Appendix S1.Further details on analysis methods

Variable transformations and Factor Analysis

The factor analyses were justified as indicated by the Kaiser-Meyer-Olkin measure of sampling adequacy (human impact variables: 0.86; environmental variables: 0.59; a minimum of 0.5 is usually considered indicative of data being appropriate for FA, McGregor 1992) as well as Bartlett's test of sphericity (human impact variables: 2=1189.9, df=19, P<0.001; environmental variables: 2=1147.2, df=28, P<0.001; McGregor 1992).

The third environmental factor (EFactor3) and the second human impact factor (HFactor2) revealed from the FA were square root transformed to achieve more symmetrical distributions before using them as predictors in the GLM.

Accounting for spatial autocorrelation

The abundances investigated were likely to show some spatial autocorrelation (i.e., neighboring transects revealing more similar values than more distant ones) unexplained by the predictor variables included in the model. Such autocorrelation leads to spatially non-independent residuals, violating a crucial prerequisite of any linear model and, hence, devalues its reliability. We thus explicitly incorporated spatial autocorrelation into the model. We did this using the following approach: First, we ran the full model with all environmental gradients, the squared terms, the species, the interactions and the offset term(log transformed transect length) and transect ID included and derived the residuals from it. Then we calculated, separately for each for each individual data point (i.e., combination of species and individual transect), the weighted average of the residuals of all other data points of the same species. The weight used was inversely related to the distance between two transects and followed a Gaussian function. The mean of this function was set to zero (i.e., maximum weight at a distance of zero) and its standard deviation was determined by maximizing the likelihood of the full model with the derived autocorrelation term included.

Species-specific analyses

To assess the impact of the different covariates on the abundances of the different species and to understand the complex patterns of species specific impacts of the investigated gradients we ran separate models for each species and investigated their results with regard to direction, magnitude and significance of estimates. From these models we removed squared terms which were clearly not significant (p>0.5). The autocorrelation term included into these models was that derived from the full model ran for all species and transect length was again included as an offset variable.

Species model predictions used for identifying core distributional range areas

After extracting the variables for each cell in the country (cell size: 0.05 degrees) and transforming them when necessary to achieve approximately symmetrical distributions (Table S1, S2) we subjected them to two separate factor analyses with varimax rotation (conducted using the function fa of the R-package psych; Revelle 2012), one for the environmental variables and one for the human impact variables. These were justified as indicated by the Kaiser-Meyer-Olkin measure of sampling adequacy (environmental variables: 0.421; human impact variables: 0.755) as well as Bartlett's test of sphericity(environmental variables: 2=588.46, df=15, P<0.001; human impact variables: 2=550.05, df=21, P<0.001; McGregor 1992). Both revealed two factors with Eigenvalues larger than one, together explaining roughly 50% of the total variance in the set of variables (Table S1, S2). We extracted the respective scores (thereafter 'factor scores') and used them as predictors in the models.

Out of the four factor scores and their squares we constructed all subsets (i.e., a total of 81 one models; models that included a squared covariate included also the respective covariate unsquared). The squared terms we included to allow for non-linear impacts of the predictors on species abundances (i.e., higher abundance at intermediate values of a covariate). We fitted the models to the transect data (N=266 transects) using Generalized Linear Models (McCullaghNelder 2008) with negative binomial error structure and log link function (function glm.nb of the R package MASS; Venables & Ripley 2002). In some cases a given model could not be fitted. This concerned chimpanzees and cane rat (one model each), Spot nosed monkeys (five models), Giant rat (seven models), Warthog (12 models), Bay duiker (26 models), Buffalo (30 models), and Campbell's monkey (32 models). The respective models we discarded from further consideration. We then determined for each model the predicted values for all the grid cells in the country and also its Akaike weight (Burnham & Anderson 2002) and then averaged the predictions whereby we weighted their contribution by the respective Akaike weights. We used these estimated abundances per species and grid cell as the basis of the spatial prioritization.

Table S1: Results of the factor analysis for the human impact variables. For each variable the largest absolute loading is shown in bold.

variable / transformation / factor 1 / factor 2
tr.ht_30 / square root / -0.630 / 0.177
distance to nearestvillage / square root / 0.782 / 0.198
distanceotmajorroads / square root / 0.609 / -0.363
distance to minorroads / square root / 0.720 / -0.211
humanpopulation density / log / -0.399 / 0.633
minimum distance to protected area / square root / -0.151 / -0.613
humanpopulationchange / square root / -0.295 / 0.538
Eigenvalue / 2.167 / 1.311
prop. variance explained / 0.310 / 0.187

Table S2: Results of the factor analysis for the environmental variables. For each variable the largest absolute loading is shown in bold.

variable / transformation / factor 1 / factor 2
elevation / square root / 0.961 / -0.267
tr.ht_40 / square root / 0.354 / 0.143
CTI / square root / -0.667 / -0.016
ht_60 / -0.210 / -0.549
Precipitation seasonality / square root / -0.342 / 0.504
mean precipitation / -0.021 / 0.997
Eigenvalue / 1.655 / 1.642
prop. variance explained / 0.276 / 0.274

Algorithm for identifying core distributional ranges

We first determined as starting point Qtot for the configuration of the selected 20% of grid cells with maximum relative abundance (Qtot actual; AppendixS2, Fig. S4). In a second step we identified those selected cells that had at least one adjacent unselected cell (adjacent cells were considered those four pixels immediately above, below, to the left and to the right of the pixel, not diagonal). These cells could potentially be dropped from the CDRA ('pixels to be dropped'). In the third step we identified those pixels that were not selected, but that were adjacent to at least one selected pixel following the same criteria as in step two. These were the pixels that could potentially be included in the CDRA ('pixels to be included'). In the fourth step we then calculated Qtot for each combination of pixels to be dropped and pixels to be included and chose that combination that maximized Qtot (Qtot new) by keeping 20% of all pixels. In the fifth step we evaluated if Qtot newQtot actual, and if this was the case we chose the configuration of the new area as updated CDRA and repeated steps one to five; if Qtot newQtot actual we terminated the search and used the actual configuration as final CDRA. The algorithm was implemented in R (R Core Team 2012) and applied separately for each species.

Implementation

All analyses were conducted in R (version 2.11; R Development Core Team 2010 or version 3.1.2; R Core Team 2014). GLMMs were fitted using the functions glmmadmb of the R package glmmADMB(Fournier et al. 2012; Skaug et al. 2014), the FAs were conducted using the function factanal or the function fa of the package psych (Revelle 2012), and diagnostics of the applicability of FAs were derived using the function pafof the package rela (Chajewski 2009). The autocorrelation term and the fitting of the standard deviation of the respective weight function were derived using a self-written R-script. The CCA was conducted and its results were plotted using the functions cca and plot.cca of the R package vegan (Oksanen et al. 2010).

The identification of species core ranges turned out to be computationally intense. We therefore parallelized the algorithm (to do multiple computer operations at the same time) to achieve reasonable computation times using the R-package ‘parallel’. The algorithm finished after 8 to 106 iterations (mean = 50, median = 42; average number of arrangements tested per iteration: 66070) and needed on average 14.9 hours (arithmetic mean) per species to finalize on a quadcore processor with 2.83Ghz.

References

Burnham, KP & Anderson, DR. (2002).Model Selection and Multimodel Inference. 2nd ed. Berlin: Springer.

Chajewski, M. 2009. rela: Scale item analysis. R package version 4.1.

Fournier DA, Skaug HJ, Ancheta J, Ianelli J, Magnusson A, Maunder M, Nielsen A & Sibert J. 2012. AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optim. Methods Softw., 27, 233-249.

McGregor PK. 1992. Quantifying responses to playback: one, many, or composite multivariate measures? In: McGregor PK. (Ed.): Playback and Studies of Animal Communication. Plenum Press. New York, London.

Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., O'Hara, R.B., Simpson, G.L., Solymos, P., Stevens, M.H.H. & Wagner, H. 2010.vegan: Community Ecology Package. R package version 1.17-4.

R Development Core Team. 2010. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria.R Core Team. 2014. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria.

Skaug H, Fournier D, Bolker B, Magnusson A & Nielsen A. 2014. Generalized Linear Mixed Models using AD Model Builder.R package version 0.8.0.

Revelle, W. 2012.psych: Procedures for Personality and Psychological Research.

Venables, WN & Ripley, BD. 2002. Modern Applied Statistics with S. Fourth Edition. Springer, New York.

1