**Supplementary information**

**Braking effect of climate and topography on global change-induced upslope forest expansion**

Juha M. Alatalo1*, Alessandro Ferrarini2

1 Department of Biological and Environmental Sciences, College of Arts and Sciences, Qatar University, P.O. Box 2713, Doha, Qatar,

2 Via G. Saragat 4, I-43123 Parma, Italy

*Correspondence:

E-mail:

Tel: (+974) 4403-6823

**Text S1. Detailed Methods**

**1. Description of the GMDH model**

GMDH (Group Method of Data Handling; Ivakhnenko 1971) is an inductive modelling method that has been applied to complicated systems in several fields such as modelling, prediction, data mining and system identification(Srinivasan, 2008; Ghanadzadeh et al., 2012).

The goal of GMDH in our framework was to find thefunction gthat could be used instead of the actual unknown function f in order to predict the output Yp(predicted treeline elevation) for a vector of topo-climatic predictors X=(x1, x2,…, xn) as close as possible to its real Y (current treeline elevation).

In mathematical terms, given N predictor variables and M observations (points sampled along the treeline)

(1)

where.

The function g can be estimated by the Volterra-Kolmogorov-Gabor polynomial

(2)

where a is the vector of weight coefficients.

In order to determine the best fit values, the partial derivatives of the difference between predicted and real values with respect to each constant ai were taken and set equal to 0 as follows:

(3)

Solving Eq. (3) in a least squares sense requires that

(4)

where a is the vector of unknown coefficients of the polynomial in Eq. 2, Y is the vector of observed values (current treeline points) and A is a matrix in the form

(5)

The least squares technique from multiple regression analysis leads to a solution of the form

(6)

where -1 indicates the inverse matrix and Tthe matrix transpose. Eq. 6 determines the vector of the best coefficients of Eq. 2.

**2. Description of the naive Bayesian classiﬁer**

A naïve Bayesian classifier (NBC) was built to forecast how changes to independent variables (topo-climatic variables) modify the probability of treepresence (dependent variable).

The NBC makes use of the Bayesian theorem to calculate the a posteriori (conditioned) probability starting from an estimated a priori probability. The probability of an instance X = <a1, a2 ,..., an(i.e. vector of topo-climatic values in our study) being a member of class c(i.e. tree presencein our study) is computed as

(7)

where is the a priori probability that c(i.e. tree presence) occurs;is the a priori probability for the instance X = <a1, a2 ,..., an and is the a posteriori probability of c (tree presence) given the vector of topo-climatic conditionsX = <a1, a2 ,..., an.

The NBC was trained using 80% of sampling sites, and validated using the remaining 20% of sampling sites).

**3. Generation of response curves**

Response curves were generated based on Eq. (7) and built using nomograms. A nomogram can be considered a model to evaluate how much an instance fits to a class.

Let c be the class (tree presence in our study) for which a nomogram is constructed, and let c1 be the class (tree absence in our study) other than c. P(c1 | X ) is the probability that an instance X = <a1, a2 ,..., an(i.e. vector of topo-climatic values in our study) is not a member of class c.

The odds ratio for these two probabilities is calculated as

(8)

Logit of Odds can be expressed as

(9)

The terms in summation can be expressed as odds ratios

(10)

and estimate the ratio of posterior to prior probability given the attribute value ai.

The individual contribution (point score) of each known attribute value in the nomogram is equal to the log of odds ratios

(11)

Last, log odds ratios were rescaled in the [-100, 100] interval to facilitate interpretation.

**4. Computation of variable importance**

Nomograms plot log odds ratios (Eq. 11) for each value of each predictor. Length of the line corresponds to span of the odds ratio, showing the importance of the related predictor. It also shows the impacts of individual values of the predictor.

REFERENCES

Ghanadzadeh H, Ganji M, Fallahi S (2012) Mathematical model of liquid–liquid equilibrium for a ternary system using the GMDH-type neural network and genetic algorithm. Applied Mathematical Modelling, 36, 4096–4105.

Ivakhnenko AG (1971) Polynomial theory of complex systems. Syst Man Cybern IEEE Trans On 364–378.

Srinivasan D (2008) Energy demand prediction using GMDH networks. Neurocomputing, 72, 625–629.

Figure S1. The study was carried out in the Handölan valley (64°14' N; 12°25' E) in the southern Swedish Scandes, province of Jämtland. The study area is 63.25 Km2 wide. The valley floor is at about 550 m a.s.l. and the surrounding mountains rise to 1400 m a.s.l.

Figure S2. We set oneGIS point (in magenta) every 100 m along the digitalized treeline. For each GIS point, we determined current treeline elevation (Y; m a.s.l.), slope acclivity (x1; °), slope aspect (x2; categorical) and the 8 climate variables described in the manuscript.The current treeline in the study area was found to be 35.7 km in length, thus 357 points were placed along it.

Figure S3. We set oneGIS point (in blue) every 100 m in both X and Y directions, thus 6325 points were used. For each GIS point, we determined tree presence/absence (*Y), slope acclivity (x1; °), slope aspect (x2*;categorical) and the 8 climate variables described in the manuscript.

Figure S4. Map of the current treeline (in light green) in the Handölan valley (southern Swedish Scandes, province of Jämtland). It was found to be 35.7 km in length. The lowest treeline elevation is 552 m a.s.l., while the highest one is 1022 m a.s.l.

Figure S5. To represent present climate conditions of the study area, we used the 1991-2009 climate period. Eight biologically-relevant candidate climate variables were selected: degree-days above 5°C (growing degree-days; °C); degree-days below 0°C (chilling degree-days; °C); mean annual precipitation (mm); mean warmest month temperature (°C); 8) mean coldest month temperature (°C); summer heat:moisture index (°C mm-1); number of frost-free days (unitless); precipitation as snow (mm).To represent potential future climates, we used projections of the CMIP5 multimodel data set for the A2 emission scenario.

Figure S6. The developed treeline model had predictive accuracy (R2) equal to 99.8223% in the training set and 99.9028% in the test set (model validation); MAPE was equal to 0.08% in the training set and 0.07% in the test set; MAE was 0.58 m in the training data and 0.57 m in the test data. Model accuracy resulted excellent thus allowing for reliable projections.

Figure S7. On the X-axis, the 357 points sampled along the current treeline are placed in increasing order of elevation. The green line represents the current elevation of each point sampled along the treeline, while the colored lines represent the predicted shifts of each point over short (2010-2039; in yellow), medium (2040-2069; in orange) and long (2070-2099; in red) climate periods.The curves indicate that over the brief period treeline dynamics are expected only in the lower portion of the valley. Minor dynamics are to be expected as slopes and elevation increase. Over the mid-period, dynamics of the subalpine forest in the lower part of the valley will be completed, and tree cover will be able to start climb up the less gentle slopes. Over the long period, projections suggest that subalpine forestwill be able to climb up even the most elevatedslopes of the study area whose maximum elevation is 1400 m a.s.l..

Figure S8. On the left, projections ofthe upslope advance of subalpine forests over short (in yellow), medium (in orange) and long (in red) climate periods using the multivariate topo-climatic model. On the right, projections based on a lapse-rate model where we assumed that the position of the treeline in the study area is only determined by the mean warmest month temperature.

1