Appendix S1. Construction of the Model

Appendix S1. Construction of the model

a. Creation of a lattice

A landscape can be viewed as a mosaic of ecologically homogeneous patches, each with its own invasibility to Prunus serotina. Considered as such, the Compiègne forest consists of a mosaic of 6 434 patches whose size ranges from 0.25 to 56 ha. Hence, modelling the landscape at the patch scale would require a two dimensional grid consisting of 115 488 50 m x 50 m cells (401 x 288) of which 60 430 correspond to the forest. As this number is too high to be stored by usual softwares, we prefered a patch mosaic scale, as defined by Chabrerie et al. (2007b).

We first superimposed a lattice of 500 x 500 m cells over the forest map. This generated a grid of 41 x 29 cells, i.e. 1189 cells, among which 696 and 493 corresponded to forest and non-forest cells respectively (Fig. 1a). Each forest cell was thus ecologically heterogeneous, consisting in a mosaic of patches with particular soil conditions, vegetation and target tree species.

Then, we reshaped the two-dimensional landscape matrix to create a vector ‘probability to be invaded’. Each cell can be either safe (the invader is absent) or invaded (the invader is present as fertile trees in the canopy, whatever its density).

b. Assignement of invasibility indices

We create a vector ‘invasibility’, that is assigning an invasibility index to each cell of the matrix. For this purpose, we followed the approach proposed by Chabrerie et al. (2007b). Each stand has been carefully surveyed so as to assert with sufficient certainty whether P. serotina was present or absent. A set of three environmental variables was collected and mapped using a Geographic Information System (GIS): soil types, soil drainage levels, and dominant tree species. Partial risk indices were then derived for each environmental variable (Isoil, Idrainage, Ispecies).

The soil suitability for P. serotina establishment in cell i was estimated by the soil partial index Iisoil. As a cell consisted in a mosaic of patches, it could contain soils promoting and/or soils limiting P. serotina establishment. This partial index was based on positive or negative Pearson correlations between P. serotina and different soil types weighted by the relative area of the latter in the cell. Hence, Iisoil gave an overall picture of the soil suitability in cell i and was calculated as follows:

Iisoil = (Σ Cisoil x Aisoil) / |Csoilmax x Amax|(1)

where Cisoil is the correlation coefficient between each soil type and P. serotina in tree layer (establishment risk, level 2, see Chabrerie et al. 2007b) and Aisoil is the area of each soil type in cell i. At the denominator, Csoilmax is the maximum correlation coefficient found between soil types and P. serotina in the tree layer and Amax is the total cell area (i.e. 250 000 m²). The absolute value of the denominator was used because correlation coefficients may be negative. Iisoil was maximum (-1 or +1) when the cell i was totally covered by the soil type having the highest correlation coefficient, negative or positive, with P. serotina in the tree layer. The partial indices for soil drainage level and species cover were built following the same reasoning, with respectively:

Iidrainage = (Σ Cidrainage x Aidrainage) / |Cdrainagemax x Amax| (2)

where Cidrainage is the correlation coefficient between each drainage level and P. serotina in the tree layer, and

Iispecies = (Σ Cispecies x Aispecies) / |Cspeciesmax x Amax|(3)

where Cispecies is the correlation coefficient between each dominant tree species and P. serotina in the tree layer. Management-related disturbances (e.g. thinning, clearcutting, logging) were taken into account in the dominant tree species partial index, since their frequency and spatial extent is species-specific.

Then, the three partial risk indices were averaged to obtain a single overall risk index for each cell i:

Iiestablishment=mean (Iisoil, Iidrainage, Iispecies)(4)

This overall risk index is a probability for cell i to be invaded and is considered as a quantitative estimation of the landscape cell invasibility. This index initially ranges from -1 (i.e. cell resistant to P. serotina establishment) to +1 (i.e. cell which would be invaded with certainty if P. serotina disperses into it), but was rescaled on a [0,2] interval since we aimed at elaborating a multiplicative coefficient. This index indicated how close the local environment was to the species’ requirements; the higher its value, the more suitable the cell, a value below 1 preventing from establishment.

We assigned an invasibility index c(i) to each forest cell i and a null coefficient to the 493 non-forest cells (Fig. 1a). The values of the invasibility index in the 1 189 landscape cells (696 forest cells + 493 non-forest cells) defined the vector c, referred as the ‘invasibility vector’.

c. Determination of the dispersal matrix

Three types of dispersal were distinguished: short-distance (corresponding to seeds dispersed by gravity or locally regurgitated by birds), mid-distance (corresponding to birds gathering cherry stones) and long-distance (corresponding to mammals, especially foxes), with proportion 98%, 1.5% and 0.5 respectively, to take into account the cell size (500 x 500 m). The number of seeds dispersing into neighbouring cells thus decreased with distance.

We combined two lognormal dispersal functions, f1 and f2, to distribute the recruitment potential outside an invaded cell by birds and by foxes respectively. We disregarded seeds reaching a neighbouring cell by gravity, as well as animal-dispersed seeds remaining in the same cell, because of the low probability of such events. Hence, the seeds that left a cell in one time unit were restricted to those redistributed by birds and foxes, while those remaining in the same cell came from the effect of gravity (including local regurgitation by birds). Modelling dispersal in this way ensured that the model simulated both the mass action of local dispersal and the stochastic nature of long-distance dispersal.

For p=1 or 2, the lognormal function fp depended on two parameters: the shape parameter Sp and the scale parameter Lp. It gave the probability for a seed coming from cell i to reach cell j:

(5)

where ri,j is the Euclidian distance (in grid units) between cells i and j. Let (k,l) and (x,y) be the cell i and cell j co-ordinates respectively; k and x are row indices, that varied from 1 to 29, while l and y are column indices that varied from 1 to 41. The formal relation between cell co-ordinates and indices in matrices and vectors are: i=(k-1)41+l and j=(x-1)41+y, and ri,j=[(k-x)2 + (l-y)2]0.5. Following the recommendations of Greene et al. (2004), we chose S1=S2=1, and two values were selected for Lp, that approximate the mean dispersal distance by birds and by foxes. After Deckers et al. (2005), Pairon et al. (2006) and personal field observations, we retained a mean dispersal distance of 100 m for birds (i.e. L1=0.2 grid units) and 918 m for foxes (i.e. L2=1.836 grid units) (Fig. S1).

The integral dr, where the integration variable r is the distance to the centre of the cell i, Aj is the area of the cell j, and f is a dispersal function, gave the fraction of seeds coming from cell i received by cell j, i.e. the dispersal kernel (Cannas et al. 2003). Turning to a discrete space, with each cell i of area one, we used the approximation for a cell j:

dr(6)

Finally, we wrote a complete dispersal matrix M which described all possible transitions between the cells. If mi,j is the probability for the seeds to move from cell i to cell j, then the diagonal coefficients mi,i corresponded to the within-cell dispersal, while the other coefficients mi,j accounted for outside cell dispersal. This led to the following formula:

mi,i=1-p-q(7)

mi,j=p*f1(ri,j) +q*f2(ri,j)(8)

where p = 0.005 and q = 0.015.

Figure S1: The two dispersal lognormal functions f1 and f2 describing seed dispersal by birds (S1 = 1; L1 = 0.2) and by foxes (S2 = 1; L2 = 1.836) respectively. One grid unit equals 500m.

d. Computation

The model ran on a 8 yr time-step to match the average time lag between the release of saplings from suppression and the first seed production (Closset-Kopp et al. 2007). At each time step t, we defined a vector Xt=(Xti) containing the probability for the forest to be invaded. For each cell i (i=1…1189), Xti gives the probability for cell i to be invaded at time t. At each time-step (i.e. every 8 years) the characteristic of each cell is updated.

P. serotina was initially introduced in a cell i0 at time t=0, i.e. X0i0=1. Two conditions were necessary for P. serotina to invade another cell: it must reach this cell (as seeds) and become established (depending on the cell invasibility). Hence, the probability for P. serotina to invade cell i was the product of the probability to reach the cell i, which was given by the dispersal function, by the probability to become established in cell i, which was given by the invasibility index. We used a split-step method to compute the probability for cell i to be invaded at time t+1. At each time step t:

- if cell i was invaded with probability 1 at time t, then Xt+1i=1

- else:

• Dispersal: Xt+0.5 =M Xt(9)

• Establishment: Xt+1 i = Xt+0.5i . ci(10)

We imposed that the value of Xt+1iwasnever exceeding 1.

To fit the real environmental conditions during the past few decades we incorporated the effects of storms of 1984 and 1990 (see Study area). In each cell i and for each storm event (t=16, 17), a disturbance coefficient dit was assigned, which equalled the proportion of the cell area disturbed by storm. The disturbance vector dt was included into the establishment phase; hence, equation (10) became:

if t=16 or t=17,Xt+1 i = Xt+0.5i. ci . (1+dit)(11)

e. Assumptions

Our model is based on the following assumptions:

-Time-invariance: we assumed that no significant environmental change able to modify the landscape components can occur during the simulation run, except for natural disturbances (i.e. storms) which were incorporated separately. We needed this assumption since the computation of the partial indices was based on synchronic field measures. Thus, invasibility indices were maintained constant over time for each cell i. It seemed to be a reasonable assumption considering that soil conditions cannot spontaneously change over this short time lag. It was also unlikely that the dominant tree species can dramatically change, since it was long-living and strongly associated to the soil type. Since they were target tree species-specific, management-associated disturbances (e.g. thinning, clearcutting) were included in the dominant tree species partial index of the overall invasibility index. Small-scale disturbances (i.e. single treefalls) have been neglected. Regarding large-scale disturbances, only the storms that occured after 1980 were incorporated in the model since no data was available on the effects of former storm events.

-Dispersal from each point of the landscape can be approximated using a lognormal kernel, and dispersion is a non-directional process. Such a dispersal function has the advantage to reproduce short-, mid- and long-distance dispersal (Neubert & Caswell 2000, Garnier & Lecomte 2006) and thus avoids pitfalls of models that either neglect dispersal (e.g. Goslee et al. 2006) or restrict it to certain cells (e.g. von Neumann neighbourhood, Söndgerath & Schröder 2002). We used a lognormal kernel, which has been shown to be the best function for modelling tree seed dispersal compared to the 2Dt kernel and the Weibull kernel with two parameters (Greene et al. 2004). This has been recently confirmed for P. serotina (Pairon et al. 2006).

-P. serotina achieved maturity within 8 years once it has established in a cell. We retained this mean value after field measures.

-Once a cell was invaded with probability 1, it was keeping this probability until the end of the simulation. In other words, once established P. serotina could not go extinct. This was based on a previous study which has shown that once a invasive population of P. serotina has colonized a recipient ecosystem, it was virtually permanently established (Closset-Kopp et al. 2007).

f. Validation

We compared the predicted maps which were obtained at the 19th iteration to the real map of the current observed distribution of P. serotina in the forest. Predicted outcomes were classified into presences or absences at the following probabilities: p=1, p0.8, p0.5, and p0.1. Then, we built up a contingency table of the model predictions against the current effective distribution, the numbers of correctly predicted cells (invaded and uninvaded) being in the diagonal. To quantify how accurately the output map was predicting the presence/absence of the species, we computed four statistical measures of goodness-of-fit:

- overall agreement (OA), i.e. the percentage of cells (invaded or uninvaded) that were correctly predicted by the model; OA should be maximal.

- underestimation error (UE), i.e. the percentage of invaded cells that were not predicted as such by the model; UE should be minimal.

- Kstandard (Pontius, 2000), i.e. the kappa statistic, which is a better measure for the predictive quality for each cell than the overall agreement, since it incorporates the expected proportion of correct classification due to chance. Its value ranges from –1 to 1. The maximal value 1 indicates perfect agreement, whereas the minimal value –1 indicates maximum disagreement. A kappa value equal to 0 indicates that the observed agreement matches the agreement expected by random arranging of cells.

- Klocation (Pontius, 2000), which is the component of Kstandard quantifying the proportion of agreement due to location. Its maximal value is 1. There is no minimal value.

Comparison between predicted (Fig. 2c) and real (Fig. 1b) distribution maps gave the best results of Kstandard at a threshold p=1 (Table S1). For p=1, p0.8 and p0.5, we found very similar values for percentage of correctly predicted cells (from 79% to 81%). Those values of goodness-of-fit allow us to validate the model, retaining the threshold probability 1 as the best predictor of the invasion pattern.

Table S1: Measures of goodness-of-fit between predicted (19 iterations) and observed distribution maps of Prunus serotina when it is introduced in cell i0=388

OA: overall agreement, UE: underestimation error, p: threshold probability to which a cell is considered as invaded.

Two limitations may explain the differences between predicted and observed maps. First, fluctuations of fox population densities were neglected in our simulations. Second, due to the lack of accurate information, we did not incorporate storm events that occurred between 1850 and 1980. However, it should be noted that in Compiègne area, storms usually impact the forest along the same wind corridor, which includes only the south and south-eastern parts of the forest. Hence, the extent of large-scale disturbances may be highly restricted in space.

g. Mean field model (no space) along the invasibility gradient

Spatial structure naturally forms when one or more spatially varying factors affect a system. However, spatial structure can also form because of distance-dependent interactions among cells in an initially homogeneous system. A sound understanding of the mean field model along a continuous gradient of the invasibility index was thus required before implementing spatially realistic simulations. For this purpose we preliminary ran the model to test whether the system exhibited founder control along some regions of the gradient and search for thresholds along the invasibility gradient that would drive the system dynamics. We thus studied the special case where the invasibility index ci = c was the same for each cell i. In this case, equation (10) was rewritten as follow:

Xt+1 i = Xt+0.5i . c (12)

Then, equation (9) and equation (10) were merged into:

Xt+1 = M Xt. c = c. M Xt (13)

and finally:

Xt= (c. M)t X0 (14)

Equation (14) fully determined a homogeneous Markov chain. This implied that invasion was uniform and went from cell i0 towards all directions: there is no directional process (Figs. S2 & S3).

c≤0.7c =0.9c =1.1

c =1.2c =1.3c ≥1.5

Figure S2: Invasion maps after 19 iterations, when starting from i0=388.

Figure S3: Propagation distance, invasion speed, and invasion extent plotted against the invasibility index, after 19 iterations, when starting from i0=388.

The invasion speed was increasing with the invasibility index; for a given time interval, the higher the index, the faster the spatial spread of P. serotina. For c≥1.5, the landscape is entirely invaded at the 18th iteration and beyond, and both invasion extent and propagation distance became constant. Conversely, for c1.1, no cell was invaded with probability 1 after 19 iterations.

It should be noted that the mean invasibility index for the 696 forest cells of the real landscape (Compiègne forest) was 1.16, that is close to the threshold value of 1.1. Simulations conducted with the mean field model when an invasibility index of 1.16 is equally affected to all cells are shown in Figs. S4 & S5.

t=15 t=19

t=23 t=27

Figure S4: Invasion maps when an invasibility index of 1.16 is equally affected to all cells and starting from i0= 388.

Figure S5: Quantitative descriptors of the spatial spread of Prunus serotina following its introduction in cell i0=388, when an invasibility index of 1.16 is equally affected to all cells.

Additional references

Goslee SC, Peters D, Beck K (2006) Spatial prediction of invasion success across heterogeneous landscapes using an individual-based model. Biol Invasions8: 193-200

Greene DF, Canham CD, Coates KD, Lepage PT (2004) An evaluation of alternative dispersal functions for trees. J Ecol92: 758-766

Neubert MG, Caswell H (2000) Demography and dispersal: calculation and sensitivity analysis of invasion speed for structured populations. Ecology81: 1613-1628

Pontius RG (2000) Quantification error versus location error in comparison of categorical maps. Photogramm Eng Rem S66: 1011-1016