Null Model Analyses of Presence-Absence Matrices Need a Definition of Independence

Null Model Analyses of Presence-Absence Matrices Need a Definition of Independence

ESM S1.Distributions for the quasi-abundance (Poisson, negative binomial and binomial), model selection and parameter estimation

Let Aui denote the quasi-abundance of species u (=1,…, t) at location i (=1,…, s).A log-linear model is assumed for the expected value of the quasi-abundance, μui=E(Aui), dependent on a constant overall effect γ, a qualitative species effect δu and a qualitative location effect θi, through the linear model: log μui = γ + δu + θi, with δ1 = θ1 = 0 to fix the other parameters.In addition, suppose that Yui , u=1,…, t,i =1,…, s, are ts independent binary random Bernoulli variables, such that πui = P(Yui = 1) = E(Yui). Under the assumption that the probability structures of Aui and Yui are given by πui = P(Yui =1) = P(Aui0), three special cases are considered below for modeling the probability of occurrence of species i on location u, πui.

1) If Auihas a Poisson distribution with mean μui, then the probability mass function (p.m.f.) for Aui is:

.

Letting , the generalized linear model for the probability of occurrence of species uon location i, is:

(A)

Let , then . The link function is the well known complementary log-log link:

ηui = g(πui) = log[-log(1-πui)](B)

2) If Auihas anegative binomial distribution with parameters μui and λ(i.e., the aggregation parameter λ is the same for all the ts distributions), the p.m.f for Aui is:

Letting again, the generalized linear model for the probability of occurrence of species uon location i is:

(C)

and link function

(D)

This is a one-parameter link family (Aranda-Ordaz 1981), containing both the (canonical) logistic link () and the complementary log-log link () as special cases. The model for πuican also be written as:

.

3) WhenAui follows a binomial distribution with parameters μui and n(i.e. n is the same for all the ts distributions), then:

The generalized linear model forπui is:

(E)

or, more succinctly: , whereηui is the (non-canonical) link function:

(F)

Model selection

The link families we establish as “competitor models” are indexed by one extra parameter, say ς; links derived from the binomial and negative binomial distributions are examples of that sort. A formal method like the deviance (likelihood ratio) test is applied in order to find the estimate . This process produces an estimate of the profile likelihood (or profile deviance) for ς. The profile deviance can be used to get approximate (1–α)100% interval estimates for ς(McCullagh and Nelder 1989):

{ς | D(ς )<D()+χ2(1,1–α)}.

Here is the maximum likelihood estimate of ς, corresponding to the minimum deviance obtained by visual inspection of the graph of D(ς ) as a function of ς; and χ2(1,1–α) is the upper (1–α)100% percentile of the chi-squared distribution with 1 degree of freedom.

Calculation of the profile deviance for each family tested in this study follows the strategy just described. When the quasi-abundance is negative-binomially distributed, the link functions ηui are part of a parametric family of links indexed by an “aggregation parameter” ς = λ (Aranda-Ordaz 1981). The corresponding model with linear predictor γ + δu + θi is fitted for different values of λ, then the profile deviance is produced, and a 95% confidence interval for λ is constructed. In a similar way, the profile deviance for the parameter ς =n and corresponding 95% confidence interval estimate for n is constructed,when the quasi-abundance is binomially distributed. Models where the quasi-abundance is negative binomial distributed (indexed by λ) and Poisson distributed can be tested against each other member of the same family by comparing their deviances. Analogously, the deviance can be compared in order to choose between different models coming from the binomial family (indexed by n) as the assumed distribution of the quasi-abundance. Because the Poisson distribution is a limiting case of this latter family, the comparison between them is also possible. The model attaining the overall minimum deviance will be regarded as the best representative of the models for the binary response variable yui as function of γ, δu and θi.

Parameter estimation

For the Poisson and negative binomial models for the quasi-abundance, the profile likelihood procedure can be run using statistical software like GENSTAT (VSN International Ltd 2007). Non-canonical links are easily handled in GENSTAT with the “GLMLINK” directive. However, the link corresponding to the model for the estimation of πui assuming a binomial distribution has difficulties when a “GLMLINK” procedure is programmed. Due to the particular form of the log-likelihood, there is a possibility that the logarithms might not be real-valued functions. Thus, feasible solutions (local minima) need to be controlled in order to confine in the [0,1] interval. Therefore, the minimization problem is a constrained non-linear programming problem. A FORTRAN 77 program was implemented by the authors for running a constrained minimization algorithm, as described by Powell (1982, 1985), and found in the HSL library (HSL 2007). This algorithm is an extension of the variable metric method by Broyden-Fletcher-Goldfarb-Shanno (BFGS) (Press et al. 1992) to the constrained case.

References

Aranda-Ordaz FJ (1981) On two families of transformation to additivity for binary response data. Biometrika 68:357-363

HSL (2007) A collection of FORTRAN codes for large scientific computation. AspenTech Limited, United Kingdom

McCullagh P, Nelder JA (1989) Generalized Linear Models. Chapman and Hall, LondonUK

Powell MJD (1982) Extensions to Subroutine VF02. In: Drenick RF, Kozin F (eds) Systems Modelling and Optimisation, Lecture Notes in Control and Information Sciences 38. Springer-Verlag, Berlin, pp 529-538

Powell MJD (1985) On the quadratic programming algorithm of Goldfarb and Idnani. Mathematical Programming Study 25:46-61

Press WH, Teukolsky, SA, Vetterling WT, Flannery BP (1992) Numerical Recipes in FORTRAN 77: The Art of Scientific Computing, 2nd edn. CambridgeUniversity Press, Cambridge

VSN International Ltd (2007) GenStat Version 7.2.220 Discovery Edition. Rothamsted, Lawes Agricultural Trust

Electronic Supplementary Material S1, Page 1 of 5