Electronic Supplement belonging to the article “Variation in plasticity of personality traits implies that the ranking of personality measures changes between environmental contexts: Calculating the cross-environmental correlation”
by
Jon E Brommer
R code
I below follow the convention of printing the R code in the Courier font and include the ‘>’ produced by the R command window for each input line. To run a line of code, the ‘>’ is not typed in.
The estimates in Table 2 are derived from the published estimates (provided in Table 1) by first defining the double column vector phi based on the environmental values E1 and E2 reported in Table 2. For example, for E1=0 and E2=1,
phi=matrix(c(1,1,0,1),2,2)
The matrix K is defined for the variance in elevation, slope and their covariance, e.g. for the values reported by Dingemanse et al. (2012a) as
K=matrix(c(26.7,14.8,14.8,11.1),2,2)
The matrix P with the variances in E1 and E2 in the diagonal and the covariance between these is provided by multiplication
P <- phi%*%K%*%t(phi),
and the covariance is scaled to a correlation as
r_between<-P[1,2]/sqrt(P[1,1]*P[2,2]).
The above can be extended straightforwardly to more than two environments. Suppose behaviors were quantified as a function of a continuous variable (e.g. feeding rates as a function of rainfall). When rainfall is standardized to a zero mean, we may want to calculate the between-individual variances and covariances for standardized environmental values of -1, 0 and +1. The random regression estimates
K=matrix(c(30,4,4,10),2,2)
> K
[,1] [,2]
[1,] 30 4
[2,] 4 10
Now, the vector phi has three rows and two columns
phi=matrix(c(1,1,1,-1,0,1),3,2)
phi
[,1] [,2]
[1,] 1 -1
[2,] 1 0
[3,] 1 1
Again,
P <- phi%*%K%*%t(phi)
produces
> P
[,1] [,2] [,3]
[1,] 32 26 20
[2,] 26 30 34
[3,] 20 34 48
which is, when scaled to a correlation
cov2cor(P)
[,1] [,2] [,3]
[1,] 1.0000000 0.8391464 0.5103104
[2,] 0.8391464 1.0000000 0.8959787
[3,] 0.5103104 0.8959787 1.0000000
Hence, the cross-environmental individual correlation between E=-1 and E=0 is 0.839, and reduces to 0.510 between E=-1 and E=+1, whereas the correlation between E=0 and E=+1 is 0.896. The above can also be used to plot e.g. how the between-individual variance as predicted by the random regression is expected to vary over E. For example,
> phi=matrix(c(rep(1,11),seq(-1,1,by=0.2)),11,2)
> P <- phi%*%K%*%t(phi)
> plot(phi[,2],diag(P),type="l",xlab=list("E",cex=1.5), ylab=list("Individual Variance",cex=1.5))
produces a line plot showing the increase in between-individual variance with increasing values of E.
Bayesian Random Regression and Character State and the cross-environmental correlation
Below R code is provided as an example of an implementation of the calculation of the cross-environmental correlation following random regression in a Bayesian framework implemented in R using the package ‘MCMCglmm’. The approach is illustrated for simulated data. The data was simulated for 200 individuals measured twice in each of two environments. Kluen and Brommer (2013) presents how to implement random regression in a frequentist linear mixed model and include a parametric bootstrap of random regression based on the routine ‘lme’ in the package ‘nlme’ run in R. Another approach is to use Bayesian statistics to obtain posteriors for the RR estimates of variance in elevation, slope and their covariance and use the mathematical formulas presented in the main text and above to, from these estimates, derive the environment-specific between-individual variances and their covariance (correlation).
The R code below implements the Random Regression model in a Bayesian framework and applies the relevant equations to calculate the environment specific individual variances and the cross-environmental correlation on the posteriors. Further, the character state approach is implemented on the same data which estimates directly the environmental-specific individual variances and their covariance. Lastly, code is provided for one approach of plotting the data with the predictions provided by the RR model is presented in order to facilitate comparison of model output and data (Fig. S1).
Comparison of estimates based on these two approaches (Table S1) illustrates good quantitative agreement in terms of estimates produced. The random regression and character state models estimated the cross-environmental correlation more accurately than simply using the raw data to calculate this statistic (Table S1). This result is general, because the raw data correlation also includes the variation between individuals due to residual variance, and the raw data correlation will hence be biased towards 0 (since the residual cross-environmental correlation is zero). Both modeling approaches tended, however, to underestimate the environment-specific individual variances and the cross-environmental correlation, and overestimated the residual variance. Nevertheless, all confidence intervals overlap with the true values and the model therefore performed sufficiently. In general, random regression analyses are fairly “data hungry” methods (requirements detailed by van de Pol 2012). In particular, Bayesian Random Regression appears to require a substantial amount of information for the problem considered in this example code and when assuming realistic effect sizes. The Bayesian character state and a frequentist RR model tend to work better when sample sizes are reduced (results not shown).
Fig. S1. Plot of the simulated data of the focal trait z and the expected means (solid line) and confidence interval (dashed lines) based on the random regression model estimates in the two environments. Data was simulated for 200 individual measured twice in each of two environments (E1=0 and E2 = 1) with VI(E1) = 0.2; VI(E2) = 0.3; VR = 0.8 and a cross-environmental correlation of –0.3. A slight random jitter has been added in order to aid visual presentation of the data. Confidence intervals are based on the sum of the environment-specific SD between individuals and the residual SD and assuming a normal distribution (where 95% lies within 1.96*SD). This figure is produced using the R code provided in this supplement and presents one option for relating the data to model predictions. The Random Regression model clearly provides a decent fit to the data in this case.
Code for running the Bayesian Random Regression and Character State models in R. Copy-paste the code into the R editor or equivalent. The code produces output as reported in Table S1 and Fig. S1.
--- start of code.do not copy this line ----
# This R code examplifies how one can run a Bayesian Random Regression (RR) analysis on data collected by taking
# repeated measures of individuals (subjects) in different environmental conditions.
# The posteriors are used to calculate the posterior distribution of the between-individual variance in each environment
# and the cross-environmental corrrelation from the RR estimates of variance in elevation and slope and their covariance.
# The example considers the simple case of two environments, but the approach is general
# Also included is a character state implementation of the same problem as well as an example on
# how one can plot the data with the expected RR distribution on top in order to evaluate the model output.
# The code can be extended to consider what power these approaches have to detect hypothesised effect sizes.
#load the packages needed : install these if not installed
library("MCMCglmm") #Bayesian mixed models
library(MSBVAR)
############################################3
## SIMULATION of example data set
##########################################33333
#this uses MSBVAR library, which has function
#rmultnorm(1, matrix(c(1,2),2,1), vmat=matrix(c(1,1,0,1),2,2))
#rmultnorm(n, mu, vmat, tol = 1e-10)
#n Number of variates to draw.
#mu m column matrix of multivariate means
#vmat m x m covariance matrix
#tol Tolerance level used for SVD of the covariance. Default is 1e-10
##############
# below simulates a dataset for Nind individuals which are assumed to have been measured two times in each of two environments
# sim.v1: between-individual variance in environment 1
# sim.v2: between-individual variance in environment 2
# sim.r12: individual correlation in values in E1 and E2
Nind=200
sim.v1=.2
sim.v2=.3
sim.r12=-0.3
sim.cov12=sim.r12*sqrt(sim.v1*sim.v2)
# residual variance assumed to be 0.8
res.var=0.8
#simulate the individual specific trait values: means assumed to be 0 and 1 and E1 and E2
sim<-rmultnorm(Nind, matrix(c(0,1),2,1), vmat=matrix(c(sim.v1,sim.cov12,sim.cov12,sim.v2),2,2))
#store simulated data in a data.frame
sim.data=list()
sim.data$ind=c(1:Nind,1:Nind,1:Nind,1:Nind) #ID of individuals
sim.data$E=c(rep(0,2*Nind),rep(1,2*Nind)) # the environmental value (0 or 1)
sim.data$Efac=c(rep('a',2*Nind),rep('b',2*Nind)) #environemnt as a factor
#the observed value 'z' is the individual specific value plus a randomly drawn value based on the residual variance
sim.data$z=c(sim[1:Nind,1],sim[1:Nind,1],sim[1:Nind,2],sim[1:Nind,2])+rnorm(4*Nind,sd=sqrt(res.var))
sim.data=data.frame(sim.data)
#########1###########################################################################
##### random regression approach with univariate errors ##################
#####################################################################################
#specify the prior : non informative for the residual variance ; inverse gamma for the RR estimates
prior.m1.RR <- list(R = list(V = 1, n = 1), G = list(G1 = list(V = diag(2)*0.08, n=2)))
#below specifies a paramter expanded prior as an alternative
prior.m1.RR <- list(R = list(V = 1, n = 1), G = list(G1 = list(V = diag(2)*0.1, nu=diag(2),alpha.mu=c(0,0),alpha.V=diag(2)*25^2)))
#specify the model, first order RR: the covariate is 'E' which is 0 or 1. Use 'poly' to specify higher orders
#this model specification takes some time to run because it used lots of iterations.
#This is an overkill but tends to produce decent uncorrelated chain. Should be checked and possibly adjusted to fit individual needs
mRR.sim <- MCMCglmm(z ~ E, random = ~us(1 + E):ind, data = sim.data, verbose = FALSE, pr = TRUE, prior = prior.m1.RR, nitt=550000,thin=500,burnin=50000)
#check the output
summary(mRR.sim)
plot(mRR.sim$VCV) #plots the posteriors of the random regression covariances
autocorr(mRR.sim$VCV) #the autocorrelation between the covariances which should be low
########2############################
####### CROSS ENVIRONMENT ###########
#####################################
#derive the environment-specific variances and the covariance between these
phi=matrix(c(1,1,0,1),2,2) #the second column of this matrix denotes the points where you want to evaluate the RR
#below declares the output we want to get, V(ariance in environment)1, R(epeatability in environment)1, etc...
var.r<-data.frame(V1=numeric(),V2=numeric(),R1=numeric(),R2=numeric(),r12=numeric())
for (i in 1:length(mRR.sim$VCV[,1])) {
K=matrix(mRR.sim$VCV[i,1:4],2,2) #K matrix for the i'th posterior
P <- phi%*%K%*%t(phi) #apply the relevant equations
temp<-data.frame(V1=P[1,1], V2=P[2,2], R1=P[1,1]/(P[1,1]+as.vector(mRR.sim$VCV[i,5])), R2=P[2,2]/(P[2,2]+as.vector(mRR.sim$VCV[i,5])), r12=P[1,2]/sqrt(P[1,1]*P[2,2]))
var.r<-rbind(var.r,temp) #store the data
} #for (i in
#########3################################################################
##### code below implements the character state approach ##################
###########################################################################
#lines below fit the Character State model with a univariate residual error (as in the RR approach above)
priorb = list(R = list(V = diag(1), nu = 0.002), G = list(G1 = list(V = diag(2) * 0.2, nu = 4)))
m.cs <- MCMCglmm(z ~ E, random = ~us(Efac):ind, data = sim.data, verbose = FALSE, prior = priorb,nitt=150000,thin=100,burnin=50000)
summary(m.cs)
######4##########################################
## Organise the output in a data.frame
################################################
output=list()
output$approach=c("BayesianRR","BayesianCS")
output$V1=c( posterior.mode(as.mcmc(var.r$V1)),posterior.mode(m.cs$VCV[,1]) )
output$V1.low=c(HPDinterval(as.mcmc(var.r$V1))[1],HPDinterval(m.cs$VCV[,1])[1] )
output$V1.high=c(HPDinterval(as.mcmc(var.r$V1))[2],HPDinterval(m.cs$VCV[,1])[2] )
output$V2=c( posterior.mode(as.mcmc(var.r$V2)),posterior.mode(m.cs$VCV[,4]) )
output$V2.low=c(HPDinterval(as.mcmc(var.r$V2))[1],HPDinterval(m.cs$VCV[,4])[1] )
output$V2.high=c(HPDinterval(as.mcmc(var.r$V2))[2],HPDinterval(m.cs$VCV[,4])[2] )
output$Res=c( posterior.mode(mRR.sim$VCV[,5]),posterior.mode(m.cs$VCV[,5]) )
output$Res.low=c(HPDinterval(mRR.sim$VCV[,5])[1],HPDinterval(m.cs$VCV[,5])[1] )
output$Res.high=c(HPDinterval(mRR.sim$VCV[,5])[2],HPDinterval(m.cs$VCV[,5])[2] )
#repeatability in each environment
output$R1=c(posterior.mode( as.mcmc(var.r$V1/(var.r$V1+mRR.sim$VCV[,5]) ) ), posterior.mode(m.cs$VCV[,1]/(m.cs$VCV[,1]+m.cs$VCV[,5]) ) )
output$R1.low=c(HPDinterval( as.mcmc(var.r$V1/(var.r$V1+mRR.sim$VCV[,5]) ) )[1], HPDinterval(m.cs$VCV[,1]/(m.cs$VCV[,1]+m.cs$VCV[,5]) )[1] )
output$R1.high=c(HPDinterval( as.mcmc(var.r$V1/(var.r$V1+mRR.sim$VCV[,5]) ) )[2], HPDinterval(m.cs$VCV[,1]/(m.cs$VCV[,1]+m.cs$VCV[,5]) )[2] )
output$R2=c(posterior.mode( as.mcmc(var.r$V2/(var.r$V2+mRR.sim$VCV[,5]) ) ), posterior.mode(m.cs$VCV[,4]/(m.cs$VCV[,4]+m.cs$VCV[,5]) ) )
output$R2.low=c(HPDinterval( as.mcmc(var.r$V2/(var.r$V2+mRR.sim$VCV[,5]) ) )[1], HPDinterval(m.cs$VCV[,4]/(m.cs$VCV[,4]+m.cs$VCV[,5]) )[1] )
output$R2.high=c(HPDinterval( as.mcmc(var.r$V2/(var.r$V2+mRR.sim$VCV[,5]) ) )[2], HPDinterval(m.cs$VCV[,4]/(m.cs$VCV[,4]+m.cs$VCV[,5]) )[2] )
#cross-environmental correlation
output$r12=c(posterior.mode(as.mcmc(var.r$r12)),posterior.mode(m.cs$VCV[,2]/sqrt(m.cs$VCV[,1]*m.cs$VCV[,4]) ) )
output$r12.low=c(HPDinterval(as.mcmc(var.r$r12))[1],HPDinterval(m.cs$VCV[,2]/sqrt(m.cs$VCV[,1]*m.cs$VCV[,4]) )[1] )
output$r12.high=c(HPDinterval(as.mcmc(var.r$r12))[2],HPDinterval(m.cs$VCV[,2]/sqrt(m.cs$VCV[,1]*m.cs$VCV[,4]) )[2] )
data.frame(output) #present a summary of the outputs produced by these approaches
#######5#######################################################
## raw data correlation between E's (for comparison)
###############################################################
ind.data<-aggregate(sim.data$z,list(sim.data$ind,sim.data$E),mean) #mean z per E per individual
cor(ind.data[1:Nind,3],ind.data[(Nind+1):(2*Nind),3]) #raw data individual-level correlation between E1 and E2
#####6###################################################
## plotting the output of the RR model
########################################################
### following gives an example of plotting the data and the random regression expectations on top of it
plot(z ~ jitter(E), data = sim.data, cex.lab = 1.5) #plots the data, a random jitter is added along the X-axis
mus=c(posterior.mode(mRR.sim$Sol[,1]),posterior.mode(mRR.sim$Sol[,1])+posterior.mode(mRR.sim$Sol[,2])) #means in season 0 and 1
#below draws lines in the graph
lines(mus ~ c(0,1), lwd = 2)
#below calculates the expected SD per environment as the sum of the residual and the environment-specific
#between-individual variance
sds=c(sqrt(posterior.mode(as.mcmc(var.r$V1+mRR.sim$VCV[,5]) ) ), sqrt(posterior.mode(as.mcmc(var.r$V2+mRR.sim$VCV[,5]) ) ) )
#draw the expected 95% confidence around the intercepts, assuming the variances are normally distributed
lines(I(mus + 1.96 * sds) ~c(0,1), lty = 2, lwd = 2)
lines(I(mus - 1.96 * sds) ~c(0,1), lty = 2, lwd = 2)
#clearly, a satisfactory fit of the RR should produce an expectation of the means (solid lines) and the variance(dashed
# lines) which reflects the data.
# See Hadfield's MCMCglmm Course notes for another example (type: vignette("CourseNotes", "MCMCglmm") in the R Console)
--- end of code. do not copy this line ---
Table S1. A comparison of estimates for the cross-environmental correlation and other quantities derived from different approaches for simulated data as produced by the supplemented R code. RR stands for random regression, CS for character state approach. VI(E1) and VI(E2) give the between-individual variances in environment 1 and 2 respectively, and VR the residual variance. Data was simulated with VI(E1) = 0.2; VI(E2) = 0.3; VR = 0.8; and a cross-environmental correlation of –0.3. R(E1) and R(E2) are the repeatability in environment 1 (value 0) and environment 2 (value 1) respectively, with rI(E1,E2) the cross-environmental correlation estimated by the model. Between brackets are the 95% credible interval. The code for the Bayesian models which produce the output reported here is provided in this Supplement (data.frame ‘output’).
MethodVI(E1)VI(E2)VRR(E1)R(E2)rI(E1,E2)
Raw data-0.049
Bayesian RR0.11 (7.6E-7, 0.25)0.25 (0.083, 0.40)0.86 (0.76, 1.03)0.11 (8.03E-7, 0.23)0.27 (0.097, 0.34)-0.12 (-0.79, 0.50)
Bayesian CS0.14 (0.073, 0.27)0.24 (0.12, 0.38)0.88 (0.77, 1.00)0.13 (0.075, 0.25)0.24 (0.12, 0.32)-0.14 (-0.50, 0.28)