Ronald Heck Week 12: Class Notes 1

EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 2, 2012

Class Notes

Examining Repeated Measures Data on Individuals

Generalized linear mixed models (GLMM) also provide a means of incorporating longitudinal designs with categorical outcomes into situations where there are clustered data structures. One of the attractive properties of the GLMM is that it allows for linear as well as non-linear models under a single framework which will address issues of clustering. It is possible to fit models with outcomes resulting from various probability distributions including normal (or Gaussian), inverse Gaussian, gamma, Poisson, multinomial, binomial, and negative binomial through an appropriate link function. At level 1, repeated observations (e.g., students’ proficiency status in math, students’ enrollment over successive semesters in college, changes in clinical or health status) are nested within individuals, perhaps with additional time-varying covariates. At level 2, we can define variables describing differences between individuals (e.g., treatment groups, participation status, subject background variables and attitudes).

Generalized Estimating Equations
Alternatively, by using the Generalized Estimated Equations (GEE) approach, we can examine a number of categorical measurements nested within individuals (i.e., individuals represent the clusters), but where individuals themselves are considered to be independent and randomly sampled from a population of interest. More specifically, in this latter type of model, the pairs of dependent and independent variables (;) for individuals are assumed to be independent and identically distributed (Ziegler, Kastner, & Blettner, 1998) rather than clustered within organizations. GEE is used to characterize the marginal expectation of a set of repeated measures (i.e., average response for observations sharing the same covariates) as a function of a set of study variables. As a result, the important point is that the growth parameters are not assumed to vary randomly across individuals (or higher groups) as in a typical random-coefficients (or mixed) model. This is an important distinction between the two types of models to keep in mind—that is, while random-coefficient models explicitly address variation across individuals as well as clustering among subjects in higher-order groups, GEE models assume simple random sampling of subjects representing a population as opposed to at set of higher-order groups. Hence, GEE models provide what are called “population average” results; that is, they model the marginal expectation as a function of the explanatory variables. In contrast, typical multilevel model provide “unit specific” results.

Regression coefficients based on population averages (GEE) will be generally similar to unit-specific (random-effect models) coefficients but smaller in size (Raudenbush & Bryk, 2002). This distinction does not arise in models with continuous outcomes and identity link functions. For example, for a GEE model, the odds ratio is the average estimate in the population—that is, the expected increase for a unit change X in the population. In contrast, in random-effect (unit-specific) models, the odds ratio will be the subject-specific effect for a particular level of clustering (i.e., the person or unit of clustering) given a unit change in X.

We first begin with a within- and between-subjects model estimated using the GEE (or fixed-effect) approach. GEE was developed to extend GLM further by accommodating repeated categorical measures, logistic regression, and various other models for time series or other

correlated data where relationships between successive measurements on the same individual are assumed to influence the estimation of model parameters (Horton & Lipsitz, 1999; Liang & Zeger, 1986; Zeger, Liang, & Albert, 1988). The GEE analytic approach handles a number of different types of categorical outcomes, their associated sampling distributions, and corresponding link functions. It is suitable to use where the repeated observations are nested within individuals over time, but the individuals are considered to be a random sample of a population.

One scenario is where individuals are randomly assigned to treatment conditions that unfold over time. If the outcome is a count, we can make use of an additional “exposure” parameter (i.e., referred to as an offset term) which as you will recall is a "structural" predictor that can be added to the model. Its coefficient is not estimated by the model but is assumed to have the value 1.0; thus, the values of the offset are simply added to the linear predictor of the dependent variable. This extra parameter can be especially useful in Poisson regression models, where each case may have different levels of exposure to the event of interest.At present in IBMSPSS, the GEE approach only accommodates a two-level data hierarchy (measurements nested in individuals). If we intend to add a group-level variable, we would need to use GENLIN MIXED to specify the group structure.

Students’ Proficiency in Reading Over Time

Consider a study to examine students’ likelihood to be proficient in reading over time and to assess whether their background might affect their varying patterns of meeting proficiency or not. We may first be interested in answering whether a change takes place over time in students’ likelihood to be proficient. This concern addresses whether the probability of a student being proficient is the same or different over the occasions of measurement. The assumption is that if we can reject the hypothesis that the likelihood of being proficient is the same over time, it implies that a change in individuals has taken place. In this situation, occasions of measurement are assumed to be nested within subjects but independent between subjects.

We may have a number of research questions we are interested in examining such as the following: “What is the probability of students being proficient in reading over time? Do probabilities of being proficient change over time? What do students’ trends look like over time? Are there between-individual variables that explain students’ likelihood to be proficient over time?”

Vertical Alignment of Data Within Individuals

The data in this study consist of 2,228 individuals who were measured on four occasions regarding their proficiency in reading. To examine growth within and between individuals using GEE (or GENLIN MIXED), the data must first be organized differently (see Chapter 2 in the text). The time-related observations must be organized vertically, which will require four lines for each subject, since there are four repeated observations regarding proficiency. You will recall that an intercept is defined as the level of Y when X (Time) is 0. For categorical outcomes, the time variable functions to separate contrasts between time, for example, between a baseline measurement and end of a treatment intervention or to examine change over a particular time period. This coding pattern for Time (0, 1, 2, 3) identifies the intercept in the model as students’

initial (time1) proficiency status (i.e., since it is coded 0, and the intercept represents the individual’s status when the other predictors are 0). This is the most common type of coding for models involving individual change.

There are several important steps that must be specified in conducting the analysis.Users identify the type of outcome and appropriate link function, define the regression model, select the correlation structure between repeated measures, and select either model-based or robust standard errors. There are a number of different ways to notate the models. We will let be the dichotomous response at time t (t = 1,2,…, T ) for individual i (i = 1,2,…, N), where we assume the observations of different individuals are independent, but we allow for an association between the repeated measures for the same subject. This will allow us later in the chapter to add the subscript j to define random effects of individuals nested within groups such as classrooms or schools. We assume the following marginal regression model for the expected value of :

whereis a (p +1) x 1 vector (prime designates a vector) of covariates for the ith subject on the tth measurement occasion (t = 1,2,…, T), represents the corresponding regression parameters, and refers to one of several corresponding link functions, depending on the measurement of. This suggests that the data can be summarized to the vectorand the matrix. The slopecan be interpreted as the rate of change in the population-averaged with (Zeger et al., 1988). Typically, the parameters are constant for all t (Ziegler et al., 1998). Where the data are dichotomous, the marginal mean—a probability—is most commonly modeled via the logit link (i.e., whether a child is proficient or not at time t). The coefficients are then interpreted as log odds.

For the Bernoulli case (i.e., where the number of trials is 1), has a binomial distribution with probability of success and variance of π(1-π). For binary data with the logit link function, we have the familiar

= ,

where is the underlying transformed predictor of , in this case, the log of the odds of . It should again be noted that the model represents a ratio of the probability of the event coded 1 occurring versus the probability of the event coded 0 occurring at a particular time point. There is no residual variance parameter (), as the variance is related to the expected value of and therefore cannot be uniquely defined.

In the first model, we specify the repeated measures outcomes in two parameters which describe the intercept and time-related slope as follows:

,

where time is coded to indicate the interval between successive measurements,is an intercept anddescribes the rate of change on a logit scale in the fraction of positive responses in the population of subjects per unit time, rather than the typical change for an individual subject. As the above equation suggests, is the log odds of response when time is 0 (i.e, initial status). In this case, is the log odds associated with a one-year interval. The model assumes there are no between-subject random effects; therefore, there are two parameters to estimate. Since this is a single-level model, for convenience we’ll drop the subscripts referring to the predictors.

Correlation Structures Between Repeated Measures

It is possible to specify several different types of correlation structures to describe the within-subject dependencies over time. However, because one does not often know what the correct structure is ahead of time, different choices can make some difference in the model’s parameter estimates; therefore, the structure is chosen to improve efficiency. It often does take a bit of preliminary work to determine the optimal working correlation matrix for a particular data structure. Examples of GEE correlation/covariance structure specifications include independence, exchangeable, autoregressive, stationary m-dependent, and unstructured.

The independent matrix assumes that the repeated measurements are uncorrelated; however, this will not be the case in most instances. Generally, in longitudinal models the successive measurements are correlated at least to some extent. An exchangeable (or compound symmetry) covariance (or correlation) matrix assumes homogenous correlations between elements (which is sometimes difficult to assume in longitudinal studies); that is, the correlations are assumed to be the same over time. This can sometimes be difficult to support in a longitudinal study, however.

The autoregressive, or AR(1) matrix, assumes the repeated measures have a first-order autoregressive structure. This implies that the correlation between any two adjacent elements is equal to(rho), tofor elements separated by a third, and so on, withconstrained such that -1< <1.

An m-dependent matrix assumes consecutive measurement have a common correlation coefficient, pairs of measurements separated by a third have a common correlation coefficient, and so on, through pairs of measurements separated by m-1other measurements. Where measurements are note evenly spaced, it may be reasonable to consider a model where the correlation is a function of the time between observations (i.e., M-dependent or autoregressive). Measurements with greater separation are assumed to be uncorrelated. When choosing this structure, specify a value of m less than the order of the working correlation matrix.

Finally, an unstructured correlation (or covariance) matrix provides a separate coefficient for each covariance. As with cross-sectional models, we have found that model estimates can vary slightly according to the matrix structure specified.

Standard Errors and Estimation

Model-based standard errors are based on the correlational structure chosen. Hence, they may be inconsistent if the correlation structure is incorrectly specified. They are usually a little smaller than the robust standard errors (SEs). For smaller numbers of clusters, model-based SEs are generally preferred over robust SEs. In contrast, robust standard errors vary only slightly depending on the choice of hypothesized correlational structure among the repeated measures; that is, the estimates are consistent even if the correlational structure is specified incorrectly. The robust SE approach uses a “sandwich” estimator based on an approximation to maximum likelihood. Because of this, there can be occasions that occur when one approach will converge and the other may not. Robust standard errors are often preferred when the number of clustered observations is large. We will estimate our models in this example using robust standard errors since we have a considerable amount of data. Once again, we note that users should keep in mind that GEE uses a type of quasi-likelihood estimation (as opposed to full information ML), which can make direct model comparison based on fit statistics that depend on the real likelihood (e.g., deviance, AIC, BIC) not very accurate (Hox, 2010).

Table 1. Model Information
Dependent Variable / readprofa
Probability Distribution / Binomial
Link Function / Logit
Subject Effect / 1 / Id
Within-Subject Effect / 1 / Time
Working Correlation Matrix Structure / Exchangeable
a. The procedure models 1 as the response, treating 0 as the reference category.

Table 1 provides information about how the model is defined (e.g., probability distribution and link function, number of effects in the model, type of correlation matrix used to describe within-subject structure). As the output shows, the distribution is binomial and a logit link function is used to transform. The working correlation structure is exchangeable, which is the same as compound symmetry. This implies that the correlations are the same over each time interval. We can subsequently investigate whether this is a viable assumption for these data.

Next, we can observe how many of the total cases for the dependent variable (reading proficiency) are coded 1 (proficient) versus 0 (not proficient). As the table suggests, across the four time periods, an average 68% of the individuals were proficient and 32% were not.

Table 2. Reading Proficiency Information
N / Percent
Dependent Variable / readprof / 0 / 2850 / 32.0%
1 / 6062 / 68.0%
Total / 8912 / 100.0%

If we did not include the time variable, the log odds intercept would be 0.755 (not tabled) which would be the grand mean log odds coefficient across the four time periods. We can translate the odds ratio back to the predicted population probability of = 1 [odds/(1+odds)], which would be 2.128/3.128, or 0.680, which fits with the Table 2 estimate.

Next in Table 3 are the fixed effect results for the intercept and the time-related predictor. The estimated intercept log odds coefficient is 0.838, which because of the coding of the time variable (i.e., 0, 1, 2, 3), can be interpreted as thepercentage of individuals who are proficient at the start of the study. The intercept represents the predicted log odds when any variables in the model are 0. If we exponentiate the log odds, we obtain the corresponding odds ratio of 2.311. This suggests individuals are almost 2.3 times more likely to be proficient than non-proficient at the beginning of the study (.70/.30 ~2.3).

Table 3. Parameter Estimates
Parameter / B / Std. Error / 95% Wald Confidence Interval / Hypothesis Test / Exp(B) / 95% Wald Confidence Interval for Exp(B)
Lower / Upper / Wald Chi-Square / Df / Sig. / Lower / Upper
(Intercept) / .838 / .0478 / .744 / .931 / 306.910 / 1 / .000 / 2.311 / 2.104 / 2.538
Time / -.055 / .0165 / -.087 / -.022 / 11.021 / 1 / .000 / .947 / .916 / .978
(Scale) / 1
Dependent Variable: readprof
Model: (Intercept), time

Regarding the time variable, the coefficient suggests that over each interval students’ likelihood of being proficient decreases significantly (log odds = -0.055, p < .001). We can translate this into a predicted probability by adding it to the intercept. Initially (i.e., at time = 0), the log odds of being proficient is 0.838. For the second interval (time = 1) the estimated log odds will then be the 0.783 [0.838 + (-0.055) = 0.783]. We could then estimate the new probability as 0.69, which is estimated as follows: 1/[1+(2.71828)-(.783) which reduces to 1/1.457. Note this estimate is slightly different from the actual observed probability in the table below, since there was no actual change that took place between time 0 and time 1. The odds ratio suggests the odds of being proficient are multiplied by .947 (or reduced by 5.3%) over the first interval. We can see in this situation an assumed negative linear time trend in reduced probability of being proficient does not quite fit the data optimally.

Table 4. Proportion of proficient students
Readprof
Time / Mean / N / Std. Deviation
0 / .6984 / 2228 / .459
1 / .6988 / 2228 / .459
2 / .6481 / 2228 / .478
3 / .6755 / 2228 / .468
Total / .6802 / 8912 / .466

In this case, we might decide to code the data somewhat differently to obtain results that model the trend a bit better. We might wish to treat the time-related variable as ordinal (1,2…,C) rather than scale. If we make this change, we will have C-1 estimates, since one category will serve as the reference group. In this case, we will specify “descending” for the factor category order so that the first category (Time = 0) will serve as the reference group. This is the same as creating a series of C-1 dummy variables for a categorical factor and specifying them in the model.

Table 5. Model 1.2 Parameter Estimates
Parameter / B / Std. Error / Hypothesis Test / Exp(B) / 95% Wald Confidence Interval for Exp(B)
Wald Chi-Square / df / Sig. / Lower / Upper
(Intercept) / .840 / .0462 / 330.845 / 1 / .000 / 2.315 / 2.115 / 2.535
[time=3] / -.106 / .0457 / 5.438 / 1 / .020 / .899 / .822 / .983
[time=2] / -.229 / .0445 / 26.440 / 1 / .000 / .795 / .729 / .868
[time=1] / .002 / .0177 / .014 / 1 / .904 / 1.002 / .968 / 1.038
[time=0] / 0a / . / . / . / . / 1 / . / .
(Scale) / 1
Dependent Variable: readprof Model: (Intercept), time (ordinal)
a. Set to zero because this parameter is redundant.

The intercept log odds is now 0.840. This is only slightly different from the last table. If we calculate the predicted probability of being proficient initially (Time = 0), we see it will be or 1/1.432 = 0.698. Note we can also use the odds ratio to estimate the probability (2.315/3.315). This probability is consistent with the observed probability of 0.6984 in the previous table. We can see further that at Time = 1, there was little change in log odds units regarding students’ probability of being proficient (log odds = 0.002, p = .904). At Time 2 (log odds = -0.229, p < .001) and Time 3 (log odds = -0.106, p < .001), however, students were significantly lower in probability of being proficient relative to their proficiency status at Time 0. Regarding the odds ratios (OR), we can interpret the nonsignificant relationship at Time = 1 as indicating there was no significant change in odds of being proficient at Time 1 (OR = 1.002, p = .904). In contrast, the odds of being proficient at Time = 2 versus time 0 are multiplied by 0.795 (or reduced by 20.5%) compared to the initial level. At Time = 3, it suggests that the odds of being proficient at Time 3 versus Time 0 (i.e., initial status intercept) are multiplied by 0.899 (or reduced by 11%).