Ronald H. Heck and Lynn N. Tabata Investigating Models with Two or Three Categories 5
EDEP 606: Multivariate Methods (S2013) April 23, 2013

Investigating Models with Two or Three Categories

For the past few weeks we have been working with discriminant analysis. Let’s now see what the same sort of model might look like if we used logistic regression with a dichotomous outcome or multinomial outcome. We could also extend this to the multinomial case (e.g., three job categories), which is similar to the three-group discriminant analyses with which we have been working. We will save that second type of categorical model one for another day, however.

The first type of model considers examining a model where the outcome is dichotomous (coded 0 and 1) outcome. When there are only a two categories in the scale of measurement (e.g., such as no or yes, or three categories (such as clerk, custodian, and manager) another approach is to incorporate a necessary transformation and choice of error distribution directly into the statistical modeling approach (Hox, 2010). These types of model are often referred to as generalized linear models (McCullagh & Nelder, 1989). As Hox notes, generalized linear models make it possible to extend standard regression models in several ways including the inclusion of non-normal error distribution and by using nonlinear link functions; that is, a means of linking expected values of the outcome variable (e.g., a binary, multinomial) to an underlying (latent) variable that represents predicted values of the outcome. This is accomplished through what is referred to as a link function. The probability that y takes on a particular value (e.g., y = 1) can be expressed in terms of an explanatory (or regression) model through the link function, one of which is the logit, which provides log odds coefficients and corresponding odds ratios and is estimated through an iterative algorithm such as maximum likelihood (ML). Generalized linear models, therefore, avoid trying to transform the observed values of a binary or multinomial variable but, instead, apply a transformation to the expected values.

Consider a case where we want to examine whether or not students are proficient in reading[1]. We wish to see if there is a relationship between background variables (i.e., student gender, SES, race/ethnicity) and students’ likelihood to be proficient (coded 1 = proficient, 0 = not proficient). We might simply ask: Are students’ socioeconomic status, gender, and race/ethnicity related to their likelihood of being proficient in reading? We will begin with a simple discriminant analysis where we will try to separate students into two groups based on their gender and socioeconomic status. We probably will not have a great ability to classify since there are likely a number of other variables missing from our model.

Discriminant Analysis

Table 1. Wilks' Lambda
Test of Function(s) / Wilks' Lambda / Chi-square / df / Sig.
1 / .891 / 755.289 / 3 / .000

We can see that the single function (since there are only two groups) is significant, which suggests that the set of three background variables is significantly related to classifying students into the two groups. The canonical correlation is 0.33 (not tabled). In the sample, there are 69.1% proficient and 30.9% who are not proficient (not tabled). If we examine the standardized coefficients we see that SES dominates in classifying students, followed by race ethnicity.

Table 2. Standardized Canonical Discriminant Function Coefficients
Function
1
lowses / .706
female / -.303
minor / .506

We would likely classify almost 70% correctly by chance alone, so we can see that our simple model is really not doing any better than that. In particular, it is not very accurate in classifying individuals who are not proficient (with only 26% of the non-proficient students being correctly classified).

Table 3. Classification Resultsb,c
readprof / Predicted Group Membership / Total
0 / 1
Original / Count / 0 / 529 / 1490 / 2019
1 / 437 / 4072 / 4509
% / 0 / 26.2 / 73.8 / 100.0
1 / 9.7 / 90.3 / 100.0
Cross-validateda / Count / 0 / 529 / 1490 / 2019
1 / 437 / 4072 / 4509
% / 0 / 26.2 / 73.8 / 100.0
1 / 9.7 / 90.3 / 100.0
a. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case.
b. 70.5% of original grouped cases correctly classified.
c. 70.5% of cross-validated grouped cases correctly classified.


Next, we can use SPSS to fit a number of different models with various types of categorical outcomes (e.g., dichotomous, multinomial, count, ordinal). As Hox notes, generalized linear models include the necessary transformation and appropriate error distribution within the statistical model. They have three common components:

·  An outcome variable Y with a specific error distribution with mean and variance ,

·  A linear additive regression equation that produces a transformed predictor of Y, and

·  a link function which connects the expected values of Y to the predicted values of :.

Depending on the sampling distribution of the outcome variable, particular error distributions (e.g., normal, binomial or Bernoulli, Poisson) are incorporated into the particular link function chosen. In the case where the link function is identity and the errors are normally distributed (as when the outcome is continuous), no transformation of the outcome is needed. The generalized linear model simplifies to the familiar multiple regression model. For models where the outcome is categorical, this expected value can be transformed so the predictions are constrained to lie within a particular interval. Generalized linear models therefore make possible the extension of the linear model with a continuous outcome to situations where the outcome has some type of non-normal error distribution. In these cases, the appropriate selection of a non-normal error distribution and relevant nonlinear link function can provide more efficient estimates of model parameters.

Dichotomous Outcome

Where the outcome is dichotomous, the sampling model is binomial (Hox, 2002). One more appropriate choice is logistic regression (e.g., another possibility is probit regression, which uses a probit link function). The link function is a logit function given by =, and the transformed predicted value of the outcome Y can be represented through a linear structural model of the form . Logistic regression is based on the probability that the dichotomous outcome Y is either 0 or 1. If the population proportion of cases for which Y = 1 is defined as the probability, then the probability that Y = 0 can be defined as . This can be expressed as a linear logit equation, where in the binary case, the logit, or log odds, for the event is a linear expression which can be written in terms of a logit function. Taking the logarithm of the odds provides a way of representing the additive effects of the set of predictors on the outcome. The log odds (ηi) for the likelihood of individual i being proficient in reading can be written as follows:

ηi =. (1)

Equation 1 suggests a log odds coefficient is a ratio of two probabilities. The ratio

π/(1 – π) is defined as the odds for y = 1 as opposed to y = 0. For example, if the probability of being proficient is 0.5, then 0.5/0.5 = 1, and the corresponding log (1) = 0. This suggests each even is equally likely to occur. If the probability of being proficient is 0.9, then the odds will be greater than 1.0 (0.9/1-0.9 = 9), and log(9) = 0.602. Conversely, if the probability is less than 0.5 the odds Y = 1 will be less than 1.0. Therefore, although the predicted value for the transformed outcome can take on any real value, the probability y = 1 will vary between 0 and 1. The usual residual variance (ei in a typical regression model) is not included in the logistic regression model represented in Eq. 1 because for a binomial distribution the residual variance is a function of the population mean (or proportion) and cannot be estimated separately (Hox, 2002). It is typically set to a scale factor of 1.0, which suggests it does not need to be interpreted (Hox, 2010).

One of the desirable features of the logit link is that odds ratios can be obtained (β), where e is approximately 2.71828 andis the specific log odds coefficient (so if the log odds of = 0, the odds ratio equals 1). Odds ratios are typically easier to interpret than log odds. The log odds can also be used to estimate the predicted probability Y = 1 from the log odds coefficient.

(2)

where βs are logistic regression coefficients for the intercept and two covariates. This model assures that whether the predicted value for is positive, negative, or zero, the resulting predicted probability will lie between 0 and 1. In the table following, where we just have the simple intercept log odds for the proficiency example (-0.803), we can estimate the predicted probability of being proficient as 0.691 (where 2.71828-(0.803) =0.448) and 1/1.448 = 0.691.

Table 4. Parameter Estimates
Parameter / B / Std. Error / 95% Wald Confidence Interval / Hypothesis Test
Lower / Upper / Wald Chi-Square / df / Sig.
(Intercept) / .803 / .0268 / .751 / .856 / 900.283 / 1 / .000
(Scale) / 1a
Dependent Variable: readprof
Model: (Intercept)
a. Fixed at the displayed value.

Ronald H. Heck and Lynn N. Tabata Investigating Models with Two or Three Categories 5
EDEP 606: Multivariate Methods (S2013) April 23, 2013

Ronald H. Heck and Lynn N. Tabata Investigating Models with Two or Three Categories 5
EDEP 606: Multivariate Methods (S2013) April 23, 2013

The estimated probability of a student being proficient in math (0.691) matches the percentage proficient in the table below.

Table 5. Categorical Variable Information
N / Percent
Dependent Variable / readprof / 0 / 2019 / 30.9%
1 / 4509 / 69.1%
Total / 6528 / 100.0%

When we estimate this preliminary model, we obtain the following results. First, it is important to note that the defaults in IBM SPSS use the last category as the reference group. For example, in the model, this would be students with low SES background, females, minority by race/ethnicity, and proficient in reading (coded 1). We can, however, change the reference group to be the groups coded 0. This is consistent, for example, with dummy coding, where the reference category is coded 0 and the named category (e.g., female) is coded 1. We can then obtain estimates for the categories coded 1 (as shown in the table below).

Ronald H. Heck and Lynn N. Tabata Investigating Models with Two or Three Categories 5
EDEP 606: Multivariate Methods (S2013) April 23, 2013

Table 6. Parameter Estimates
Parameter / B / Std. Error / 95% Wald Confidence Interval / Hypothesis Test / Exp(B) / 95% Wald Confidence Interval for Exp(B)
Lower / Upper / Wald Chi-Square / df / Sig. / Lower / Upper
(Intercept) / 1.545 / .0562 / 1.435 / 1.656 / 754.915 / 1 / .000 / 4.690 / 4.200 / 5.236
[lowses=1] / -1.053 / .0589 / -1.169 / -.938 / 320.063 / 1 / .000 / .349 / .311 / .391
[lowses=0] / 0a / . / . / . / . / . / . / 1 / . / .
[female=1] / .465 / .0572 / .353 / .577 / 66.013 / 1 / .000 / 1.592 / 1.423 / 1.781
[female=0] / 0a / . / . / . / . / . / . / 1 / . / .
[minor=1] / -.770 / .0592 / -.886 / -.654 / 169.181 / 1 / .000 / .463 / .413 / .520
[minor=0] / 0a / . / . / . / . / . / . / 1 / . / .
(Scale) / 1b
Dependent Variable: readprof
Model: (Intercept), lowses, female, minor
a. Set to zero because this parameter is redundant.
b. Fixed at the displayed value.

Ronald H. Heck 6
EDEP 606: Multivariate Methods (2013) April 23, 2013

We can interpret the intercept (1.545) as the predicted log odds of Y = 1 if all the predictors were equal to 0. That individual would be average/high SES (0), male (0), and not a minority by race/ethnicity (0). Regarding gender, since males as the reference group, the table suggests females are significantly more likely to be proficient in reading ( = 0.465, p < .01) than males. The odds ratio (1.592) suggests that females have about a 59.2% increased likelihood to be proficient compared with males. Stated differently, the odds of being proficient are increased by about 1.6 times for females compared to males. For comparative purposes, if the odds ratio was 2:1, this would represent a 100% increase in the odds of being proficient compared to males. In contrast, low SES is negatively related to likelihood to be proficient (= -1.053, p < .01). Expressed as an odds ratio, the odds of a student who is low SES being proficient are reduced by a factor of 0.349 (or 65.1%) compared to the reference group. Odds ratios below 1.0 are sometimes easier to explain by converting them into the odds of not being proficient. This can be accomplished by dividing 1 by the obtained odds ratio (1/.349). This will change the odds ratio to 2.865, representing the odds of being not proficient. This suggests the odds of being proficient for low SES students are reduced by almost 2.9 times compared with their peers of average or high SES background.

In addition to examining the statistical significance of the individual variables, we can also obtain other information to evaluate how well the model fits the data., such as the ability to classify individuals correctly into likelihood to be proficient or not, or obtaining the model’s log likelihood, which can be used to evaluate a series of model tests. Models with smaller log likelihoods represent better-fitting models. Using the overall percentage of individuals correctly classified, in the table below, we note this model also correctly classified 70.5% of the participants (again with greater accuracy for proficient students).