Analysis of Covariance (ANCOVA)and Multiple Regression
Example 1 – Gestational Age, Birth Weight, and Mother’s Smoking Status During Pregnancy
Data File:Lowbirthweight.JMP
The variables in this data file are:
- id – identification number of the infant (labeling purpose only)
- headcir – head circumference of infant (nearest in.)
- length – length of infant (nearest in.)
- weight – birth weight (lbs.)
- gest – gestational age of infant (weeks)
- mage – mothers age
- mnocig – daily number of cigarettes during pregnancy, mother
- mheight – mothers height (nearest in.)
- mppwt – mothers pre-pregnancy weight (lbs.)
- fage – fathers age
- fedyrs – fathers education level (yrs.)
- fnocig – daily number of cigarettes, father
- fheight – fathers height (in.)
- lowbwt – low birth weight indicator (1 = yes, 0 = no)
- mage35 – mother over 35 years of age (1 = yes, 0 = no)
- smoker – mother smoked during pregnancy (1 = yes, 0 = no)
There are many questions of interest one could examine using these data. In this analysis we will examine the relationship between birth weight and smoking. We could do this by using a two-sample t-test, either pooled or non-pooled depending upon the equality of population variances, to compare the mean birth weight of infants born to non-smokers vs. smokers. The results of such an analysis are presented below.
Here we can see that the mean birth weight of infants born to smokers is significantly lower than the mean birth weight of infants born to non-smokers (p < .0001). In particular we estimate that the mean birth weight of infants born to smokers is between .33 and .65 lbs. less than the mean birth weight of infants born to non-smokers.
Does this mean that if we compared the population of infants that have the same gestational age those born to smokers will have smaller birth weight by between .33 and .65 lbs. on average when compared to those born to non-smokers?
Perhaps smoking during pregnancy leads to infants being born earlier and hence have a smaller birth weight as result. We cannot tell unless we include information about gestational age in our analysis. To do this we can use Analysis of Covariance (ANCOVA) which is really just multiple regression where one of the predictors/covariates is a factor of interest (i.e. smoking in this example) and the other variables (covariates) are used as “adjustments”. For example, if we include information about gestational age into our model we will be able to say “adjusting for gestational age of the infant, we estimate that the effect of smoking during pregnancy is (fill in the blank) on the birth weight of infants”.
How do we include information about smoking and gestational age in a multiple regression model?
Potential Models
1) Smoking Effects Only
E(Birth Weight| Smoking Status) =
where
so the regression model can expressed separately for smokers and non-smokers as follows:
To fit this model in JMP select Analyze > Fit Model and place weight in the Y box and smoker in the Construct Model Effects box.
Here are the results of fitting the model in JMP using Analyze > Fit Model to fit the smoking status model outlined on the previous page.
2) Both Smoking and Gestational Age (Parallel Lines Model)
E(Birth Weight| Smoking Status, Gest. Age) =
where
and Gest. Age = gestational age of the infant in weeks.
Picture of this model:
To fit this model in JMP we again use Analyze > Fit Model and place weight in the Y box and both smoker and gest in the Construct Model Effects box as shown below.
The results from JMP are shown below.
Our estimated model is:
For smokers we have
For non-smokers we have
Predict the mean birth weight for an infant with a gestational age of 36 weeks born to a smoker.
Predict the mean birth weight for an infant with a gestational age of 36 weeks born to a non-smoker.
95% CI for the “Smoking Effect” for Infants with a Given Gestational Age
What if the effect of gestational age is different for smokers and non-smokers? For example, maybe for smokers an additional week of gestational age does not translate to the same increase in birth weight as it does for non-smokers? What should we do?
3) Both Smoking and Gestational Age (Unrelated Lines Model)
E(Birth Weight| Smoking Status, Gest. Age) = +
For smokers we have the following
For non-smokers we have the following
Picture of the unrelated lines model:
To visualize the unrelated lines model in JMP select Analyze > Fit Y by X to construct the plot of Y vs. X. Next from the pull-down menu in upper left-hand corner of the plot select Group By… and highlight the categorical variable or factor you wish to use in constructing the unrelated lines (see above).
To fit the unrelated lines regression model in JMP select Analyze > Fit Model put weight in the Y box and then highlight both gest and smoker in the list of variables while holding down the CTRL key. Next click on Full Factorial from the Macros pull-down menu which will place the two main effects of gestational age and mothers smoking status along with the interaction between them into the model. The interaction term is used to allow for a potential difference in the effect of gestational age for smokers and non-smokers, i.e. it allows for the slope of the regression lines for smokers and non-smokers to have a different slope.
The resulting output from JMP is shown below.
The estimate regression equation is
E(Birth Weight|Smoking Status, Gest. Age) = -2.04 + .201Smoking Status + .240Gest.Age
- .0183(Gest. Age – 39.77)(Smoking Status)
The interaction term is NOT significant (p = .3616) so we should go with the simpler model (i.e. parallel lines) model.
Quantifying the “Smoking Effect” adjusted for Gestational Age
Adjusting for gestational age we estimate that….
Example 2 – Birth Weight and Smoking Adjusting for all Potential Covariates
We know consider adding all relevant predictors to the model for predicting birth weight.
The list of variables in this data is again presented below.
The variables available:
- headcir – head circumference of infant (nearest in.)
- length – length of infant (nearest in.)
- weight – birth weight (lbs.)
- gest – gestational age of infant (weeks)
- mage – mothers age
- mnocig – daily number of cigarettes during pregnancy, mother
- mheight – mothers height (nearest in.)
- mppwt – mothers pre-pregnancy weight (lbs.)
- fage – fathers age
- fedyrs – fathers education level (yrs.)
- fnocig – daily number of cigarettes, father
- fheight – fathers height (in.)
- mage35 – mother over 35 years of age (1 = yes, 0 = no)
- smoker – mother smoked during pregnancy (1 = yes, 0 = no)
We first fit a large model using most of the available covariates. Rather than use mothers number of cigarettes (mnocig) we are again using the smoking status indicator (smoker). The other infant size measurements, head circumference and length, have also not been included. These would actually be other responses we might wish to examine.
A summary of this model found on the next page.
Summary of Preliminary Model
Backward Elimination
Removing what predictor at a time using (p < .10) to retain a predictor we arrive at the following model.
Looking at specifically at effect of mothers smoking we find.
Conclusion:
After adjusting for gestational age of the infant, mothers height, mothers pre-pregnancy weight, and fathers height that women who smoke during pregnancy will have infants with a mean birth weight between .23 and .52 lbs. less than the mean birth weight of infants born to non-smokers.
1