Chapter 5-10 Linear Regression Robustness to Assumptions
In ANOVA and linear regression, the following assumptions are made (van Belle et al, 2004, p.397):
1. Homogeneity of variance (In one-way ANOVA, each group has the same variance on the
outcome variable. In linear regression with a single continuous predictor, the variance around
the regression line is the same at every point along the X axis.)
2. Normality of the residual error (the distribution of differences between the actual and
predicted values has a normal distribution).
3. Statistical independence of the residual errors (the residuals have no discernable pattern).
4. Linearity of the model (the form Y = a + bX + error is a good representation of the data,
so the variability in Y can be partitioned into these separate terms).
If either or both of the assumptions of homogeneity of variance or normality of the residual error are violated, transformations of the data, such as taking logarithms, are frequently advocated. If the right transformation is selected, either or both assumptions are usually met on the transformed scale.
Statisticians frequently make the comment that t tests, analysis of variance (ANOVA), and linear regression are robust to the assumptions of homogeneity of variance (equal variance) and normality. These two assumptions are the focus of this chapter, which provides authoritative references to back up the robustness claim.
These robustness claims originally came from statistical papers on ANOVA. However, ANOVA is simply a special case of linear regression, as we saw in Chapter 5-4). Further, an independent groups t test is just a one-way ANOVA comparing two means. So, the robustness described in this chapter applies to the t test, ANOVA, and linear regression.
______
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010.
Homogeneity of Variance
Lorenzen and Anderson (1993, p.35) state,
“Historically, the assumption thought to be the most critical was the homogeneity of variance. However, Box (1954) demonstrated that the F test in ANOVA was most robust for a while working with a fixed model having equal sample sizes. He showed that for a relatively large (one variance up to nine times larger than another) departures from homogeneity, the a level may only change from .05 to about .06. This is not considered to be of any practical importance. (It should be pointed out that the only time an a level increased dramatically was when the sample size was negatively correlated with the size of the variance.)”
To put this in simpler terms, statisticians concern themselves with insuring that significance tests do not increase the type I error rate. Violations of assumptions, then, are of a concern if they lead to a rejection of a true null hypothesis more frequently than a, almost always set at 0.05. Box showed that you could have very large departures from homogeneity of variance without affecting the alpha level in any appreciable way.
When the homogeniety of variance assumption is violated, using a test such as Levene’s test of homogeneity, it is frequently advised by authors of statistics textbooks to transform the outcome variance. Lorenzen and Anderson (1993, p.35) offer this advice,
“When there are large departures from homogeneity, it is felt that the data should be transformed to produce more meaningful results. However, one must take in the interpretation of the results after transforming since transforming also changes the form of the mathematical model. To our knowledge, no one has come up with an a level on homogeneity tests that protects against too much heterogeneity. A set of working rules that seems to be effective for the practitioner is as follows:
1. If the homogeneity test is accepted at a = .01, do not transform.
2. If the homogeneity test is rejected at a = .001, transform.
3. If the result of the homogeneity test is between a = .01 and a = .001, try very hard to find out the theoretical distribution from the investigator. If there is a practical reason to transform and the transformed variable makes sense, go ahead and transform. Otherwise, we recommend not transforming.”
In discussing the assumptions of classical hypothesis tests (t test, ANOVA, linear regression) van Belle (2002, p.10) states,
“The second condition for the validity of tests of hypotheses is that of homogeneity of variance. Box (1953) already showed that hypothesis tests are reasonably robust against heterogeneity of variance. For a two-sample test a three-fold difference in variances does not affect the probability of a Type I error. Tests of equality of variances are very sensitive to departures from the assumptions and usually don’t provide a good basis for proceeding with hypothesis tests. Box (1953) observed that,
...to make the preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean linear to leave port!”
Normality
Lorenzen and Anderson (1993, p.41) state,
“Generally speaking, the F ratio used in the analysis of variance has been shown to be very robust to departures from normality, Eisenhart (1947).”
In discussing commonly used tests of means, Box (1953) states,
“...thanks to the work of Pearson (1931), Bartlett (1935), Geary (1947), Gayen (1950 a, b), David & Johnson (1951 a, b) there is abundant evidence that these comparative tests on means are remarkably insensitive to general* non-normality of the parent population.
____
*By ‘general’ parent non-normality is meant that the departure from normality, in particular skewness, is the same in the different groups, as could usually be assumed when the data were from an experiment in which the groups corresponded with different applied treatments to be compared. In tests in which sample means are compared, general skewness tends to be cancelled out; larger effects are found, however, if the skewness is in different directions in the different groups.
References
Pearson, E.S. (1931). Biometrika, 23, 114.
Bartlett, M.A. (1935). Proc. Camb. Phil. Soc. 31, 223.
Geary, R.C. (1947). Biometrika, 34, 209.
Gayen, A.K. (1950a). Biometrika, 37, 236.
Gayen, A.K. (1950b). Biometrika, 37, 399.”
In discussing and contrasting independence of observations, homogeneity of variance, and normality in t tests, ANOVA, and linear regression, van Belle (2002, p. 10) states,
“Normality is the least important in tests of hypotheses. It should be noted that the assumption of normality deals with the error term of the model, not the original data. This is frequently forgotten by researchers who plot histograms of the raw data rather than the residuals from the model....”
Note: van Belle points out that the assumption of normality is actually that the residuals are normality distributed, not the variables themselves. However, if you use indicator variables to model the groups, the regression line goes directly through the group means, and so the residuals will have the same distributional shape as the outcome variable when you examine the outcome variable separately for each group.
Just how far from normality can one go without creating a problem? It is not so simple to quantify a departure from normality, but a contrast to the homogeneity of variance assumption can be made. We saw above that for the homogeneity of variance assumption, the variance of one group could be three-times the variance of another group without changing alpha (the type I error) at all, and could be nine-times the variance and only change alpha from 0.05 to 0.06, a change of no importance. What might seem like appalling violations, then, are of no consequence due to the robustness of the hypothesis tests. In contrast, the hypothesis tests are even more robust to the assumption of normality. van Belle (2002, p.8) explains,
“Many investigators have studied the issue of the relative importance of the assumptions underlying hypothesis tests (see, Cochran, 1947; Gastwirth and Rubin, 1971; Glass et al., 1972; Lissitz and Chardoes, 1975; Millard et al., 1985; Pettitt and Siskind, 1981; Praetz, 1981; Scheffé, 1959; and others). All studied the effects of correlated errors on classical parametric and nonparametic tests. In all of these studies, positive correlation resulted in an inflated Type I error level, and negative correlation resulted in a deflated Type I error level. The effects of correlation were more important than differences in variances between groups, and differences in variances were more important than the assumption of a normal distribution.”
What To Do With This Knowledge
Your model is probably fine in most cases, if your data are somewhat symmetrical and skewness does not occur in opposite directions for your study groups.
There is no need to rush into transformations, which can be avoided in most cases.
Still, many journal reviewers are not aware of the robustness of linear regression to violations of normality and homogeneity of variance, since this is not taught in introductory statistics courses, so somethings you get a reviewer response to your manuscript asking if you tested the assumptions of normality and equal variances.
As a final note, even when aware that linear regression is robust to the equal variance and normality assumptions, statisticians will sometimes still go ahead and try a transformation when the assumptions are violated. This is because an increase in precision is frequently gained by the transformation, making it easier to achieve statistical significance. We will see an example of this in the “modeling costs” chapter, where a log transformation of the hospitalizaiton cost variance shrinks the variance considerably.
References
Box, GEP. (1953). Non-normality and tests on variances. Biometrika 40:318-335.
Box, GEP. (1954). Some theorems on quadratic forms applied in the study of analysis of variance
problems, I. Effect of inequality of variance in the one-way classification. Annals of
Mathematical Statistics, 25: 290-302.
Cochran WG. (1947). Some consequences when the assumptions for the analysis of variance are
not satisified. Biometrics 3:22-38.
Eisenhart C. (1947). The assumptions underlying the analysis of variance, Biometrics 3;1-21.
Gastwirth JL, Rubin H. (1971). Effect of dependence on the level of some one-sample tests.
Journal of the American Statistical Association. 66:816-820.
Glass GV, Peckham PD, Sanders JR. (1972). Consequences of failure to meet the assumptions
underlying the fixed effects analysis of variance and covariance. Reviews in Educational
Research. 42:237-288.
Lassitz RW, Chardoes S. (1975). A study of the effect of the violation of the assumption of
independent sampling upon the type I error rate of the two group t-test. Educational and
Psychological Measurement. 35:353-359.
Lorenzen TJ, Anderson VL. (1993). Design of Experiments: a No-Name Approach. New York,
Marcel Dekker.
Millard SP, Yearsley JR, Lettenmaier DP. (1985). Space-time correlation and its effect on
methods for detecting aquatic ecological change. Canadian Journal of Fisheries and Aquatic Science. 42:1391-1400. Correction: (1986) 43:1680.
Nawata K, Sohmiya M, Kawaguchi M, et al. (2004). Increased resting metabolic rate in patients
with type 2 diabetes mellitus accompanied by advanced diabetic nephropathy.
Metabolism 53(11) Nov: 1395-1398.
Pettitt AN, Siskind V. (1981). Effect of within-sample dependence on the Mann-Whitney-
Wilcoxon statistic. Biometrika 68:437-441.
Praetz P. (1981). A note on the effect of autocorrelation on multiple regression statistics.
Australian Journal of Statistics. 23:309-313.
Scheffé H. (1959). The Analysis of Variance. New York, John Wiley and Sons.
van Belle G. (2002). Statistical Rules of Thumb. New York, John Wiley & Sons.
van Belle G, Fisher LD, Heagerty PJ, Lumly T. (2004). Biostatistics: A Metholdogy for the
Health Sciences, 2nd ed, Hoboken, NJ, John Wiley & Sons.
Chapter 5-10 (revision 16 May 2010) p. 1