Lecture & Examples

Topic 4: Checking the Model Assumptions with

Residual Plots

We can use the least-squares method to estimate the unknown parameters--b0, b1, b2, …, bk, and s2--in a multiple regression model only when the assumptions we made about the error term of the probabilistic model are adequate. Therefore, it is very important for us to check these assumptions. Recall that the assumptions about the error term are:

·  E(e) = 0: the mean of the random error term is zero

·  Var(e) = s2: the variance of the probability distribution of, e, is s2

·  e’s are independent

·  have a normal distribution

This lecture we introduce the residual plot as a tool to examine these assumptions. Since the actual random error associated with a particular value of y is the difference between the actual y value and the estimated mean, . We call this difference, , the regression residual, or simply the residual.

The regression residuals have the following properties:

·  A residual is equal to the difference between the observed y value and its estimated mean, i.e., Residual = .

·  The sum of all residuals is equal to zero, i.e.,

·  The standard deviation of residuals is equal to the standard of the fitted model. Thus,

We can use the following steps to perform a residual analysis.

1.  Produce a residual plot with some statistical software packages.

2.  You can add two lines--one three standard deviations above zero and another three standard deviations below zero--to perform outlier analysis.

3.  When the model assumptions are adequate, we should not see any special pattern on a residual plot. When we detect any special data pattern on a residual plot, we know the model is not adequate and we need to search for a better model.

4.  We can also use the box plot to detect the symmetry property of the residuals, and box plots provide a better way to detect the outlier as well.

We will use the SAS package and the following two examples to show you how to perform residual analysis.

Example 12.8: (Continuation of Example 12.1)

(a) Analyze the residual plot for variable x1 and determine whether visual evidence exists for any nonrandom pattern.

Solution: No, there is not any nonrandom pattern. The residuals appear to be randomly distributed around the 0 line, as expected. Also, there is not any residual located outside the 3 standard deviation lines. This means there is not any suspicious outlier.

(b) Analyze the residual plot for variable x2 and determine whether visual evidence exists for any nonrandom pattern.

Solution: No, there is not any nonrandom pattern. The residuals appear to be randomly distributed around the 0 line, as expected. Also, there is not any residual located outside the 3 standard deviation lines. This means there is not any suspicious outlier.

(c) Analyze the residual plot for variable x3 and determine whether visual evidence exists for any nonrandom pattern.

Solution: No, there is not any nonrandom pattern. The residuals appear to be randomly distributed around the 0 line, as expected. Also, there is not any residual located outside the 3 standard deviation lines. This means there is not any suspicious outlier.

(d) Analyze the residual plot for variable x4 and determine whether visual evidence exists for any nonrandom pattern.

Solution: No, there is not any nonrandom pattern. The residuals appear to be randomly distributed around the 0 line, as expected. Also, there is not any residual located outside the 3 standard deviation lines. This means there is not any suspicious outlier.

Example 12.9:

(a) Analyze the following residual plot.

Solution: The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved shape, with the residuals for the small values of x above the horizontal 0 line, the residuals corresponding to the middle values of x below the 0 line, and the residuals for the largest values of x again above the zero line.

(b) Analyze the following residual plot.

Solution: The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a linear shape, with the residuals for the small values of x below the horizontal 0 line and the residuals for the large values of x above the zero line.

(c) Analyze the following residual plot.

Figure 12.7 Residual Plot of X2 for Example 12.9 (c)

Solution: This residual plot looks fine.

8