Chapter 13: Simple Regression Analysis 247
Chapter 13
Simple Regression Analysis
LEARNING OBJECTIVES
The overall objective of this chapter is to give you an understanding of bivariate regression and correlation analysis, thereby enabling you to:
1. Compute the equation of a simple regression line from a sample of data and interpret the slope and intercept of the equation.
2. Understand the usefulness of residual analysis in testing the assumptions underlying regression analysis and in examining the fit of the regression line to the data.
3. Compute a standard error of the estimate and interpret its meaning.
4. Compute a coefficient of determination and interpret it.
5. Test hypotheses about the slope of the regression model and interpret the results.
6. Estimate values of y using the regression model.
CHAPTER OUTLINE
13.1 Introduction to Simple Regression Analysis
13.2 Determining the Equation of the Regression Line
13.3 Residual Analysis
Using Residuals to Test the Assumptions of the Regression Model
Using the Computer for Residual Analysis
13.4 Standard Error of the Estimate
13.5 Coefficient of Determination
Relationship between r and r2
13.6 Hypothesis Tests for the Slope of the Regression Model and Testing the Overall Model
Testing the Slope
Testing the Overall Model
13.7 Estimation
Confidence Intervals to Estimate the Conditional Mean of y: µy/x
Prediction Intervals to Estimate a Single Value of y
13.8 Interpreting the Output
KEY WORDS
coefficient of determination (r2) prediction interval
confidence interval probabilistic model
dependent variable regression analysis
deterministic model residual
heteroscedasticity residual plot
homoscedasticity scatter plot
independent variable simple regression
least squares analysis standard error of the estimate (se)
outliers sum of squares of error (SSE)
STUDY QUESTIONS
1. The process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable is ______.
2. Bivariate linear regression is often termed ______regression.
3. In regression, the variable being predicted is usually referred to as the ______variable.
4. In regression, the predictor is called the ______variable.
5. The first step in simple regression analysis often is to graph or construct a ______.
6. In regression analysis, b1 represents the population ______.
7. In regression analysis, bo represents the sample ______.
8. A researcher wants to develop a regression model to predict the price of gold by the prime interest rate. The dependent variable is ______.
9. In an effort to develop a regression model, the following data were gathered:
x: 2, 9, 11, 19, 21, 25
y: 26, 17, 18, 15, 15, 8
The slope of the regression line determined from these data is ______.
The y intercept is ______.
10. A researcher wants to develop a regression line from the data given below:
x: 12, 11, 5, 6, 9
y: 31, 25, 14, 12, 16
The equation of the regression line is ______.
11. In regression, the value of is called the ______.
12. Data points that lie apart from the rest of the points are called ______.
13. The regression assumption of constant error variance is called ______.
If the error variances are not constant, it is called ______.
14. Suppose the following data are used to determine the equation of the regression line given below:
x: 2, 5, 11, 24, 31
y: 12, 13, 16, 14, 19
= 12.224 + 0.1764 x
The residual for x = 11 is ______.
15. The total of the residuals squared is called the ______.
16. A standard deviation of the error of the regression model is called the ______and is denoted by ______.
17. Suppose a regression model is developed for ten pairs of data resulting in S.S.E. = 1,203. The standard error of the estimate is ______.
18. A regression analysis results in the following data:
x = 276 x2 = 12,014 xy = 2,438
y = 77 y2 = 1,183 n = 7
The value of S.S.E. is ______.
19. The value of Se is computed from the data of question 18 is ______.
20. Suppose a regression model results in a value of Se = 27.9. 95% of the residuals should fall within ______.
21. Coefficient of determination is denoted by ______.
22. ______is the proportion of variability of the dependent variable accounted for or explained by the independent variable.
23. The value of r2 always falls between ______and ______inclusive.
24. Suppose a regression analysis results in the following:
b1 = .19364 y = 1,019
b0 = 59.4798 y2 = 134,451
n = 8 xy = 378,932
The value of r2 for this regression model is ______.
25. Suppose the data below are used to determine the equation of a regression line:
x: 18, 14, 9, 6, 2
y: 14, 25, 22, 23, 27
The value of r2 associated with this model is ______.
26. A researcher has developed a regression model from sixteen pairs of data points. He wants to test to determine if the slope is significantly different from zero. He uses a two-tailed test and a = .01. The critical table t value is ______.
27. The following data are used to develop a simple regression model:
x: 22, 20, 15, 15, 14, 9
y: 31, 20, 12, 9, 10, 6
The observed t value used to test the slope of this regression model is ______.
28. If a = .05 and a two-tailed test is being conducted, the critical table t value to test the slope of the model developed in question 27 is ______.
29. The decision reached about the slope of the model computed in question 27 is to ______the null hypothesis.
ANSWERS TO STUDY QUESTIONS
1. Regression 16. Standard Error of the Estimate, se
2. Simple 17. 12.263
3. Dependent 18. 20.015
4. Independent 19. 2.00
5. Scatter Plot 20. 0 + 55.8
6. Slope 21. r2
7. y Intercept 22. Coefficient of Determination
8. Price of Gold 23. 0, 1
9. –0.626, 25.575 24. .900
10. –1.253 + 2.425 x 25. .578
11. Residual 26. 2.977
12. Outliers 27. 4.72
13. Homoscedasticity, 28. + 2.776
Heteroscadasticity
29. Reject
14. 1.8356
15. Sum of Squares of Error
SOLUTIONS TO ODD-NUMBERED PROBLEMS IN CHAPTER 13
13.1 x x
12 17
21 15
28 22
8 19
20 24
Sx = 89 Sy = 97 Sxy = 1,767
Sx2= 1,833 Sy2 = 1,935 n = 5
b1 = = = 0.162
b0 = = 16.5
= 16.5 + 0.162 x
13.3 (Advertising) x (Sales) y
12.5 148
3.7 55
21.6 338
60.0 994
37.6 541
6.1 89
16.8 126
41.2 379
Sx = 199.5 Sy = 2,670 Sxy = 107,610.4
Sx2 = 7,667.15 Sy2 = 1,587,328 n = 8
b1 = = = 15.24
b0 = = –46.29
= –46.29 + 15.24 x
13.5 Starts Failures
233,710 57,097
199,091 50,361
181,645 60,747
158,930 88,140
155,672 97,069
164,086 86,133
166,154 71,558
188,387 71,128
168,158 71,931
170,475 83,384
166,740 71,857
Sx = 1,953,048 Sy = 809,405 Sx2 = 351,907,107,960
Sy2 = 61,566,568,203 Sxy = 141,238,520,688 n = 11
b1 = = =
b1 = –0.48042194
b0 = = 158,881.1
= 158,881.1 – 0.48042194 x
13.7 Steel New Orders
99.9 2.74
97.9 2.87
98.9 2.93
87.9 2.87
92.9 2.98
97.9 3.09
100.6 3.36
104.9 3.61
105.3 3.75
108.6 3.95
Sx = 994.8 Sy = 32.15 Sx2 = 99,293.28
Sy2 = 104.9815 Sxy = 3,216.652 n = 10
b1 = = = 0.05557
b0 = = –2.31307
= –2.31307 + 0.05557 x
13.9 x y Predicted () Residuals (y – )
12 17 18.4582 –1.4582
21 15 19.9196 –4.9196
28 22 21.0563 0.9437
8 19 17.8087 1.1913
20 24 19.7572 4.2428
= 16.5 + 0.162 x
13.11 x y Predicted () Residuals (y – )
12.5 148 144.2053 3.7947
3.7 55 10.0953 44.9047
21.6 338 282.8873 55.1127
60.0 994 868.0945 125.9055
37.6 541 526.7236 14.2764
6.1 89 46.6708 42.3292
16.8 126 209.7364 –83.7364
41.2 379 581.5868 –202.5868
= –46.29 + 15.24x
13.13 x y Predicted () Residuals (y – )
5 47 42.2756 4.7244
7 38 38.9836 –0.9836
11 32 32.3996 –0.3996
12 24 30.7537 –6.7537
19 22 19.2317 2.7683
25 10 9.3558 0.6442
= 50.5056 – 1.6460 x
No apparent violation of assumptions
13.15
Error terms appear to be non independent
13.17
There appears to be nonlinear regression
13.19 SSE = Sy2 – b0Sy – b1Sxy = 1,935 – (16.51)(97) – 0.1624(1767) = 46.5692
= 3.94
Approximately 68% of the residuals should fall within ±1se.
3 out of 5 or 60% of the actually residuals in 13.1 fell within ± 1se.
13.21 SSE = Sy2 – b0Sy – b1Sxy = 1,587,328 – (–46.29)(2,670) – 15.24(107,610.4) =
SSE = 70,940
= 108.7
Six out of eight (75%) of the sales estimates are within $108.7 million.
13.23 (y – ) (y – )2
4.7244 22.3200
–0.9836 .9675
–0.3996 .1597
–6.7537 45.6125
2.7683 7.6635
0.6442 .4150
S(y – )2 = 77.1382
SSE = = 77.1382
= 4.391
13.25 Volume (x) Sales (y)
728.6 10.5
497.9 48.1
439.1 64.8
377.9 20.1
375.5 11.4
363.8 123.8
276.3 89.0
n = 7 Sx = 3059.1 Sy = 367.7
Sx2 = 1,464,071.97 Sy2 = 30,404.31 Sxy = 141,558.6
b1 = –.1504 b0 = 118.257
= 118.257 – .1504x
SSE = Sy2 – b0Sy – b1SXY
= 30,404.31 – (118.257)(367.7) – (–0.1504)(141,558.6) = 8211.6245
= 40.5256
This is a relatively large standard error of the estimate given the sales values
(ranging from 10.5 to 123.8).
13.27 r2 = = .972
This is a high value of r2
13.29 r2 = = .685
This value of r2 is a modest value.
68.5% of the variation of y is accounted for by x but 31.5% is unaccounted for.
13.31 CCI Median Income
116.8 37.415
91.5 36.770
68.5 35.501
61.6 35.047
65.9 34.700
90.6 34.942
100.0 35.887
104.6 36.306
125.4 37.005
Sx = 323.573 Sy = 824.9 Sx2 = 11,640.93413
Sy2 = 79,718.79 Sxy = 29,804.4505 n = 9
b1 = = =
b1 = 19.2204
b0 = = –599.3674
= –599.3674 + 19.2204 x
SSE = Sy2 – b0Sy – b1Sxy =
79,718.79 – (–599.3674)(824.9) – 19.2204(29,804.4505) = 1283.13435
= 13.539
r2 = = .688
13.33 sb = = .068145
b1 = –0.898
Ho: b = 0 a = .01
Ha: b ¹ 0
Two-tail test, a/2 = .005 df = n – 2 = 7 – 2 = 5
t.005,5 = ±4.032
t = = –13.18
Since the observed t = –13.18 < t.005,5 = –4.032, the decision is to reject the null hypothesis.
13.35 sb = = .27963
b1 = –0.715
Ho: b = 0 a = .05
Ha: b ¹ 0
For a two-tail test, a/2 = .025 df = n – 2 = 5 – 2 = 3
t.025,3 = ±3.182
t = = –2.56
Since the observed t = –2.56 > t.025,3 = –3.182, the decision is to fail to reject the null hypothesis.
13.37 F = 8.26 with a p-value of .021. The overall model is significant at a = .05 but not
at a = .01. For simple regression,
t = = 2.8674
t.05,5 = 2.015 but t.01,5 = 3.365. The slope is significant at a = .05 but not at
a = .01.
13.39 x0 = 100 For 90% confidence, a/2 = .05
df = n – 2 = 7 – 2 = 5 t.05,5 = ±2.015
= 81.57143
Sx= 571 Sx2 = 58,293 Se = 7.377
= 144.414 – .0898(100) = 54.614
± t /2,n–2 se =
54.614 ± 2.015(7.377) =
54.614 ± 2.015(7.377)(1.08252) = 54.614 ± 16.091
38.523 y 70.705
For x0 = 130, = 144.414 – .0898(130) = 27.674
y ± t /2,n–2 se =
27.674 ± 2.015(7.377) =
27.674 ± 2.015(7.377)(1.1589) = 27.674 ± 17.227
10.447 y 44.901
The width of this confidence interval of y for x0 = 130 is wider that the confidence interval of y for x0 = 100 because x0 = 100 is nearer to the value of x = 81.57 than is x0 = 130.
13.41 x0 = 10 For 99% confidence a/2 = .005
df = n – 2 = 5 – 2 = 3 t.005,3 = 5.841
= 8.20
Sx = 41 Sx2 = 421 Se = 2.575
= 15.46 – 0.715(10) = 8.31
± t /2,n–2 se
8.31 ± 5.841(2.575) =
8.31 ± 5.841(2.575)(.488065) = 8.31 ± 7.34
0.97 E(y10) 15.65
If the prime interest rate is 10%, we are 99% confident that the average bond rate
is between 0.97% and 15.65%.
13.43 x y
53 5
47 5
41 7
50 4
58 10
62 12
45 3
60 11
Sx = 416 Sx2 = 22,032
Sy = 57 Sy2 = 489 b1 = 0.355
Sxy = 3,106 n = 8 b0 = –11.335
a) = –11.335 + 0.355 x
b) (Predicted Values) (y–) residuals
7.48 –2.48
5.35 –0.35
3.22 3.78
6.415 –2.415
9.255 0.745
10.675 1.325
4.64 –1.64
9.965 1.035
c) (y – )2
6.1504
.1225
14.2884
5.8322
.5550
1.7556
2.6896
1.0712
SSE = 32.4649
d) se = = 2.3261
e) r2 = = .608
f) Ho: b = 0 a = .05
Ha: b ¹ 0
Two-tailed test, a/2 = .025 df = n – 2 = 8 – 2 = 6
t.025,6 = ±2.447
sb = = 0.116305