Chapter 13: Simple Regression Analysis 247

Chapter 13

Simple Regression Analysis

LEARNING OBJECTIVES

The overall objective of this chapter is to give you an understanding of bivariate regression and correlation analysis, thereby enabling you to:

1. Compute the equation of a simple regression line from a sample of data and interpret the slope and intercept of the equation.

2. Understand the usefulness of residual analysis in testing the assumptions underlying regression analysis and in examining the fit of the regression line to the data.

3. Compute a standard error of the estimate and interpret its meaning.

4. Compute a coefficient of determination and interpret it.

5. Test hypotheses about the slope of the regression model and interpret the results.

6. Estimate values of y using the regression model.

CHAPTER OUTLINE

13.1 Introduction to Simple Regression Analysis

13.2 Determining the Equation of the Regression Line

13.3 Residual Analysis

Using Residuals to Test the Assumptions of the Regression Model

Using the Computer for Residual Analysis

13.4 Standard Error of the Estimate

13.5 Coefficient of Determination

Relationship between r and r2

13.6 Hypothesis Tests for the Slope of the Regression Model and Testing the Overall Model

Testing the Slope

Testing the Overall Model

13.7 Estimation

Confidence Intervals to Estimate the Conditional Mean of y: µy/x

Prediction Intervals to Estimate a Single Value of y

13.8 Interpreting the Output

KEY WORDS

coefficient of determination (r2) prediction interval

confidence interval probabilistic model

dependent variable regression analysis

deterministic model residual

heteroscedasticity residual plot

homoscedasticity scatter plot

independent variable simple regression

least squares analysis standard error of the estimate (se)

outliers sum of squares of error (SSE)

STUDY QUESTIONS

1. The process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable is ______.

2. Bivariate linear regression is often termed ______regression.

3. In regression, the variable being predicted is usually referred to as the ______variable.

4. In regression, the predictor is called the ______variable.

5. The first step in simple regression analysis often is to graph or construct a ______.

6. In regression analysis, b1 represents the population ______.

7. In regression analysis, bo represents the sample ______.

8. A researcher wants to develop a regression model to predict the price of gold by the prime interest rate. The dependent variable is ______.

9. In an effort to develop a regression model, the following data were gathered:

x: 2, 9, 11, 19, 21, 25

y: 26, 17, 18, 15, 15, 8

The slope of the regression line determined from these data is ______.

The y intercept is ______.

10. A researcher wants to develop a regression line from the data given below:

x: 12, 11, 5, 6, 9

y: 31, 25, 14, 12, 16

The equation of the regression line is ______.

11. In regression, the value of is called the ______.

12. Data points that lie apart from the rest of the points are called ______.

13. The regression assumption of constant error variance is called ______.

If the error variances are not constant, it is called ______.

14. Suppose the following data are used to determine the equation of the regression line given below:

x: 2, 5, 11, 24, 31

y: 12, 13, 16, 14, 19

= 12.224 + 0.1764 x

The residual for x = 11 is ______.

15. The total of the residuals squared is called the ______.

16. A standard deviation of the error of the regression model is called the ______and is denoted by ______.

17. Suppose a regression model is developed for ten pairs of data resulting in S.S.E. = 1,203. The standard error of the estimate is ______.

18. A regression analysis results in the following data:

x = 276 x2 = 12,014 xy = 2,438

y = 77 y2 = 1,183 n = 7

The value of S.S.E. is ______.

19. The value of Se is computed from the data of question 18 is ______.

20. Suppose a regression model results in a value of Se = 27.9. 95% of the residuals should fall within ______.

21. Coefficient of determination is denoted by ______.

22. ______is the proportion of variability of the dependent variable accounted for or explained by the independent variable.

23. The value of r2 always falls between ______and ______inclusive.

24. Suppose a regression analysis results in the following:

b1 = .19364 y = 1,019

b0 = 59.4798 y2 = 134,451

n = 8 xy = 378,932

The value of r2 for this regression model is ______.

25. Suppose the data below are used to determine the equation of a regression line:

x: 18, 14, 9, 6, 2

y: 14, 25, 22, 23, 27

The value of r2 associated with this model is ______.

26. A researcher has developed a regression model from sixteen pairs of data points. He wants to test to determine if the slope is significantly different from zero. He uses a two-tailed test and a = .01. The critical table t value is ______.

27. The following data are used to develop a simple regression model:

x: 22, 20, 15, 15, 14, 9

y: 31, 20, 12, 9, 10, 6

The observed t value used to test the slope of this regression model is ______.

28. If a = .05 and a two-tailed test is being conducted, the critical table t value to test the slope of the model developed in question 27 is ______.

29. The decision reached about the slope of the model computed in question 27 is to ______the null hypothesis.

ANSWERS TO STUDY QUESTIONS

1. Regression 16. Standard Error of the Estimate, se

2. Simple 17. 12.263

3. Dependent 18. 20.015

4. Independent 19. 2.00

5. Scatter Plot 20. 0 + 55.8

6. Slope 21. r2

7. y Intercept 22. Coefficient of Determination

8. Price of Gold 23. 0, 1

9. –0.626, 25.575 24. .900

10. –1.253 + 2.425 x 25. .578

11. Residual 26. 2.977

12. Outliers 27. 4.72

13. Homoscedasticity, 28. + 2.776

Heteroscadasticity

29. Reject

14. 1.8356

15. Sum of Squares of Error

SOLUTIONS TO ODD-NUMBERED PROBLEMS IN CHAPTER 13

13.1 x x

12 17

21 15

28 22

8 19

20 24

Sx = 89 Sy = 97 Sxy = 1,767

Sx2= 1,833 Sy2 = 1,935 n = 5

b1 = = = 0.162

b0 = = 16.5

= 16.5 + 0.162 x

13.3 (Advertising) x (Sales) y

12.5 148

3.7 55

21.6 338

60.0 994

37.6 541

6.1 89

16.8 126

41.2 379

Sx = 199.5 Sy = 2,670 Sxy = 107,610.4

Sx2 = 7,667.15 Sy2 = 1,587,328 n = 8

b1 = = = 15.24

b0 = = –46.29

= –46.29 + 15.24 x

13.5 Starts Failures

233,710 57,097

199,091 50,361

181,645 60,747

158,930 88,140

155,672 97,069

164,086 86,133

166,154 71,558

188,387 71,128

168,158 71,931

170,475 83,384

166,740 71,857

Sx = 1,953,048 Sy = 809,405 Sx2 = 351,907,107,960

Sy2 = 61,566,568,203 Sxy = 141,238,520,688 n = 11

b1 = = =

b1 = –0.48042194

b0 = = 158,881.1

= 158,881.1 – 0.48042194 x

13.7 Steel New Orders

99.9 2.74

97.9 2.87

98.9 2.93

87.9 2.87

92.9 2.98

97.9 3.09

100.6 3.36

104.9 3.61

105.3 3.75

108.6 3.95

Sx = 994.8 Sy = 32.15 Sx2 = 99,293.28

Sy2 = 104.9815 Sxy = 3,216.652 n = 10

b1 = = = 0.05557

b0 = = –2.31307

= –2.31307 + 0.05557 x

13.9 x y Predicted () Residuals (y – )

12 17 18.4582 –1.4582

21 15 19.9196 –4.9196

28 22 21.0563 0.9437

8 19 17.8087 1.1913

20 24 19.7572 4.2428

= 16.5 + 0.162 x

13.11 x y Predicted () Residuals (y – )

12.5 148 144.2053 3.7947

3.7 55 10.0953 44.9047

21.6 338 282.8873 55.1127

60.0 994 868.0945 125.9055

37.6 541 526.7236 14.2764

6.1 89 46.6708 42.3292

16.8 126 209.7364 –83.7364

41.2 379 581.5868 –202.5868

= –46.29 + 15.24x

13.13 x y Predicted () Residuals (y – )

5 47 42.2756 4.7244

7 38 38.9836 –0.9836

11 32 32.3996 –0.3996

12 24 30.7537 –6.7537

19 22 19.2317 2.7683

25 10 9.3558 0.6442

= 50.5056 – 1.6460 x

No apparent violation of assumptions

13.15

Error terms appear to be non independent

13.17

There appears to be nonlinear regression

13.19 SSE = Sy2 – b0Sy – b1Sxy = 1,935 – (16.51)(97) – 0.1624(1767) = 46.5692

= 3.94

Approximately 68% of the residuals should fall within ±1se.

3 out of 5 or 60% of the actually residuals in 13.1 fell within ± 1se.

13.21 SSE = Sy2 – b0Sy – b1Sxy = 1,587,328 – (–46.29)(2,670) – 15.24(107,610.4) =

SSE = 70,940

= 108.7

Six out of eight (75%) of the sales estimates are within $108.7 million.

13.23 (y – ) (y – )2

4.7244 22.3200

–0.9836 .9675

–0.3996 .1597

–6.7537 45.6125

2.7683 7.6635

0.6442 .4150

S(y – )2 = 77.1382

SSE = = 77.1382

= 4.391

13.25 Volume (x) Sales (y)

728.6 10.5

497.9 48.1

439.1 64.8

377.9 20.1

375.5 11.4

363.8 123.8

276.3 89.0

n = 7 Sx = 3059.1 Sy = 367.7

Sx2 = 1,464,071.97 Sy2 = 30,404.31 Sxy = 141,558.6

b1 = –.1504 b0 = 118.257

= 118.257 – .1504x

SSE = Sy2 – b0Sy – b1SXY

= 30,404.31 – (118.257)(367.7) – (–0.1504)(141,558.6) = 8211.6245

= 40.5256

This is a relatively large standard error of the estimate given the sales values

(ranging from 10.5 to 123.8).

13.27 r2 = = .972

This is a high value of r2

13.29 r2 = = .685

This value of r2 is a modest value.

68.5% of the variation of y is accounted for by x but 31.5% is unaccounted for.

13.31 CCI Median Income

116.8 37.415

91.5 36.770

68.5 35.501

61.6 35.047

65.9 34.700

90.6 34.942

100.0 35.887

104.6 36.306

125.4 37.005

Sx = 323.573 Sy = 824.9 Sx2 = 11,640.93413

Sy2 = 79,718.79 Sxy = 29,804.4505 n = 9

b1 = = =

b1 = 19.2204

b0 = = –599.3674

= –599.3674 + 19.2204 x

SSE = Sy2 – b0Sy – b1Sxy =

79,718.79 – (–599.3674)(824.9) – 19.2204(29,804.4505) = 1283.13435

= 13.539

r2 = = .688

13.33 sb = = .068145

b1 = –0.898

Ho: b = 0 a = .01

Ha: b ¹ 0

Two-tail test, a/2 = .005 df = n – 2 = 7 – 2 = 5

t.005,5 = ±4.032

t = = –13.18

Since the observed t = –13.18 < t.005,5 = –4.032, the decision is to reject the null hypothesis.

13.35 sb = = .27963

b1 = –0.715

Ho: b = 0 a = .05

Ha: b ¹ 0

For a two-tail test, a/2 = .025 df = n – 2 = 5 – 2 = 3

t.025,3 = ±3.182

t = = –2.56

Since the observed t = –2.56 > t.025,3 = –3.182, the decision is to fail to reject the null hypothesis.

13.37 F = 8.26 with a p-value of .021. The overall model is significant at a = .05 but not

at a = .01. For simple regression,

t = = 2.8674

t.05,5 = 2.015 but t.01,5 = 3.365. The slope is significant at a = .05 but not at

a = .01.

13.39 x0 = 100 For 90% confidence, a/2 = .05

df = n – 2 = 7 – 2 = 5 t.05,5 = ±2.015

= 81.57143

Sx= 571 Sx2 = 58,293 Se = 7.377

= 144.414 – .0898(100) = 54.614

± t /2,n–2 se =

54.614 ± 2.015(7.377) =

54.614 ± 2.015(7.377)(1.08252) = 54.614 ± 16.091

38.523 y 70.705

For x0 = 130, = 144.414 – .0898(130) = 27.674

y ± t /2,n–2 se =

27.674 ± 2.015(7.377) =

27.674 ± 2.015(7.377)(1.1589) = 27.674 ± 17.227

10.447 y 44.901

The width of this confidence interval of y for x0 = 130 is wider that the confidence interval of y for x0 = 100 because x0 = 100 is nearer to the value of x = 81.57 than is x0 = 130.

13.41 x0 = 10 For 99% confidence a/2 = .005

df = n – 2 = 5 – 2 = 3 t.005,3 = 5.841

= 8.20

Sx = 41 Sx2 = 421 Se = 2.575

= 15.46 – 0.715(10) = 8.31

± t /2,n–2 se

8.31 ± 5.841(2.575) =

8.31 ± 5.841(2.575)(.488065) = 8.31 ± 7.34

0.97 E(y10) 15.65

If the prime interest rate is 10%, we are 99% confident that the average bond rate

is between 0.97% and 15.65%.

13.43 x y

53 5

47 5

41 7

50 4

58 10

62 12

45 3

60 11

Sx = 416 Sx2 = 22,032

Sy = 57 Sy2 = 489 b1 = 0.355

Sxy = 3,106 n = 8 b0 = –11.335

a) = –11.335 + 0.355 x

b) (Predicted Values) (y–) residuals

7.48 –2.48

5.35 –0.35

3.22 3.78

6.415 –2.415

9.255 0.745

10.675 1.325

4.64 –1.64

9.965 1.035

c) (y – )2

6.1504

.1225

14.2884

5.8322

.5550

1.7556

2.6896

1.0712

SSE = 32.4649

d) se = = 2.3261

e) r2 = = .608

f) Ho: b = 0 a = .05

Ha: b ¹ 0

Two-tailed test, a/2 = .025 df = n – 2 = 8 – 2 = 6

t.025,6 = ±2.447

sb = = 0.116305