STAT 2607 Review Problems Midterm
1.A Life Insurance Company manager believes that the number of weekly sales concluded by his salesmen can be predicted by the number of follow-up phone calls they make to their clients. He collected the following data on 20 salesmen.
Calls: 66 43 57 32 18 59 61 32 48 39 58 54 48 37 29 21 43 62 51 44
Sales: 20 15 18 12 2 21 18 8 14 12 17 16 13 9 9 5 12 18 17 14
where
a)What is the response variable in this problem? What is the explanatory (predictor) variable?
b)State the simple linear regression model for this problem, and give all the other assumptions needed for a complete regression analysis.
c)Find the least squares fitted line. Which of the assumptions in (b), if any, are required for this step?
d)Give the interpretation of b0 and b1for this problem.
e)Why would it not be appropriate, in this problem, to say that b0 estimates the average value of Y when X equals 0?
f)What does it mean when we say that b1 is an unbiased estimator of ? Which of the assumptions given in (b) are needed for this to be true?
Assuming that the SLR model hypothesized in (b) holds with no violation of the assumptions
g)Set up the ANOVA table.
h)Test whether there is a significant linear relationship between sales and follow-up phone calls. Use the F-test with α = .01.
i)What is the estimated average increase in weekly sales for each extra follow-up phone call?
j)What are the corresponding t-test statistic and critical region? Show that the F-test and the t-test are equivalent by (i) proving that (ii) showing that, in this problem, calculated value of equals the calculated value of AND that the critical value of F1,n - 2 equals the square of the critical value of tn - 2.
k)What does measure? Give the estimate of for this problem.
l)What proportion of the variability in weekly sales can be attributed to changes in the number of follow-up phone calls?
m)Give a 99% C.I. estimate for β1.
n)If appropriate, estimate, using a 99% C.I., the average number of weekly sales for salesmen making 30 follow-up calls per week. When would it not be appropriate?
o)If appropriate, estimate with 99% confidence, the number of weekly sales that would be concluded by Mr. Smith if he made 30 follow-up calls.
p)If you were asked to estimate the number of weekly sales of a particular salesman if he were to make 100 follow-up calls per week what would you answer?
2.In a SLR problem you want to test whether there is evidence of a positive slope.
a)Set up the appropriate null and alternative hypotheses.
b)Would both the F-statistic and the t-statistic be useful for this test? Give reasons for your answer.
3.Why is there a difference in the variance of the error for predicting an individual value of Y at X = Xp and the variance of the error for estimating the expected value of Y at X = Xp ?
4.Explain what "residual analysis" is, and what we are looking for when we perform it in a regression problem.
5.If a medical researcher finds a strong positive linear relationship between blood pressure (Y) and age (X) with an R2 value of 0.97 and there are no violations of the required assumptions,
a)Can the researcher use the fitted regression equation to predict blood pressure based on age? Why or why not?
b)Can the researcher conclude that an increase in age causes an increase in blood pressure? Why or why not?
6.Give the assumptions for a complete linear regression analysis. What can be done to check these assumptions?
7.In order to predict Y based on 3 explanatory variables X1, X2 and X3, 10 sample points were collected giving the results shown below. Complete the following ANOVA table.
SourcedfSSMSF
regression582.83
error
______
Total2300.9
8.If = 34 = 308 = 162 = 12264 = 935
The Least Squares fitted line of Y on X is ______
SSR = ______TSS = ______
9.A labour union wishes to determine whether hourly wages can be predicted as a first order linear function of years of schooling, and years of experience on the job. For a random sample of workers the union obtained the following data:
= 103 = 58 = 133.5 = 983 = 360 = 557 = 1644.82 = 1264.45 = 735.7
whereY = hourly wage X1 = years of schooling X2 = years of job experience
a)Give the hypothesized model and state all assumptions necessary for a complete regression analysis.
b)Give the matrix form of the normal equations for finding the estimated regression parameters for this problem (using the values given above).
c)Given
find the fitted regression equation.
d)Set up the ANOVA table and test whether hourly wages are linearly related to years of schooling and years of job experience. Use alpha = .05.
e)Under what conditions could you give an interpretation of b2? Assuming these conditions hold what is the interpretation of b2 in this question?
f)What is the value of the coefficient of determination? What does it measure in this problem?
g)What is MSE estimating?
h)Does X1 make a significant contribution to a model that includes X2? Test at the .05 level of significance.
i)Find a point estimate for the average wage of all employees with 10 years of schooling and 5 years of experience on the job.
10.Explain what you would have to do to carry out a residual analysis for the above problem and and what you would be checking for.
11.Explain the Least Squares Method used for estimating the regression coefficients in a linear regression model.
12.A test question asked students to state the statistical model for a simple linear regression. The answers given included all of the following: i)
ii) iii) iv)
Why are answers (i) to (iii) incorrect?
13.The Least Squares fitted line of Y on X is
where
= 34 = 308 = 162 = 12264 = 935
SSR = 1556.749 TSS = 1723.556
Find the sample correlation coefficient between X and Y. Do the calculation in two different ways.
14.The SAS output for a SLR of Y on X is given below:
a)What is the fitted equation for model 1?
b)Which model assumption seems to be violated here?
c)Give the model that was postulated next (don't forget the assumptions).
d)Does it seem that the violation of part (b) has been remedied? Why or why not?
e)What is the new fitted equation?
f)Compare the coefficient of determination for the equations. Which equation explains more of the variation in the response variable?
g)Compare the values of S = SY/X = or the 2 equations. Which is smaller? Does this mean that that equation is a better fit to the data? Why or why not?
data netert1;
infile 'a:ch01ta01.dat';
input x y;
invy=1/y;
run;
proc reg;
var x y invy;
model y=x;
plot r.*p.='*'/vplots=2;
run;
model invy=x;
plot r.*p.='*';
run;
Model: MODEL1
Dependent Variable: Y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 1 0.00005 0.00005 43.892 0.0001
Error 23 0.00003 0.00000
C Total 24 0.00008
Root MSE 0.00111 Rsquare 0.6562
Dep Mean 0.00379 Adj Rsq 0.6412
C.V. 29.36478
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 0.007460 0.00059688 12.498 0.0001
X 1 0.000052414 0.00000791 6.625 0.0001
+++++++
| |
| |
0.004 + +
| |
R | * * |
E 0.002 + +
S | * |
I | * * * * |
D 0.000 + * * * * * +
U | * * * |
A | * * * * |
L 0.002 + * +
| |
| |
0.004 + +
| |
+++++++
0.001 0.002 0.003 0.004 0.005 0.006 0.007
PRED
Model: MODEL2
Dependent Variable: 1/Y
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 1 252377.58081 252377.58081 105.876 0.0001
Error 23 54825.45919 2383.71562
C Total 24 307203.04000
Root MSE 48.82331 Rsquare 0.8215
Dep Mean 312.28000 Adj Rsq 0.8138
C.V. 15.63447
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 62.365859 26.17743389 2.382 0.0259
X 1 3.570202 0.34697216 10.290 0.0001
+++++++++
| |
100 + * +
| * |
| |
R | * * |
E | * * * |
S | * |
I | * |
D 0 + * * * +
U | * * * * |
A | * |
L | * * |
| * * |
| * |
| * |
100 + +
+++++++++
100 150 200 250 300 350 400 450 500
PRED
15.When we talk about the multiple linear regression model
where
to what does the "linear" refer?
16.How does the interpretation of vary in models (a) and (b) below?
a) b)
17.Does the interpretation of in 16(b) hold if ? Why or why not?
18.In order to predict the demand for a new brand of detergent based on how expensive it was, the difference between its price and that of the old brand, and the type of advertising used to promote it, the manufacturer used SAS to fit the model , where
X1 (pricedif) = difference in price between new brand and old brand
X2 (expenses) = how much it costs
X3 (adtype) = type of advertising used
Use the output given below to help answer the following questions.
a)What is the fitted regression equation?
b)What is the s.e.(b2) ?
c)What is the calculated value of the statistic for testing whether there is a significant linear relationship between Y and the explanatory variables?
d)What are the null and alternative hypotheses for the test in part (c)? (not given by SAS)
e)What is the calculated value of the t-statistic for testing vs ?
f)What proportion of the variation in the demand values is accounted for by the fitted regression equation on X1, X2, X3
SAS
Model: MODEL1
Dependent Variable: DEMAND
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 3 12.53279 4.17760 117.323 0.0001
Error 26 0.92580 0.03561
C Total 29 13.45859
Root MSE 0.18870 Rsquare 0.9312
Dep Mean 8.38267 Adj Rsq 0.9233
C.V. 2.25107
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 4.008725 0.57998807 6.912 0.0001
PRICEDIF 1 1.560061 0.23717256 6.578 0.0001
EXPENSES 1 0.592391 0.09452865 6.267 0.0001
ADTYPE 1 0.311760 0.07545413 4.132 0.0003
19.By what percentage is the total sum of squares of the deviations of the observed Yi values about their mean reduced by using the least squares equation rather than as a predictor of E(Y|X) ?
20.For what parameter is = MSE an unbiased estimator? Explain how this parameter enters into the description of the statistical model .
21.If, in a SLR, b1 = 1.87 s.e.(b1) = 0.63 n = 25 α = 0.05
test .
22.The results from fitting an MLR model are given below
a)Write the model and all extra assumptions.
b)What is the value of ?
c)What was the sample size?
d)What are the values of , , ?
23.In fitting the model: , 28 observations were used and the following ANOVA table produced.
Source / df / SS / MS / Fregression / 126.30
error
total / 395.4
a)Complete the ANOVA table.
b)Find the coefficient of determination.
c)At α = 0.01, test whether there is a linear relation between Y and the explanatory variables.
24.Write a first order multiple regression model relating a response variable Y to three explanatory variables.
25.Write a second order multiple regression model relating a response Y to two explanatory variables. Include all possible terms (including cross-products).