SOLUTIONS TO FINAL EXAM
VERSION 2
1)
A) No, there are some problems here. The pattern seems somewhat curved (nonlinear), and the variability of revenues seems to increase with housing starts. But overall, revenues seem to increase on the average as housing starts increase.
B) The slope is positive, and the right-tailed p-value on the slope is .000/2. Random chance alone would be able to produce such a large estimated slope less than 2.5 times out of 10000. So there is a statistically significant positive relationship. Also, the is 50.4%, indicating a reasonably strong linear relationship.
C) We estimate that each additional million housing starts adds 2269.8 million dollars (that is, 2.3 billion dollars!!) to the expected quarterly revenue.
2)
A) No, the coefficient of Mortgage Rate in the fitted model was obtained under the assumption that Housing Starts is also in the model, whereas the slope of the line fitted to the data in Figure 2 would be the coefficient of Mortgage Rate assuming that Mortgage Rate is the only predictor variable in the model.
B) This p-value is 0.000, indicating that the regression model is useful (that is, not all of the true slopes in the true response surface are zero). No, it does not indicate any problem with the model.
C) Using the equation for the fitted model, if mortgage rate increases by 1 and housing starts changes by -.5, the fitted value changes by ( -.5)( 1596.9)+(1)( -974.2)= -1772.65. Thus, these changes would lead to an estimated decrease of 1.773 billion dollars in expected revenues.
D) The right-tailed p-value on the intercept is .042/2=.021, so there is moderate to strong evidence that the true intercept is positive.
3)
A) The residuals vs. fitted values plot (Figure 3) shows a wedge shape, indicating that the variability of the errors increases with the fitted value. Thus, there is evidence of nonconstant variance.
B) Figure 1 shows the variability of revenues increasing with housing starts, while Figure 2 shows the variability of revenues decreasing with mortgage rates. Since in the multiple regression, the estimated coefficient of housing starts is positive while the estimated coefficient of mortgage rates is negative, we would expect that as the fitted value goes up, the variability goes up.
C) Yes. If we had used logs of revenues, the variability would have been much more stable in Figures 1 and 2. This is in part due to the fact that revenues suffer from size-dependent variability. That is, in times when revenues are large, there is going to be much more variability than there is in times when revenues are low.
4)
The would not change at all. Let’s use the formula =1-SSE/SST. First, notice that SST is not going to change, since SST is just (n-1) times the sample variance of the y’s, which will not change if we add 1/2 to all the y values. SSE (the sum of squared residuals) is not going to change either, since each residual in the new data set is the same as the corresponding residual from the old data set. Why? Because clearly eachvalue in the new data set will be 1/2 plus the corresponding value from the old data set. (By the nature of the least squares method, the fitted line just shifts up by 1/2 , parallel to the original line). And each y value in the new data set is 1/2 plus the corresponding y value from the old data set. So the residualwill remain unchanged, and SSE will remain unchanged.
5)
The statement is incorrect. Since there is nothing random about the null hypothesis, we cannot talk about the probability that it is true. (We also cannot talk about the probability that the alternative hypothesis is true). A correct statement would be: “The larger the p-value, the weaker the evidence against the null hypothesis”. We could also say: “The larger the p-value, the weaker the evidence in favor of the alternative hypothesis”.
6) Since the probability of losing is , the expected profit is . Solving for p yields p=2/3. Answer is B.
7) No, when you include a new variable in the regression, the SSE cannot increase, and therefore cannot decrease. (The SST will stay the same since it only depends on the y values, which do not change). Answer is B.
8) We have , which becomes Solving for SSE yields SSE=75. Answer is A.
9) Since we have a left-tailed alternative hypothesis and the sample size is large, the p-value is (to a good approximation) the probability that a standard normal random variable is less than 2.2. From the Normal table, this is .5+.4861=.9861. Answer is C.
10) Since we have a two-tailed alternative hypothesis, the rejection region is |t|>. Using DF=n-1=9 and , we find that from the t-table that =2.262. So the rejection region is |t|>2.262. Answer is C.
11) Remember that the 95% CI contains all the “non-rejectable” values of for a two-tailed hypothesis test at level .05. Since the 95% CI does not contain zero, we know that zero is rejectable, so we can reject the null hypothesis at level .05. Since the 99% CI does contain zero, we cannot reject the null hypothesis at level .01. Answer is D.
12) Since the variance of the binomial random variable is npq and the mean is np, we have 2.4/6=npq/np=q, so that q=.4. Therefore, p=.6. Now, the mean is 6=np=n(.6). So n=6/.6=10. Answer is D.
13) Based on the rule, we are going to reject the null hypothesis whenever either t>1.645 or t< -1.645. If the null hypothesis is true, since the sample size is large, we can assume that t has a standard normal distribution, so the probability of rejecting the null hypothesis is .05+.05=.10. So it’s actually twice the desired level of .05. That is why, in order for the test to be valid, we must decide on the alternative hypothesis before looking at the data. Answer is D.
14) The width of the CI is. Since in principle the two sample standard deviations for the two samples could be anything, we cannot be sure of which CI will be wider. Answer is B.
15) Since the errors in the linear regression model are independent, and have zero mean, each data point has a probability of 1/2 to be above the true regression line, and the number of points above the true line has a binomial distribution with n=4 and p=1/2 . The probability that exactly two points lie above the true line is therefore
Answer is C.