# Solutions to Final Exam

##### SOLUTIONS TO FINAL EXAM

##### VERSION 1

1)

A)The model is Company Performace Rank*i = α + β(Compensation Ranki) + εi for i*=1,…,50. The only two things that are known are Company Performance Rank and Compensation Rank, since we have all 50 cases of both of these variables in our dataset. The unknown quantities are the true intercept and slope α and β, as well as the errors εiwhich are assumed to be independent of each other and normally distributed with mean zero and constant variance.

B)If β is positive, then as compensation rank becomes a higher number (so that compensation goes down), the expected value of the company performance rank increases (so that expected company performance deteriorates). So positive values of β would tend to suggest that the CEOs earned their compensation, in an overall sense. Of course, many other factors besides the compensation of (and decisions made by) the CEO in the given year would also affect the expected company performance rank for that year. Furthermore, even if β>0 the correlation between CEO compensation and company performance does not establish a cause and effect relationship.

2)

A) If we take β>0 to mean that the CEOs earned their compensation (see above), then we need to test the null hypothesis that β=0 versus the alternative hypothesis that β>0. Since the estimated slope is positive, we can get the p-value for this situation by dividing Minitab’s p-value by 2, resulting in p=.047/2=.0235. Since this is less than 5% we can reject the null hypothesis and declare that there is evidence at the 5% level of significance that the CEOs earned their compensation.

B)Yes, the results are also significant at the 4% level, since the (right-tailed) p-value is less than .04.

C) The scatterplot shows a fairly weak positive linear relationship. This impression is reinforced by the R2 value of 8%, indicating that 8% of the variation in company performance rank is explained by CEO compensation rank.

D)Plugging into the equation for the fitted line, we obtain There is no need to round this number. Note that the fitted value can also be viewed as an estimate of the true regression function, that is, the expected value of company performance rank for a CEO with compensation rank 10. There is no reason why such an expected value needs to be a whole number.

E)The residual is −29.51 = y−(see part D above), so the performance rank for the company is

3)We want to test β=1 versus β≠1. (We take a two-tailed alternative hypothesis since the question did not imply any desired direction for the alternative hypothesis.) We obtain Since the residual DF exceeds 30, we can reject the null hypothesis at level .05 if |t|>1.96. Since in fact we have |t|<1.96 we do not reject the null hypothesis. So we do not have evidence to contradict the null hypothesis.

4)

A) Multiplying the coefficient by 1 Million (I’m just doing this for ease of interpretability, but of course you don’t have to do this to get credit for the problem) we can say that we estimate that each additional $1 Million in base salary leads to a decrease of 6.19 in the expected company performance rank. (Remember that decrease in rank corresponds to improvement, so the negative sign on the estimated coefficient makes sense.)

B)If the true coefficient of base salary is then the alternative hypothesis is . (As base salary increases, holding the other variable fixed, the expected company performance rank goes down, corresponding to improved performance.) Of course, for many CEOs the base salary is just a small fraction of total compensation, but the left-tailed alternative hypothesis still seems appropriate here.

C) The p-value is .316/2 = .158, which is not less than .05, so the coefficient is not statistically significant.

D) One of the fitted values is much less than the others, but this does not indicate any particular problem with the model. There is no particular sign of nonconstant variance, or any nonlinear pattern. Nor does there seem to be any systematic tendency of the residuals to increase (or decrease) with the fitted value. Overall, no obvious problems. So the multiple linear regression model seems OK to use for this dataset.

5)

A) The p-value is 0.086. If the explanatory variables were both useless, so that the true coefficients of compensation rank and base salary were both zero, then we would find such a large F-statistic as the one we got here 8.6% of the time. This p-value is not less than 5% so we cannot reject the null hypothesis that both predictors are useless at the 5% level of significance.

B) No, this p-value refers to the model as a whole (i.e., the null hypothesis that the true coefficients for both predictors are zero), not to one particular coefficient.

C) The two-tailed p-value for the coefficient of compensation rank in the simple regression was .047, indicating that compensation rank is a valid predictor, at the 5% level of significance. The p-value for the F-statistic in the multiple regression model is .086, which is not less than 5% and therefore would not allow us to reject the null hypothesis that both true coefficients in this model are zero (at the same 5% level of significance.) So there is an apparent contradiction here. But remember that the two models are different. The true coefficient β of compensation rank in the simple regression need not have the same meaning, interpretation, or numerical value as the true coefficientof compensation rank in the multiple regression model.

6) A key piece of information is the value of R2=.099. Since we get SSR=(105916)(.099)=10485.68. Next, we have . The degrees of freedom are 2 for Regression and 47 for Residual. The mean squares are MSR=SSR/2=(10485.68)/2=5242.84 and MSE=SSE/47=(95430.32)/47=2030.43. Finally, we get F=MSR/MSE=5242.84/2030.43=2.58. So the answer is 2.58. The complete table is given below.

Analysis of Variance

Source DF SS MS F P

Regression 2 10485.68 5242.84 2.58 0.086

Residual Error 47 95430.32 2030.43

Total 49 105916.00

7)The statement is false. It has several flaws. Although the statement talks about rejecting a null hypothesis, it does not say what the rule is for performing the hypothesis test, and it does not mention the significance level of the test. It does not state the assumption that the null hypothesis is true, an assumption that we need to make in order to compute the p-value. A correct interpretation of the above situation is: “If the null hypothesis is true, then we would find evidence against it that is at least as strong as what we obtained here in 2.3% of all random samples which could be collected from the given population.” Since working with p-values allows for much more flexibility than we get by performing formal hypothesis tests, nothing needs to be said here about whether we would reject the null hypothesis, or about how often we would reject the null hypothesis, since two different analysts could make different decisions based on the same p-value.

8)Yes, since the p-value is the smallest significance level at which we could do a (formal) hypothesis test and still reject the null hypothesis, and since we are assuming that the null hypothesis can be rejected at the 1% level of significance, we know that the p-value is less than (or equal to) .01. Therefore, the p-value must be less than .05, so it follows that the null hypothesis can also be rejected at the 5% level of significance.

9)

A) The population mean μ would represent the expected value of the household net savings rate for the United States, assuming that this is a constant over time.

B) Since we have n=10, and the sample size is small, we use DF=9, and find from the t-table that 2.262. The estimated standard error for the sample mean is Thus, the 95%CI for μ is 1.72 ± (2.262)(.7381) = (.05042,3.3896). It’s a reasonably good bet that the true population mean lies in this interval. Such an interval would contain the true population mean in 95% of all samples that could be collected.

C) The values show patterns which suggest a lack of independence over time (assuming that the population mean is fixed). For example, the first three observations are all above the sample mean while the last four observations are all below the sample mean. This suggests that it is not reasonable to consider this data to have come from a random sample from a population with a fixed mean. Therefore, we should not trust the confidence interval from B) above.

10) There are two scenarios that yield XY>0. Scenario 1: {X>0 and Y>0}. Scenario 2: {X<0 and Y<0}. These two scenarios are mutually exclusive, so we can use the special addition rule for mutually exclusive events. We have

Prob{XY>0}=Prob{X>0 and Y>0}+Prob{X<0 and Y<0}.

Since X and Y are independent, we can use the special multiplication rule for independent events. Furthermore, since X and Y are standard normal, we have Prob{X>0}=Prob{X<0}=Prob{Y>0}=Prob{Y<0}=1/2. In the end, we get

Prob{XY>0}=Prob{X>0}Prob{Y>0}+Prob{X<0}Prob{Y<0}=(1/2)(1/2)+(1/2)(1/2)

=1/2.