252y0771 11/27/07 (Page layout view!)
ECO252 QBA2 Name ______Key______
THIRD EXAM Student number______
November 29, 2007 Class Day and hour______
Version 1
I. (8 points) Do all the following (2points each unless noted otherwise).Make Diagrams! Show your work! All probabilities must be between zero and (positive)1.
1.
For make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the entire area between -0.43 and 0.86. Because this is on both sides of zero we must add the area between -0.43 and zero to the area between zero and 0.86. If you wish, make a completely separate diagram for . Draw a Normal curve with a mean at 26. Indicate the mean by a vertical line! Shade the entire area between 20 and 38. This area is on both sides of the mean (26) so we add to get our answer.
2.
For make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the entire area above -1.86. Because this is on both sides of zero we must add the area between -1.86 and zero to the area above zero. If you wish, make a completely separate diagram for . Draw a Normal curve with a mean at 26. Indicate the mean by a vertical line! Shade the entire area above zero, remembering that zero is below the mean. This area is on both sides of the mean (26) so we add to get our answer.
3.
For make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line! Shade the area between 0.43 and 3.43. Because this is on one side of zero we must subtract the area between zero and 0.43 from the larger area between zero and 3.43. If you wish, make a completely separate diagram for . Draw a Normal curve with a mean at 26. Indicate the mean by a vertical line! Shade the area between 32 and 76. This area is on one side of the mean (26) so we subtract to get our answer.
4. (Do not try to use the t table to get this.) For make a diagram. Draw a Normal curve with a mean at 0. is the value of with 7.5% of the distribution above it. Since 100 – 7.5 = 92.5, it is also the 92.5th percentile. Since 50% of the standardized Normal distribution is below zero, your diagram should show that the probability between and zero is 92.5% - 50% = 42.5% or The closest we can come to this is So (or something slightly smaller). To get from to , use the formula , which is the opposite of . . If you wish, make a completely separate diagram for . Draw a Normal curve with a mean at 26. Show that 50% of the distribution is below the mean (26). If 8.5% of the distribution is above , it must be above the mean and have 41.5% of the distribution between it and the mean.
Check:
II. (22+ points) Do all the following (2points each unless noted otherwise).Do not answer a question ‘yes’ or ‘no’ without giving reasons. Show your work when appropriate. Use a 5% significance level except where indicated otherwise. Note that this is extremely long and that no one will do all the problems, so look them over!
1. Turn in your computer problems 2 and 3 marked as requested in the Take-home. (5 points, 2 point penalty for not doing.)
2. In an ordinary 1-way ANOVA, if the computed F statistic is below the value from the F table at the given significance level, we can
a. Reject the null hypothesis because the difference between the means is not significant
b. Reject the null hypothesis because there is evidence of a significant difference between some of the means.
c. *Not reject the null hypothesis because the difference between the means is not significant.
d. Not reject the null hypothesis because the difference between the means is significant.
c. Not reject the null hypothesis because the difference between the variances is not significant.
d. Not reject the null hypothesis because the difference between the variances is significant.
e. None of the above.[7]
3. After an analysis if variance, you would use the Tukey-Kramer procedure or similar confidence intervals to check
a. For Normality
b. For equality of variances
c. For independence of error terms
d. *For pairwise differences in means
e. For all of the above
f. For none of the above
4. If an ordinary one-way ANOVA has 25 columns 17 rows and , the degrees of freedom for the F test are
a. 400 and 24
b. 408 and 16
c.*24 and 400
d. 16 and 408
e. 400 and 424
f. 408 and 424
g. 424 and 400
h. 424 and 408
i. 16 and 24
j. None of the above. The correct answer is ______.
Explanation: This is a one-way ANOVA. The total number of observations is and the number of columns is . This means there are 425-1 = 424 total degrees of freedom and that between the columns there are 25-1 = 24 degrees of freedom. This leaves 424 – 24 = 400 degrees of freedom for the error (within) term. Numbers are filled in below.
Source / SS / DF / MS / /Between / / / / /
Within / / /
Total / /
5. Assuming that your answer to 4 is correct and that the significance level is 5%, the correct value of F from the table is _1.54_____. (This may have to be approximate. If so, what did you use?) (1) [12]
Note: I will check your answer against what you said in the previous question. The answer above is
wrong if you did not say something close to .
Exhibit 1A realtor believes that the selling price of a home (in $ thousands) is related to the condition of the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below on Minitab and gets the following.
Row Price Size Condition
1 360 23 5
2 200 11 2
3 340 20 9
4 280 17 3
5 280 15 8
6 330 21 4
7 380 24 7
8 250 13 6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor Coef SE Coef T P
Constant 64.539 4.228 15.27 0.000
Size 11.7282 0.2317 50.62 0.000
Condition 4.8826 0.4494 ______
S = 2.75997 R-Sq = 99.9% R-Sq(adj) = 99.8%
Analysis of Variance
Source DF SS MS F P
Regression 2 25712 12856 1687.70 0.000
Residual Error 5 38 8
Total 7 25750
Source DF Seq SS
Size 1 24813
Cond 1 899
The sum of the price column is 600 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2950.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed.
6 and 7. In the multiple regression, are the coefficients of size and condition significant at the 5% significance level? Give reasons. Do not do unneeded computations. (3) [15]
Solution: 11.7282, the coefficient of ‘score’ has a p-value below .05 and thus must be significant.
For the coefficient of ‘Condition,’ we can compute . Our table says that . Since the computed value of t exceeds the table value, we reject the null hypothesis and say that this coefficient is significant.
8. Assuming that the coefficients in the multiple regression are correct, what price would we predict for a home with 20(hundred) square feet and a condition score of 9? (1)
Price = 64.539 + 11.7282 Size + 4.8826 Condition = 64.539+11.7282(20)+4.8826(9)64.539 + 234.564 + 43.943 = 343.046 (Thousand)
1
252y0771 11/27/07 (Page layout view!)
Exhibit 1A realtor believes that the selling price of a home (in $ thousands) is related to the condition of the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below on Minitab and gets the following.
Row Price Size Condition
1 360 23 5
2 200 11 2
3 340 20 9
4 280 17 3
5 280 15 8
6 330 21 4
7 380 24 7
8 250 13 6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor Coef SE Coef T P
Constant 64.539 4.228 15.27 0.000
Size 11.7282 0.2317 50.62 0.000
Condition 4.8826 0.4494 ______
S = 2.75997 R-Sq = 99.9% R-Sq(adj) = 99.8%
Analysis of Variance
Source DF SS MS F P
Regression 2 25712 12856 1687.70 0.000
Residual Error 5 38 8
Total 7 25750
Source DF Seq SS
Size 1 24813
Cond 1 899
1
252y0771 11/27/07 (Page layout view!)
The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed.
9. Using the information in the multiple regression printout, make your result in 8) into a rough prediction interval. (2)
Solution: The outline says that an approximate prediction interval . Remember . In the printout S = 2.75997 . So
10. Using the information in the printout, what is the value of R-squared for a regression of ‘Price’ against ‘Size’ alone? (2) [20]
Solution: Looking at the sequential Sum of squares the regression sum of squares is 24813 for ‘Size’ alone. The total sum of squares is 25750, so we have
11. Do a simple regression of ‘Price’ against ‘Condition’ alone.Before you do something ridiculous see 252blunders!
a) Compute the sum that you will need for this regression. Show your work! (2) Don’t compute stuff that has already been done for you!
Solution: The only column that you should have computed is in bold below.
Row Price Size Cond Ysq x1sq x2sq x1y x2y x1x2
1 360 23 5 129600 529 25 8280 1800 115
2 200 11 2 40000 121 4 2200 400 22
3 340 20 9 115600 400 81 6800 3060 180
4 280 17 3 78400 289 9 4760 840 51
5 280 15 8 78400 225 64 4200 2240 120
6 330 21 4 108900 441 16 6930 1320 84
7 380 24 7 144400 576 49 9120 2660 168
8 250 13 6 62500 169 36 3250 1500 78
2420 144 44 757800 2750 2844554013820 818
1
252y0771 11/27/07 (Page layout view!)
Exhibit 1A realtor believes that the selling price of a home (in $ thousands) is related to the condition of the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below on Minitab and gets the following.
Row Price Size Condition
1 360 23 5
2 200 11 2
3 340 20 9
4 280 17 3
5 280 15 8
6 330 21 4
7 380 24 7
8 250 13 6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor Coef SE Coef T P
Constant 64.539 4.228 15.27 0.000
Size 11.7282 0.2317 50.62 0.000
Condition 4.8826 0.4494 ______
S = 2.75997 R-Sq = 99.9% R-Sq(adj) = 99.8%
Analysis of Variance
Source DF SS MS F P
Regression 2 25712 12856 1687.70 0.000
Residual Error 5 38 8
Total 7 25750
Source DF Seq SS
Size 1 24813
Cond 1 899
1
252y0771 11/27/07 (Page layout view!)
The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed.
b) It says that you do not need to know the sum of squares in the sales column. You do however need the spare part . Without doing any computing, tell what its value is. (1)
Solution: The ANOVA in the computer output says that the total sum of squares is 25750.
Of course, if you like to waste time . ()
c) Compute the coefficients of the equation to predict the value of ‘Price’ on the basis of ‘Condition.’ (4) [27]
Solution: First copy (you computed ) and is not needed. (It’s 757800.)
Then compute means: .
The ‘Spare Parts’ are as follows:
You already found (Total Sum of Squares) ..
So and , whichmeans or
1
252y0771 11/27/07 (Page layout view!)
Exhibit 1A realtor believes that the selling price of a home (in $ thousands) is related to the condition of the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below on Minitab and gets the following.
Row Price Size Condition
1 360 23 5
2 200 11 2
3 340 20 9
4 280 17 3
5 280 15 8
6 330 21 4
7 380 24 7
8 250 13 6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor Coef SE Coef T P
Constant 64.539 4.228 15.27 0.000
Size 11.7282 0.2317 50.62 0.000
Condition 4.8826 0.4494 ______
S = 2.75997 R-Sq = 99.9% R-Sq(adj) = 99.8%
Analysis of Variance
Source DF SS MS F P
Regression 2 25712 12856 1687.70 0.000
Residual Error 5 38 8
Total 7 25750
Source DF Seq SS
Size 1 24813
Cond 1 899
The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed.
1
252y0771 11/27/07 (Page layout view!)
d) Compute . (3)
Solution:. We can say or or
e) Is the slope of the simple regression significant at the 5% level? Do not answer this question without appropriate calculations! (4)
Solution: We can compute . Then or . Soand . The outline says to test use and if the null hypothesis is false in that case we say that is significant. So our ‘do not reject’ zone is between . If . Our calculated is between these two values, so we cannot reject the null hypothesis and must concludethat the coefficient is insignificant.
1
252y0771 11/27/07 (Page layout view!)
Exhibit 1A realtor believes that the selling price of a home (in $ thousands) is related to the condition of the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below on Minitab and gets the following.
Row Price Size Condition
1 360 23 5
2 200 11 2
3 340 20 9
4 280 17 3
5 280 15 8
6 330 21 4
7 380 24 7
8 250 13 6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor Coef SE Coef T P
Constant 64.539 4.228 15.27 0.000
Size 11.7282 0.2317 50.62 0.000
Condition 4.8826 0.4494 ______
S = 2.75997 R-Sq = 99.9% R-Sq(adj) = 99.8%
Analysis of Variance
Source DF SS MS F P
Regression 2 25712 12856 1687.70 0.000
Residual Error 5 38 8
Total 7 25750
Source DF Seq SS
Size 1 24813
Cond 1 899
1
252y0771 11/27/07 (Page layout view!)
The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed.
f) Predict the price ofan average home with a condition of 9and make your estimate into an appropriate 99% interval. (4)
Solution: We found . The outline says that the Confidence Interval is , where , , and . So
. We will use and . So or 208.387 to 481.615.
g) Do an analysis of variance using your SST, SSE and SSR for this equation or using 1, and . What have you already done that makes this table redundant? If you don’t know what redundant means, ask! (3) [43]
Solution: We actually have almost all this done. We have already found ,
and . So our ANOVA table will be as below.
Source / SS / DF / MS / F /Regression / 6192.859 / 1 / 6192.859 / 1.900 /
Error / 19557.141 / 6 / 3559.5235
Total / 25750 / 7
If we recall for this regression, we can rewrite the table as below.
Source / / DF / ‘MS’ / F /Regression / 0.2405 / 1 / 0.2405 / 1.900 /
Error / 0.7595 / 6 / 0.12658
Total / 1.0000 / 7
Just for reassurance, here is the Minitab output
Regression Analysis: Price versus Condition
The regression equation is
Price = 236 + 12.1 Condition
Predictor Coef SE Coef T P
Constant 235.71 52.49 4.49 0.004
Condition 12.143 8.810 1.38 0.217
S = 57.0922 R-Sq = 24.0% R-Sq(adj) = 11.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 6193 6193 1.90 0.217
Residual Error 6 19557 3260
Total 7 25750
This is redundant because we have already shown that the coefficient of ‘Condition’ is insignificant. Because there is only one independent variable, this shows the same thing.
h) Using the information on Regression Sums of squares or and in the ANOVA that you just did and from the multiple regression, do an F test to see if adding ‘Size’ to the regression of ‘Price’ against ‘Condition’ is worthwhile. Do not waste our time by repeating stuff that has already been done. (3) [46]
Solution: In view of the original printout, we can rewrite our ANOVA tables above for the multiple regression.
So our ANOVA table will be as below.
Source / SS / DF / MS / F /Regression / 25912 / 2 / 12856 / 1687
Error / 38 / 5 / 7.6
Total / 25750 / 7
If we recall for this regression, we can rewrite the table as below.
Source / / DF / ‘MS’ / F /Regression / 0.9985 / 2 / 0.49925 / 1642
Error / 0.0015 / 5 / 0.0003
Total / 1.0000 / 7
For the regression against ‘Condition’ alone, we had and . So that if we itemize the regressions above, we get the tables below.
Source / SS / DF / MS / F /Regression
Condition
Size / 25912
6193
19719 / 2
1
1 / 19719 / 2595 /
Error / 38 / 5 / 7.6
Total / 25750 / 7
If we use instead for this regression, we can rewrite the table as below.
Source / / DF / ‘MS’ / F /Regression
Condition
Size / 0.9985
0.2495
0.7490 / 2
1
1 / 0.7490 / 2497 /
Error / 0.0015 / 5 / 0.0003
Total / 1.0000 / 7
In spite of the gigantic rounding error in the table using , the results are the same as in the t- test on the coefficient of ‘years’ in the second regression, the calculated F is far above the table F so that adding ‘Size’ significantly improves the results.
Exhibit 2 (Groebner)A product is being produced on 3 different lines using 3 different layouts for the lines. A sample of 36 observations are taken on various days over a period of four weeks so that there are 12 observations for the daily output for each line evenly divided between the three possible layouts. Assume .
MTB > Twoway c4 c2 c3;
SUBC> Means c2 c3.
Two-way ANOVA: output 1 versus line, layout
Source DF SS MS F P
line 2 187.1 93.5 0.22 0.804
layout 2 28263.4 14131.7 33.21 0.000
Interaction ______
Error __ 11489.0 425.5
Total 35 41874.6
S = 20.63 R-Sq = 72.56% R-Sq(adj) = 64.43%
Individual 95% CIs For Mean Based on
Pooled StDev
line Mean ------+------+------+------+---
1 132.583 (------*------)
2 128.167 (------*------)
3 127.417 (------*------)
------+------+------+------+---
120.0 128.0 136.0 144.0
Individual 95% CIs For Mean Based on
Pooled StDev
layout Mean ----+------+------+------+-----
1 116.667 (----*----)
2 168.250 (----*----)
3 103.250 (----*----)
----+------+------+------+-----
100 125 150 175
12. Fill in the missing degrees of freedom, the missing sum of squares and the missing mean square. (2) Solution: We can find the degrees of freedom by multiplying the degrees of freedom for the factors that interact. The error degrees of freedom are whatever is needed to make the column add up and the mean squares are found by dividing the sums of squares by degrees of freedom. The error sum of squares is whatever makes the SS column add up. The MS is SS divided by DF. The corrected table reads as below
Two-way ANOVA: output 1 versus line, layout
Source DF SS MS F P
line 2 187.1 93.5 0.22 0.804
layout 2 28263.4 14131.7 33.21 0.000
Interaction 4 1935.1 483.78 1.136 _____
Error 27 11489.0 425.5
Total 35 41874.6
S = 20.63 R-Sq = 72.56% R-Sq(adj) = 64.43%
13. Is there significant interaction between ‘line’ and ‘layout’? Don’t answer unless you can tell me what the evidence is. (2)
Solution: We can look up . It is larger than the computed F of 1.136. This means that we cannot reject the null hypothesis that the interaction is insignificant.
14. Is the difference between lines significant? Why?(1)
Solution: We can look up if we like to work. It is larger than the computed F of 0.22. Or we can simply note that since the p-value of 0.804 is well above any significance level that we are likely to use, we cannot reject the null hypothesis that the difference between the line means is insignificant.