Final—Form A Economics 173 Name______

Spring 2003 Instructor: Petry SSN______

Before beginning the exam, please verify that you have 18 pages with 50 questions in your exam booklet. You should also have a decision-tree and formula sheet provided by your TA. Please include your full name, social security number and Net-ID on your bubble sheets. Good luck!

Use the following information to answer the next eight questions (#1-8).

You are interested in understanding the home run hitting ability of young major league baseball players. You decide to run a regression with the dependent variable: HRs--Number of Homeruns Hit by the Player in the Most Recently Completed Major League Season. You identify three independent variables:

Minor HR--Number of Homeruns the Player Hit in Last Season as a Minor Leaguer.

Age--the player’s age.

Years Pro--Number of Years the Player has been a Professional Ball Player.

SUMMARY OUTPUT
Regression Statistics
Multiple R / 0.59256
R Square / 0.351128
Adjusted R Square / 0.335172
Standard Error / 6.992105
Observations / 126
ANOVA
df / SS / MS / F / Significance F
Regression / 3 / 3227.612245 / 1075.871 / 1.85592E-11
Residual / 5964.522676 / 48.88953
Total / 125 / 9192.134921
Coefficients / Standard Error / t Stat / P-value / Lower 95% / Upper 95%
Intercept / -1.96998 / 9.547049398 / -0.20634 / 0.836866 / -20.86933228 / 16.92938
Minor HR / 0.665838 / 0.087149184 / 7.640212 / 5.46E-12 / 0.493317598 / 0.838359
Age / 0.135728 / 0.524087215 / 0.258979 / 0.796088 / -0.901756157 / 1.173212
Years Pro / 1.176371 / 0.670625334 / 1.75414 / 0.081917 / -0.151200086 / 2.503942

1. The test statistic for testing the model’s overall significance is:

a.  0.0454

b.  22.006

c.  7.640

d.  0.524

e.  0.671

2. The degrees of freedom for the test statistic named above is:

a.  3

b.  122

c.  3 and 122

d.  122 and 125

e.  122 and 126

3. The conclusion from the test performed above is:

a.  none of the three independent variables are significant

b.  all three independent variables are significant

c.  the dependent variable is significant

d.  at least one of the independent variables is significant

e.  all of the above

4. Based on the t-tests for individual significance, and at a 5% level of significance, the variable(s) that DO have a significant impact on homerun hitting ability is (are):

a.  Minor HR

b.  Age

c.  Years Pro

d.  All the above

e.  Both a and c

5. Ignoring the results from any significance tests conducted on the model, the estimated number of HRs hit by a player who hit 22 HRs in his last season in the minor leagues, is 20 years old and has 3 years of professional ball playing experience is:

a.  19

b.  15

c.  12

d.  10

e.  5

6. Suppose that in this study, you generated a correlation matrix for the 3 independent variables, as given below. Based SOLELY on this correlation matrix, which of these problems would you suspect?

Minor HR / Age / Years Pro
Minor HR / 1
Age / 0.035416 / 1
Years Pro / -0.03916 / 0.837398 / 1

a.  non-normality

b.  autocorrelation

c.  heteroskedasitcity

d.  multicollinearity

e.  none of the above

7. In order to fix the problem identified in the previous question, you would:

a.  drop either Minor HR or Age from the model

b.  drop either Minor HR or Years Pro from the model

c.  drop either Years Pro or Age from the model

d.  include the log of age in the model

e.  do nothing since there is no problem

8. Assume that it was appropriate to conduct a Durbin-Watson test on the regression output, and that the DW test statistic was 2.36. The DW critical values are: dL=1.61 and dU=1.74. What should you conclude from this test?

a.  heteroskedasticity is present

b.  homoskedasticity is present

c.  multicollinearity is present

d.  autocorrelation is present

e.  the test proves inconclusive


Use the following information to answer the next two questions (#9-10).

In a study of income differentials, data was collected for 100 subjects on their Incomes (in thousands of dollars), Years of Education, Age, and Number of Children They Had. Then, the natural log of income was taken as the y-variable, and regressed on the three independent variables named above. The output is given below:

SUMMARY OUTPUT
Regression Statistics
Multiple R / 0.759978
R Square / 0.577566
Adjusted R Square / 0.564365
Standard Error / 0.264954
Observations / 100
ANOVA
df / SS / MS / F
Regression / 3 / 9.214115 / 3.071372 / 43.75147
Residual / 96 / 6.739241 / 0.0702
Total / 99 / 15.95336
Coefficients / Standard Error / t Stat / P-value
Intercept / 2.189232 / 0.156791 / 13.9627 / 7.58E-25
Education / 0.092041 / 0.008131 / 11.31922 / 2.26E-19
Age / 0.001391 / 0.002276 / 0.611137 / 0.542553
Children / -0.01082 / 0.020423 / -0.5299 / 0.597402

9. According to this output, for every additional child, the impact on income is

a.  decreases by 0.01082 dollars

b.  decreases by 10.82 dollars

c.  decreases by 0.989 dollars

d.  decreases by 989 dollars

e.  increases by 989 dollars

10. Disregarding any tests on the significance of individual independent variables, the estimated income, in thousands of dollars, for someone with 17 years of education, 42 years of age, with 3 children would be:

a.  3.78

b.  3779.9

c.  43.81

d.  24.6

e.  51.12


11. In a multiple regression model the subjects’ ethnicities are to be represented as independent variables. All subjects fall into five ethnic groups: Caucasian, African-American, Asian, Native-American and Hispanic. How many dummy variables must be constructed to adequately represent all five groups?

a.  1

b.  2

c.  3

d.  4

e.  5

Use the following information to answer the next two questions (#12-13).

Following is the output from a regression of Used Car Prices on Car Color and Odometer Reading. Car Color is a qualitative variable, with levels White, Silver and Other Colors, and is therefore represented in the model via dummy variables.

SUMMARY OUTPUT
Regression Statistics
Multiple R / 0.8354822
R Square / 0.6980304
Adjusted R Square / 0.6885939
Standard Error / 142.27105
Observations / 100
ANOVA
df / SS / MS / F
Regression / 3 / 4491749.241 / 1497250 / 73.97095
Residual / 96 / 1943140.949 / 20241.05
Total / 99 / 6434890.19
Coefficients / Standard Error / t Stat / P-value
Intercept / 6350.3231 / 92.16652879 / 68.90053 / 1.5E-83
White / 45.240979 / 34.08443045 / 1.327321 / 0.187551
Silver / -147.73801 / 38.18498973 / 3.869007 / 0.000199
Odometer / -0.0277698 / 0.002368579 / -11.7242 / 3.14E-20

12. On average, and odometer readings being the same, a White colored car would sell for how much more (or less) than a Silver colored car?

a.  192.98

b.  -192.98

c.  102.50

d.  -102.50

e.  45.24


13. According to this model, the estimated average selling price of a car that is of a Color other than White or Silver (neither White nor Silver), and has an Odometer reading of 27,125 miles would be:

a.  6350.32

b.  5642.31

c.  5597.07

d.  5449.33

e.  4200.91

Use the following information to answer the next five questions (#14-18).

These questions are based on Project 2. You are expected to be able to recall that entire scenario. Provided below are the ANOVA tables from the full and reduced model regressions, respectively:

From full model:

ANOVA
df / SS / MS / F / Significance F
Regression / 15 / 2386035 / 159069 / 24.08315 / 3.63E-42
Residual / 284 / 1875818 / 6604.991
Total / 299 / 4261852

From reduced model:

ANOVA
df / SS / MS / F / Significance F
Regression / 8 / 2318693.212 / 289836.7 / 43.40481 / 2.06E-45
Residual / 291 / 1943159.21 / 6677.523
Total / 299 / 4261852.422

14. Going from the full to the reduced, how many variables get dropped?

a.  6

b.  7

c.  8

d.  9

e.  10

15. In order to test if the variables dropped are significant as a group, which statistical test should be conducted?

a.  t-test, two sample, assuming equal variances

b.  t-test, two sample, assuming unequal variances

c.  chi square test for variance

d.  F-test for overall significance

e.  Partial F-test


16. The calculated value of the test statistic for the test referred to above is:

a.  1.227

b.  1.117

c.  1.025

d.  1.457

e.  cannot be calculated due to insufficient information

17. The degrees of freedom for the test statistic calculated above is (are):

a.  6 and 284

b.  7 and 284

c.  6

d.  7

e.  284

18. Given that the relevant critical value for this test is 2.04, your conclusion should be:

a.  fail to reject the null hypothesis, therefore choose the reduced model

b.  fail to reject the null hypothesis, therefore choose the full model

c.  reject the null hypothesis, therefore choose the reduced model

d.  reject the null hypothesis, therefore choose the full model

e.  the test proves inconclusive

19. The range of values that R2 can possibly take is from -1 to 1.

a.  True

b.  False

20. While R2 can go down if irrelevant variable are included in the model, adjusted R2 always goes up upon the inclusion of new variables.

a.  True

b.  False


Use the following information to answer the next two questions (#21-22).

MON / 35
TUE / 42
WED / 56
THU / 46
FRI / 67
SAT / 51
SUN / 39

21. After doing a centered 2 period moving average on this column, the moving averages for Friday, Saturday and Sunday are (in that order):

a.  50, 53.75, 57.75

b.  53.75, 57.75, 52

c.  57.75, 52, not available

d.  52, not available, not available

e.  67, 51, 39

22. The exponentially smoothed value (use a smoothing constant of 0.4) for Sunday is:

a.  not available

b.  54.07

c.  52.84

d.  47.30

e.  39

Use the following information to answer the next two questions (#23-24).

The following table gives you the actual observations from a time series (y) and the corresponding residuals obtained after fitting a trend to the series.

Observation / Y / Residuals
1 / 6.9 / -1.89763
2 / 7.6 / -1.59915
3 / 8.5 / -1.10068
4 / 11.3 / 1.297798
5 / 12.7 / 2.296273
6 / 10.9 / 0.094749
7 / 11.9 / 0.693224
8 / 11.6 / -0.0083
9 / 10.2 / -1.80982
10 / 11.5 / -0.91135

23. The percent of trend value for period 7 is:

a.  1.06

b.  0.94

c.  17.17

d.  8.25

e.  cannot be calculated from the information provided

24. After obtaining the percent trends for all periods, you plotted them against time. This procedure is designed to reveal the presence of which time-series component?

a.  trend

b.  seasonal

c.  cyclical

d.  random

e.  irregular

Use the following information to answer the next five questions (#25-29).

Following is some output you might expect to be generated when dealing with a seasonal time series. In this case we have quarterly data, for which the trend regression output is given below:

Coefficients / Standard Error / t Stat / P-value
Intercept / 23.67601329 / 2.256404591 / 10.4928 / 3.51E-13
Time / 0.310413772 / 0.089331913 / 3.474836 / 0.001221

Based on this model, percent of trend values were calculated, and an attempt was made to construct seasonal indices. The results from this initial attempt are given below:

Q1 0.78

Q2 1.01

Q3 0.88

Q4 1.36

25. Making any necessary adjustments to these initial results, the final seasonal index for the third quarter would be:

a.  0.774

b.  1.002

c.  1.350

d.  0.873

e.  0.88

26. Given that the actual value of the series (y) for period 19, which happens to be a third quarter, is 23.43, what is the seasonal plus trend based forecast (that is the forecast that takes into account BOTH the trend and the seasonal components) for that period?

a.  29.57

b.  25.82

c.  20.46

d.  26.03

e.  26.82

27. Relying on all the information above, what should the seasonally adjusted (deseasonalized) value of the series be for period 19?

a.  29.57

b.  25.83

c.  20.46

d.  26.03

e.  26.84

The SAME time series discussed above is now analyzed by representing the quarters by indicator variables. The output is given below:

ANOVA
df / SS / MS / F
Regression / 4 / 2438.89821 / 609.7246 / 63.33698
Residual / 38 / 365.8136645 / 9.626675
Total / 42 / 2804.711874
Coefficients / Standard Error / t Stat / P-value
Intercept / 34.6197333 / 1.291752195 / 26.8006 / 2.76E-26
Time / 0.30514848 / 0.038191454 / 7.989968 / 1.17E-09
Q1 / -17.5114879 / 1.356199997 / -12.9122 / 1.8E-15
Q2 / -10.4739091 / 1.355662143 / -7.72605 / 2.62E-09
Q3 / -14.3417848 / 1.356199997 / -10.575 / 7.06E-13

28. Given that this is sales data, name the quarter that clearly outperforms ALL other quarters:

a.  Quarter 1

b.  Quarter 2

c.  Quarter 3

d.  Quarter 4

e.  Cannot be determined from the given information

29. Forecast the sales value for period 20, according to this indicator variable model:

a.  34.62

b.  40.72

c.  26.08

d.  42.10

e.  Cannot be determined from the given information

30. Which of the following statements is FALSE ?

a.  SSE should be used for model selection when it is important to avoid any large errors.

b.  Autoregressive models are based on regressing a time series on its past values.

c.  In Autoregressive models some observations are lost, more so if the order of the model is high.

d.  MAD is a criteria for model selection when several forecasting techniques are available.

e.  When using MAD for model selection, the model with the largest MAD statistic should be chosen.


Use the following information to answer the next four questions (#31-34).

Recently a member of the United Nations was sent to Baghdad. He was assigned to compare the prices that looters are charging for hospital equipment on the black market (population 1) as compared to the prices that are charged by the regular manufacturers (population 2). The United Nations representative claims that the black market prices are lower than the regular market prices. Assume that because of the volatile situation in Baghdad, the looters have a drastically higher variation in prices as compared to manufacturers.