I. Answer the following multiple choice questions. Mark your answers in your blue book.
1.(3) The scatterplot below plots for each of the 50 states, the infant morality rate (death per 1000) X in 1990 in the state vs. the percent of 18 years olds in the state Y in 1990 who graduated from high school.
The correlation between X and Y is . If instead of plotting these variables for each of the 50 states, we plotted the values of these variables for each county in the United States, we would expect the value of the correlation to be
(a) About the same
(b) Somewhat less than –0.54 (i.e., closer to –1)
(c) Somewhat higher than –0.54 (i.e., closer to 0)
(d) Much higher and probably near 1 because there are many more counties than states.
2.(3) Referring to the previous question, the least-squares regression line was fit to the data in the scatterplot and the residuals computed. A plot of the residuals versus the 1990 population in the state is given below.
This plot suggests
(a) high infant mortality rates imply low nutrition and hence higher drop-out rates later in life, but only for states with large populations
(b) high infant mortality rates imply low nutrition and hence higher drop-out rates later in life, but only for states with small populations
(c) population may be a lurking variable in understanding the association between infant mortality rate and percent graduating from high school.
(d) None of the above.
3.(3) The owner of a chain of supermarkets notices that there is a positive correlation between the sales of beer and the sales of ice cream over the course of the previous year. Seasons when sales of beer were above average, sales of ice cream also tended to be above average. Likewise, during seasons when sales of beer were below average, sales of ice cream also tended to be below average. Which of the following would be a valid conclusion from these facts?
(a) Sales records must be in error. There should be no association between beer and ice cream sales.
(b) Evidently, for a significant proportion of customers of these supermarkets, drinking beer causes a desire for ice cream or eating ice cream causes a thirst for beer.
(c) A scatterplot of monthly ice cream sales versus monthly beer sales would show that a straight line describes the pattern in the plot, but it would have to be a horizontal line.
(d) None of the above.
4.(3) Consider the following scatterplot.
From this plot, we can conclude
(a)there is evidence of a modest cause-and-effect relation between X and Y with increases in X causing increases in Y
(b)there is an outlier in the plot
(c)there is a strongly influential point in the plot
(d)all of the above
5.(3) An article in the student newspaper of a large university had the headline “A’s swapped for evaluations?” The article included the following.
“According to a new study, teachers may be more inclined to give higher grades to students, hoping to gain favor with the university administrators who grant tenure. The study examined the average grade and teaching evaluation in a large number of courses given in 1997 in order to investigate the effect of grade inflation on evaluations. `I am concerned with the student evaluations because instruction has become a popularity contest for some teachers,’ said Professor Smith, who recently completed the study. Results showed higher grades directly corresponded to more positive evaluation.”
Which of the following would be a valid conclusion to draw from the study?
(a)A teacher can improve their teaching evaluations by giving good grades.
(b)A good teacher, as measured by teaching evaluations, helps students learn better resulting in higher grades.
(c)Teachers of courses in which the mean grade is above average apparently tend to have above-average teaching evaluations.
(d)All of the above.
6.(3) Which of the following would provide evidence that a power law model describes the relationship between a response y and an explanatory variable x?
(a)A scatterplot of versus looks approximately linear.
(b)A scatterplot of versus looks approximately linear.
(c)A scatterplot of versus looks approximately linear.
(d)A scatterplot of versus looks approximately linear.
7.(3) A variable grows exponentially over time if
(a)the variable increases by the addition of a fixed amount of the variable as time increases by a fixed amount.
(b)The variable increases by squaring its value whenever time is increased by a certain fixed amount
(c)The variable increases by multiplication by a fixed amount as time increases by a fixed amount
(d)The variable increases by the logarithm of its value whenever time is increased by a certain fixed amount
II. Below is a scatterplot of schooling completed (x) and annual income in thousands of dollars (y) for a sample of 18 40-year old men. Suppose we fit the simple linear regression model:
where the deviations were assumed to be independent and normally distributed with mean 0 and standard deviation . The least squares line is found to be .
Bivariate Fit of Income By Years
(a)(4) Which of the following statements is best supported by the scatterplot (choose only one)?
(i)There is no striking evidence in the plot that the assumptions for simple linear regression are violated.
(ii)There appears to be an outlier and/or influential observations in the plot suggesting that the least squares line must be interpreted with caution.
(iii)The plot contains dramatic evidence that the standard deviation of the response about the true regression line is not even approximately the same everywhere
(iv)The plot suggests that the relationship between full and assistant professor salaries is highly nonlinear and that a straight line regression function is inappropriate.
(b)(4) The analysis of variance table is given below. What null hypothesis is tested by the ANOVA F statistic? What does this hypothesis say in practical terms?
Analysis of Variance
Source / DF / Sum of Squares / Mean Square / F RatioModel / 1 / 101626374 / 101626374 / 1.5191
Error / 16 / 1070373626 / 66898352 / Prob > F
C. Total / 17 / 1172000000 / 0.2356
(c)(4) A 95% confidence interval for the mean earnings of an individual who obtains 12 years of schooling is ($22,750, $31,750). What can you say about the lower endpoint of a 95% prediction interval for predicting the earnings of an individual who obtains 12 years of schooling?
- it will be lower than $22,750
- it will be higher than $22,750
- it will equal $22,750
- it cannot be determined from the information given
III. The U.S. government is interested in predicting the amount of corn that will be produced in the next year for planning purposes. The government has available a long range weather forecast for the expected rainfall in the next year. The following chart lists the corn yield (in bushels per acre) and average rainfall (in inches per year) in the U.S. from 1890-1927. A statistician for the government estimates the relationship between corn yield and average rainfall using the regression model below.
Year / Corn / Rain / Year / Corn / Rain / Year / Corn / Rain1890 / 24.5 / 9.6 / 1903 / 30.2 / 14.1 / 1916 / 29.7 / 9.3
1891 / 33.7 / 12.9 / 1904 / 32.4 / 10.6 / 1917 / 35 / 9.4
1892 / 27.9 / 9.9 / 1905 / 36.4 / 10 / 1918 / 29.9 / 8.7
1893 / 27.5 / 8.7 / 1906 / 36.9 / 11.5 / 1919 / 35.2 / 9.5
1894 / 21.7 / 6.8 / 1907 / 31.5 / 13.6 / 1920 / 38.3 / 11.6
1895 / 31.9 / 12.5 / 1908 / 30.5 / 12.1 / 1921 / 35.2 / 12.1
1896 / 36.8 / 13 / 1909 / 32.3 / 12 / 1922 / 35.5 / 8
1897 / 29.9 / 10.1 / 1910 / 34.9 / 9.3 / 1923 / 36.7 / 10.7
1898 / 30.2 / 10.1 / 1911 / 30.1 / 7.7 / 1924 / 26.8 / 13.9
1899 / 32 / 10.1 / 1912 / 36.9 / 11 / 1925 / 38 / 11.3
1900 / 34 / 10.8 / 1913 / 26.8 / 6.9 / 1926 / 31.7 / 11.6
1901 / 19.4 / 7.8 / 1914 / 30.5 / 9.5 / 1927 / 32.6 / 10.4
1902 / 36 / 16.2 / 1915 / 33.3 / 16.5
A statistician for the government estimates the relationship between corn yield and average rainfall using the regression model below.
Bivariate Fit of Corn Yield By Rainfall
Linear Fit
Corn Yield = 23.552102 + 0.7755493 Rainfall
Summary of Fit
RSquare / 0.16211RSquare Adj / 0.138835
Root Mean Square Error / 4.049471
Mean of Response / 31.91579
Observations (or Sum Wgts) / 38
Analysis of Variance
Source / DF / Sum of Squares / Mean Square / F RatioModel / 1 / 114.21474 / 114.215 / 6.9651
Error / 36 / 590.33578 / 16.398 / Prob > F
C. Total / 37 / 704.55053 / 0.0122
Parameter Estimates
Term / Estimate / Std Error / t Ratio / Prob>|t|Intercept / 23.552102 / 3.236462 / 7.28 / <.0001
Rainfall / 0.7755493 / 0.293864 / 2.64 / 0.0122
Distributions
Residuals Corn Yield
(a)(6) Do the regression diagnostics indicate any problems with this regression model? Comment on both the residual plot and the plots of the distribution of the residuals.
(b)(5) The statistician believes that advances in technology over time may be a lurking variable. Suggest a plot that the statistician could make based on the data in the table above to diagnose whether advances in technology is a lurking variable and state what the statistician should look for in this plot to diagnose whether advances in technology is a lurking variable.
For the remaining question, ignore any problems (if any) with the regression assumptions as indicated by the regression diagnostics. That is, go ahead and assume that the simple linear regression model holds even if you think the regression model should be improved.
(c)(6) Is there strong evidence that the mean corn yield changes as the average rainfall increases? State this question in terms of a hypothesis test and report the results of the test at the 0.01 significance level.
(d)(5) Find a 95% confidence interval for the slope of the regression line.
(e)(5) The government forecasts that the average rainfall will be 10 inches next year. The government wants to be reasonably certain that the corn yield will be at least 30 bushels per acre because otherwise there will be a shortage - in which case the government wants to offer farmers incentives to increase corn production. Which is more relevant to the government’s decision about whether to offer farmer’s incentives to increase corn production – (i) a confidence interval for the mean corn yield for an average rainfall of 10 inches or (ii) a prediction interval for the corn yield in a year with an average rainfall of 10 inches.
IV. (a)(6) World population has grown approximately exponentially between 1950-2002.
Bivariate Fit of log population By Year
The least squares equation for the regression of on time () for the data between 1950-2002 is the following:
Based on the least squares equation, what would you predict the world population to be in 2010.
(b)(6) A demographer claims that world population doubles every 20 years. Has the world population grown slower or faster between 1950-2002 than the demographer claims? Justify your answer.
V.
(a)(6) A preschool program attempts to boost children’s IQs. All of the kids in a large school district are enrolled in the program. The children are tested when they enter the program at age 4 and when they leave the program at age 5. Among all children in the population, the mean IQ scores at age 4 is 100 with a standard deviation of 15 and the mean IQ scores at age 5 is also 100 with a standard deviation of 15. Among children in the program, the mean IQ score at age 4 is 100 with a standard deviation of 15 and the mean IQ score at age 5 is 110 with a standard deviation of 15. The children in the program who had very low IQ scores at age 4 continued to be below average but improved by age 5 while the chilren who had very high IQ scores at age 4 continued to be above average at age 5 but did not do as well relative to the rest of the children in the program as they did before. Is there strong evidence that the program is effective in boosting IQs or can the above data be explained by the regression effect? Explain briefly.
(b)(6) Another program focuses only on children who had low IQ scores at age 4. Only children who had an IQ score below 90 at age 4 are enrolled in the program (there are still a large number of students enrolled in the program). The mean IQ score of children enrolled in the program improves to 95 by age 5. Does this provide strong evidence that the program is effective in boosting IQs or can this be explained by the regression effect? Explain briefly.
VI.
(a)(4) Consider a multiple linear regression model for personal income (y) as a function as a function of years of education (e) and IQ (iq):
Answer true or false; if false, correct it.
(i)The coefficient may be interpreted as the average amount of income that a randomly chosen person with an IQ of 101 earns more than a randomly chosen person with an IQ of 100.
(ii)A randomly chosen person with an IQ of 100 who obtained a college degree (and no further education) earns on average 4more income than a randomly chosen person with an IQ of 100 who only obtained a high school degree.
(b)(4) A sociologist computed a regression of mobility as a function of family income :
Then she realized that family size also was relevant and so she calculated the multiple regression:
.
Under what conditions would you expect the coefficient of in the two regressions to be approximately equal (i.e., under what conditions would you expect to approximately equal )?
VII. (8) Advocates of a broad-based health care program in a developing country claim that it will raise mean birthweight in villages where birthweight was too low and lower mean birthweight in villages where birthweight was too high. The following JMP output shows a scatterplot of the mean birthweight in a village before the program was implemented (before program) and the mean birthweight in a village after the program was implemented (after program) for 11 villages , the least squares regression line for the response variable after program and the explanatory variable before program and distributional summaries of the after program and before program variables. Does the output provide strong evidence that the program had its intended effect of raising mean birthweight in villages where birthweight was too low and lowering mean birthweight in villages where birthweight was too high? Justify your answer.
Bivariate Fit of after program By before program
Linear Fit
after program = 3.9027273 + 0.4218182 before program
Summary of Fit
RSquare / 0.867569RSquare Adj / 0.852855
Root Mean Square Error / 0.14404
Mean of Response / 6.75
Observations (or Sum Wgts) / 11
Analysis of Variance
Source / DF / Sum of Squares / Mean Square / F RatioModel / 1 / 1.2232727 / 1.22327 / 58.9601
Error / 9 / 0.1867273 / 0.02075 / Prob > F
C. Total / 10 / 1.4100000 / <.0001
Parameter Estimates
Term / Estimate / Std Error / t Ratio / Prob>|t|Intercept / 3.9027273 / 0.373343 / 10.45 / <.0001
before program / 0.4218182 / 0.054935 / 7.68 / <.0001
Distributions
before program
Moments
Mean / 6.75Std Dev / 0.8291562
Std Err Mean / 0.25
upper 95% Mean / 7.3070347
lower 95% Mean / 6.1929653
N / 11
after program
Moments
Mean / 6.75Std Dev / 0.3754997
Std Err Mean / 0.1132174
upper 95% Mean / 7.0022641
lower 95% Mean / 6.4977359
N / 11