Practice Midterm Stat 112 D. Small

(For second midterm, scheduled November 13th, 3:00-4:20 p.m.)

Instructions: Closed book. Calculators and two (two-sided) pages of notes allowed. Write answers on the test pages along with your work. Use the back of the test or extra pages as necessary. If a question says to explain your answer, you will get no credit without some explanation. When performing hypothesis tests, include a statement of the null and alternative hypotheses. Time=80 minutes. No questions will be entertained during the exam.

1. Multiple choice questions

(i) A regression of the number of crimes committed in a day on volume of ice cream sales in the same day showed that the coefficient of ice cream sales was positive and significantly differed from zero. Which of the following is the most likely explanation?

(a)  The content of ice cream (probably the sugar) encourages people to commit crimes

(b)  Successful criminals celebrate by eating ice cream

(c)  A pathological desire for ice cream is triggered in a certain percentage of individuals by certain environmental conditions (such as warm days), and these individuals will stop at nothing to satisfy their craving

(d)  Another variable, such as temperature, is associated with both crime and ice cream sales

(ii) The owner of a chain of supermarkets notices that there is a positive correlation between the sales of beer and the sales of ice cream over the course of the previous year. Seasons when sales of beer were above average, sales of ice cream also tended to be above average. Likewise, during seasons when sales of beer were below average, sales of ice cream also tended to be below average. Which of the following would be a valid conclusion from these facts?

(a)  Sales records must be in error. There should be no association between beer and ice cream sales.

(b)  Evidently, for a significant proportion of customers of these supermarkets, drinking beer causes a desire for ice cream or eating ice cream causes a thirst for beer.

(c)  A scatterplot of monthly ice cream sales versus monthly beer sales would show that a straight line describes the pattern in the plot, but it would have to be a horizontal line.

(d)  None of the above.

(iii) In a regression analysis, the residuals represent the

(a)  difference between the actual values and their predicted values

(b)  difference between the actual values and their predicted values

(c)  square root of the coefficient of determination

(d)  change in per unit of

The next two questions correspond to the following setup. Below is a scatterplot of schooling completed (x) and annual income in thousands of dollars (y) for a sample of 18 40-year old men. Suppose we fit the simple linear regression model:

where the deviations were assumed to be independent and normally distributed with mean 0 and standard deviation . The least squares line is found to be .

Bivariate Fit of Income By Years

(iv) Which of the following statements is best supported by the scatterplot (choose only one)?

(a)  There is no striking evidence in the plot that the assumptions for simple linear regression are violated.

(b)  There appears to be an outlier and/or influential observations in the plot suggesting that the least squares line must be interpreted with caution.

(c)  The plot contains dramatic evidence that the standard deviation of the response about the true regression line is not even approximately the same everywhere

(d)  The plot suggests that the relationship between full and assistant professor salaries is highly nonlinear and that a straight line regression function is inappropriate.

(v) A 95% confidence interval for the mean earnings of an individual who obtains 12 years of schooling is ($22,750, $31,750). What can you say about the lower endpoint of a 95% prediction interval for predicting the earnings of an individual who obtains 12 years of schooling?

(a)  it will be lower than $22,750

(b)  it will be higher than $22,750

(c)  it will equal $22,750

(d)  it cannot be determined from the information given

2. For each data problem below, write a letter from the following list corresponding to the most appropriate tool for answering the question of interest

A.  One-way analysis of variance F-test

B.  Planned comparisons (Each pair, student’s t in JMP) t-test for comparing 2 of several means

C.  Bonferroni-adjusted t-test for comparing 2 of several means

D.  Tukey-Kramer-adjusted t-tests for comparing each mean with each other mean

E.  A t-test for the hypothesis that the intercept in a simple linear regression model is zero

F.  A confidence interval for the intercept in a simple regression model

G.  A t-test for the hypothesis that the slope in a simple regression model is zero.

H.  A confidence interval for the slope in a simple regression model.

I.  A confidence interval for the mean of a response at some value of the explanatory variable

J.  A prediction interval for values of the response at some value of the explanatory variable

(a) x= amount of newspaper coverage devoted to a publicized suicide (in square inches), y= number of suicides in the week after the newspaper coverage. Is the mean of y associated wtih x (from 15 observed x, y pairs)

Tool for answering question: _____

(b) Each of 30 cats was randomly assigned to receive one of 5 diet treatments. The weight gain was measured for each cat after completion of the diet. Are the mean weight gains the same for all 5 diets?

Tool for answering question: _____

(c) (Continuation of (b)) Which diets differ from which others (with respect to mean weight gain)?

Tool for answering question: _____

(d) At a bolt manufacturing plant 20 bolts were selected at each of days 0, 5, 10, 15 and 20 after machine maintenance, and their diameters were measured. It is believed that the mean diameter decreases linearly with day. What can we expect a diameter to be for any particular bolt made on day 12 after machine maintenance.

Tool for answering question: _____

3. The U.S. Vocational Rehabilitation Act of 1973 prohibits discrimination against people with physical disabilities. A study explored how physical handicaps affect people’s perception of employment qualifications.[1]

The researchers prepared five videotaped job interviews, using the same two male actors for each. A set script was designed to reflect an interview with an applicant of average qualifications. The tapes differed only in that the applicant appeared with a different handicap. In one, he appeared in a wheelchair; in a second, he appeared on crutches; in another, his hearing was impaired; in a fourth, he appeared to have one leg amputated; and in the final tape, he appeared to have no handicap.

Seventy undergraduate students from a U.S. university were randomly assigned to view the tapes, fourteen to each tape. After viewing the tape, each subject rated the qualifications of the applicant on a 0- to 10-point scale. JMP output is given at the end of the question.

(a) (4) What evidence is there that subjects systematically evaluate qualifications differently according to the candidate’s handicap? State your null and alternative hypotheses and justify your conclusion.


(b) (3) Using Tukey’s multiple comparison method, list all significant differences (at the 5% significance level) between handicap types in terms of subject’s evaluation of their qualifications (Be sure to state which handicap types are rated significantly higher than which other handicap types).

(c) (3) Amputee, crutches and wheelchair can be considered as handicaps of mobility and hearing can be considered as a handicap of communication. How would you reanalyze the data to test whether the average mean of subjects’ evaluations of qualifications for handicaps of mobility is different than the mean for handicaps of communication? Just describe how you would do it in words – you don’t need to do any calculations. Be specific about what test you would use. Note that you are not allowed to collect any additional data – you must reanalyze the existing data.

JMP Output for 2

Oneway Analysis of Qualification Score By Handicap

Analysis of Variance

Source / DF / Sum of Squares / Mean Square / F Ratio / Prob > F /
Handicap / 4 / 3052.143 / 763.036 / 2.8616 / 0.0301
Error / 65 / 17332.143 / 266.648
C. Total / 69 / 20384.286

Means for Oneway Anova

Level / Number / Mean / Std Error / Lower 95% / Upper 95% /
Amputee / 14 / 44.2857 / 4.3642 / 35.570 / 53.002
Crutches / 14 / 59.2143 / 4.3642 / 50.498 / 67.930
Hearing / 14 / 40.5000 / 4.3642 / 31.784 / 49.216
None / 14 / 49.0000 / 4.3642 / 40.284 / 57.716
Wheelchair / 14 / 53.4286 / 4.3642 / 44.713 / 62.144

Std Error uses a pooled estimate of error variance

Means Comparisons

Dif=Mean[i]-Mean[j] / Crutches / Wheelchair / None / Amputee / Hearing /
Crutches / 0.000 / 5.786 / 10.214 / 14.929 / 18.714
Wheelchair / -5.786 / 0.000 / 4.429 / 9.143 / 12.929
None / -10.214 / -4.429 / 0.000 / 4.714 / 8.500
Amputee / -14.929 / -9.143 / -4.714 / 0.000 / 3.786
Hearing / -18.714 / -12.929 / -8.500 / -3.786 / 0.000

Alpha= 0.05

Comparisons for all pairs using Tukey-Kramer HSD

q* /
2.80582
Abs(Dif)-LSD / Crutches / Wheelchair / None / Amputee / Hearing /
Crutches / -17.317 / -11.532 / -7.103 / -2.389 / 1.397
Wheelchair / -11.532 / -17.317 / -12.889 / -8.174 / -4.389
None / -7.103 / -12.889 / -17.317 / -12.603 / -8.817
Amputee / -2.389 / -8.174 / -12.603 / -17.317 / -13.532
Hearing / 1.397 / -4.389 / -8.817 / -13.532 / -17.317

Positive values show pairs of means that are significantly different.

4. Answer the following questions based on the setting described below and the JMP output.

A service firm has experienced rapid growth. Because of this growth, some of the employees who handle customer calls have had to work additional hours (overtime). The firm is concerned that over-worked employees are less productive and handle fewer calls per hour than employees who work less demanding schedules. Most employees who work the “conventional” schedule put in 30-40 hours a week, depending upon demand. The firm constructed the regression model shown next relating the number of hours worked (X) to the number of calls serviced per hour (Y) for 60 employees.

Bivariate Fit of Calls Per Hour By Hours Worked

Linear Fit

Calls Per Hour = 22.141429 - 0.1967268 Hours Worked

Summary of Fit

RSquare / 0.34405
RSquare Adj / 0.33274
Root Mean Square Error / 1.594723
Mean of Response / 14.23977
Observations (or Sum Wgts) / 60

Analysis of Variance

Source / DF / Sum of Squares / Mean Square / F Ratio /
Model / 1 / 77.36585 / 77.3659 / 30.4214
Error / 58 / 147.50225 / 2.5431 / Prob > F
C. Total / 59 / 224.86811 / <.0001

Parameter Estimates

Term / Estimate / Prob>|t| / Lower 95% / Upper 95% /
Intercept / 22.141429 / <.0001 / 19.244283 / 25.038576
Hours Worked / -0.196727 / <.0001 / -0.268123 / -0.12533

(a) Does the number of hours worked impact the number of calls handled per hour, or can we explain the declining pattern seen in the plot as simply a random coincidence? Explain.

(b) From the model, how many calls on average does an employee who works a 30 hour week process?

(c) For each additional 10 hours of work, how many fewer calls are processed per hour, on average? Might the drop be as large as 2 calls per hour.

(d) How would you interpret the intercept in this fitted model?

(e) When the model was used to predict the number of calls handled by a new employee, the model’s prediction of the number of calls per hour was 3 calls per hour more than the actual productivity. Can we conclude whether the performance of the new employee is consistent with the performance of the employees used to build this model?

(f) This fitted model uses data from 60 employees. If the sample size were increased to 240 (i.e., fit to a sample that is four times larger), then

(i) What would happen to the confidence interval for the slope?

(ii) Would the predictive accuracy of the model improve, especially when used to predict the performance within the range of prior experience?

(iii) What would happen to the estimates of the slope and intercept on average?

(iv) Would the model’s R2’s goodness of fit index increase on average?

(g) Which of the following is the best characterization of the R2 statistic for this model?

(i) The model explains about 34% of the variation in productivity?

(ii) The model predicts about 34% of the employees accurately?

(iii) The model predicts about 66% of the employees accurately?

(iv) Only 20% of the observations lie within 2 RMSE’s of the fitted line?

5. World population has grown approximately exponentially between 1950-2002.

Bivariate Fit of log population By Year

The least squares equation for the regression of on time () for the data between 1950-2002 is the following:

Based on the least squares equation, what would you predict the world population to be in 2010.

[1] Data from S.J. Cesare, R.J. Tannenbaum and A. Dalessio, “Interveiwers’ Decisions Related to Applicant Handicap Type and Rater Empathy,” Human Performance 3(3)(1990): 157-171.