6N216: Data and Decisions

Homework 4

1.(a) A credit card company regularly sells the list of its cardholders to direct-mail advertisers. To promote its list, the company often needs to estimate specialized characteristics of its cardholders. For example, an automobile manufacturer is interested in the average amount spent by the cardholders on their most recent new-car purchase. For 200 cardholders chosen at random, the amount spent on the most recent new-car purchase has sample mean X-bar = $20,000 and sample standard deviation SX = $6,000. Use this data to compute a 99% confidence interval for the amount an average cardholder spends on a new car.

(b) Assume that nationally the average price paid on most-recent new car purchases is $19,000. Is the difference between the sample value of $20,000 and the national average of $19,000 statistically significant at the 5% level? Explain.

2.(a) Assume that car thefts in a certain city historically average 5 thefts per week throughout the year. In an effort to reduce car thefts, the police department conducts an extensive “awareness campaign” to educate citizens on ways to avoid car theft. In the six weeks following the campaign, the numbers of thefts in each week are 3, 6, 4, 2, 5, and 4. It is reasonable to assume that the number of thefts per week is approximately normally distributed. Use the data for the 6 weeks to construct a 95% confidence interval for the new average number of thefts per week.

(b)Is the difference between the sample mean from the six weeks of data and the historical mean of 5 thefts per week statistically significant at the 5% level? Explain.

3. The weight data in hw4_data.xls provides information on the weight of a sample of US males aged 20- 39 years old.

a. What is the average weight of men in their 20’s? In their 30’s?

b. Based on this sample, can you conclude (at the 95% confidence level) that the average weight of men in their 30’s is higher than the average weight of men in their 20’s?

c. What are the standard deviations of the weights of men in the two groups?

d. Can you conclude (at the 95% level) that the standard deviations of weights of men in their 20’s differs from that of men in their 30’s?

e. Can you conclude (at the 95% level) that the standard deviation of weights of men in the 30’s is greater than 15?

f. Can you conclude (at the 95% level) that the average weight of men in either their 20’s or 30’s (use the entire sample) is greater than 120?

4.A survey was conducted to evaluate viewer preferences for network news coverage. A cross-tabulation of viewer preference with age results in the following table:

Age / ABC / NBC / CBS
Under 20 / 30 / 20 / 20
20-30 / 60 / 70 / 80
30-40 / 100 / 110 / 70
Over 40 / 110 / 100 / 80

(a)In one or two sentences, describe what appears to be the relationship between age and network preference, if any, which is suggested by the above data.

(b)Run a chisquare test of independence on the data to evaluate the hypothesis that network preference is independent of viewer age. Is the departure from independence statistically significant at the 5% level?

5.Refer to the MBA job placement data in Homework 1 (repeated in hw4_data.xls), and run Chi-Square tests to determine the independence of GPA and having a job offer and also of the GMAT score and having a job offer. What conclusions can you draw? Are these conclusions consistent with those of Homework 1? (Hint: Aggregate the GPA data into two intervals, each of size 1, with the first starting at 2, and also aggregate the GMAT data into two intervals, each of size 250, with the first starting at 350. In addition, if the relevant data is not present for one or more students, you may ignore those students in your analysis.)

6. Read “The Avocado Case.” The analysis in the case indicates that considerable improvement in the Avocado’s German market performance should be possible. The data in Exhibit 2 of the case is in the worksheet avocado.xls.

a)Repeat the regression described in the article, following the in-class examples (scatter diagram, trendline, full regression analysis including residuals and residual plot, etc.). However, restrict your attention to just the regression of EATEN on AWARE; ignore the variable BOUGHT. Briefly describe the logic behind the conclusion, based on this regression, that the avocado is under-performing relative to other produce items. Interpret the intercept and/or slope, and examine the residuals. In particular, does your plot of the residuals on AWARE reveal any clear patterns?

b)One concern, looking at the data, is the combination of very well known produce items, like oranges, bananas, etc., with more exotic items like the avocado. A reasonable approach is to remove the very common items (say those with AWARE levels of 98% or more) from the regression analysis, and restrict it to the lesser-known items. How do the results of this analysis compare with the original analysis based on the full data set? What effect does the change have on the conclusion about the avocado’s performance?