Homework 3, Statistics 112, Fall 2004

This homework is due Thursday, September 30th at the beginning of class.

1. Moore and McCabe, Exercise 1.104.

Solution: (a) 50% (b) Z=-1.33, so probability of less than this is 0.0918. (c) Z=+2.67.

Probability of more than this is 0.38%. (d) For 120, Z=+1.33. Probability between

Z=0 and Z=1.33 is 0.9082-0.5 = 0.4082.

2. Moore and McCabe, Exercise 2.54. The data is stored in farmpopulation.JMP.

Solution:(a) Graph is below. The regression line is Population = 1167 - 0.587 Year. (b)

The decline was about 587,000 people per year. The percent of variation explained is

given by R2=97.7%. (c) The prediction for 1990 is -1.13 million people. The result is not

reasonable because it is negative.

3. All Canadians have government-funded health insurance, which pays for any medical care they require. However, when traveling out of the country, Canadians usually acquire supplementary health insurance to cover the difference between the costs incurred for emergency treatment and what the government program pays. In the United States, this cost differential can be prohibitive. Until recently, private insurance companies (such as Blue Cross) charged everyone the same weekly rate, regardless of age. However, because of rising costs and the realization that older people frequently incur greater medical emergency expenses, insurers decided to change their premium plans. They decided to offer rates that depend on the age of the customer. To help determine new rates, one insurance company gathered data concerning the age and mean daily medical expenses of a random sample of 1,348 Canadians during the previous 12-month period. The data are stored in the file healthinsurance.JMP. Assume that the simple linear regression model holds for this data.

(a) Fit a simple linear regression model to the data.

Solution:

Bivariate Fit of Expense By Age

Linear Fit

Expense = -5.96618 + 0.225734 Age

Summary of Fit

RSquare / 0.064659
RSquare Adj / 0.063964
Root Mean Square Error / 12.97593
Mean of Response / 6.674755
Observations (or Sum Wgts) / 1348

Analysis of Variance

Source / DF / Sum of Squares / Mean Square / F Ratio
Model / 1 / 15666.93 / 15666.9 / 93.0480
Error / 1346 / 226632.43 / 168.4 / Prob > F
C. Total / 1347 / 242299.36 / <.0001

Parameter Estimates

Term / Estimate / Std Error / t Ratio / Prob>|t|
Intercept / -5.96618 / 1.357287 / -4.40 / <.0001
Age / 0.225734 / 0.023401 / 9.65 / <.0001

(b) Interpret the slope of the regression model.

Solution:

The slope is 0.225734. When ageincreases by 1 year, the mean daily medical expenses associated with age increases by about 23 cents.

(c) Based on the regression model, how much more should the insurance company charge 70 year olds as compared to 60 year olds for each day of insurance?

Solution:

They should charge

0.225734*(70-60)=2.25734

more.

4. Problem 3 continued.

(a) A Canadian friend has come to visit you with his 60 year old father for the day (assume they will be staying for 24 hours). You do not know anything more about your friend’s father other than his age, so you can think of him as a randomly chosen 60 year old Canadian. Predict the medical expenses that will be incurred by your friend’s father during his one day to visit to you.

Solution:

Based on the regression model, his daily mean medical expense is

-5.96618 + 0.225734 *60=7.57786.

(b) Would you be surprised if your friend’s father incurred $20 of expenses? Explain your reasoning.

Solution:

I would not be surprised. The RMSE is 12.97593, and $20 is within the 1 Standard Error range.

(c) Assume that the estimated simple linear regression model is the true model. What is the probability that your friend’s father will incur expenses of more than $30?

Solution:

From the regression, we know that

$30-$7.57786=$22.42214.

Then the Z score is: 22.42214/12.97593=1.728.

Using the normal table, we find that the probability of expense more than 30 is approximately 1-0.9582=0.0418.