MAR 5621 – In-Class Project 2
Multiple Linear Regression
1)Download the U.S. County Retail Data. The columns represent: county name, per capita retail sales, per capita retail establishments, per capita income, per capita federal expenditures, and males per 100 females.
a)Fit the model based on the population of 845 counties, relating per capita retail sales (Y) to percapita retail establishments (X1), per capita income (X2), per capita federal expenditures (X3), and males per 100 females (X4). Give the parameters0, , 2, 3, 4 and .
b)Give a histogram of the errors, = Y-(0+1X1+2X2+3X3+4X4). Is it approximately normally distributed?
c)Plot the residuals versus the mean (fitted) values. Does the error variance appear to be constant?
d)What proportion of the total variation in per capita retail sales is “explained” by the set of 4 predictors?
e)Take a random sample of n=40 counties by generating a column of random numbers, then sorting the data set based on the random numbers, then taking the top 40 rows only (deleting the last 805).
f) Fit the model based on the sample of 40 counties, relating per capita retail sales to per capita income. Give the estimates b0, b, b2, b, b4, and SYX.
g)Obtain 95% confidence limits for 0,…, 4. Do the intervals contain the true parameters from part a)?
h)Give a histogram of the residuals, e = Y-(b0+b1X1+b2X2+b3X3+b4X4). Is it approximately normally distributed?
i)Plot the residuals versus the mean (fitted) values. Does the error variance appear to be constant?
j)What proportion of the total variation in per capita retail sales is “explained” by per capita income?
k)Use the CP statistic and stepwise regression to select which model best fits your sample data.
2)Download the advertising expenditure data..
a)Plot the impressions (Y) versus the expenditures (X) and the least squares regression line.
b)Fit a simple regression model, relating Y to X.
c)Plot the residuals versus the fitted values. Comment on patterns.
d)Repeat parts a) – c) for a polynomial regression model of order 2.