Statistics 103Probability and Statistical Inference
Instructions for lab 9

Lab Objective


Practice with multiple linear regression, residual diagnostics, and dummy variables.


Lab Procedures

Many macroeconomic studies use cross-sectional data (i.e., data from the same time frame) from countries around the world. Of particular interest is the factors related to Gross National Product (GNP), which essentially is the amount of money the country produces from all sources.


Open the data setcountries.JMP. It contains economic data for 97 countries from around the world. All monetary values are expressed in U.S. dollars. The variables include:


GNP (per capita) the GNP divided by the number of people in the country.
Birth Rate (per 1000) the number of births per 1000 people in the country.
Death Rate (per 1000) the number of deaths per 1000 people in the country.
Infant Deaths (per 1000) the number of infant deaths per 1000 people in the country.
Life Expectancy (Males) average age at death for men.
Life Expectancy (Females) average age at death for women.
Region Eastern European and former Soviet Union countries = 1
South American and Central American countries = 2
"Western" countries (e.g., France, Japan, USA) = 3
Middle Eastern countries = 4
South Asian countries = 5
African countries = 6.
Country name of country.

·  Fitting multiple linear regression model

1.  Analyze à Fit model

2.  Select the response variable and add to the Y box.

3.  Select each predictor and add to the Construct Model Effects box.

4.  To add an interaction.

5.  Highlight a variable in the Select Column and a variable in the Construct Model Effects box. Then click Cross. You should see the interaction term in the Effect box.

6.  Click Run Model.


Questions

1.  Does a normal curve describe the distribution of per capita GNP well?

2.  What is the regression equation for predicting per capita GNP (Y) from birth rate (X)?

3.  How much error did GNPs deviate fromthe regression line?

4.  Click on the red arrow beside Linear Fit and select Plot Residual. Does the plot of residuals versus the predictor suggest any violations of the regression assumptions? If so what are they?

5.  Let's do the regression using the (natural) logarithm of per capita GNP as the dependent variable.

a.  What is the regression equation for predicting the logarithm per capita GNP (Y) from birth rate (X)?

b.  Also plot the residuals versus predictor. Are the linear model assumptions more appropriate?

6.  Interpret the effect of birth rate on LogGNP and give a 90% confidence interval for the true regression slope.

7.  Fit a linear model for LogGNP and include both birth rate and death rate simultaneously.

a.  Interpret the slope coefficient for birth rate.

b.  How does the slope of birth rate change between model with or without death rate as a predictor?

c.  Describe the relationship between birth rate and death rate.

For the following questions, use the indicator (dummy) variables for country region to fit an appropriate model. Write down the model and the estimated regression coefficients. Justify your conclusion by either reporting a confidence interval or carrying out a hypothesis test.

8.  After controlling for birth rate and death rate, is there evidence that Western countries have higher log GNP compared to the other countries?

9.  After controlling for death rate, is there evidence that the associations of birth rate and log GNP were different between Western countries and the other countries?