ST3900/4950 HOMEWORK/LAB 5 Spring, 2015
Covered materials: Correlation Analysis & Simple Linear Regression
ST 3900 Lessons 31, 32, 33
ST 4950 Chapter 5 excluding Section G
______
Question 1
Betsy is interested in relating quality of teaching to quality of research by college professors. She has access to a sample of 50 social science professors who were teaching at he same university for a 10-year period. Over this 10-year period, the professors were evaluated on a 5-point scale on quality as instructors and on quality of their courses. Betsy has averaged these ratings to obtain an overall quality rating as an instructor (rating_1) and the overall quality of the course (rating_2) for each professor. In addition, Betsy also has the number of articles that each professor published during this time period (num_pubs) and the number of times these articles were cited by other authors (cites). The data, qn1.txt, is provided on the class website.
(a) Conduct a correlational analysis to investigate the relationships among these variables. Identify the following on the output:
- P value for the correlation between rating_1 and rating_2
- Correlation between cites and num_pubs
- Correlation between cites and rating_1
(b) What is the relationship (linear or non-linear or nothing) between the number of articles published and the overall quality of the instructor? Provide a scatterplot to support your answer.
(c) Compute a regression line of the overall quality of the instructor on the number of articles published. Interpret the slope in practical terms. Check the assumptions.
(d) (ST3900 only) Create a scatterplot matrix to show the relationships among the 4 variables
(e) Write a Results/Summary section based on your analysis of these data.
Question 2 is on the second page.
Question 2
Given the following data:
X / Y / Z1 / 3 / 15
7 / 13 / 7
8 / 12 / 5
3 / 4 / 14
4 / 7 / 10
(a) Create a correlation matrix for the 3 variables. Identify the Pearson correlation coefficients between X and Y and between X and Z. Which pair(s) is significantly linearly associated? Identify the p-values to support your answer.
We will do regression of Y on X now.
(b) Draw a scatterplot of Y vs. X with a regression line.
(c) Compute the regression line of Y on X, i.e. Y is the response variable and X is the explanatory variable. What is the slope and intercept? Are they significantly different from 0?
(d) Does this regression line fit the data well? Provide a statistic to justify your answer. Check the assumptions of the regression model; do they hold here?
(e) Create a new variable LY, which is the natural log of Y. Repeat (c), (d) using the new response LY.
(f) Which model will you use, (d), (e)? Why?