ST3900/4950 HOMEWORK/LAB 5 Spring, 2015

Covered materials: Correlation Analysis & Simple Linear Regression

ST 3900 Lessons 31, 32, 33

ST 4950 Chapter 5 excluding Section G

______

Question 1

Betsy is interested in relating quality of teaching to quality of research by college professors. She has access to a sample of 50 social science professors who were teaching at he same university for a 10-year period. Over this 10-year period, the professors were evaluated on a 5-point scale on quality as instructors and on quality of their courses. Betsy has averaged these ratings to obtain an overall quality rating as an instructor (rating_1) and the overall quality of the course (rating_2) for each professor. In addition, Betsy also has the number of articles that each professor published during this time period (num_pubs) and the number of times these articles were cited by other authors (cites). The data, qn1.txt, is provided on the class website.

(a)  Conduct a correlational analysis to investigate the relationships among these variables. Identify the following on the output:

  1. P value for the correlation between rating_1 and rating_2
  2. Correlation between cites and num_pubs
  3. Correlation between cites and rating_1

(b)  What is the relationship (linear or non-linear or nothing) between the number of articles published and the overall quality of the instructor? Provide a scatterplot to support your answer.

(c)  Compute a regression line of the overall quality of the instructor on the number of articles published. Interpret the slope in practical terms. Check the assumptions.

(d)  (ST3900 only) Create a scatterplot matrix to show the relationships among the 4 variables

(e)  Write a Results/Summary section based on your analysis of these data.

Question 2 is on the second page.

Question 2

Given the following data:

X / Y / Z
1 / 3 / 15
7 / 13 / 7
8 / 12 / 5
3 / 4 / 14
4 / 7 / 10

(a)  Create a correlation matrix for the 3 variables. Identify the Pearson correlation coefficients between X and Y and between X and Z. Which pair(s) is significantly linearly associated? Identify the p-values to support your answer.

We will do regression of Y on X now.

(b)  Draw a scatterplot of Y vs. X with a regression line.

(c)  Compute the regression line of Y on X, i.e. Y is the response variable and X is the explanatory variable. What is the slope and intercept? Are they significantly different from 0?

(d)  Does this regression line fit the data well? Provide a statistic to justify your answer. Check the assumptions of the regression model; do they hold here?

(e)  Create a new variable LY, which is the natural log of Y. Repeat (c), (d) using the new response LY.

(f)  Which model will you use, (d), (e)? Why?