Correlation
1)A researcher has a large number of data pairs (age, height) of humans from birth to 70 years. He computes a correlation coefficient.
- Would you expect it to be positive or negative? Why?
- What would you suggest be a major problem with this approach?
Answer:
- Positive since in general people grow in height increasing with age.
- The underlying data is not linear. During the first few years of life, height increases rapidly. After teenage years height is essentially constant. A correlation coefficient is a measure of the scatter about a straight line.The most commonly used correlation statistic is the Pearson correlation coefficient. This statistic measures both thestrengthanddirectionof the linear relationship between two variables.
A better plan would be to restrict the data set to children only. Since older children are generally taller than younger children, we would expect the dots on the plot to roughly approximate a straight line (a linear relationship between the variables) and that the line will slope upward (since age and height tend to increase at the same time).
2)A sample of two variables of size 40 produces a correlation coefficient of r= 0.682.
a.What is the point estimate for the population correlation coefficient, ρ?
b.Construct a 95% confidence interval for ρ
Answer:
a. = r = 0.682.
b.sr= 0.11864, df = 38, tc= 2.024, Confidence Interval: (0.442, 0.922)
3) A sample of size 50 produces a correlation coefficient r= 0.297.
Test the hypotheses:
H0: ρ=0.
HA: ρ 0.
α=0.05.
What is the test statistic? What do you conclude?
Answer: df = 48, sr= 0.13782, t = 2.1549, p-value = 0.0181; since p-value < α,reject H0
4) Some data are given as:
X / Y1 / 16
2 / 23
4 / 35
3 / 28
5 / 44
6 / 40
3 / 22
8 / 61
9 / 82
a.Sketch a scatterplot.
b.Compute the correlation coefficient, r.
c.Compute the coefficients of the linear regression line, y = b1x + b0.
d.What is the estimated value, ypfor x = 7?
Answer: a.
Some DataY /
X
b.r = 0.963.
c.b1= 7.540, b0= 4.651; therefore, y = 7.540x + 4.651.
d.For x = 7, the point estimate for y is 57.43.
5) Suppose that you have decided to buy an ice cream truck to go into the ice cream business this summer instead of getting a summer job. You collected data every day last summer while working for an ice cream company about the temperature (in °F) and sales (in dollars) for that day as a way to research for your new business. You decided to fit a regression line and get the following based off of your data
Sales = -762 + 18.53*Temperature R2 = 47.1%
A) Which of the following is the proper interpretation of the slope?
a) For every one dollar increase in Sales, Temperature will increase on average by 18.53 degrees.
b) For every one degree increase in Temperature, Sales will increase on average by 18.53 dollars.
c) When the Temperature is 0 degrees, Sales will be 18.53 dollars, on average.
d) When the Sales are 0 dollars, Temperature will be 18.53 degrees, on average.
B) What is the correlation between these two variables?
a)0.6862944
b)-0.6862944
c)0.221841
d) -0.221841
Answer: A) B. The definition of “slope”. (For every one unit increases in x, y increased by b units).
B)A. r= =0.6862944. The slope is positive, so r is also positive.