AP Stats – Chap 27
Inferences for Regression
Finally, we’re interested in examining how slopes of regression lines vary from sample to sample. Each sample will have it’s own slope, b1. These are all estimates of the “true” slope, 1. The distribution of all these slopes follows a t-model and has (n – 2) degrees of freedom.
Things you will need to recall about regression from Chapters 7-9:
- how do we make a scatterplot? Enter data into L1 and L2
- what do we look for in a scatterplot?Straight line
- how do we find the correlation? R
stat calc LinReg(a+bx)
- what does it mean? How much data forms a straight line.
- what does R2 mean?% of the variance in y that can be explained by the variance in x
- how do we create a linear model (regression equation)?
- which variable goes where in the equation, and which gets the hat on it?
Y is the response variable and gets the hat because it is the predicted value
X is the explanatory variable
- how do we know if a linear model is appropriate? We can tell by looking at the residual plot. It has to be boring and random.
- how do we create a residuals plot? Once you have done linreg(a+bx), graph with x list: l1
y list: residual
- what does the slope mean…in context?
The dependent variable changes by ______amount for every one change in independent variable.
- what does the y-intercept mean…in context?
If the independent variable were 0, the dependent variable would be ___.
Regression Slope t-Test
HYPOTHESIS
null:...within the population, there is no association between the variables (that
we see in the example).
…that the ideal regression line is plain, boring, has a 1 = 0 (horizontal line).
alternative: …there is some relationship between the variables.
…the 1 ≠ 0
MODEL
conditions must be checked in order
Straight Enough Condition – is the scatterplot of the original data straight enough? check the residuals plot! you may need to re-express.
Independence Condition – this is nearlt impossible to check, so check for Randomization. often, the fact that the individuals are a representative sample of the population is the best that can be done.
Does the Plot Thicken? Condition – the spread of the data around the regression “line” should be nearly constant. no fan shape! no growing or shrinking tendencies. again…look at residuals plot!
Nearly Normal Condition – make a histogram of the residuals. it needs to be symmetric and unimodal enough.
If all four conditions are true, the ideal regression line would look like:
“With the conditions having been met, we can use a regression model for the distribution and a linearregression t-test.”
MECHANICS
if you have the individual data:
- enter data into L1 and L2
- STAT
- TESTS
- LinRegTTest
- Xlist:L1
- Ylist:L2
- Freq:1
- choose the two-tailed (≠) option
- RegEQ:
- CALCULATE
if you have a computer regression analysis:
- if the t-value is not given in the analysis, you’ll need to calculate
draw a t-curve and shade it
list the p-value
list the regression equation (found in Y1)
CONCLUSION
reject / fail to reject
“There is evidence that…” (provide context!)
confidence interval:
- if you have a TI-84/84+:
- STAT
- TESTS
- LinRegTInt
- if you don’t have a TI-84/84+:
- find the t* value as before. (here, the df = n – 1)
- interval is
“We are ___% confident that the average __(dependent variable)__ increases/decreases/rises/falls/faster/slower/etc. between _(low)_ and _(high)_ __(units)__ for each additional __(independent variable)__.”
Example #1
High Stakes Test
New state requirements force students to take a “high stakes” math
test in order to graduate from high school. Faced with such a
pressure-laden situation, many students become very nervous, which
may interfere with their ability to perform well. Concerned about “test
anxiety,” a researcher enlists 24 student volunteers for a study.
A psychologist interviews them before the math test, assessing their
anxiety levels on a scale from 1 to 10. The table shows the anxiety
levels and exam scores.
1. Sketch a scatterplot.
2. Does there appear to be an association between anxiety level and
test score? Describe what you see in the scatterplot.
There is a weak, negative, linear association between anxiety
Level and the math test. Generally, lower anxiety scores are
Associated with higher test scores.
3. Find the correlation. What does it indicate?A correlation of -0.525 indicates that there is a weak, negative linear relationship between anxiety and test scores.
4. Interpret the R2 in context. 28% of the variation in the math scores is explained by the linear relationship in anxiety level.
5. Create the linear model.
6. Is this linear model appropriate? Sketch and discuss the residuals plot.
The data is somewhat linear. The residual plot is somewhat scattered so a linear model is appropriate.
7. Interpret the slope of this line in context. For each additional unit of anxiety, the student’s score is predicted by 4.49 points
8. Interpret the y-intercept of this line in context. If a student had no anxiety at all, the model would predict a test score of 91.66
9. Is there evidence of an association between anxiety levels and student performance?
(Perform a test.)
Hypothesis:
HO: There is no association between math tests and anxiety level.
HA: There is a linear association between math tests and anxiety level.
Model
Straight enough condition: There is a somewhat linear pattern in the original data.
Independence condition: Test scores from individual students can be assumed to be independent.
Residual plot: The residual plot is fairly scattered. The residual and scatterplot show relatively consistent spread.
Nearly normal: The histogram of the residuals is unimodal and symmetric enough.
Under these conditions, we can assume we can use a regression model and perform a linear regression t-test.
Mechanics:
Conclusion:with a low p-value, we reject the null. There is evidence of linear association between anxiety level and test score on a math test.
10. Provide a 95% confidence interval.
We are 95% confident that the average math test score decreases between 7.7 and 1.27 points for each additional drop of one unit of anxiety.
Example #2
Electricity Usage
Investigate the association between average monthly temperatureand electrical usage (kilowatt hours) for a home.
Original data – avg temp (x) v.
kwh (y)
Residual plot – avg temp (x) v.
residuals (y)
Histogram of residuals
Is there evidence of an association between average monthly temperate and electrical usage?
Hypothesis
HO: There is no association between average temp and electrical usage.
HA: There is a linear association between average temp and electrical usage.
Model
Straight enough condition: There is a somewhat linear pattern in the original data.
Independence condition: Months taken comprise a full year that we can assume to representative of all years and the usage of one month is independent of another month.
Residual plot: The residual plot is fairly scattered. The residual and scatterplot show relatively consistent spread. There is no thickening.
Nearly normal: The histogram of the residuals is unimodal and symmetric enough.
Under these conditions, we can assume we can use a regression model and perform a linear regression t-test.
Mechanics:
Conclusion: With a low pvalue of less than 0.0001, we reject the null. There is strong evidence that the electrical usage decreases as the ave monthly temperature increases
Explain the association using a 95% confidence interval.
We are 95% confident that each additional increase in average temperature results in a decrease of between 39.81 and 62 kwh of usage.
Example #3
GPAs
Ten students in a graduate program were randomly selected. Their grade point averages (GPAs) when they entered the program were between 3.5 and 4.0. The students’ GPAs on entering the program and their current GPAs were recorded. Use the regression analysis below to answer the questions.
1. Create the linear model.
2. Interpret the p-value. With a high p-value of 0.121739, we fail to reject the null. There is strong evidence that there is no association between the students entering and current gpa.
3. Find a 95% confidence interval for the slope of the regression line.
We are 95% confident that for each 1 unit increase in entering gpa, the student’s current gpa on average increases by -0.56 and 0.62 points.
Example #4
Heights and Weights
Is the height of a man related to his weight?
The regression analysis from a sample of 26
men is shown.
- How many degrees of freedom are there? 26 – 2 = 24
- What is the t-value?
- Find a 98% confidence interval for the slope of the regression line.
We are 98* confident that for every one inch increase in height, on average, the weight in lbs increases by between 5.47 and 12 pounds.