AP Stats – Chap 27

Inferences for Regression

Finally, we’re interested in examining how slopes of regression lines vary from sample to sample. Each sample will have it’s own slope, b1. These are all estimates of the “true” slope, 1. The distribution of all these slopes follows a t-model and has (n – 2) degrees of freedom.

Things you will need to recall about regression from Chapters 7-9:

  • how do we make a scatterplot? Enter data into L1 and L2
  • what do we look for in a scatterplot?Straight line
  • how do we find the correlation? R

stat calc LinReg(a+bx)

  • what does it mean? How much data forms a straight line.
  • what does R2 mean?% of the variance in y that can be explained by the variance in x
  • how do we create a linear model (regression equation)?
  • which variable goes where in the equation, and which gets the hat on it?

Y is the response variable and gets the hat because it is the predicted value

X is the explanatory variable

  • how do we know if a linear model is appropriate? We can tell by looking at the residual plot. It has to be boring and random.
  • how do we create a residuals plot? Once you have done linreg(a+bx), graph with x list: l1

y list: residual

  • what does the slope mean…in context?

The dependent variable changes by ______amount for every one change in independent variable.

  • what does the y-intercept mean…in context?

If the independent variable were 0, the dependent variable would be ___.

Regression Slope t-Test

HYPOTHESIS

null:...within the population, there is no association between the variables (that

we see in the example).

…that the ideal regression line is plain, boring, has a 1 = 0 (horizontal line).

alternative: …there is some relationship between the variables.

…the 1 ≠ 0

MODEL

conditions must be checked in order

Straight Enough Condition – is the scatterplot of the original data straight enough? check the residuals plot! you may need to re-express.

Independence Condition – this is nearlt impossible to check, so check for Randomization. often, the fact that the individuals are a representative sample of the population is the best that can be done.

Does the Plot Thicken? Condition – the spread of the data around the regression “line” should be nearly constant. no fan shape! no growing or shrinking tendencies. again…look at residuals plot!

Nearly Normal Condition – make a histogram of the residuals. it needs to be symmetric and unimodal enough.

If all four conditions are true, the ideal regression line would look like:

“With the conditions having been met, we can use a regression model for the distribution and a linearregression t-test.”

MECHANICS

if you have the individual data:

  • enter data into L1 and L2
  • STAT
  • TESTS
  • LinRegTTest
  • Xlist:L1
  • Ylist:L2
  • Freq:1
  • choose the two-tailed (≠) option
  • RegEQ:
  • CALCULATE

if you have a computer regression analysis:

  • if the t-value is not given in the analysis, you’ll need to calculate

draw a t-curve and shade it

list the p-value

list the regression equation (found in Y1)

CONCLUSION

reject / fail to reject

“There is evidence that…” (provide context!)

confidence interval:

  • if you have a TI-84/84+:
  • STAT
  • TESTS
  • LinRegTInt
  • if you don’t have a TI-84/84+:
  • find the t* value as before. (here, the df = n – 1)
  • interval is

“We are ___% confident that the average __(dependent variable)__ increases/decreases/rises/falls/faster/slower/etc. between _(low)_ and _(high)_ __(units)__ for each additional __(independent variable)__.”

Example #1

High Stakes Test

New state requirements force students to take a “high stakes” math

test in order to graduate from high school. Faced with such a

pressure-laden situation, many students become very nervous, which

may interfere with their ability to perform well. Concerned about “test

anxiety,” a researcher enlists 24 student volunteers for a study.

A psychologist interviews them before the math test, assessing their

anxiety levels on a scale from 1 to 10. The table shows the anxiety

levels and exam scores.

1. Sketch a scatterplot.

2. Does there appear to be an association between anxiety level and

test score? Describe what you see in the scatterplot.

There is a weak, negative, linear association between anxiety

Level and the math test. Generally, lower anxiety scores are

Associated with higher test scores.

3. Find the correlation. What does it indicate?A correlation of -0.525 indicates that there is a weak, negative linear relationship between anxiety and test scores.

4. Interpret the R2 in context. 28% of the variation in the math scores is explained by the linear relationship in anxiety level.

5. Create the linear model.

6. Is this linear model appropriate? Sketch and discuss the residuals plot.

The data is somewhat linear. The residual plot is somewhat scattered so a linear model is appropriate.

7. Interpret the slope of this line in context. For each additional unit of anxiety, the student’s score is predicted by 4.49 points

8. Interpret the y-intercept of this line in context. If a student had no anxiety at all, the model would predict a test score of 91.66

9. Is there evidence of an association between anxiety levels and student performance?

(Perform a test.)

Hypothesis:

HO: There is no association between math tests and anxiety level.

HA: There is a linear association between math tests and anxiety level.

Model

Straight enough condition: There is a somewhat linear pattern in the original data.

Independence condition: Test scores from individual students can be assumed to be independent.

Residual plot: The residual plot is fairly scattered. The residual and scatterplot show relatively consistent spread.

Nearly normal: The histogram of the residuals is unimodal and symmetric enough.

Under these conditions, we can assume we can use a regression model and perform a linear regression t-test.

Mechanics:

Conclusion:with a low p-value, we reject the null. There is evidence of linear association between anxiety level and test score on a math test.

10. Provide a 95% confidence interval.

We are 95% confident that the average math test score decreases between 7.7 and 1.27 points for each additional drop of one unit of anxiety.

Example #2

Electricity Usage

Investigate the association between average monthly temperatureand electrical usage (kilowatt hours) for a home.

Original data – avg temp (x) v.

kwh (y)

Residual plot – avg temp (x) v.

residuals (y)

Histogram of residuals

Is there evidence of an association between average monthly temperate and electrical usage?

Hypothesis

HO: There is no association between average temp and electrical usage.

HA: There is a linear association between average temp and electrical usage.

Model

Straight enough condition: There is a somewhat linear pattern in the original data.

Independence condition: Months taken comprise a full year that we can assume to representative of all years and the usage of one month is independent of another month.

Residual plot: The residual plot is fairly scattered. The residual and scatterplot show relatively consistent spread. There is no thickening.

Nearly normal: The histogram of the residuals is unimodal and symmetric enough.

Under these conditions, we can assume we can use a regression model and perform a linear regression t-test.

Mechanics:

Conclusion: With a low pvalue of less than 0.0001, we reject the null. There is strong evidence that the electrical usage decreases as the ave monthly temperature increases

Explain the association using a 95% confidence interval.

We are 95% confident that each additional increase in average temperature results in a decrease of between 39.81 and 62 kwh of usage.


Example #3

GPAs

Ten students in a graduate program were randomly selected. Their grade point averages (GPAs) when they entered the program were between 3.5 and 4.0. The students’ GPAs on entering the program and their current GPAs were recorded. Use the regression analysis below to answer the questions.

1. Create the linear model.

2. Interpret the p-value. With a high p-value of 0.121739, we fail to reject the null. There is strong evidence that there is no association between the students entering and current gpa.

3. Find a 95% confidence interval for the slope of the regression line.

We are 95% confident that for each 1 unit increase in entering gpa, the student’s current gpa on average increases by -0.56 and 0.62 points.

Example #4

Heights and Weights

Is the height of a man related to his weight?

The regression analysis from a sample of 26

men is shown.

  1. How many degrees of freedom are there? 26 – 2 = 24
  1. What is the t-value?
  1. Find a 98% confidence interval for the slope of the regression line.

We are 98* confident that for every one inch increase in height, on average, the weight in lbs increases by between 5.47 and 12 pounds.