Topics for Today
Interpreting the Linear Regression
Using Linear Regression for Prediction
Assessing the quality of the linear relationship
Linear Regression, so what?
So … we can construct a scatterplot to visually demonstrate a relationship.
… and we can calculate the ‘best fit line’ (ie: ______regression line).
Now what?
There are to things we can do with the results of our linear regression:
- make______statements about the relationship between X and Y
- make______for Y based on X
Interpretting Linear Regression
Let’s first recall what our slope and intercept are, then interpret them in a specific example.
/ The slope indicates the # of units Y will ______for each unit X ______.Interpreting the Slope
Let’s look at an actual example. Back to the teen pregnancy, and poverty example.
The regression line was:
(Teen Pregnancy) = _____*(Poverty Rate) + ______
So, for every one unit increase in the Poverty Rate, we expect a _____ unit increase in the Teen Pregnancy rate.
… we have to be careful about what this implies.
’Change in X means Change in Y’
For observational studies, like poverty and pregnancy data, we obviously can’t ______a state’s Poverty Rate (at will anyway).
What we mean is that if State A has a 1 unit______Poverty Rate than State B, we then expect that State A also has a ______Pregnancy Rate.
So, the slope tells us how to compare (potentially unknown) values of Y for two individuals with different values of X.
Interpreting the Intercept
Let’s think about what the intercept is.
The value of Y when X is ____.
(Teen Pregnancy) = ______
Now, plug a Poverty Rate of zero into this linear model.
(Teen Pregnancy) = ______
So, the intercept tells us the Teen Pregnancy rate that would exist even with nobody living in Poverty.
An important note! You usually can’t see the y-intercept on a scatterplot!
Another Example to Interpret
Back to the study of alcohol consumption and heart disease.
Recall, there is a ______relationship between alcohol consumption and heart disease.
Let’s find the least-squares regression line using SPSS.
But ..the first question, what is Y and what is X?
Let’s say we’d like to predict the level of ______from the ______.
So, the equation of the regression line for predicting the number of HeartDiseaseDeaths (Y) from Alcohol Consumed (X) is:
______= ______
So, the slope is ______… meaning that for every additional litre of alcohol consumed per person, the number of heart disease deaths (per 100,000 people) drops by almost __!
What does the interecept mean in this case?
The intercept is the value of Y if X is ____.
This means that if there was no alcohol consumed, there would still be over ___ deaths per 100,000 people.
Prediction
Regression allows us to make a guess at a value of _ for an individual based only on their value of _.
Let’s consider the poverty and teen pregnancy example again.
This study included a sample of 8 states, but demonstrated a very strong linear relationship.
Now, if we know of another state that has a poverty rate of __, what’s a reasonable guess at the teen pregnancy rate in that state?
Beyond using the graph, we can formalize this using the regression formula.
(Teen Pregnancy) = 1.116*(Poverty Rate) + 28.335
So, for a state with a Poverty Rate of 10, the Teen Pregnancy Rate for that state can be estimated as:
(Teen Pregnancy) = ______= ______
= ______
Now let’s do a prediction using the alcohol and heart disease data.
Recall that this data is a random sample of 19 countries. Canada was not one of these countries.
If the average amount of alcohol consumed is______, what would we expect is the number of heart disease deaths per 100,000 people?
Let’s approximate this from the graph.
And now, precisely from the equation:
(HeartDiseaseDeaths) =______
=______
=______
=______
Assessing the Strength of the Linear Relationship
Doesn’t the correlation do this?
Almost.
Mathematically, the precise way to do this is with the correlation squared.
r2 is called the ______.
-it actually represents the ______in Y that is explained by X
-it is simple the ______of the Pearson correlation!
It automatically comes out in the SPSS regression analysis:
Going back to the poverty and pregnancy example:
The R is the Pearson correlation, and the R-squared is the coefficient of determination.
In this case, ____% of the variability in Teen Pregnancy Rates is explained by the Poverty rate.
… so this is an ______regression model.
From the alcohol and heart disease example:
so, __% of the variability in Deaths due to Heart Disease is explained by the consumption of alcohol.
Not as strong a regression model as the poverty/teen pregnancy model, but still ______.
Note: R in the regression output is the absolute value of the Pearson correlation coefficient … so you should still do a separate correlation analysis on this.
New Topics Covered Today
Interpreting slope and intercept
- slope is the amount of change in Y for a 1 unit change in X
- intercept is the value of Y when X is zero
Prediction
- we can use the regression equation to generate potential values for Y when we only know an individual’s value of X
Coefficient of Determination
- squaring the Pearson correlation coefficient tells us the percent of variability in Y which is explained by X
Reading:
Chapter 11 up to Inference in Regression
Stat203Page 1 of 22
Fall2011 – Week 11, Lecture 2