Topics for Today

Interpreting the Linear Regression

Using Linear Regression for Prediction

Assessing the quality of the linear relationship

Linear Regression, so what?

So … we can construct a scatterplot to visually demonstrate a relationship.

… and we can calculate the ‘best fit line’ (ie: ______regression line).

Now what?

There are to things we can do with the results of our linear regression:

  1. make______statements about the relationship between X and Y
  2. make______for Y based on X

Interpretting Linear Regression

Let’s first recall what our slope and intercept are, then interpret them in a specific example.

/ The slope indicates the # of units Y will ______for each unit X ______.

Interpreting the Slope

Let’s look at an actual example. Back to the teen pregnancy, and poverty example.

The regression line was:

(Teen Pregnancy) = _____*(Poverty Rate) + ______

So, for every one unit increase in the Poverty Rate, we expect a _____ unit increase in the Teen Pregnancy rate.

… we have to be careful about what this implies.

’Change in X means Change in Y’

For observational studies, like poverty and pregnancy data, we obviously can’t ______a state’s Poverty Rate (at will anyway).

What we mean is that if State A has a 1 unit______Poverty Rate than State B, we then expect that State A also has a ______Pregnancy Rate.

So, the slope tells us how to compare (potentially unknown) values of Y for two individuals with different values of X.

Interpreting the Intercept

Let’s think about what the intercept is.

The value of Y when X is ____.

(Teen Pregnancy) = ______

Now, plug a Poverty Rate of zero into this linear model.

(Teen Pregnancy) = ______

So, the intercept tells us the Teen Pregnancy rate that would exist even with nobody living in Poverty.

An important note! You usually can’t see the y-intercept on a scatterplot!


Another Example to Interpret

Back to the study of alcohol consumption and heart disease.

Recall, there is a ______relationship between alcohol consumption and heart disease.

Let’s find the least-squares regression line using SPSS.

But ..the first question, what is Y and what is X?

Let’s say we’d like to predict the level of ______from the ______.



So, the equation of the regression line for predicting the number of HeartDiseaseDeaths (Y) from Alcohol Consumed (X) is:

______= ______

So, the slope is ______… meaning that for every additional litre of alcohol consumed per person, the number of heart disease deaths (per 100,000 people) drops by almost __!

What does the interecept mean in this case?

The intercept is the value of Y if X is ____.

This means that if there was no alcohol consumed, there would still be over ___ deaths per 100,000 people.

Prediction

Regression allows us to make a guess at a value of _ for an individual based only on their value of _.

Let’s consider the poverty and teen pregnancy example again.

This study included a sample of 8 states, but demonstrated a very strong linear relationship.

Now, if we know of another state that has a poverty rate of __, what’s a reasonable guess at the teen pregnancy rate in that state?
Beyond using the graph, we can formalize this using the regression formula.

(Teen Pregnancy) = 1.116*(Poverty Rate) + 28.335

So, for a state with a Poverty Rate of 10, the Teen Pregnancy Rate for that state can be estimated as:

(Teen Pregnancy) = ______= ______

= ______

Now let’s do a prediction using the alcohol and heart disease data.

Recall that this data is a random sample of 19 countries. Canada was not one of these countries.

If the average amount of alcohol consumed is______, what would we expect is the number of heart disease deaths per 100,000 people?

Let’s approximate this from the graph.

And now, precisely from the equation:

(HeartDiseaseDeaths) =______

=______

=______

=______
Assessing the Strength of the Linear Relationship

Doesn’t the correlation do this?

Almost.

Mathematically, the precise way to do this is with the correlation squared.

r2 is called the ______.

-it actually represents the ______in Y that is explained by X

-it is simple the ______of the Pearson correlation!

It automatically comes out in the SPSS regression analysis:

Going back to the poverty and pregnancy example:

The R is the Pearson correlation, and the R-squared is the coefficient of determination.

In this case, ____% of the variability in Teen Pregnancy Rates is explained by the Poverty rate.

… so this is an ______regression model.

From the alcohol and heart disease example:

so, __% of the variability in Deaths due to Heart Disease is explained by the consumption of alcohol.

Not as strong a regression model as the poverty/teen pregnancy model, but still ______.

Note: R in the regression output is the absolute value of the Pearson correlation coefficient … so you should still do a separate correlation analysis on this.

New Topics Covered Today

Interpreting slope and intercept

  • slope is the amount of change in Y for a 1 unit change in X
  • intercept is the value of Y when X is zero

Prediction

  • we can use the regression equation to generate potential values for Y when we only know an individual’s value of X

Coefficient of Determination

  • squaring the Pearson correlation coefficient tells us the percent of variability in Y which is explained by X

Reading:

Chapter 11 up to Inference in Regression

Stat203Page 1 of 22

Fall2011 – Week 11, Lecture 2