Statistics 1
Agenda for February 15

·  Please get out your homework What’s the Correlation? for a short homework discussion and to be collected.

·  Give me your portfolio revisions in your manila folder.

·  There is no homework over break.

·  Extra help is available until 2:50 today. If you are behind in portfolio work, you need to come talk to me. I want you to do well in this course.


Homework Discussion

Diamonds is the recorder.
Whole Class Discussion—Line of Best Fit

I want you to come away from this class with the idea that you can use ideas of statistics to understand your world. Let’s apply that to something that is studied in the MSAN class at our school: The Achievement Gap.

What is it?

How can we use ideas of statistics to look at the achievement gap? The word “predict” is very important here. Bivariate statistics is all about making predictions. But “predict” is not always a good thing. One way to look at the achievement gap through the lens of statistics is to ask:

·  Does your race or income level predict your achievement in school? If it does, we have an achievement gap.

As a school system we are trying to get to the point where race and income are not predictors. It shouldn’t matter who you are. We should see no correlation between those variables.

But we do. The correlation with income is remarkably strong. Let’s see that in Fathom:

http://www.arps.org/users/hs/kochn/Statistics1/Unit5/SAT_Income.ftm

Notice the positive slope on the line. Notice how close the points are to that line. This shows a strong positive correlation between the two variables.

Where does that line come from? How does the computer draw it?

Let’s go back to Tuesday’s class activity on Which Line Fits the Better? You had some data points and you tried to fit a line to them with your ruler and pencil. How do we decide if you made a good line or not? We compare your line’s prediction for a given point against the actual value in the data set and we see how far off your prediction is. This difference, g(x) – f(x), is known as a residual. You computed all of the residuals for your model. We can use those residuals to decide how good a fit we have.

Who thinks they might have a pretty good model? What’s your equation?

Let’s look at your residuals:

X / f(x)
original data / j(x) / j(x)-f(x)
residuals
0 / -5
1 / -3
2 / -3
3 / 0
4 / 2
5 / 3
6 / 6
7 / 8
8 / 11
9 / 13
10 / 15
11 / 16
12 / 17
13 / 20

What do we hope to see with those residuals? What would be true of the residuals for the best possible model of this data?


Why do we want them to be small? Why do we want them to average to zero?

To see if they are small or not, we need to ignore the sign of each residual. We can use absolute value or squaring to help us do that.

Then we need to consider the role of outliers.

If we used absolute value instead of squaring, these two lines would get the same “score” in terms of goodness of fit. But the one on the left is really a much better fit.

http://www.arps.org/users/hs/kochn/Statistics1/Unit5/BestFitIntro.ftm

The actual formula looks like this. It looks scary, but it’s really just telling you to do what you are already doing.


Small Group Class Activity—

You will now do a small group activity, Pick The Winner, to apply what you have learned about the least-squares method.