1

Analysis of Variance (Anova) Overview

What We Already Know

The z test and the t test are inferential statistics. The z test is used to compare a sample mean with a population mean. The example that we discussed in class compared the anxiety scores of a sample of 20 combat veterans to anxiety scores from the general population (making the unlikely assumption that combat vets are a random sample of the general population). The t test, which is in widespread use, is used to compare one sample mean with another sample mean. For the z test, the null hypothesis (H0) is that the sample mean was drawn from the general population (e.g., μVETS=μGP) – or whatever population is relevant for the problem that is being studied. For the t test, the null hypothesis (H0) is that the two sample means were drawn from the same underlying population (i.e., μ1=μ2). The z and t test formulas are almost identical.

z = x̅-μ / σx̅

t= x̅1-x̅2 / sx̅1-x̅2 (We use sx̅1-x̅2 instead of σx̅1-x̅2 because it is estimated

from sample data; i.e., it is not a population parameter.)

Both statistics are fractions, and in both cases the numerator is the difference between the two means that are being compared (x̅-μ or x̅1-x̅2). In both cases, the denominator is a measure of sampling error (σx̅ orsx̅1-x̅2).In both cases, the measure of sampling error is the standard deviation of the underlying theoretical sampling experiment that we discussed in class; i.e., the sampling distribution of means (z test) or the sampling distribution of mean differences (x̅1-x̅2). (These are the histograms that we talked about in class with the yellow pad, blue science pen, etc. As we discussed, these sampling experiments do not actually need to be done since math nerds figured out how they would to turn out if they were done.) These sampling distributions tell us directly how likely it is that sampling errors of different sizes will occur. For example, in one of the examples we did in class, we got a sample mean of 110 from a sample of 20 vets, while the mean for the general population is 100 (σ= 15, σx̅ = σ/√n = 15/4.47 = 3.35).

z = x̅-μ / σx̅ = (110-100)/3.35 = 10/3.35 = 2.98

The value that was calculated is an ordinary z score, and when the probability is looked up in a table, you find that the chance of getting a sample mean of 110 or greater from the null distribution (a distribution in which μ=100) is very low (0.0014 or 0.14%), meaning that it is unlikely that the sample mean of 110 came from the general population. This, in turn, means that we reject H0, with a probability of error of 0.14%.

The t-test uses the same logic, but adjusted for the fact that two sample means are being compared. (See your lecture notes.)Notice that both the z test and the t test are fractions. In both cases, the number on top (numerator) reflects how different the two means are (x̅-μ orx̅1-x̅2) while the denominator is a direct measure of sampling error (σx̅ or sx̅1-x̅2)

Anova

The logic underlying analysis of variance (Anova) is the same as it is for the z test and the t test, although it may not appear that way at first. An important limitation of the t test is that it can only be used to compare two sample means. Suppose you want to compare the scores of 1st, 3rd, and 5th graders on some task, or you want to compare normal hearing listeners with hearing aid users with cochlear implant users? Anova was developed for this purpose, and it is capable of comparing anynumber of sample means.

We begin by comparing the sample frequency distributions A and B at the top of the last page. Each of the histograms show highly idealized sample distributions for three groups. Notice that in example A, the three sample distributions have means that are quite close to one another. However, in example B the three means are much better separated. The other thing to notice is that the variability within each of the sample distributions is the same. For which of these examples, A or B, is the Null least likely to be true? We should agree that it is B; i.e., it is less likely that three samples in B – with well separated means – came from an underlying population with a single population mean (that’s what H0says for the Anova case: μ1 = μ2 = μ3 … = μn; all of the sample means were drawn from the same underlying population).For the z and t tests, we measured how different the means were just by subtracting (x̅-μ or x̅1- x̅2). Simple subtraction will not work when there are three or more means being compared.

Many solutions to this problem might be imagined, but the solution that is used in Anova is to calculate something called between-group variance (s2b), which simply tells us whether the three (or four or five or …) means are either all packed together or spread apart. s2bis a completely ordinary variance calculation – the same s2 formula we learned earlier in the semester – to calculate the variance using the three (or four or five or …) sample means as data. The only difference between the s2 we learned earlier and s2b is that we use the n sample means as data rather than ordinary numbers. The basic idea is simple: If the means (no matter how many there are) and very different from one another (e.g., x̅1=10, x̅2=50, x̅3=90), s2b will be a large number, exactly analogous to a large x̅1 -x̅2 in a t test. On the other hand, if the three means are very similar to one another (e.g., x̅1=49, x̅2=50, x̅3=52), s2b will be a smaller number. That is pretty much it for s2b.

With those ideas in mind, which of the histogram examples on the last page, A or B, will have the larger s2b?**(see footnote below)

______

*For reasons that we don’t need to worry about, in Anova the variance rather than the standard deviation is used to measure variability/sampling error.

**Answer:B; i.e., the three means are more spread out in B than A, where they are all clumped close to one another.

At this point, we have found a numerator for Anova that corresponds exactly to the numerators in the t and z tests (x̅-μ and x̅1- x̅2), except that it is not limited to just two means. All else being equal, the larger s2b is the less likely it is that H0 is true.

Now turn your attention to examples C and D on the last page. The three sample distributions in C have exactly the same means as the three sample distributions in D; i.e., in each case, the three sample means are 99, 100, and 102. Because of this, the three sample distributions in D overlap more than the three sample distributions in C; i.e., even with the same three means for C and D, the three sample distributions are more distinct from one another in C than D. This is because the variability within the groups is greater in D than C. Thevariability within the groupswill become the denominator in Anova. It is called s2w, or within-group variance, and the calculation does exactly that. It is an ordinary variance calculation. You begin calculating variance using the scores in group 1 and the mean of group 1, then you continue the calculation using the scores in group 2 and the mean of group 2, then you continue the calculation using the scores in group 3 and the mean of group 3. That’s all there is.

*********

Note: I will not ask you to calculate either s2w ors2b. I will give you these values. However, you should have a basic understanding of what these calculations are doing in the Anova formula.

*********

Remember again that z and tare the same sort of fraction: numerator=difference between the means, denominator=sampling error. The F ratio that is calculated in Anova is the same sort of thing: numerator=s2b, denominator=s2w. Why is s2w a measure of sampling error? There are two factors that affect sampling error: sample size and the variability in the population that you’re sampling from. s2w captures the variability factor, except that it is estimated from sample statistics rather than population parameters (that’s why we use s2w instead of σ2w). The only thing that is left is to take sample size into account, but that will be done when we look up the F ratio in a table, which will have different threshold F ratios (aka critical values) for different sample sizes. (As we saw for the t test, sample sizes are actually represented as degrees of freedom rather than n., but that’s a technical nicety that has little to do with the fundamental ideas.)

The bottom line is that t and F are variations on the z score, which measures the number of standard deviation units from the mean of the distributions (e.g., z = 1.5 means 1.5 sd units above the mean). A t value of 1.5 means 1.5 sd units above the mean, where the mean is the center of the Null distribution (μ1 = μ2 or, equivalently, μ1-μ2 = 0). An F value from Anova is also like a z score, except that the variance is used as the unit instead of the standard deviation (e.g., an F value of 4.0 means 4 variance units – rather than sd units – above the mean of the Null distribution, which says that all of the sample means came from the same underlying population).

Example and Anova Table Lookup

A lip-reading test is given to normal-hearing subjects (n=22), hearing-impaired subjects (n=20) with pure-tone averages between 75 and 85 dB, and deaf subjects with pure-tone averages above 100 dB (n=21). Between-group variance (s2b) = 80, within-group variance (s2w) = 23. Is there a significant effect for hearing status (normal hearing vs. hearing impaired vs. deaf)?

Step 1: F = s2b / s2w = 80/23 = 3.48

Step 2: Look the F value up in an Anova table. Unlike the t test, there are two df values for Anova: the first df value (df1) is associated with the number of groups (3 in this example) and the second df value is the first df value is associated with the number of subjects (22+20+21 = 63). The system is simple:

df1 = k-1, where k = the number of groups

df2 = n-k, where n is the total number of subjects

For this example:

df1 = k-1 = 3-1 = 2

df2 = n-k = 63-3 = 60

In Anova lingo, df = 2,60

The table look-up is pretty straightforward: (1) find the column with the df1 value of 2, then go the row with the df2 value of 60. You should find a threshold value (aka critical value) of 3.15. the F value that you calculated (3.48) needs to be at least that large. It is.

Step 3: Make a decision: (a) reject H0, or (b) fail to reject H0.

The F ratio needs to be equal to or greater than the threshold value in the table, so we reject H0, with a probability of error that is less than 0.05 (p < 0.05 in stats lingo). Equivalently, the effect of hearing status is significant – not likely to be due to chance. In a journal article, the Anova findings will be summarized as “F (2,60) = 3.48, p < 0.05”, with the “2,60” being the two df values.

------

Addendum

Imagine that all of the facts are the same as the example above, except that between-group variance (s2b) = 60 and within-group variance (s2w) = 21. What, if anything, changes?

F = s2b/ s2w =60/21 = 2.86

We still look to the Anova table for column 2, row 60 (df1, df2). The threshold value, of course, is still 3.15. What’s the decision? Our F value of 2.86 is not equal to or greater than the threshold value, so we: (a) accept H0, (b) prove H0, or (c) fail to reject H0? It’s that last one: we fail to reject H0. Equivalently, the effect of hearing status is not significant.Have we proven the Null? No.

What z, t, and F Have in Common

z, t, and F (Anova) are all fractions; t and F are variations on a z score. In all three cases, the numerator measures the how different the means are.

z numerator: x̅-μ

t numerator: x̅1- x̅2

F numerator: s2b(ordinary s2 calculated using the samplemeans as data)

In all three cases, the denominator measures sampling error.

z denominator (std error of the mean): σx̅ (= σ/√n)

t denominator (std error of the diffbetw means): sx̅1-x̅2(= a long thing we didn’t discuss)

F denominator (within-group variance: s2w (ordinary s2 calculated within the groups)

So, all three formulas have the same form:

difference between (or among) the means

z, t, F = ______

sampling error