Chapter 11 - Experimental Design and ANOVA
A. Basic Concepts of Experimental Design
1. response variable – this is your dependent variable.
2. factor(s) – these are the independent variables. We generally want to see how the factors affect or response variable. When we change the properties of a factor, we call it a treatment.
Example: Test score and study time and method. Test score is the dependent variable, while study time and method would be the factors. If we looked at study time over 30 min, 1hr, and 2hrs, then these would be our treatments.
Note: we can get data from either experiments or observational studies.
3. Designed vs. Randomized Experiment
a. designed – when the experimenter controls for how the treatments are applied to the items (or experimental units) in the study.
b. randomized – when the treatments are applied to the items/experimental units is a random fashion.
Note: if the treatments are applied in a random fashion we say that the response variables are independent of one another.
B. One Way ANOVA (analysis of variance)
Before when we performed tests of difference we only had two samples. Many times we don’t simply want to test for only the difference between 3 groups. In this case we generally can perform and ANOVA to test 3 or more groups when we want to compare means.
1. One factor One-Way ANOVA – a study in which we can test the effects of 1…p treatments on a response variable. We use ANOVA to test the difference between the means of the groups.
We will let μi & σi be the mean and variance of the response variable given treatment i, with i= 1…p.
2. Assumptions for One-Way ANOVA
a. Constant variance – we assume that each of the p treatments all have the same variances. So we assume σ1 = σ2 = ….= σp
b. normality – all of the populations with the treatments are assumed normal.
c. Independence – the items/experimental units must all be independent of one another.
**the assumption that is generally the most important is constant variances. If we don’t have this then we generally assume the ANOVA results are invalid and should not run the test. If the largest sample variance is not twice as large the smallest one we can generally assume this condition holds.
3. ANOVA Basics
a. When we perform the ANOVA the hypotheses are as follows
Ho: μ1 = μ2 = ….= μp (i.e. the means are equal or there is no difference in each treatment mean)
Ha: at least one of the means is different from one another (possibly more)
b. Two Types of Variability
i- Within treatments variability – this measures the variances that occurs within each treatment
ii. Between treatments variability – this measures the differences in measures that we see between the treatments (i.e. across the treatments)
**if these two types of variability are different from one another we have reasonable evidence to think that there is in fact a difference in the means of the treatments.
4. Sums of Squares – ANOVA Calculations
a. Treatment Sums of Squares – SST =
note: ,
Example: Suppose we wanted to test the effects of 3 study techniques on test scores. If we had treatments A, B, C with the following test scores.
Table 1: Test Scores With 3 Study Methods
Person\Treatment / A / B / C1 / 56 / 72 / 54
2 / 67 / 81 / 62
3 / 63 / 67 / 65
4 / 61 / 76 / 59
Avg / 61.75 / 74 / 60
= 65.25
SST = = 4(61.75 – 65.25)2 + 4(74 – 65.25)2 + 4(60 – 65.25)2 = 465.5
b. Error Sum of Squares –SSE =
***so what you are doing is taking each observation and subtracting it from its treatment mean.
Example: Continuing with table 1 above we find that:
SSE =
= [(56-61.75)2 + (67-61.75)2 + (63-61.75)2 + (61-61.75)2 ] + [(72-74)2 + (81-74)2 + (67-74)2 + (76-74)2 ]+
[(54-60)2 + (62-60)2 + (65-60)2 + (59-60)2 ] = 194.75
c. Sums of Squares Total – SSTO = SST + SSE
so if we use our previous examples then our overall sums of squares or SSTotal = 194.75 +465.5 = 660.25
5. Mean Square – ANOVA Calculations – this is going back and taking your sums of squares values and dividing them by your degrees of freedom with each measure.
a. Treatment Means Square –MST = SST /( p-1); where p = number of treatments
b. Error Mean Square – MSE = SSE/ (n-p); where n is your number of observations in each treatment
So from your previous example we have the following:
MST = 465.5 / 2 = 232.75
MSE = 194.75 / (12-3) = 194.75/9≈ 21.64
6. Hypothesis Test for ANOVA
a. Set up Hypotheses – use the same ones mentioned above
b. Find your critical values – for ANOVA we use the F values and use table A.5-A.8. Make sure to note the level of significance
c. Find your test Stat: Here our test Stat is F = MST/MSE
d. Conclusion: If your test stat lies in the tail then reject Ho and conclude at least one mean is different from the others. If not, then there is not enough evidence to suggest the means are different.
Example: Use Table 1 above and the values obtained in the previous sections to test if there is a significant difference in mean test score with the 3 study methods. Test at the 5% level of significance.
(1) Ho: μA = μB =μC (ie there is no difference in the mean test score for each study method)
Ha: At least one of the means is different
(2) Critical Value: Taken from table A.6 with p-1 = 2 numerator degrees of freedom and n-p =3 denominator degrees of freedom – 5.46
(3) Test Stat – F = MST/MSE = 232.75/64.92 ≈10.76
(4) Conclusion – Since 10.76 >5.46 (i.e. it is in the tail) we fail reject Ho and conclude there is enough evidence to suggest that there is a difference in test scores between the 3 study methods.
7. ANOVA Table – we can formalize all the previous information in a table format as follows:
Source of Variation / Degrees of Freedom / Sum of Squares / Mean Square / F-StatTreatments / p-1 / SST / MST / MST/MSE
Error / n-p / SSE / MSE
Total / n-1 / SSTO
7. Pairwise Comparison: If we find that there is a significant difference with at least one of the means (i.e. we do in fact reject Ho) then we can run a pairwise Tukey’s Test to find out which means are different. So if we think some are different from one another we can actually go back and test which ones. We will not cover this in class, but be aware of what test you would run and what you can do.
For example: If we did reject Ho above and we wanted to see which of the study methods mean test score was different we could compare the averages between method A, B, & C as follows:
Compare A vs. B, A vs. C and B vs. C to see which ones are different.
C. Randomized Block Design
Suppose that we thought there might actually be differences that occur within the experimental units themselves. Then a common technique is to use a block design rather than a completely randomized design. If we do this we can test to see if there is a difference in both the blocks and the treatments.
For example: In the previous example we wanted to see if there were differences in the study methods. Suppose we still thought there were, but because we had a randomized design and had different people taking the tests we got faulty results. What we could do is block of the person and have them take a test using 3 study methods. So a ‘block’ is essentially the experimenter holding some part of the experiment constant so there is no unmeasured variation that occurs. One important result that can come from this is understanding that sometimes the experimental units (people, animals, plants, land, etc..) can have different results occurring as we use different treatments.
When performing this analysis we not only need the treatment means, but also the block means.
1. Calculation of each sum of square
Note: xij – the observation for treatment i and block j
= the mean for treatment I; there are b values because there are b blocks
= the mean of block j; there are p values since there are p treatments
a. Sums of Square Total – SSTO = SSB + SST + SSE = SSTO = Σ Σ(xij - where SST and SSE are the same as before, but SSB is the sum of squares due to blocks.
b. SST =
c. SSB = p
2. ANOVA Table- We can apply the same technique as before and construct an ANOVA table. The hypothesis test we run is exactly the same as before except now we can use the F-stat from the block and the treatments to test to see if there is a difference in the blocks and if there is a difference between the treatments. We can also run a Tukey’s pairwise test should we find there is at least one difference in the means.
Source of Variation / Degrees of Freedom / Sum of Squares / Mean Square / F-StatTreatments / p-1 / SST / MST = SST/(p-1) / MST/MSE
Blocks / b-1 / SSB / MSB = SSB/(b-1) / MSB/MSE
Error / (p-1)(b-1) / SSE / MSE = SSE/[(p-1)(b-1)]
Total / pb-1 / SSTO
3. Example using Table 1 – now we will extend our table above to also test to see if there is a difference in the people when they use the different study methods.
Table 2: Test Scores With 3 Study Methods and Blocks
Person\Treatment / A / B / C / Block Avg1 / 56 / 72 / 54 / 60.67
2 / 67 / 81 / 62 / 70
3 / 63 / 67 / 65 / 65
4 / 61 / 76 / 59 / 65.3
Treatment Avg / 61.75 / 74 / 60 / 65.25
p = number of treatments and b = number of blocks (i.e. people)
SST = = 4[ (61.75-65.25)2+(74-65.25)2+ (60-65.25)2] = 4(195.75) ≈ 783
SSB = p = 3[ (60.67-65.25)2+(70-65.25)2+ (65-65.25)2+ (65.3-65.25)2] = 3(43.60) ≈130.8
SSTO = Σ Σ(xij - = [ (56-65.25)2+ (72-65.25)2+ (54-65.25)2+ ….+ (61-65.25)2+ (76-65.25)2+ (59-65.25)2] ≈ 1056
**this is simply summing the difference of each value from the overall mean and squaring it
So SSTO = SST + SSB + SSE SSE = SSTO – SST – SSB = 1056 – 130.8 -783 = 142.2
**a good way to check your work is to note that SSE can never be negative since it is a squared value. So if you got a negative value here you made a mistake.
So putting our results into the table we find that:
Table 3:
Source of Variation / Degrees of Freedom / Sum of Squares / Mean Square / F-StatTreatments / p-1 = 3-1 = 2 / SST = 783 / MST = SST/(p-1) = 783/2 = 391.5 / MST/MSE = 391.5/23.7 ≈ 16.52
Blocks / b-1 = 4-1=3 / SSB = 130.8 / MSB = SSB/(b-1) = 43.6 / MSB/MSE = 43.6/23.7 ≈ 1.84
Error / (p-1)(b-1) / SSE = 142.2 / MSE = SSE/[(p-1)(b-1)] = 142.2/6=23.7
Total / pb-1 / SSTO = 1056
If we go through the entire hypothesis procedure for blocks using α=0.05 once again we have the following:
BLOCKS:
(1) Ho: μ1= μ2=μ3 = μ4 (i.e. there is no difference in the mean test score for each person)
Ha: At least one of the means is different
(2) Critical Value: Taken from table A.6 with b-1 = 3 numerator degrees of freedom and (b-1)(p-1) =6 denominator degrees of freedom – 4.76
(3) Test Stat – F = MSB/MSE = 43.6/23.7 ≈1.84
(4) Conclusion – Since 1.844.76 (i.e. it is not in the tail) we fail to reject Ho and conclude there is not enough evidence to suggest that there is any difference in test scores between the 4people.
****But now if we do the same thing for treatments as before making sure to block out the variation in each person we find the following for the differences in treatments (i.e. the study methods)
TREATMENTS:
(1) Ho: μA = μB =μC (i.e. there is no difference in the mean test score for each study method)
Ha: At least one of the means is different
(2) Critical Value: Taken from table A.6 with p-1 = 2 numerator degrees of freedom and (b-1)(p-1) =6 denominator degrees of freedom – 5.14
(3) Test Stat – F= MST/MSE = 391.5/23.7 ≈ 16.52
(4) Conclusion – Since 16.52 > 5.14 (i.e. it is located very far along in the tail) we reject Ho and conclude there is enough evidence to suggest that there is a difference in test scores between the 3 study methods with 95% confidence. So at least one of the study methods is different from the others.
We could now run a pairwise test (i.e. Tukey’s pairwise procedure) to see which one is different.
But we see that it is important to hold for block differences if we suspect variation may come from them. Without holding it constant we found that the study methods produced a statistically different result. After we did control for the fact that there may be some block variation we found out that the study methods did indeed have significant results, but even greater than previously thought given our larger test stat, which would result in a smaller p-value.
1