Name ______Date ______

AP Biology Mr. Collea

Probability & Chi Square: Goodness-of-Fit Test

Background Information:

Statistical analysis is one of the cornerstones of modern science. For instance, Mendel’s great insights about the behaviors of inherited factors were founded upon his understanding of mathematics and the laws of probability. Today, we still apply those mathematical principles to the analysis of genetic information, as well as to virtually any other kinds of numerical data which might be collected.

In this lab investigation, we will be examining one of these applications, the Chi Square (X2) Goodness-of-Fit Test. Collected data rarely conform exactly to prediction, so it is important to determine if the deviation between the expected values (based upon the hypothesis) and the actual results is SIGNIFICANT enough to discredit the original hypothesis. This need has led to the development of a variety of statistical devices (such as the chi square test) designed to challenge the collected data. We will be examining this procedure using several simple examples of hypotheses and data collection.

Remember that the purpose of this test is to determine if the actual results are different enough from the predicted results to suggest that the hypothesis is not correct.

A Note about probability: Probabilities are predictions. We make predictions of this kind all the time. For example, “There’s a fifty percent chance that baby will be a boy,” is a probability statement, based on the hypothesis that half of human births produce boys and half produce girls (which is in turn based upon understanding about X and Y chromosomes, and about sperm and eggs). In formal mathematical language, probabilities are expressed as decimals between zero (no chance) and one (certainty). So the prediction above would be expressed as “the probability for a boy is 0.5.” Expressed as a mathematical “sentence,” it would be P(boy) = 0.5.

The difficulty with working with probabilities is knowing when to conclude that an occurrence is NOT due to random chance. Values far from the mean in a distribution can occur, but will occur with low probability in accordance with the Normal Distribution Curve below.

We are therefore essentially testing the hypothesis that the observed data fit a particular distribution.

In a coin flip example, we're testing to see if our results fit those expected from the distribution of a fair coin. So we need to come up with a point at which we can conclude our results are definitely not part of the distribution we are testing. So when do you determine that a given data set no longer fits a distribution when random chance will always play a role? Well, you've got to make an arbitrary decision, and biologists/statisticians set precedent long ago. Given that 95% of the values in a distribution fall within two standard deviations of the mean (0.05 or 5% do not), statisticians have decided that if a result falls outside of this range, you can determine that your data does not fit the distribution you are testing. This essentially says that if your result has equal to or less than a 5% chance of belonging to a particular distribution, then you can conclude with 95% confidence/certainty (meaning outside the 2 standard deviation interval) that it is not a part of that distribution. As probabilities are listed as proportions, this means that a result is "statistically significant" if its occurrence (p-value) is ≤ 0.05.

This leads to our statistical "rule of thumb":

whenever a statistical test returns a probability value (or "p-value") equal to or less than 0.05 (95% confidence), we reject the hypothesis that our results fit the distribution we expect to get. The standard practice in such comparisons is to use a null hypothesis (written as "H0"), which states that there is NO difference and the data are not statistically significant and do fit the expected distribution.

H0 : The data fit the assigned distribution (no difference) with 95% `confidence and is not statistically

significant.

To practice your interpretation of p-values, decide if each of the p-values below indicates that you should accept or reject your null hypothesis by circling the correct answer.

A. p-value = 0.11 Accept or Reject H0?

B. p-value = 0.56 Accept or Reject H0?

C. p-value = 0.01 Accept or Reject H0?

D. p-value = 0.9 > 0.7 Accept or Reject H0?

E. p-value < 0.005 Accept or Reject H0 ?

Part I. The Coin Toss: A Case Study

After losing a close game in overtime, a local high school football coach accuses the officials of using a "loaded" coin during the pre-overtime coin toss. He claims that the coin was altered to come up heads when flipped, his opponents knew this, won the coin toss, and consequently won the game on their first possession in overtime. He wants the local high school athletic association to investigate the matter. You are assigned the task of determining if the coach's accusation stands up to scrutiny. Well, you know that a "fair" coin should land on heads 50% of the time, and on tails 50% of the time. So how can you test if the coin in question is doctored? If you flip it ten times and it comes up heads six times, does that validate the accusation? What if it comes up heads seven times? What about eight times? Does coming up heads five time prove that it ISN‘T a rigged coin? To make a conclusion, you need to know the probability of these occurrences.

To examine the potential outcomes of coin flipping, we will use a Binomial Distribution. This distribution describes the probabilities for events when you have two possible outcomes (heads or tails) and independent trials (one flip of the coin does not influence the next flip). The normal distribution for ten flips of a “fair” coin is shown below.

Note that ratio of 5 heads : 5 tails is the most probable, and the probabilities of other combinations decline as you approach greater numbers of heads or tails. The figure demonstrates two important points. One, it shows that the expected outcome is the most probable – in this case a 5 : 5 ratio of heads to tails. Two, it shows that unlikely events can happen due solely to random chance (e.g., getting 0 heads and 10 tails), but that they have a very low probability of occurring.

Also note that the binomial distribution is rather "jagged" when only ten coin flips are performed. As the number of trials (coin flips) increases, the shape of the distribution begins to smooth out and resemble a normal curve. Note how the shape of the curve with 50 trials is much smoother than the curve for 10 trials, and more representative of a normal curve as seen below.

These normal curves are often referred to as a - bell curve.

Normal curves are useful because they allow us to make statistical conclusions about the likelihood of being a certain distance from the center (mean) of the distribution. In a normal distribution, there are probabilities associated with differing distances from the mean. Recall from Algebra 2/Trigonometry that 68% of the values in a sample showing normal distribution are within one standard deviation of the mean, 95% of values are within two standard deviations of the mean, and 99% of the values are within three standard deviations of the mean.

So back to our coin test... It is comparing our result to the expected distribution of a fair coin. To test the coin, you opt to flip it 50 times, tally the number of heads and tails, and compare your results to the fair coin distribution.

Our null hypothesis (H0) would be as follows:

H0 : The coin is not unbalanced OR there is no difference between this coin and any other coin and we would expect an even number of heads and tails when flipped repeatedly with 95% confidence.

You obtain the results: 33 Heads / 17 Tails

So what does this mean? Referencing the distribution, we see that a ratio of 33 heads to 17 tails would only occur about less than 1% of the time if the coin were indeed fair. As this is less than 5% (p < 0.05), we can reject our hypothesis that the data fit the expected distribution and we discovered something about the coin.

In other words, we reject our null hypothesis with 95% confidence/certainty that the coin was fair and we would expect a 50% heads : 50% tails ratio. We were testing the distribution of a fair coin, so this suggests the coin was not fair, and the coach's accusation has merit. Again, by rejecting the null hypothesis were are discovering something about the coin which would indicate that further tests should be conducted or the number of trials (coin flips) should be increased so a more definitive conclusion could be reached. Either way…we discovered something about the coin!

Man, I love a good controversy...

Stating Conclusions

Once you have collected your data and analyzed them to get your p-value, you are ready to determine whether your hypothesis is supported or not OR whether we should accept or reject the null hypothesis. If the p-value in your analysis is 0.05 or less, then the data do not support your null hypothesis with 95% confidence that the observed results would be obtained due to chance alone.

So, as a scientist, you would state your "unacceptable" results in this way:

"The differences observed in the data were statistically significant at the 0.05 level."

You could then add a statement like,

"Therefore, with 95% confidence, the data do not support the hypothesis that..."

This is how a scientist would state "acceptable" results:

"The differences observed in the data were not statistically significant at the 0.05 level."

You could then add a statement like,

"Therefore, with 95% confidence, the data support the hypothesis that..."

And you will see that over and over again in the conclusions of research papers.


Chi-Square Analysis

The Chi-square is a statistical test that makes a comparison between the data collected in an experiment (observed) versus the data you expected to find. It can be used whenever you want to compare the differences between expected results and experimental or observed data.

Variability is always present in the real world. If you toss a coin 10 times, you will often get a result different than 5 heads and 5 tails. The Chi-Square test is a way to evaluate this variability to get an idea if the difference between real/observed and expected results are due to normal random chance, or if there is some other factor involved (like our unbalanced coin). The Chi-square test helps you to decide if the difference between your observed results and your expected results is probably due to random chance alone, or if there is some other factor influencing the results.

In other words, it determines what our p-value is!

The Chi-square test will not, in fact, prove or disprove if random chance is the only thing causing observed differences, but it will give an estimate of the likelihood that chance alone is at work.

Determining the Chi-square Value

Chi-square is calculated based on the formula below.

X2 = S

We will fill out a table for the first go around so you can get familiar with how to use it. Follow the following procedure to test the hypothesis that any given coin is even balanced and we would expect to get the same number of heads (50) and tails (50) when flipped 100 times.

Activity

1. State your null hypothesis for this activity below:

______

______

______

______

______

______

2. Each team of two students will toss a pair of coins exactly 100 times and record the results in Table 1. The only outcomes can be: both heads (H/H), one heads and one tails (H/T), or both tails (T/T). Based on the laws of probability (that we learned in math years ago), each of these have a 25%, 50%, and 25% chance of happening, respectively. Each team must check their results to be certain that they have exactly 100 tosses.

Table 1: Team Data for Coin Flip Test.

Toss / H/H / H/T / T/T / Toss / H/H / H/T / T/T / Toss / H/H / H/T / T/T
1 / 35 / 69
2 / 36 / 70
3 / 37 / 71
4 / 38 / 72
5 / 39 / 73
6 / 40 / 74
7 / 41 / 75
8 / 42 / 76
9 / 43 / 77
10 / 44 / 78
11 / 45 / 79
12 / 46 / 80
13 / 47 / 81
14 / 48 / 82
15 / 49 / 83
16 / 50 / 84
17 / 51 / 85
18 / 52 / 86
19 / 53 / 87
20 / 54 / 88
21 / 55 / 89
22 / 56 / 90
23 / 57 / 91
24 / 58 / 92
25 / 59 / 93
26 / 60 / 94
27 / 61 / 95
28 / 62 / 96
29 / 63 / 97
30 / 64 / 98
31 / 65 / 99
32 / 66 / 100
33 / 67
34 / 68 / TOTAL


Table 2: Class Data for Coin Flip Test.