I M a Fan of Old Movies and Watched One Once Called Damsel in Distress, Starring Fred Astaire

Three Intuitive Examples of p-values

At the request of some students, this is an attempt to explain the concept of p-value in a more intuitive manner than the way I commonly do in class. As you recall, the p-value may be defined in the following three ways:

1. It is the a at which we would be indifferent between accepting or rejecting the null hypothesis.

2. From the previous definition, we can arrive at the one we actually use to calculate the p-value. It is the area under the sampling distribution cut off by the test statistic.

3. Lastly, it may be defined as the likelihood of obtaining a sample result as extreme as the one actually obtained just by chance if the null hypothesis is actually true.

Example 1

I’m a fan of old movies and once watched one called “Damsel in Distress,” starring Fred Astaire, George Burns and Gracy Allen (of the Burns and Allen TV show, but you’re probably too young to remember) and Joan Fontaine. Two things struck me about Miss Fontaine. First, she was absolutely beautiful, and second, she looked familiar, even though I couldn’t recall seeing her before. As I thought about it, I realized that she bore a strong resemblance to Olivia deHavilland (she was Miss Melanie in “Gone with the Wind”), and wondered if they might be sisters. I was home at the time, wasn’t hooked up to the web, and fortunately it was too late to find a library open. However, we have a World Almanac in the house, and one of the neat things you can look up in the Almanac is birthplaces of famous personalities. I reasoned that if they were both born in the same city that would be pretty good evidence that they were in fact sisters.

Now, before going any further, this situation may be expressed as a test of hypotheses with the null and alternate being:

H0: Olivia deHavilland and Joan Fontaine are not sisters

Ha: they are sisters

The following reasoning is somewhat tortuous, but it’s exactly what we do in hypothesis testing: If we find that they aren’t born in the same city, that doesn’t mean they aren’t sisters, but it also provides no evidence that they are, so we have to accept H0, the null hypothesis. If they aren’t sisters, the probability is low that they would have been born in the same place. We can’t actually calculate this probability, but common sense tells us that it’s fairly low. This is a p-value! Now suppose we find that they were born in the same city. Estimating the p-value in this case would depend a lot on where they were born. If they were both born in New York City, for example, the p-value would be fairly low, but not extremely so, since lots of people are born in New York. We would probably reject the null, but we couldn’t be all that sure they’re sisters.

Guess what: they were both born in Tokyo, Japan. Think about that for a second. If H0 is true, it’s conceivable that they could both have been born in Tokyo, but extremely unlikely (unless they were Japanese, of course). This means the p-value for the test is extremely small, and our conclusion is that they are in fact sisters.

By the way, it doesn’t work for any two Americans who happen to have been born in Tokyo. If you randomly search the backgrounds of a large number of Americans and find two who were born in Tokyo, Warsaw, or some other unlikely place, that’s not particularly good evidence that they’re related. It is, however, if you have some idea ahead of time that they are.

I know, this is kind of a dumb example, but it really is a perfect illustration of a p-value. If you’re wondering, I looked it up on the web and they were in fact sisters. Their dad was a patent attorney in Tokyo at the time of their births. They didn’t get along very well, but sisters are like that.

Example 2

Let’s suppose someone is accused of stealing a car. The evidence against him is that he was caught speeding in the car after it was reported stolen. In addition, his prints were found on the club used to knock out the owner of the car. If that isn’t enough, a witness comes forward willing to testify he saw the defendant club the owner and drive off in the car. The null and alternate hypotheses are:

H0: the accused is innocent

Ha: he’s guilty

These are always the null and alternate in a jury trial in this country. The accused’s defense is that he bought the car unknowingly from the real thief. Also, the thief stole his club to commit the crime and that’s why his (the defendant’s) fingerprints are on it. His story on the witness is that he (the witness) is ticked off because he (the accused) had an affair with the witness’ wife.

Now it’s possible that one of these three explanations could be true, but it’s very unlikely that all three could. If they’re all true, the defendant must be about the unluckiest guy in the world. The probability of their all being true and the accused being innocent is very small, certainly beyond the reasonable doubt specified in jury trials. This probability is also a p-value, and even though we can’t calculate it, as in the first example, we know it’s very small and have to reject the null hypothesis and convict.

Example 3

You’re a gambler and your favorite game is flipping a coin to bet on the outcome. To simplify things, let’s suppose you always bet on tails. You seem to lose more often than you would think, so you suspect your gambling partner of playing with an unfair coin. To test this, you beat him up, take his coin, and toss it ten times. It turns up heads in all ten tosses. What’s your conclusion?

This is also a test of hypotheses as follows:

H0: the coin is fair

Ha: it’s unfair (weighted toward heads)

In more mathematical terms, if we let p represent the true proportion of heads, then:

H0: p = .5

Ha: p > .5

Unlike the other two examples, we can actually calculate the p-value in this case. Because successive coin flips are independent of one other, the probability of getting a head is .5 for every toss. Since we are asking for the joint probability of all ten tosses coming up heads, we can multiply .5 by itself ten times or raise .5 to the 10th power. So the probability of ten heads in a row is .510, or .00098. This is a p-value. Since this p-value is very low, you reject the null and conclude your friend is cheating.

The reasoning involved in rejecting the null goes like this: If the coin is fair (p = .5), the probability of getting ten heads or ten tails is very small (p-value = . 00098). Because it’s so small, we can conclude that the coin is, in fact, unfair and reject the null.

Two further comments: You can probably see that you would want to toss the coin more than just ten times, and the reason you would want to is to increase the power of the test. But the calculations are a little more involved, so I used this example to keep things simple. Also, and I can’t resist this, if you’re a gambler you probably won’t be able to follow either the math or the reasoning in the above analysis, but that’s the way it goes.