Module II

Probability

Recall that our eventual goal in this course is to go from the random sample to the population. The theory that allows for this transition is the theory of probability.


Words that are usually come top mind when you hear the word “probability”

Chance

Likelihood

Uncertainty

Odds

Gambling

Risk

Precipitation Probability

There are three ways in which to define probability.


Definition One

Axiomatic Probability

Define an “experiment” as a situation with an uncertain outcome.

For example: You flip a fair coin.

You role a fair die.

You pick a card from a well-shuffled deck.

Define an “outcome” of an experiment as one of the possible things that could occur.

For example:

Experiment Outcomes

Flip a Coin Heads or Tails

Role a Die 1,2,3,4,5, or 6

Pick a Card Get a Heart, Spade, Club,

or Diamond

or Pick an Ace, King, Queen, Jack, 10, 9,etc

To define the probability of an outcome of an experiment using the Axiomatic Definition, you follow a three step process:

1) List all of the possible outcomes (say there are K of them);

2) Assume each outcome is as likely as any other outcome;

3) Assign the probability (1/K) to each possible outcome.


In our cases then, we would have:

Experiment -- Flip a coin

Outcome Probability

Heads .5 = ½

Tails .5

Experiment -- Role a die

Outcome Probability

1 .1667 = 1/6

2 .1667

3 .1667

4 .1667

5 .1667

6 .1667

Experiment -- Pick a Card from a well shuffled deck

Outcome Probability

Hearts .25 = ¼

Spades .25

Clubs .25

Diamonds .25

or

Outcome Probability

Ace .077 = 1/13

King .077

Queen .077

Jack .077

Ten .077

Nine .077

Eight .077

Seven .077

Six .077

Five .077

Four .077

Three .077

Two .077

To obtain the probability of more complicated situations (called events) one simply adds together the probabilities of the outcomes which make up the event.

For example, the probability of rolling an even number can be found by realizing that an even number is either a 2, a 4 or a 6. Therefore:

Probability (even number) = 1/6 + 1/6 + 1/6

= 3/6 = ½ = .5

In the card experiment,

Probability (Red Suit) = ¼ + ¼ = ½ =.5

since Diamonds and Hearts are both red suits.

Some events require a bit more thinking, for example what is the Probability that we pick a King or pick a Spade?

The difficulty here is that the two lists we have constructed for the card picking experiment are insufficient to answer the question for if we simply added the probability of picking a King ( = 1/13) plus the probability of picking a Spade

( = ¼) we would overestimate the answer. This is true since if you picked the King of Spades it would be part of both outcomes and thus you would count it twice. To solve this problem, you would need to construct a new list with 52 outcomes (each possible suit and rank) and then add up the possibilities to obtain a probability of 16/52 or .3077. We shall make this procedure more explicit later.

Other deeper problems exist with the Axiomatic Method. For example what is the probability that it will rain tomorrow? Using the Axiomatic method, we could say that the two possible out comes are either Rain or not Rain and assign each of them a probability of .5. Clearly there is not a .5 chance of rain each day (unless you live in Seattle).

Also, the axiomatic definition can sometimes be interpreted in several ways. For example consider the following experiment:

I have two coins, one is a double-headed coin and the other is a regular coin with a head and a tail. I mix them up without looking, and pick one of them, place it down on a desk and observe that a head is showing. What is the probability that the other side of the coin is a head?

Consider the following three arguments:

1) The probability is .5 since it is either the double-headed coin or it is not;

2) The probability is .75 since three time out of four a side will be heads:

3) The probability is 2/3 since the double headed coin has twice the chance of showing a head face up.

All of these arguments are “logical” but only one of them is correct. Before you go to the next page pick the one you think makes the most sense to you.


If you picked 2/3 you are correct, and if you picked one of the other answers you probably still don’t believe it is true.

How does one resolve a situation like the above or apply probability to events like rain or no rain. The answer is to use the Frequency Definition of Probability.


Definition Two

Probability as Frequency of Occurrence

The way to resolve which of the three arguments given above is to actually perform the experiment a large number of times and see which of three probabilities is correct by measuring how often the event occurred. That is, we could take a coin and mark both sides as heads and then also select a normal coin. We place them into a cup and shake the cup and pick a coin out at random. We would then place the coin on the desk and observe which side was up. If a tail showed, we would stop and repeat the experiment, if a head was showing we would look at the other side and record the number of times we got heads. By dividing this number by the number of times heads showed, we would get the proportion of times that the other side of the coin was a head. This process is called simulation and forms the basis of the frequency definition of probability.

Let A be an event, and assume that you have performed an experiment n times so that n is the number of times A could have occurred. Further let nA be the number of times that A did occur. Then define:

P(A) = nA / n.


It is possible to criticize this definition by pointing out that if I flipped a coin 100 times and counted 52 heads, I would say that the probability of heads is .52. If you then flipped the same coin 100 times and got 47 heads, you would say the probability of heads is .47. How can the same coin have different probabilities of coming up heads?

Notice that if I pooled my results and your results, we would have 99 heads in 200 flips giving a probability of heads of .495. By flipping the coin a larger and larger number of times (i.e. increasing n), it can be shown that the absolute deviation from .5 (assuming the coin is “fair”) would get smaller and smaller as n increased. Therefore using the concept of the limit from Calculus, this objection could be removed by the definition:


From this definition of Probability we see immediately that for any event A,

since it is impossible for A to occur less that 0% of the time, and it is impossible for A to occur more than 100 % of the time.

Now if you wished to find the Probability of Rain, you could look through meteorological records to find days with similar weather conditions and determine the proportion of times it rained on such days.

Similarly, you wouldn’t have to assume that a coin was fair, you could flip it many times and see if moved toward a probability of heads of .5.

Of course physical simulation of experiments takes a great deal of time. Most of the time it is done by computer.


Computer Simulation

In order to simulate a random situation on the computer, we must have the computer perform the same steps as would be performed if you were simulating the situation physically. Consider the steps involved in physically simulating the double headed coin problem:

1) Pick one of the two coins at random

2) Determine if a head is showing when you put it on the desk

3) Determine if the other side of the coin is also a head.

The computer must simulate each one of the above steps. One way to do this is to use the RAND() command which we used in EXCEL when we were taking a random sample. Since RAND() generates pseudo-random numbers which are approximately equi-probable on the range of 0 to 1, we could establish the rule that if the random number is less than .5 then the two headed coin is picked, otherwise the regular coin is picked. This would accomplish Step 1 above.

To perform step2 realize that if the two headed coin is picked, then it will always show a head. On the other hand, if the regular coin is picked then there is only a .5 chance that a head will show so we would have to generate another random number to decide if the head or tail was showing. We could use the rule that if the second random number is less than or equal to .5, a head will show, otherwise it would be a tail

Step 3 is automatic since if a head is showing, only the double-headed coin has the flip side as a head.


Notice that except for the RAND() command, the rest of the simulation is a series of “if” statements.

EXCEL also has a command called “IF” it has the form:

=IF(condition, result if true, result if false)

Notice that there are three arguments inside the parentheses. The first is the condition we are evaluating. For example in order to pick a coin, we would put in the condition

RAND()<=.5

If it is true, we will put a “1” in to indicate that the double headed coin was picked, otherwise if it is false we will put a “0” to indicate that it is the regular coin .

The whole statement would look like this:

=IF(RAND()<=.5, 1, 0).

Open up an EXCEL file and put this statement into cell B7.


For the second step (is a head showing), we now want to determine if ahead is showing. We can use the “IF” statement again. For the condition we use:

B7 = 1

If this is true, then it is the double headed coin and a head is showing. Indicate this with a “1”.

If it is false, then a head will show only half the time, so we will put in the following result in the false area:

IF(RAND()<=.5,1,0).

You will notice that this is exactly the command we used in cell B7 because again we are simulating an event with a probability of occurrence of .5. The whole statement would look like this:

=IF(B7=1, 1, IF(RAND()<=.5, 1, 0)) .

Type the above line into cell D7. Now copy cells B7, C7 and D7 down the sheet for one hundred rows. This is equivalent to doing the picking 100 times.


You should observe something like the following (which I have reproduced from EXCEL worksheet “simhead.xls” which can be found in folder MBA Part II).


Now add up all the values in Column B and Column C using the EXCEL command “SUM”. Enter the following command in cell B108:

=SUM(B7:B106)

This will add the 100 values you entered in column B. Enter a similar sum in Column D using the expression:

=SUM(D7:D106).

The result should look something like the following:


The value “84” is the sum of the values in column D and indicates that 84 times out of 100, the side showing was a head.

The value “56” is the sum of the values in column B and indicates that 56 times out of 100 we picked the two-headed coin.

Therefore of the 84 time a head was showing, 56 times it was the two-headed coin and the flip side of the head showing was a head. This is a proportion of:

56 / 84 = .666666.

Therefore this is our estimate of the probability that the other side of the coin showing a head, is also a head.

The graph below indicates how the probability that the other side of the coin is a head given that a head is showing approaches its correct value as the number of trials increases from 1 to 100. You will notice that when n is small, the value of the probability varies from its correct value, but as n gets large it approaches its correct limiting value of .66667.


The result does not always come out exactly equal to 2/3. On your worksheet press the F9 key (i.e. the F9 function key usually on the top row of your lap-top). You will notice that all the numbers change and you get new sums. Pressing that one key is equivalent to performing the experiment 100 more times. I did this 50 times (equivalent to performing the simulation 5,000 times) with the following results summarized in histogram form:


As you can see I never got a value of .5, and rarely went over .75. The average probability was .657 and the median was .667. This clearly indicates that the correct answer is .667.


Although the frequency definition is very useful and simulation allows one to quickly get answers without a great deal of symbolic manipulation, the definition is still not complete.