Introduction to Probability (ASW Chapter 4)

MGMT 201: Statistics

Introduction to Probability (ASW Chapter 4)

What is probability?

Probability is a numerical representation of the likelihood that something will occur.
We typically specify the probability space so that the probability lies between zero and one (inclusive).
A probability of zero indicates that the event is impossible.
A probability of one indicates that the event is certain.
A probability of, say, 0.5 indicates that if we could repeat the circumstance over and over again, the event would occur half the time.
Note that zero probability events can occur! In fact, they occur all the time. For example, what is the probability that the temperature outside will be 46.181284706520961520975 tomorrow at 1:00 PM? When we consider that we could measure the temperature out to an infinite number of decimals, it becomes clear that the probability that any given temperature will occur is zero. Yet tomorrow at 1:00 PM some temperature will occur. This apparent contradiction is an intuitive contradiction but is not a mathematical contradiction.

Defining a Probability Space

A probability space is simply a description of all possible events along with the probabilities of their occurrence.
example: Rolling a die

Event / Probability
1 / 1/6
2 / 1/6
3 / 1/6
4 / 1/6
5 / 1/6
6 / 1/6

When conducting an experiment, we refer to the set of all possible events as the sample space. A particular event is called a sample point. Above, the set {1,2,3,4,5,6} is the sample space and a roll of 4, for example, is a sample point.

A Simple Case: Counting the Outcomes

When events are equally likely to occur, counting is very useful. In fact, we can simply count the number of ways an event can occur and divide by the total number of ways that all events can occur. This will be the probability.
example: What is the probability of drawing two consecutive aces from a deck of cards?
# of ways to draw two aces:

1 / Spades, Hearts
2 / Spades, Diamonds
3 / Spades, Clubs
4 / Hearts, Diamonds
5 / Hearts, Clubs
6 / Diamonds, Clubs

# of ways to draw two cards?
There are 52 ways to draw the first card and 51 ways to draw the second card given that we have drawn the first card (picture a table with the first draw on one axis and the second on the other). But….half of these ways are duplicates. One way, for example, would involve drawing the 3 of spades followed by the 4 of diamonds. A second way is to draw them in the reverse order.
Total ways = 5251/2 = 1326.
Probability of drawing two aces = 6/1326 = 0.004525. Said differently, roughly 4.5 times out of every thousand will result in two aces.

In general,
The symbol ‘!’ represents the factorial: k! = k(k-1)(k-2)…321.
By definition, 0! = 1.
In the previous example, the number of ways to draw two cards = 52!/(2!50!) = 1326.
We typical would use the phrase “N choose n is 1326” to describe the calculation.
In other cases, tree diagrams are useful in counting.
example: Suppose we are interested in developing a new product. We begin with a development phase in which the cost of the producing the product is determined. Based on that cost, we must decide whether to market the product to everyone, to targeted markets, or to license the technology. The consumer response to the product will then determine our profits. Suppose we provide general classifications as follows
cost  {high, low}
response  {strong, mediocre, weak}
Event Tree:

We could then assign probabilities to each of the branches along with profits to the final outcomes. This would allow us to make the appropriate decision.
How many outcomes are possible? Counting, we see that there are 18. The simple rule is to multiply the number of branches at each stage. We begin with two branches. Each of them can branch three ways. Each of them can branch three ways. So, 233 = 18.
The important point, here, is for us to be logical in our approach to situations. Notice that we have complete control over the marketing decision, but not the other two. Suppose…
…there is a 40% probability of the product being a high cost item and a 60% probability of low cost.
…there is a 30% chance of a strong consumer response, a 40% chance of a mediocre response, and a 30% chance of a weak response.
…the profit projections are as follows:

If we must commit to our marketing strategy today, what should we do?
One approach is to calculate the expected profits from a given action. We will consider this problem in chapter 5.

Probabilities

At this point, we want to be more precise in our definitions of a sample point and of an event.
sample point  a distinct individual outcome.
An implication of the definition is that a sample point cannot be subdivided.
For example, consider the case where we draw an ace from a deck of cards. “Ace” is not typically a sample point because there are four aces in the deck. The “ace of spades” is a sample point because it cannot be broken down any further.
event  a set of sample points
An event may consist of any number of sample points, including one. Drawing an ace is an event, as is drawing the ace of spades.
Another distinction, here, is that nature determines the sample points, so to speak, while we choose events based on what we are interested in examining.
Because sample points are distinct (i.e., non-overlapping), the probability of any event is equal to the sum of the probabilities of the sample points in the event.
Determining the Probability Space
Classical Method: When each outcome is equally likely, the probability of getting 1 of the n possible outcomes is 1/n. (e.g., roll of die)
Relative Frequency Method: When the outcome likelihoods are unknown , we can use sample data (assuming we can get it) to estimate the probabilities. We simply create a relative frequency distribution and use the values given there.
Subjective Method: When all else fails, we can take an educated guess.
Rules
P(A) + P(Ac) = 1, where Ac is the complement of A. That is, Ac includes all of the sample points that are not in A.
The addition law: P(AB) = P(A) + P(B) – P(AB), where  denotes “union” and  denotes “intersection”.
AB is the set of all sample points that are in A or B.
AB is that set of all sample points that are in A and B.
e.g. Suppose we consider the roll of a die and define A = {1,2,3} and B={2,3,4}.
P(A) = 1/6 + 1/6 + 1/6 = ½.
P(B) = 1/6 + 1/6 + 1/6 = ½.
We are tempted to say that the probability of A or B is ½ + ½ =1. If we do this, we are double-counting the probabilities of “2” and “3”. So, we must subtract them to obtain 1 – 1/6 – 1/6 = 4/6 = 2/3. That is the logic behind the addition law.
Mathematicians often use Venn diagrams to depict situations. The diagrams are quite useful for complex situations.
Consider the last example: We draw a rectangle to depict the sample space and circles (or other shapes if need be) to depict events. In this case, we draw the circles so that they overlap. This indicates that AB . Here,  is the empty set. When we say that the intersection of A and B is not equal to the empty set, we mean that there are sample points in A that are also in B.

The overlapping area depicts AB. The area of any region depicts the probability of something in that region occurring.

If we have mutually exclusive events, AB =  and the Venn diagram is drawn as follows:

Since A and B do not overlap, we know that AB = .

Conditional Probabilities
We are interested in estimating the probability that one event will occur given that another has occurred. For example, on election night, we probably all wondered how likely it was that Bush would win given that Pennsylvania had been won by Gore. Mathematically, we would express this as P(AB) and say “the probability of A given B”. Here, A={Bush wins election} and B={Gore wins Pennsylvania}.
Graphically, once we know that B has occurred, we can eliminate the portion of A that does not intersect with B

Now that we have eliminated the portion of A that is not in B, we see that the probability of getting A given that B occurs is P(AB)/P(B).
This gives us the rule for conditional probabilities: P(AB) = P(AB)/P(B) or P(AB) = P(B)P(AB). This is the multiplication rule for conditional probabilities.
This makes sense intuitively. Consider our die example above and suppose I told you that B has occurred. That is, either a 2, 3, or 4 was rolled. What is the probability that A occurred? For A to have occurred, either a 2 or a 3 must have been rolled, so the probability that A occurred must be 2/3 (two out of three chances). Using our notation above, P(AB) = 1/3 and P(B) = 1/2, so P(AB)/P(B) = 2/3.

Conditional Probabilities for Independent Events
Now, suppose that A and B are independent. This specifically means that anything we learn when B occurs tells us nothing about the likelihood that A will occur.
Said differently, P(AB) = P(A) for independent events.
Note that this does not mean that AB = . In fact, A and B must overlap if they are independent. The area of the overlap is precisely the amount needed so that P(AB)/P(B) = P(A).
Bayes’ Rule (Theorem)
Above, we conditioned on another event occurring. In another context, we might want to calculate a posterior probability based on a prior probability belief.
The basic idea is the following. Suppose there are two ways for something to occur. You believe there is a 10% chance that the first way will occur and a 30% chance that the second will occur. If that something does happen, what is the probability that it occurred the second way? It seems reasonable that the probability would be 75% (i.e., 30/(10+30)). This is the basis for Bayes’ Rule.
How would this be depicted with Venn diagrams?
Said differently, if there are two possible “ways” that B can occur, P(A1|B) = P(A1B)/(P(A1B)+ P(A2B)).
From our multiplication rule for conditional probabilities, we know that P(A1B) = P(A1)P(B|A1) and P(A2B) = P(A2)P(B|A2). Substituting gives
This is Bayes’ Rule for two events.
Bayes’ Rule for more than two events? Suppose that there are three ways that something can occur. The first way occurs with probability 10%, the second with probability 20%, and the third with probability 30%. Given that the “something” has occurred, what is the probability that it was the third way? Again, it seems reasonable that it is 30%/(10%+20%+30%) = 0.5. This is the logic behind the general Bayes’ Rule:
Here, n is the number of possible ways that an event can occur.
example: Suppose you work for a bank that issues home mortgages. We will call borrowers who repay their loans “good” and those who default “bad”.
Suppose you screen an applicant and he passes (i.e., the screening suggests that he is good). What is the probability that he is a good borrower?
Additional information
Historical evidence suggests that if you do no screening, 76% of the population will repay the loan.
You have a screening process that accurately identifies good borrowers with probability 98% and accurately identifies bad with probability 80%.
Said differently, we know that…
P(g) = 0.76 and P(b) = 0.24
P(p|g) = 0.98 and P(f|g) = 0.02.
I.e., the probability that the borrower will pass the screening given that he is good is 0.98.
P(p|b) = 0.20 and P(f|b) = 0.8.
I.e., 20% of bad borrowers will pass the screening.
The bottom line is that we want to update our probability assessment based on the new information we have learned (he passed the screen).  We are interested in P(g|p).
Using Bayes’ Rule,
So, we are 93.9% confident that the borrower will repay the loan.
We say that our prior belief was that the client was good with probability 0.76. Our posterior belief is that he is good with probability 0.939.