Statistics 510: Notes 12

Statistics 510: Notes 12

Statistics 510: Notes 12

Reading: Sections 4.8-4.9

I. Review: Poisson Distribution

Arises in two settings:

(1) Poisson distribution provides an approximation to the binomial distribution when n is large, p is small and is moderate.

(2) Poisson distribution is used to model the number of events that occur in a time period t when

(a) the probability of an event occurring in a given small time period is approximately proportional to .

(b) the probability of two or more events occurring in a given small time period is much smaller than .

(c) the number of events occurring in two non-overlapping time periods are independent

When (a), (b) and (c) are satisfied, the number of events occurring in a time period thas a Poisson () distribution. The parameter is called the rate of the Poisson distribution; is the mean number of events that occur in a time period of length 1. The mean number of events that occur in a time period of length t is and the variance of the number of events is also .

Sketch of proof for Poisson distribution under (a)-(c):

For a large value of n, we can divide the time period t into n nonoverlapping intervals of length . The number of events occurring in time period t is then approximately Binomial . Using the Poisson approximation to the binomial, the number of events occurring in time period t is approximately Poisson =Poisson (). Taking the limit as yields the result.

Number of events occurring in space: The Poisson distribution also applies to the number of events occurring in space. Instead of intervals of length t, we have domains of area or volume t. Assumptions (a)-(c) become:

(a’) the probability of an event occurring in a given small region of area or volume h is approximately proportional to .

(b’) the probability of two or more events occurring in a given small region of area or volumeis much smaller than

(c’) the number of events occurring in two non-overlapping regions are independent

The parameter for a Poisson distribution for the number of events occurring in space is called the intensity.

Example 1: During World War II, London was heavily bombed by V-2 guided ballistic rockets. These rockets, luckily, were not particularly accurate at hitting targets. The number of direct hits in the southern section of London has been analyzed by splitting the area up into 576 sectors measuring one quarter of a square kilometer each. The average number of direct hits per sector was 0.9323. The fit of a Poisson distribution with to the observed frequencies is excellent:

Hits / Actual Frequency / Expected Frequency (576*P(X=k hits)) where X~Poisson(0.9323)
0 / 229 / 226.74
1 / 211 / 211.39
2 / 93 / 98.54
3 / 35 / 30.62
4 / 7 / 7.14
5 or more / 1 / 1.57

R Commands for Poisson distribution:

The command rpois(n,lambda) simulates n Poisson random variables with parameter lambda.

The command dpois(x,lambda) computes the probability that a Poisson random variable with parameter lambda equals x.

The command ppois(x,lambda) computes the probability that a Poisson random variable with parameter lambda is less than or equal to x.

II. Geometric Random Variable (Section 4.8.1)

Suppose that independent trials, each having a probability p, , of being a success, are performed until a success occurs. Let X be the random variable that denotes the number of trials required. The probability mass function of X is

(1.1)

The pmf follows because in order for X to equal n, it is necessary and sufficient that the first n-1 trials are failures and the nth trial is a success.

A random variable that has the pmf (1.1) is called a geometric random variable with parameter p.

The expected value and variance of a geometric (p) random variable are

.

Example 2: A fair die is tossed. What is the probability that the first six occurs on the fourth roll? What is the expected number of tosses needed to toss the first six?

III. Negative Binomial Distribution (Section 4.8.2)

Suppose that independent trials, each having a probability p, , of being a success, are performed until r successes occur. Let X be the random variable that denotes the number of trials required. The probability mass function of X is

(1.2)

A random variable whose pmf is given by (1.2) is called a negative binomial random variable with parameters .

Note that the geometric random variable is a negative binomial random variable with parameters .

The expected value and variance of a negative binomial random variable are

Example 3: Suppose that an underground military installation is fortified to the extent that it can withstand up to four direct hits from air-to-surface missiles and still function. Enemy aircraft can score direct hits with these particular missiles with probability 0.7. Assume all firings are independent. What is the probability that a plane will require fewer than 8 shots to destroy the installation? What is the expected number of shots required to destroy the installation?

IV. Hypergeometric Random Variables (Section 4.8.3)

Suppose that a sample of size n is to be chosen randomly (without replacement) from an urn containing N balls, of which are white and are black. If we let X be the random variable that denotes the number of white balls selected, then

(1.3)

A random variable X whose pmf is given by (1.3) is said to be a hypergeometric random variable with parameters .

The expected value and variance of a hypergeometric random variable with parameters are

.

Example 4: A Scrabble set consists of 54 consonants and 44 vowels. What is the probability that your initial draw (of seven letters) will be all consonants? six consonants and one vowel? five consonants and two vowels?

V. Zeta (or Zipf) distribution

A random variable is said to have a zeta (sometimes called the Zipf) distribution with parameter if its probability mass function is given by

for some value of .

Since the sum of the foregoing probabilities must equal 1, it follows that

The zeta distribution has been used to model the distribution of family incomes.

VI. The Cumulative Distribution Function (Section 4.9)

The cumulative distribution function (CDF) of a random variable X is the function .

Example 5: Let X denote the number of aces a poker player receives in a five card hand. Graph the cdf of X.

All probability questions about X can be answered in terms of the cdf F. For example,

.

This can be seen by writing the event as the union of the mutually exclusive events and . That is,

so

.

The probability that can be computed as

For the justification of the second equality, see Section 2.6 on the continuity property of probability.

1