4/19/99

1. Reminder - Quiz Friday
Ch 11- The Normal distribution

(Omit Sec 52 for the quiz)

Ch 12 – Success/failure trials when

sampling without replacement-

The Hypergeometric model

Ch 13- The Poisson model

Sec 60, 61, and

example 1, Sec 62.

2. Revised Problem List for Chapter 12:

# 6, 8-11, 13, 24, 25, 31-33

3. More applications of HYPG.

Applications of HYPG model

1. Comparison of proportions:

Example (i) Compare the effectiveness of two treatments

for a rare blood disease.

The experiment: 15 patients were used:

7 were randomly assigned to treatment 1

8 remaining received treatment 2

Observed results:

cured uncured
tr #1 4 3 / 7
tr #2 1 7 / 8
5 10 / 15

Problem: test the hypothesis that the two treatments are

equally effective.

Note: The sample sizes are too small for either the normal approximation or the chi-squared test (Ch 15).

Let = prob of cure for treatment i.

To test : = (or £ ) vs : . (one-sided alt.)

Reasoning: If is true, then the 5 cures and 10 non-cures would have occurred – no matter how the patients were allocated. So the resulting allocation was a random sample of size 7 from a population of size 15 with two types of items:

5 cures and 10 non-cures.

The test statistic we will use is X the number of cured patients in the treatment 1 group: will be rejected if X is too large. The observation is X =4.

Under , X has a HYPG dist: P[X=k] = .

To decide if X = 4 is ‘large’ we calculate P[X ³ 4] under :

P[X ³ 4] = P[ X=4] + P[ X=5] =

= = .10.

There’s a 10% chance of getting a result this extreme. This may not be small enough to reject .

Example (ii) A discrimination problem.

90 officers took a test for advancement:

26 had spanish surnames, the rest, 64, did not.

passed failed
hisp 3 23 / 26
others 14 50 / 64
17 73 / 90

The percentage of hispanic officers who passed was .115.

The percentage of others who passed was .219, about double.

If the test is unbiased, the number who passed would be randomly allocated ethnically. Does the data support a claim of bias ?

We use the same method: X is the number that passed among the hispanic officers, i.e. X=3.

Calculate P[ X £ 3] under the assumption of random selection:

P[ X £ 3] =

= (after tedious calculations) .200.

Not nearly low enough to support a claim of bias.

2. Lotteries:

(i) Lotto:

You choose 6 integers from the set {1,2,..49}

Your choices are the ‘special items’.

6 integers are then randomly drawn from {1,..,49}

X is the number the two sets yours/theirs have in common.

You win big if X=6: P[X=6] = =

= 7.15x

Becomes more attractive when the pot builds up.

(ii) Daily: You choose a 3-digit number – one is chosen at random – you win if they match. P[Win] = 1/1000.

Or, choose 4, etc. If you win, you win more if you

match 4 than if you match 3. P[Win] =1/10000.

(i) and (ii) illustrate Bernoulli trials: each digit is S or F.

3. An odd legal problem:

Story: Police made large cocaine bust – 496 packets alleged cocaine. Conviction of traffickers requires proof that packets contained drug. Police lab tested 4 at random – all positive – got conviction. End of part I.

Part II. Police decided to use remaining 490 packets in a sting operation. 2 were randomly selected – sold by police to (new)

defendant. Between sale and arrest, defendant got rid of the evidence.

Q. Beyond reasonable doubt – did defendant buy cocaine?

Historical note: Some busted collections contain both positive and negative packets.

Defense arguments: (i) possible inside heist – cocaine replaced

by inert powder – not a statistical issue –

didn’t fly.

(ii) Scenario re population values: suppose

N positive, M negative to begin with

Then- 4 out of 4 from the N positive for 1st draw

2 out of 2 from M negative for 2nd draw.

Conclusion: 4 out of 6 were positive – suggests 2/3 +, 1/3 –

or, N=331 and M=165. Then, P[++++--] = .022.

This is not ‘beyond a reasonable doubt’

Prosecution countered: proposed experiment with the remaining 490 packets:

Choose a new sample of size s at random. The value

of s was such that if all packets in the new sample were

cocaine, then the probability of the sequence suggested

by the defense would be < 1/1000.

The mathematics to find s was similar to the defense’s – but longer – required several iterations with a computer program.

It happened – found guilty – virtue triumphs.