NORMAL DISTRIBUTION

Sets of data where:

- most measurements are near the middle

- there are a few extreme values above and below the mean

Normal distributions are described by the mean and standard deviation.

The Normal Distribution Curve

Therefore:

- 68% of the data lies within 1 standard deviation of the mean. It’s likely or probable data will fall into this region.

- 95% of the data lies within 2 standard deviations of the mean. It’s very likely or very probable data will fall into this region.

- 99% of the data lies within 3 standard deviations of the mean. It’s almost certain data will fall into this region.

e.g.The weights of school bags of normally distributed with a mean of 4.2 kg with a standard deviation of 0.9 kg.

a) What percentage of bags weigh between 2.4 kg and 6.0 kg?95%

b) What percentage of bags weigh more than 3.3 kg?34 + 50 = 84%

c) Below what weight will a bag almost certainly be?4.2 + 0.9 × 3 = 6.9 kg

d) 400 bags are weighed. Estimate how many will weigh less

than 2.4 kg?

-2.4 is 2 standard deviations below the mean, therefore, 2.5% (half of 5%) of the bags will be estimated to weigh less than 2.4 kg.

-2.5% of 400: 2.5 ÷ 100 × 400 = 10 bags

The Standard Normal Distribution

An ordinary normal curve has a mean of and standard deviation of

To convert an ordinary normal measurement (x) to a standard normal measurement (z) use:

The z-value tells us how many standard deviations x is away from the mean.

The Standard Normal Curve

Standard Normal Tables

e.g.Calculate the following probabilities:

a) P(0 < Z < 1.32)= 0.4066 (from the table)

Symmetrical properties of the curve can be used to calculate other probabilities:

b) P(-1.32 < Z < 0)c) P(Z > -1.32)

P(-1.32 < Z < 1.32) = 0.4066P(Z > -1.32) = 0.4066 + 0.5

(as curve is symmetrical about = 0.9066

the middle)

d) P(Z < -1.32)e) P(Z < 1.32)

P(Z < -1.32) = 0.5 – 0.4066P(Z < 1.32) = 0.5 + 0.4066

= 0.0934= 0.9066

- Using the differences columns

e.g.Find P(0 < Z < 0.473) = 0.1808 + 0.011

= 0.1819

Simple Applications

e.g.A distribution X has a mean of 50 and standard deviation of 12.

Calculate the probability that X has a value less than 70.

e.g.A flock of two-year old sheep have a mean weight of 32.5 kg and a standard deviation of 5.5 kg. Find the probability of a randomly selected sheep being:

a) between 32.5 kg and 40 kg

b) over 40 kg.

c) In the flock there are 400 sheep. How many would you expect to weight under 37 kg?

Inverse Normal

- using a known probability to calculate a z-value or x-value

e.g.Calculate the value of k for which P(0 < Z < k) = 0.3485

k = 1.03

- again the differences column may be needed

e.g.Calculate the value of k for which P(0 < Z < k) = 0.165

k = 0.426 (using probabilities of 0.1628 + 0.0022)

Inverse Normal Applications

e.g.The weights of school bags of normally distributed with a mean of 4.2 kg with a standard deviation of 0.9 kg.

A bag is considered to be heavy if it is in the top 10% of weights. What weight represents this top 10%?

P(Xxheavy) = 0.1

Sums and Differences of Normally Distributed Random Variables

The following results apply:

E(T) = E(X + Y) = E(X) + E(Y)

And, if X and Y are independent,

VAR(T) = VAR(X + Y) = VAR(X) + VAR(Y)

e.g. Containers are loaded onto a truck in pairs from a stockpile. The weights are normally distributed, with mean 2000 kg, standard deviation 200 kg. A truck is licensed to carry a total load of 4500 kg. If two containers are chosen at random and loaded onto the truck, what is the probability that the truck is overloaded?

E(T) = 2000 + 2000VAR(T) = 2002 + 2002

= 4000 kgSD(T) = √80000

= 282.84 kg

P(X > 4500) = P(Z > 4500 – 4000)

282.84

= P(Z > 1.768)

= 0.5 – 0.4614

= 0.0386

Harder examples involve linear combinations of random variables. The expectation results that apply here are:

E(aX + bY) = aE(X) + bE(Y)

And, if X and Y are independent,

VAR(aX + bY) = a2VAR(X) + b2VAR(Y)

Note: usually a and b are prices but if question does not involve prices, we do not square a and b

e.g.

A delicatessen has two products on special – it sells Parmesan cheese for $16 per kg, and sundried tomatoes for $12 per kg. The demands each day for these products are independent and are normally distributed.

The mean weight of Parmesan cheese sold is 4.45 kg, with s.d. 0.61 kg.

The mean weight of sundried tomatoes sold is 3.79 kg, with s.d. 0.88 kg.

Calculate the probability that more than $120 is spent on these two products one day.

P = weight of ParmesanS = weight of tomatoesM = money spent

E(M) = E(16P + 12S)VAR(M) = VAR(16P + 12S)

= 16E(P) + 12E(S) = 162VAR(P) + 122VAR(S)

= 16 × 4.45 + 12 × 3.79 = 256 × 0.612 + 144 × 0.882

= $116.68 = 206.77

SD(M) = √206.77

= $14.38

P(M > 120) = P(Z > 120 – 116.68)

14.38

= P(Z > 0.231)

= 0.5 – 0.0914

= 0.4086

BINOMIAL DISTRIBUTION

Formula: P(X =x) = nCr πx(1 – π)n – x

  • n and p (or π) uniquely determine the distribution
  • The parameters for the binomial distribution are n and p

Criteria for the Binomial Distribution of X

1.Fixed number of trials

2.Two outcomes (e.g. T/F, S/F, Y/N)

3.Trials are independent

4.The probability π (or p) of success remains fixed for each trial

Examples Using the Tables

1.When Debbie plays Colin at squash, she has a probability of 0.4 of winning any particular game.

a) What is the probability she wins 3 out of 4 games they play?

n = 4, x = 3, π = 0.4

P(X = 3) = 0.1536

b) What is the probability that she wins at least 1 game?

P(X ≥ 1) = 1 – P(X = 1)

= 1 – 0.1296

= 0.8704

2.In 15 throws of a die, what is the probability of exactly 4 sixes turning up?n = 15, x = 4, π = 1/6

P(X = 4) = 0.1418

3.Hospital records show that of patients suffering from a certain complaint, 75% die of it. What is the probability that, of 6 randomly selected patients, all will recover?

Probability patient recovers = 0.25

n = 6, x = 6, π = 0.25

P(X = 6) = 0.0002

Examples Using the Formula

1.X is a binomial random variable with parameters n = 5 and p = 0.2.

Find P(X = 3). P(X = 3) = 5C3 × (0.2)3 × (0.8)2

= 0.0512

2.What proportion of families with exactly 6 children should be expected to have at least 3 boys?

P(X ≥ 3) = 6C3×(0.5)3×(0.5)3 + 6C4×(0.5)4×(0.5)2 + 6C5×(0.5)5×(0.5)1 + 6C6×(0.5)6×(0.5)0

= 0.3125 + 0.234375 + 0.09375 + 0.015625

= 0.65625

Probabilities of Success Greater Than 0.5

Need to change successes to failures.

e.g.X is a binomial random variable with parameters n = 10 and p = 0.7.

Find P(X = 4)

Use n = 10, p = 0.3 and find P(X = 6)

Off table: P(X = 6) = 0.0368

Examples of Cumulative Probabilities

1.If n = 10 and p = 0.5, Find P(X ≥ 3)

Off table: P(X ≥ 3) = 1 – P(X < 3)

= 1 – (0.0439 + 0.0098 + 0.0010)

= 0.9453

2.A marksman hits a bull with prob. = 0.2. If he fires 10 shots, what is the probability:

a) At least 4 bulls are scored.

P(X ≥ 4) = 1 – P(X < 4)

= 1 – (0.2013 + 0.3023 + 0.2684 + 0.1074)

= 0.1209

b) Less than 8 shots do not hit the target.

P(X ≥ 3) = 1 – P(X < 3)

= 1 – (0.3020 + 0.2864 + 0.1074)

= 0.3222

Inverse Problems

In a quality control, 10 items are selected randomly. Experience shows that approx. 60% of such samples contain no defectives.

Find as closely as possible, the probability that any randomly chosen item is defective

P(X = 0) = 0.6, n = 10

10C0 × p0 × q10-0 = 0.6

q10 = 0.6

q = 0.9502

Therefore p = 1 – 0.9502

P = 0.0498

P(chosen item is defective) = 0.0498

Mean and Variance for the Binomial Distribution

For a binomial random variable X with parameters n and p:

Note: q = 1 – p

1.The mean or expected value

2.The variance

3.The standard deviation

e.g.A gardener plants 50 corn seeds, each of which have an individual probability of germinating equal to 0.75. Find the mean and standard deviation of the number that germinate.

Mean = 50 × 0.75

= 37.5

s.d. = √(50 × 0.75 × 0.25)

= 3.062 (3 d.p.)

POISSON DISTRIBUTION

  • is often known as the distribution of rare events.
  • process is one in which discrete events occur in a continuous, but finite interval of time or space.

Criteria for the Poisson Distribution of X

1.For a small interval the probability of the event occurring is proportional to the size of the interval.

2.The events must not occur simultaneously.

3.Each event must occur independently.

4.The event must occur at random.

e.g.The number of phone calls through the school office in one hour.

The number of typing errors per page.

The number of red corpuscles per ml of blood.

Formula:

Let X be the number of successes occurring in a given time period or a specified region.

where x = 0,1,2,3,… and is the mean number of events occurring per interval

Note: Theoretically there is no upper limit on x, but as

Examples Using Table

1.The owner of a garage finds that on average, 8 cars per hour arrive during weekdays. What is the probability that, during a randomly chosen 15 minute period

a) no cars arrive?

λ = 2 (as there are four 15 minute periods per hour)

P(X = 0) = 0.1353

b) at least 3 cars arrive?

P(X ≥ 3) = 1 – P(X ≤ 2)

= 1 – 0.6767

= 0.3233

2.The average number of field mice per hectare in a wheat field is estimated to be 10. Find the probability that a given hectare contains more than 15 mice.

λ = 10

P(X > 15) = 0.0217 + 0.0128 + 0.0071 + 0.0037 + 0.0019 + 0.0009 +

0.0004 + 0.0002 + 0.0001

= 0.0488

Example Using Formula

e.g.Calculate probability for previous question (Q1a) using formula

i.e. p(0, 2)

P(X = 0) = e-2 × 20

0!

= 0.1353

Inverse Problem

i.e.given the probability find

e.g.In samples of milk taken from a bulk transportation vehicle, 40% proved to contain no bacterial spores. Estimate the mean number of spores per sample, and hence find the probability of a randomly chosen sample containing 2 spores.

λ = unknown

P(X = 0) = 0.4

e-λ × λ0 = 0.4

0!

e-λ = 0.4

-λ = -0.916

λ = 0.916

Therefore the mean number of spores is 0.916

Mean and Variance for the Poisson Distribution

The mean and variance for the Poisson distribution is , therefore

Approximating One Distribution With Another

Continuity Corrections

Note: Used when approximating situations that are really discrete (from the Binomial and Poisson distributions) or when data is rounded to the nearest whole number value.

e.g.Heights of students are measured to the nearest cm. Heights are normally distributed with mean 162 cm and s.d. 4 cm.

Find the probability that a randomly selected student measures:

a) less than 160 cm

P(X < 159.5) = 0.26598552 (using calculator)

b) at least 165 cm

P(X > 164.5) = 0.26598552 (using calculator)

1.Normal Approximation to the Binomial Distribution

To approximate, n should be large and p should not be too close to 0 or 1.

The conditions that ensure this are:

np ≥ 5 AND nq ≥ 5 (or n(1 – p) ≥ 5)

To use the normal distribution, we need:

e.g.Mary is absent from class 40% of the time. Her teacher checks the register on 25 randomly chosen dates. What is the probability that on less than 6 of these, Mary was absent.

n = 25, p = 0.4, q = 0.6

µ = 25 × 0.4σ = √(25 × 0.4 × 0.6)

= 10 = 2.45

P(X < 6) = P(X < 5.5) (need to use continuity correction as

approximating a discrete distribution with a continuous one)

= 0.03312453 (using calculator)

2.Normal Approximation to the Poisson

Used to approximate when the mean of the Poisson distribution is large (say greater than 15) and several individual items need to be added together.

This situation will not arise often as the Poisson is a ‘rare events’ distribution.

To use the normal distribution we need:

Remember the need for continuity correction

e.g.A cobalt solution when pumped into a patient, emits 20 millirads per second on average. Find the probability that a random dose of the solution contains more than 17 millirads per second.

µ = λ = 20σ = √λ = √20

P(X > 17) = P(X > 17.5) (using continuity correction)

= 0.71192493

3.Poisson Approximation to the Binomial

Used to approximate when p is very small.

To use the Poisson distribution we need:

Note: If p is very large then the problem can be rewritten as one of failure.

e.g.2% of the items in a mass production process are defective. What is the probability that a randomly chosen sample of 200 items contains 3 defectives?

λ = n × p

= 200 × 0.02

= 4

P(X = 3) = 0.19536681 (using calculator)

SUMMARY OF DISTRIBUTIONS

Approx. when p is small

(close to 0)

or q (1-p) is small

Approx. when n is large

np ≥ 5

nq ≥ 5 (or n(1 – p) ≥ 5)

Approx. when > 15

Note: A continuity correction is required when moving from a discrete distribution to a continuous distribution