7 - THE NORMAL DISTRIBUTION

Examples: Alpha fetoprotein levels of mothers carrying a fetus with spina bifida.

Smooth bell shaped symmetric curve is

called the Normal p.d.f. curve or just the Normal curve.

If a random variable, X, has a Normal distribution with a mean and a standard deviation we write:

The Normal distribution is important because:

·  it fits a lot of data reasonably well;

·  it can be used to approximate other distributions

·  it is an important assumption in statistical inference (see later work.)

Shape is solely determined by two parameters, and , the population meancontrols where the normal is centered, and the population standard deviationcontrols the spread about .

Example: Alpha fetoprotein levels found in the urine of mothers carrying a foetus with spina bifida.

Let X = alpha fetoprotein level in the urine of a mother carrying a foetus with spina bifida. We will assume that alpha fetoprotein levels have a normal distribution.

The sample mean AFP level,moles/liter and the sample standard deviation, s = 3.92 moles/liter. These are the sample-based estimates of _____ and ______respectively.

Approximately ______% of the mothers in this population will have AFP levels within 1 standard deviation of the mean, i.e. we estimate that approximately ______% of this population of mothers will have AFP levels:

between ______and ______

= between ______and ______

Diagram here:

.

Approximately ______% of the mothers in this population will have AFP levels within 2 standard deviation of the mean, i.e. we estimate that approximately ______% of this population of mothers will have AFP levels:

between ______and ______

= between ______and ______

Approximately ______% of the mothers in this population will have AFP levels within 3 standard deviation of the mean, i.e. we estimate that approximately ______% of this population of mothers will have AFP levels:

between ______and ______

= between ______and ______

For the Normal Distribution:

A random observation has approximately: 68% chance of falling within 1 of;

95% chance of falling within 2 of ;

99.7% chance of falling within 3 of.

or

In a normal distribution, approximately: 68% of observations are within 1 of;

95% of observations are within 2 of;

99.7% of observations are within 3 of.

OBTAINING PROBABILITES

To find probabilities associated with a normal distribution with mean m and standard deviation s we need to have a mechanism for finding areas beneath the normal curve. Because there are infinitely many mean and standard deviations we might be interested in we need a standard process by which we can find areas associated with any normal distribution!

The Standard Normal Distribution and Using the Standard Normal Table

Fact: If X ~ N(, ) then if we define a new random variable then Z ~ N(0,1), i.e. we create a new random variable Z where the observed values of Z are the z-scores for the random variable X.

Recall the process of converting a random variable X to z-scores is called standardization. Once standardized, we can find probabilities/areas of interest using a standard normal table.

The standard normal table in the appendix of most texts gives P(Z z), i.e. lower tail probabilities for a standard normal distribution (shaded). We can also use the Normal Probability Calculator in JMP in the Tutorials section of website.

Most tables give shaded area = P(Z z)

Basic method for obtaining probabilities

1.  Sketch a Normal curve, marking on the mean and values of interest.

2.  Shade the area under the curve corresponding to the required probability.

3.  Convert all values in original scale to their corresponding z-scores.

4.  Obtain the desired probability from the upper-tail areas provided by a standard normal table.

Z = standard normal random variable
Z ~Normal(0,1).

Find the following standard normal probabilities:

a) P(Z > 2.25)

b) P(Z < 1.28)

c) P(Z > .50)

d) P(Z < -2.33)

e) P(-1.96 < Z < 1.96)

h) Find z so that P(Z < z) = .90, i.e. what is the 90th percentile of the standard normal distribution?

Spina Bifida Example (continued)

X = AFP level of a randomly selected mother carrying a foetus with spina bifida . Lets assume that

X~Normal (m =23.05, s = 4.08) using the sample mean and sample standard deviation.

Find the following:

a) P(X < 15.00)

b) P(X < 20.00)

c) P(20.00 < X < 25.00)

d) P(X > 30.00)

e) Find the 90th percentile.

f) Find the 25th percentile

Original Problem: Spina Bifida Diagnosis

Recall: For normal foetuses m =15.73, s = 0.72 and for foetuses with spina bifida m = 23.05 and s = 4.08. Assume the threshold for detecting spina bifida is set at 17.8. (A foetus would be diagnosed as not having spina bifida if the fetoprotein level is below 17.8)

a)  What is the probability that a foetus suffering from spina bifida is correctly diagnosed? Incorrectly diagnosed?

b)  What is the probability that a foetus without spina bifida is correctly diagnosed? Incorrectly diagnosed?

c) If they wanted to ensure that 99% of foetuses with spina bifida were correctly diagnosed, at what level
should they set T?

Standard Normal Table – P(Z z)

Table for negative z-scores, i.e. z < 0

Standard Normal Table – P(Z z)

Table for positive z-scores, i.e. z 0

5