3
EXPLORING NORMAL DISTRIBUTIONS
Running head: EXPLORING NORMAL DISTRIBUTIONS 1
Exploring Normal Distributions:
Investigating behavior of sample means and sample standard deviations obtained by repeated sampling from normal population
Everlast Chigoba
Ridge View High School
Introduction
The purpose of this special problem is to investigate the behavior of the means and standard deviations of repeated samples of size 25 taken from normal distributions. The statistics software package, Minitab, has the ability to generate values of variables from various distributions, including normal distributions with specified mean and standard deviation. For this investigation I obtained 25 observations from N(100,15) distribution, and calculated the mean and standard deviation for these observations. I repeated this procedure twenty times and then studied the results.
Analysis
Procedure: In this investigation, I simulated randomly selecting a sample of 25 observations from a normal population N(100,15) distribution. Then I calculated various one variable statistics on the sample, including the sample mean and sample standard deviation. Following are some descriptive statistics from the first 2 samples:
Descriptive Statistics: 1, 2
Sum of
Variable Mean StDev Variance Squares Minimum Q1 Median Q3
1 102.77 14.20 201.70 268867.03 75.82 92.84 102.90 112.55
2 98.90 11.85 140.52 247893.56 78.74 91.06 97.40 105.87
Variable Maximum IQR
1 131.63 19.71
2 130.18 14.81
I repeated the procedure until I obtained 20 samples.
Next I entered the mean and standard deviations from the 20 samples into a Minitab worksheet and obtained the following data:
Descriptive Statistics: means
Variable Mean StDev Variance Minimum Q1 Median Q3 Maximum
means 99.448 3.211 10.313 94.470 97.208 99.265 102.460 105.300
Variable Range IQR
means 10.830 5.252
3
EXPLORING NORMAL DISTRIBUTIONS
Descriptive Statistics: 1, 2, 3, 4, 5, 6, 7, 8, ...
Variable Mean
1 102.77
2 98.90
3 100.53
4 97.55
5 99.71
6 97.18
7 102.13
8 102.04
9 97.17
10 94.49
11 103.45
12 97.29
13 98.19
14 105.30
15 94.48
16 102.57
17 98.03
18 94.47
19 103.08
20 99.63
Stem-and-Leaf Display: Mean
Stem-and-leaf of Mean N = 20
Leaf Unit = 0.10
3 94 444
3 95
3 96
7 97 1125
10 98 019
10 99 67
8 100 5
7 101
7 102 0157
3 103 04
1 104
1 105 3
3
EXPLORING NORMAL DISTRIBUTIONS
The histogram of the 20 sample means shows that the distribution is roughly symmetric with center at 99.45 and a standard deviation of 3.11. Compare this with the parent population N(100, 15). The means do not vary much from the center as expected. The normal probability plot is fairly linear, which suggests that the sample means are normally distributed.
Similar data for the standard deviations were as follows:
3
EXPLORING NORMAL DISTRIBUTIONS
Variable StDev
C1 14.20
C2 11.85
C3 16.44
C4 14.81
C5 12.47
C6 16.36
C7 16.38
C8 18.03
C9 19.06
C10 13.69
C11 14.68
C12 14.50
C13 18.32
C14 13.36
C15 14.21
C16 15.15
C17 12.07
C18 13.97
C19 13.51
C20 13.64
MTB > Stem-and-Leaf 'StDev'.
Stem-and-Leaf Display: StDev
Stem-and-leaf of StDev N = 20, Leaf Unit = 0.10
1 11 8
3 12 04
8 13 35669
(5) 14 22568
7 15 1
6 16 334
3 17
3 18 03
1 19 0
3
EXPLORING NORMAL DISTRIBUTIONS
The histogram of the sample standard deviations show similar results as the sample means, supporting the notion that the
distribution is roughly symmetric with center at 14.84 and a small standard deviation of 2.027.
Conclusion and Summary
Because the mean of the sample means (99.45) is very close the median (99.265) we judge that the distribution of the sample means is quite symmetric. The normal probability pot tells us that the distribution is approximately normal. Moreover, we note that the mean of the sample means is fairly close to the population mean of 100. The distribution of sample means has a fairly small spread.
We also determine that the distribution of the sample standard deviations is fairly symmetric as supported by the facts that the normal probability plot is fairly linear, the mean sample means standard deviation of 14.84 is very close to that of the population , 15.
We conclude that if you sample from a normal population, and then insect the center and spread of the derived distribution (of sample means), its center will be approximate the center of the host normal population, but its spread will be smaller.
All of this is quite remarkable because of the randomness of the selection of the observations from each sample. Put in other words, despite the randomness of the section process, there is a surprising amount of order and symmetry in the results.