Rectangularity: Sampling Distribution of the Sample Mean
Purpose:
This activity is intended to illustrate properties of the sampling distribution of a sample mean
The Population of Rectangles Sheet shows a population of size 100 consisting of rectangles of varying areas. Each square counts as one unit towards a rectangle’s area. The true average (mean) area of the rectangles in the population is6.26. The true standard deviation of the areas of the rectangles in the population is If we did not know and wished to estimate it, we could draw a simple random sample of rectangles from the population and use the mean area of the sampled rectangles to estimate The sample mean,will vary from sample to sample. The distribution of thevalues for many simple random samples of size n is called the sampling distribution of the statistic
Instructions:
0. Enter the population of 100 rectangle areas into R by the command:
> areas=c(3,10,1,6,1,14,1,1,8,3,1,22,15,6,3,2,5,1,12,10,1,3,8,2,3,
21,8,1,6,8,2,18,1,6,6,14,2,4,9,1,1,5,12,1,2,1,3,4,1,2,3,
18,12,7,24,1,6,3,1,4,8,8,6,3,1,2,7,1,10,1,24,1,4,20,8,10,
12,10,2,10,5,1,2,2,9,3,2,6,7,12,1,1,3,9,5,11,4,10,7,18)
1. Select two different simple random samples of size 5 from the population areas (sample with replacement -- so that it is possible to select the same rectangle more than once) using R as follows:
> n=5
> set.seed(123) # You don’t pick 123; pick your own number!
> x=sample(areas,n,replace=T)
> x
[1] 6 2 1 7 5
> mean(x)
[1] 4.2
> x=sample(areas,n,replace=T)
> x
[1] 1 12 12 1 1
> mean(x)
[1] 5.4
For each sample, list the areas, and then calculate the value of the average area of the 5 areas Complete the tables below. After you have completed the tables, write your two values foron the data collection sheet given at the end under Sample Size . Complete the column on the data collection sheet.
Random Sample 1 Random Sample 2
Area / Area6 / 1
2 / 12
1 / 12
7 / 1
5 / 1
4.2 5.4
Write the average 4.2 and 5.4 (a different number for your work) into the following “Data Collection Sheet” and repeating it similarly to a total of 30 times:
Sample Means
Sample Number / n = 51 / 4.2
2 / 5.4
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2. Select two different simple random samples of size 15 from the population (sample with replacement). For each sample, list the areas, and then calculate the value of Complete the tables below. After you have completed the tables, write your two values foron the data collection sheet given at the end under Sample Size Complete the column on the data collection sheet.
Random Sample 1 Random Sample 2
Area / Areac d
3. Select two different simple random samples of size 25 from the population (sample with replacement). For each sample, list the areas, and then calculate the value of Complete the tables below. After you have completed the tables, write your two values foron the column under Sample Size Complete the column on the data collection sheet.
Random Sample 1 Random Sample 2
Area / Areae f
Write the averages c, d, e, f (numerically for your work) into the following “Data Collection Sheet” and repeating it similarly to a total of 30 times: (You copy what you had in the first column for n=5)
Data Table. Sample Means
SampleNumber / n = 5 / n = 15 / n = 25
1 / 4.2 / c / e
2 / 5.4 / d / f
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
4. Instead of repeating it just 30 times, we would like to repeatedly draw SRS’s with n=5 for 10,000 times automatically using the following R codes:
set.seed(123) # This will give you the same 30 values for n=5.
repeat.time=10000
results.5 = c()
for(i in 1:repeat.time) {
X=sample(areas,5,replace=T)
results.5[i] = mean(X)}
The results are contained in the data set named results.5
Modify the code so that you can draw SRS’s with n=15 for 10,000 times; and also SRS’s with n=25 for 10,000 times.
Questions:
Answer the following questions using the data table on the data collection sheet.
1) For the 10,000 values with sample size n = 5, compute the
a. mean of the 10,000 values.
b. Standard deviation of the 10,000 values.
2) Find the mean and standard deviation of the 10,000 values with sample size n = 15, and then for n = 25.
3) For each sample size n = 5, 15, and 25 construct a histogram of the 10,000 sample mean values and describe the shape of the distribution.
4) Compare the shape of the 3 distributions of thevalues to the shape of the distribution of the population. Which looks more normal?
5) Based on your 3 histograms, what do you think is the relationship between the sample size n and the shape of the distribution of the sample mean?
6) For which sample size is the standard deviation the largest and for which sample size is the standard deviation the smallest? Why do you suppose this happens?
7) How does the standard deviation of thevalues compare to the standard deviation of the population as n increases? What does this tell you about the spread of thevalues compared to the spread of the population values as n increases?
8) Find an expression for the population mean of the sample means,as a function of the population mean of the 100 rectangle areas,
9) Try to develop a formula to relate the standard deviation of the sample means,to the population standard deviation,and the sample size, n. (Hint: the formula involves)
Population of 100 Rectangles:
(The population of rectangles sheet is adapted from Scheaffer et al. 1996.)
Histogram and Frequency Table of the Areas of the Rectangles in the Population:
Histogram of the Areas of the Rectangles in the Population:
Frequency Table of the Areas of the Rectangles in the Population:
1