Student Notes - Prep Session Topic: Sampling Distributions

Content

The AP Statistics topic outline contains the following list of items related to sampling distributions. (Items (4), (5), and (8) will not be covered in this session.)

1.  Sampling distribution of a sample proportion

2.  Sampling distribution of a sample mean

3.  Central Limit Theorem

4.  Sampling distribution of a difference between two independent sample proportions

5. Sampling distribution of a difference between two independent sample means

6.  Simulation of sampling distributions

7.  t-distribution

8.  Chi-square distribution

Sampling distributions are an extension of probability, so many free response questions that include questions on sampling distributions will also include parts that relate to material discussed and reviewed in the earlier prep session on probability.

Be sure you understand ---

1.  The difference between a parameter and a statistic

2.  What we mean by the sampling distribution of a statistic (that is, the distribution of the values of that statistic obtained from all possible samples of a given size from a given population)

3.  What we mean by an unbiased statistic

4.  The formulas for and should be used only when the population is at least 10 times as large as the sample

5.  The sampling distribution of is approximately normal when the sample size is large (your textbook will have a definition of large, for example and )

6.  The sampling distribution of is normally distributed, regardless of sample size, if the underlying population is normally distributed

7.  The sampling distribution of is approximately normally distributed, regardless of the shape of the underlying population, when the sample size is large (according to the Central Limit Theorem). In this case is usually sufficiently large.

8.  The CLT is a statement about shape. It says that the sampling distribution of sample means becomes more normally distributed as the sample size increases.

Formulas

You will want to be familiar with the probability formulas that are provided on the exam. A partial list of formulas related to probability on the exam formula sheet is provided here. Note that several relate to the sampling distribution of sample means and sample proportions:

If X has a binomial distribution with parameters n and p, then:

If is the mean of a random sample of size n from an infinite population with mean and standard deviation , then:

Multiple Choice Questions from 1997 Exam

Questions 19 and 20 refer to the following information:

Every Thursday, Matt and Dave’s Video Venture has “roll-the-dice” day. A customer may choose to roll two fair dice and rent a second movie for an amount (in cents) equal to the numbers uppermost on the dice, with the larger number first. For example, if the customer rolls a two and a four, a second movie may be rented for $0.42. If a two and a two are rolled, a second movie may be rented for $0.22. Let X represent the amount paid for a second movie on roll-the-dice day. The expected value of X is $0.47 and the standard deviation of X is $0.15.

19. If a customer rolls the dice and rents a second movie every Thursday for 20 consecutive weeks, what is the total amount that the customer would expect to pay for these second movies?

(A)  $0.45 (B) $0.47 (C) $0.67 (D) $3.00 (E) $9.40

20. If a customer rolls the dice and rents a second movie every Thursday for 30 consecutive weeks, what is the approximate probability that the total amount paid for these second movies will exceed $15.00?

(A)  0 (B) 0.09 (C) 0.14 (D) 0.86 (E) 0.91

Multiple Choice Questions from 2002 Exam

18. Which of the following statements is (are) true about the t-distribution with k degrees of freedom?

I. The t-distribution is symmetric.

II. The t-distribution with k degrees of freedom has a smaller variance than the t-distribution with k + 1 degrees of freedom.

III. The t-distribution has a larger variance than the standard normal (z) distribution.

(A) I only (B) II only (C) III only (D) I and II (E) I and III

30. The population {2, 3, 5, 7} has mean  = 4.25 and standard deviation σ = 1.92. When sampling with replacement, there are 16 different possible ordered samples of size 2 that can be selected from this population. The mean of each of these 16 samples is computed. For example, 1 of the 16 samples is (2, 5), which has a mean of 3.5. The distribution of the 16 sample means has its own mean and its own standard deviation . Which of the following statements is true?

A) and

B) and

C) and

D)

E)

36. An urn contains exactly three balls numbered 1, 2, and 3, respectively. Random samples of two balls are drawn from the urn with replacement. The average, ,where X1 and X2 are the numbers on the selected balls, is recorded after each drawing. Which of the following describes the sampling distribution of ?

A) /
B) /
C) /
D) /
E) / It cannot be determined from the information given.

38. Suppose that public opinion in a large city is 65 percent in favor of increasing taxes to support the public school system and 35 percent against such an increase. If a random sample of 500 people from this city are interviewed, what is the approximate probability that more than 200 of these people will be against increasing taxes?

A) B)

C) D) E)

AP Exam Free Response Questions for Practice and Discussion

2008, Form B, #2

Four different statistics have been proposed as estimators of a population parameter. To investigate the behavior of these estimators, 500 random samples are selected from a known population and each statistic is calculated for each sample. The true value of the population parameter is 75. The graphs below show the distribution of the values for each statistic.

(a) Which of the statistics appear to be unbiased estimators of the population parameter?

How can you tell?

(b) Which of the statistics A or B would be a better estimator of the population parameter?

Explain your choice.

(c) Which of the statistics C or D would be a better estimator of the population parameter?

Explain your choice.

1998, #1

Consider the sampling distribution of a sample mean obtained by random sampling from an infinite population. This population has a distribution that is highly skewed toward the larger values.

(a) How is the mean of the sampling distribution related to the mean of the population?

(b) How is the standard deviation of the sampling distribution related to the standard deviation of the population?

(c) How is the shape of the sampling distribution affected by the sample size?

2004, Form B #3

Trains carry bauxite from a mine in Canada to an aluminum processing plant in northern New York

State in hopper cars. Filling equipment is used to load ore into the hopper car. When functioning properly, the actual weights of ore loaded into each car by the filling equipment at the mine are approximately normally distributed with a mean of 70 tons and a standard deviation of 0.9 ton. If the mean is greater than 70 tons, the loading mechanism is overfilling.

(a) If the filling equipment is functioning properly, what is the probability that the weight of the ore in a randomly selected car will be 70.7 tons or more? Show your work.

(b) Suppose that the weight of ore in a randomly selected car is 70.7 tons. Would that fact make you suspect that the loading mechanism is overfilling the cars? Justify your answer.

(c) If the filling equipment is functioning properly, what is the probability that a random sample of 10 cars will have a mean weight of 70.7 tons or more? Show your work.

(d) Based on your answer in part (c), if a random sample of 10 cars had a mean ore weight of 70.7 tons, would you suspect that the loading mechanism was overfilling the cars? Justify your answer.

Sampling distribution of problem

Imagine a very large candy machine filled with orange, brown, and yellow candies. The company that fills the machine says that 45% of the candies in the machine are orange. Assume for the moment that this claim is true, and that the machine has just been filled. When you insert money, the machine dispenses a random sample of 25 candies. Let = the proportion of orange candies in your sample.

(a) What is the mean of the sampling distribution of the sample proportion ? Explain.

(b) Find the standard deviation of the sampling distribution of . Show your work.

(c) Explain why it would be appropriate to use a Normal distribution to approximate the sampling distribution of in this setting.

(d) Use a Normal distribution to find the approximate probability that the proportion of orange candies in your sample will be less than or equal to 0.36. Show your method clearly.

(e) If your sample actually contained 9 orange candies, would that make you suspect that the company isn’t putting enough orange candies in the machine? Justify your answer.


2007, #3

Big Town Fisheries recently stocked a new lake in a city park with 2,000 fish of various sizes. The distribution of the lengths of these fish is approximately normal.

(a) Big Town Fisheries claims that the mean length of the fish is 8 inches. If this claim is true, which of the following would be more likely?

·  A random sample of 15 fish having a mean length that is greater than 10 inches

Or

·  A random sample of 50 fish having a mean length that is greater than 10 inches

Justify your answer.

(b) Suppose the standard deviation of the sampling distribution of the sample mean for random samples of size 50 is 0.3 inch. If the mean length of the fish is 8 inches, use the normal distribution to compute the probability that a random sample of 50 fish will have a mean length less than 7.5 inches.

(c) Suppose the distribution of fish lengths in this lake was nonnormal but had the same mean and standard deviation. Would it still be appropriate to use the normal distribution to compute the probability in (b)? Justify your answer.


2007, Form B #2

The graph below shows the relative frequency distribution for X , the total number of dogs and cats owned per household, for the households in a large suburban area. For instance, 14 percent of the households own 2 of those pets.

(a) According to local law, each household in this area is prohibited from owning more than 3 of these pets. If a household in this area is selected at random, what is the probability that the selected household will be in violation of this law? Show your work.

(b) If 10 households in this area are selected at random, what is the probability that exactly 2 of them will be in violation of this law? Show your work.

(c) The mean and standard deviation of X are 1.65 and 1.851 respectively. Suppose that 150 households in this area are to be selected at random and, the mean number of dogs and cats per household, is to be computed. Describe the sampling distribution of , including its shape, center, and spread.

Solution, 2008 Form B Question 2

(a) Statistics A, C, and D appear to be unbiased. This is indicated by the fact that the mean of the estimated sampling distribution for each of these statistics is about 75, the value of the population parameter.

Note: No other characteristic should be mentioned in the response. Students must clearly demonstrate an understanding of the term unbiased.

(b) Statistic A would be a better choice because it appears to be unbiased (or centered at 75). Although the variability of the two estimated sampling distributions is similar, statistic A would produce estimates that tend to be closer to the true population parameter value of 75 than would statistic B.

(c) Statistic C would be a better choice because it has smaller variability. Although both statistic C and statistic D appear to be unbiased, statistic C would produce estimates that tend to be closer to the true population parameter value of 75 than would statistic D.

Solution, 1998 Question 1

(a) The mean of the sampling distribution is equal to the mean of the population.

Note: There were a number of papers that had responses containing “the sample mean is close to" or “gets close to the population mean as n increases," or other rewordings of the law of large numbers. These statements, while true, do not answer the question posed.

(b) The standard deviation of the sampling distribution is equal to the standard deviation of the population divided by the square root of the sample size.

OR

Clearly states that the standard deviation of the sampling distribution decreases as n increases.

(c) The equivalent of the following two statements must be included: