P2010 Lecture Notes

Sampling, Sampling Distributions Ch 5

Samples vs. Populations

Population: A complete set of observations or measurements about which conclusions are to be drawn.

Sample: A subset or part of a population.

Not necessarily random

Statistics vs. Parameters

Parameter: A summary characteristic of a population.

Summary of Central tendency, variability, shape, correlation

E.g., Population mean, Population Standard Deviation, Population Median, Proportion of population of registered voters voting for Bush, Population correlation between Systolic & Diastolic BP

Statistic: A summary characteristic of a sample. Any of the above computed from a sample taken from the population.

E.g., Sample mean, Sample Standard Deviation, median, correlation coefficient

Inferential Statistics

We take a sample and compute a description of a characteristic of the sample – central tendency (usually), variability or shape. That is, we compute the value of a sample statistic.

We use the sample statistic to make an educated guess about the corresponding population parameter.

The basic concept is easy. The devil is in the details.

Types of sampling techniques

Random Sampling

Every element of the population must have the same probability of occurrence and every combination of elements must have the same probability of occurrence.

Usually done by having a computer program generate a “random” order for selection of participants.

Very difficult to achieve in practice.

Systematic Sampling.

Every Kth element of a population. The first person is selected arbitrarily.

xxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxkxxxxk . . .

Stratified Sampling

Stratum: A subgroup of a population.

When different strata of a population may give different responses to a survey question, survey researchers will usually attempt to make sure that each stratum is represented in a sample. Such sampling is called stratified sampling.

Typical strata: Gender groups, Ethnic groups, political groups, likelihood of voting groups.

Convenience Sampling

Taking whoever is available, without any attempt to randomly pick from a population or to stratify.

Most samples in psychology are convenience samples.

The Researcher’s Curse: Variation of sample statistics from sample to sample

Research involves taking samples and making decisions based on the sample results.

Unfortunately, sample characteristics vary from one sample to the next.

So, my decision based on a sample I took might be different from your decision based on a sample you took.

This means that to perform research, we have to know something about how sample characteristics vary from sample to sample.

Sampling Distributions (Should be called Sample Statistic Distributions)

Consider a population of IQ scores. (Illustrated on Corty p. 139)

Here’s part of the population . . .

86 99 96 95 72 73 95 125 97 95 95 83 121 87 93 73 77 115 111 109 100 87 123 96 100 120 97 95 110 85 100 116 79 96 88 95 83 117 120 82 99 106 100 106 102 95 112 101 101 105 82 64 112 116 106 85 93 135 90 93 116 115 83 126 107 90 98 117 116 68 126 93 107 99 79 113 93 86 70 111 94 88 87 69 93 71 74 106 111 92 106 125 101 111 80 84 85 97 104 81 126 89 81 106 104 85 116 97 92 122 110 100 113 123 96 75 91 112 93 77 93 103 81 92 106 97 104 108 61 95 104 102 113 77 98 104 106 121 83 108 103 101 123 98 93 78 105 54 106 107 109 89 97 83 81 73 89 92 102 111 116 93 83 111 114 78 110 98 95 105 121 79 121 118 131 108 92 115 135 72 109 82 88 99 102 96 80 91 119 101 133 93 83 88 115 123 101 89 110 93 . . .

Now consider taking a sample of size 4 from that population.

Compute the mean of that sample.

Now repeat the above steps 1000's of 1000's of times.

The result is a population of sample means.

The frequency distribution of the sample means is called the Sampling Distribution of Means.

Simulating taking samples from a population . . .

Open and run the Syntax file “Input program to simulate sampling disltribution of means.sps”.

Dot plot of population . . .

A few means of samples of size 4 . . .

Report
y
Mean / N / Std. Deviation
88.25 / 4 / 11.815
Report
y
Mean / N / Std. Deviation
111.25 / 4 / 19.873
Report
y
Mean / N / Std. Deviation
95.00 / 4 / 23.721
Report
y
Mean / N / Std. Deviation
97.50 / 4 / 8.347
Report
y
Mean / N / Std. Deviation
109.50 / 4 / 12.897
Report
y
Mean / N / Std. Deviation
94.00 / 4 / 16.793
Report
y
Mean / N / Std. Deviation
100.00 / 4 / 12.884
Report
y
Mean / N / Std. Deviation
95.50 / 4 / 14.012

Three theoretical facts and one practical fact about the distribution of sample means . . .

The theoretical facts are about 1) central tendency, 2) variability, and 3) shape . . .

1. The mean of the population of sample means will be the same as the mean of the population from which the samples were taken. The mean of the means is the mean. µM = µfrom Corty, p. 140.)

Implication: The sample mean is an unbiased estimate of the population mean. If you take a random sample from a population, it is just as likely to be smaller than the population mean as it is to be larger than the population mean.

2. The standard deviation of the population of sample means – called the standard error of the mean - will be equal to doriginal population's standard deviationdivided by the square root of N, the size of each sample. (Corty, Eq. 5.1, p 142)

In Corty’s notation, σ

σM = ------

N

The standard deviation (σM) is called the standard error of the mean.

Implication: Means are less variable than individual scores. Means are likely to be closer to the population mean than individual scores. You can make a sample mean as close as you want to the population mean if you can afford a large sample.

3. The shape of the distribution of the population of sample means will be the normal distribution if the original distribution is normal or approach the normal as N gets larger in all other cases. This fact is called the Central Limit Theorem. It is the foundation upon which most of modern day inferential statistics rests. See Corty, p. 141.

Why do we care about #3: Because we’ll need to compute probabilities associated with sample means when doing inferential statistics. To compute those probabilities, we need a probability distribution.

Practical fact

4. The distribution of Z's computed from each sample, using the formula

X-bar - M

Z = ------

------

N

will be or approach (as sample size gets large) the Standard Normal Distribution with mean = 0

and SD = 1.

Another test question: What are three facts about the distribution of sample means – a fact about central, a fact about variability, and a fact about shape of the distribution of sample means?

Biderman’s P201 Handouts Topic 10: Probability and Sampling Distributions - 111/22/2018