Notes For Lecture 2, MAT 120

1.3 Wherein we consider some details concerning the process of collecting sample data, or sampling.

Ex: Say that a pollster wants to know who the voters feel should be the next leader of the free world. It seems impractical to ask each member of the population. Instead, we should choose a sample, as discussed in Lecture 1.

The question that arises: how many people should we choose to be in the sample, and how should we choose them? Several strategies are available.

» Random sampling is the process of using chance to select individuals from a population to be included in the sample.

'Randomness' in sampling is an important concept, one that links the processes in the field of statistics to its underlying (and legitimizing) mathematical foundations. But consider the voter example above. How do you choose a 'random' person?

Ex: A Math teacher wants to get a feel for what percentage of his students use 'smartphones' (iPhone/android). He wants to take a random sample of his students of size 30. He has a database with all of his students names. To obtain the random sample, he can:

(1) Assign all students a number, ranging from 1 to (say) 170.

(2) Use a random number generator to find 30 random numbers.

(3) Interrogate the students with the assigned, generated numbers.

Ex: I'm going to find 5 random numbers between 1 and 50 on the TI-84. Note: many software-based random number generators have an algorithm which takes a seed number, then bounces it around through different operations to calculate the random number. The first step in the process is selecting a seed. I'm going to pick the number 27: Hit

And move right to the PRB menu: Now hit 1 and :

This sets the seed. Next,

We're going to use the function randInt( off of the PRB menu again – that would be option . Now I need to provide the range that I wish the random number to fall within. As noted before, I want it to be between 1 and 50, so I specify this in the following way: hit 5 times:

Try: #12, #16, pps. 28-29

1.4 Here we take a look at 'Other Effective Sampling Methods'. It is important to note that random sampling still plays a crucial role.

» A stratified sample is obtained by separating the population into nonoverlapping groups called strata, and then obtaining a simple random sample from each stratum. The individuals in each stratum should homogeneous (or similar) in some way.

Ex: An example of a stratified sample would be an IRS auditing strategy, which would select a certain percentage of taxpayers from different income ranges to audit. The strata could be income groups like 'Below 20,000', '20,001-35,000', '35,001- 70,000', etc. The percentages would vary for each group, reflecting how well they reflected the population.

Read: Ex 1, p. 30, work #26, p. 37.

Note that within each stratum, the individual members must be chosen randomly for the study. Note also that comparisons can be made between the results in different strata, one of the advantages of stratified sampling.

» A systematic sample is obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k.

Read: Ex. 2, p. 31.

Note that the choice of k depends on the desired size of the sample. There is a step-by-step method on p. 32 for determining k.

Work: #28, p. 37 (added on: find the first five numbers of a systematic sample).

» A cluster sample is obtained by selecting all individuals within a randomly selected collection or group of individuals.

Read: Ex. 3, p. 33

» A convenience sample is a sample in which the individuals are easily obtained and not based on randomness.

A great example is internet polls, where the people who feel the strongest (e.g. Ron Paul supporters) are the ones who vote, though they are not representative of the general characteristics of the population. This points out the crucial nature of randomness in sampling.

Finally, on p. 35, the text makes a note about sample size considerations. How big should a sample be, if it is to properly reflect trends in the general population? Well, there are rules for that. The bigger the sample, the better it portrays the population, but you want to strike a balance between size and efficiency. Like most facets of statistics, sample sizes can be set according to the margin-of-error within which you would like to work.