SW 981

Sampling and Randomness

Random selection (sampling) vs. Random assignment (to groups)

Equalization and representative vs. Statistical function

Larger samples are more representative in the sense of yielding a more precise (point) estimate.

Kinds of Samples:

Probability Samples - use some form of random sampling in one or more of their stages

Random Sampling - each member of the population has an

equal chance of being selected.

Stratified Sampling (Blocking) - divide population into strata.

Cluster Sampling - successive random sampling of units, or

sets and subsets.

Systematic Sampling (Interval Sampling) - every kth unit

selected.

Nonprobability Samples - all fail to use random process making statistical inference improper.

Types of Distributions

1. Sample distribution - frequency distribution summarizing a given set of data, based on a randomly selected subset of a population.

2. Population distribution - theoretical distribution which describes the relative frequency associated with each of the values of a numerical variable, into which an entire set of possible observations may be mapped.

3. Sampling distribution - theoretical probability distribution which relates various values of some sample statistic to their probabilities of occurrence over all possible, samples of size N given a specific population distribution, and some probability structure underlying the selection of samples.

We use (1) to estimate (2) based on what we know about (3).

Sampling distribution is generally not the same as the distribution of the random variable for the population.

Sampling distribution of the mean approaches a normal distribution as n increases regardless of the underlying distriubtion of the random variable for the population (Central Limit Theorem).

The theory of sampling distributions permits one to judge the probability that a given value of some statistic arose by chance from some particular population distribution.

Population (Sample space) - parameters (Greek Letters)

Sample - Statistics

We estimate parameters using statistics.

Desirable Properties of Estimators:

Maximum likelihood estimate: Principle of maximum likelihood says to choose as our estimate of the population parameter, the value that maximizes the probability of observing the obtained sample.

Unbiased estimate: E(M) = µ The expected value of the sample mean is equal to the population mean, i.e., the sample mean is an unbiased estimate of the population mean.

Note: E(V) 2

E(V) = 2  (N-1) / N

for an unbiased estimate use s2 = V N / (N-1)

Consistent estimate: prob (G -  e)  1, as N 

Relative Efficiency: ²H ÷ ²G = efficiency of H relative to G

The more efficient estimator has the smaller sampling variance.

Sufficiency: If G is a sufficient statistic, our estimate of  cannot be improved by considering any other aspect of the data not already included in G itself.

In inferential statistics, our main interest is in the sampling distribution of the mean. The mean of the sampling distribution of means is the same as the population mean.

However, the variance of the mean, ²M = ²/N

Think of the two extremes: N = 1; and, N = population.

.Estimation of the Standard Error of the Mean:

²M = ²/N However, we don't know ².

Instead we use an estimate of ²: s² = (N/N-1) * V, where V = sample variance

Substituting s² for ² yields:²M = V/(N-1)

= Standard error

 standard deviation of the sampling distribution of the mean

Sample Size Calculations

Need to distinguish between statistical and substantive significance is eliminated if one pays attention to the issue in the design of the research.

Power analysis requires that we specify what difference we want to distinguish (ie. what is substantively important).

The power of a test of a mean always depends on four things:

1. The particular alternative hypothesis - The larger the departure of H0 from the true situation, H1, the more powerful is the test of H0, other things being equal.

2. The value of alpha chosen by the researcher - smaller alpha, less power.

3. The size of the sample - larger N leads to more power.

4. The variability of the population under study.

[For graphic demonstration of above see Hays, Figure 7.9.1.]

Need some estimate of variance - This can be a stumbling block. Frequently will require a separate pilot study.

Example: Single Sample t-test

H0: Drinking coffee is safe.

H1: Drinking coffee is detrimental.

Calculation steps:

1. Calculate  (Glass's effect size) from appropriate formula in Summary Table. This is the effect (mean differnce, corrrelation, etc.) that you care about and wish to detect if it exists.

2. Calculate (critical effect size) again using formula in Summary Table.

3. Set alpha and Beta based on your willingness to be wrong in either direction. (Sample size constraints may influence choice of Beta)

4. Obtain v (nu) from Master Table.

5. Calculate n (sample size) from formula back in Summary Table.