Interval Estimation; Hypothesis Testing

UNIT TWO

Sampling; Sampling Distributions;

Why Sample?

Recall that inferential statistics as a body of knowledge refers to approaches for drawing conclusions about populations based on samples drawn from those populations. Let’s say you desire to gain information about a population. If you can access the entire population fairly quickly and cheaply, and without undue harm, then you would likely opt to examine the population in its entirety and forego sampling. For many populations, however, one would draw a sample from the population for some combination of the following reasons:

(1)save time

(2)save money

(3)limit the destruction of entities in those situations where to measure the variable of interest, you must destroy the entity being measured (sometimes referred to as destructive sampling)

(4)if the population comprises observations arising from a process (e.g., the widths of semiconductor chips made using a particular process), the process can operate indefinitely thereby yielding an unending (infinite) population of observations

(5)theory exists which enables us to draw conclusions--in which we can have reasonably high confidence--about populations from samples

Types of Probability Samples

A probability sample is a sample of units drawn from a population in such a way--invariably involving random selection--that one can draw (based on the sample) a statistically sound conclusion about the population. Conclusions about populations based on non-probability samples (e.g., samples of convenience or samples involving self-selection) are suspect. Several methods for obtaining probability samples are described below.

Simple Random Sampling: A simple random sample of size n drawn from a finite population is a sample of n units drawn in such a way that every sample of that same size has the same chance of being selected. To draw a simple random sample from a finite population, you must be able to assign each unit in the population a unique number (with the same number of digits as the numbers assigned to the other units). A table of random digits is used to determine which units are to be included in the sample. A simple random sample can be drawn with replacement (where each time you select a unit to go in the sample, you "replace" the unit in the population, thus giving it a chance to be picked again) or without replacement (where each time you select a unit to go in the sample, you "don't replace" the unit in the population, thus don't give it a chance to be picked again).

Stratified Sampling: In stratified sampling, the population is divided into non-overlapping subpopulations called strata which together comprise the entire population. Then a sample is drawn from each stratum. If a simple random sample is drawn from each stratum, one has a stratified random sample. Stratified random sampling is more precise (i.e., leads to less variable results from sample to sample for any given sample size) than simple random sampling when the units within each stratum are more homogeneous (similar to one another with respect to the variable of interest) than the population as a whole. Stratified sampling permits you to gather information about the individual subpopulations as well as the entire population. Example of stratified sampling: to estimate the mean cash holdings of financial institutions in Georgia as of September 1, you randomly sample some small, some mediumsized, and some large financial institutions in Georgia.

Cluster Sampling (singlestage). In cluster sampling, the population is divided into non-overlapping subpopulations called clusters which together comprise the entire population. Then a simple random sample of clusters is selected, and each unit in each selected cluster is examined. Cluster sampling is appropriate when each cluster is deemed to be as heterogeneous (varied with respect to the variable of interest) as the population as a whole. Often, populations naturally exist in clusters (e.g., people clustered in neighborhoods, products clustered on shelves), allowing one to increase the sample size at considerably less cost than with simple random sampling. Example of single-stage cluster sampling: in a marketing research study to estimate the proportion of adults who feel a new cereal is highly nutritious and the proportion who feel the new cereal has an excellent taste, a cereal manufacturer distributes through the post office small boxes of the new cereal to all the homes in randomly selected zip codes within a metropolitan area. (Self-addressed, stamped post cards are distributed with the cereal to capture reactions to the cereal.)

Systematic Sampling. To draw a systematic sample, you must be able to number the units in the population sequentially. To select a systematic sample, a unit is selected at random from the first k units, and then every kth unit thereafter is selected. (A systematic sample is equivalent to cluster sampling with 1 cluster being randomly selected.) Example of systematic sampling: to estimate the mean level of customer satisfaction with the service rendered, the manager of a car dealership--after randomly selecting the number 4 out of the counting numbers 1 through 10--interviews the 4th, 14th, 24th, 34th, etc. customers entering the dealership over a period of one week.

The Two Basic Approaches to Inferential Statistics

Interval Estimation

Recall that a parameter is a summary measure describing a population. Examples of parameters include the mean of a population, the variance of a population, and the proportion of a population falling in a certain category. One basic form of inferential statistics is to obtain--via sampling--an interval estimate of (confidence interval for) some parameter of a population.For example, you could select a random sample of 400 paid employees in the U.S. and—based on the proportion of that sample of employees who changed jobs within the last year—get an interval estimate of what proportion of ALL paid employees in the U.S. changed jobs within the last year. Interval estimation can also be employed to obtain--via sampling--an interval estimate of (confidence interval for) the difference or ratio between two parameters of two distinct populations.

Hypothesis Testing

Another basic form of inferential statistics is to choose--via sampling--between two competing hypotheses about a population. It is customary to call the two competing hypotheses the null hypothesis (typically denoted H0) and the alternative hypothesis (typically denoted H1), respectively. For example, it could be that the mean fill amount of 2-liter bottles filled by a filling machine is the intended 2.05 liters (H0:  = 2.05 liters). Alternatively, it could be that the mean fill amount is  2.05 liters (H1:  2.05 liters). You could select a random sample of 52 2-liter bottles filled by the machine and—based on the fill amounts for that sample of bottles—assess whether H0 should be called into question, i.e., assess whether H1 is supported. Hypothesis testing can also be employed to choose--via sampling--between two competing hypotheses about multiple populations.

Sampling Distributions

Recall that a statistic is a summary measure describing a sample. Examples of statistics include the mean of a sample, the variance of a sample, and the proportion of a sample falling in a certain category. A sampling distribution is (by definition) the distribution of some statistic across all samples of the same size and type that can be drawn from a population.

Following is an elaboration of that definition. Consider any population. Consider every possible sample of a particular size and type that can be drawn from that population. Consider determining—for EACH of those samples--the value of some statistic. The distribution of that statistic across all the samples (i.e., the “pattern” in the collection of measurements, with one measurement per sample) is called a sampling distribution.

Procedures for doing inferential statistics—whether interval estimation or hypothesis testing--rely on knowledge about sampling distributions. The theory about sampling distributions that we will address in this course provides the rationale behind various procedures for doing inferential statistics that we will learn in this course.

Estimators and Point Estimates

The type of statistic that one uses to estimate a population parameter is called an estimator. (For

example, , the sample mean, is an estimator for , the population mean.) A specific value of an

estimator obtained from a specific sample is called a point estimate for the population parameter. (For example, the mean of an actual sample drawn from a population would be a point estimate for , the population mean.)

Characteristics of a "good" estimator include:

(1)being unbiased, which means that the mean of the estimator over all possible samples of the same size and type equals the parameter being estimated.

(2)being consistent, which means that as the sample size increases, the variance of the estimator approaches 0.

(3)being efficient, which means that the variance of the estimator (over all possible samples of a given size and type) is small; estimator 1 is more efficient than estimator 2 if--across all samples of the same size and type--the variance of estimator 1 is smaller than that of estimator 2.

The sample mean, sample variance (using the divisor n –1), and sample proportion (proportion of a sample falling in a particular category) are unbiased and consistent estimators of the population mean, population variance, and population proportion (proportion of the population falling in a particular category), respectively.

Theory Underlying Inferential Statistics Procedures Addressed in This Unit

In the theorems below:

X denotes a quantitative variable.
The variable denotes the mean of a sample of n observations of X.
p denotes the proportion of a population falling in a certain category.
The variable denotes the proportion of a sample (of observations from a population) falling in a certain category.
The assumed sampling method is simple random sampling.
E(Q) denotes the expected value (or mean) of the variable Q.
VAR(Q) denotes the variance of the variable Q.
STDEV(Q) denotes the standard deviation of the variable Q.

Theorem 1. For any quantitative variable X having mean  and variance 2: (a) E() = ; and (b) VAR() = or  2/n and STDEV() = or  unless sampling is done without replacement and n > .05N, in which case VAR() = and STDEV() = .

Theorem 2.If X is normally distributed, then is normally distributed.

Theorem 3 (the CENTRAL LIMIT THEOREM). For X having a finite variance (and, implicitly, a non-normal distribution): as n approaches infinity, the distribution of approaches a normal distribution. Associated rule of thumb: if n  30, then has approximately a normal distribution.

Theorem 4. If X is normally distributed, then has the t-distribution with n-1 degrees of freedom (df). [note: the family of t-distributions is discussed below]

Theorem 5. E() = p; and VAR() = or  p(1 - p)/n and STDEV() = or unless

sampling is done without replacement and n  .05N, in which case

VAR() = and STDEV() =

Another rule of thumb related to the Central Limit Theorem: For n so large that np  5 and

n(1 - p)  5 (from which it follows that 5/n  p  1 – (5/n)) has approximately a normal distribution.

Theorem 6. If X1 is normally distributed with mean 1 and variance 12 and X2 is normally distributed with mean 2 and variance 22, then across all independent selections of one observation of X1 and one observation of X2, aX1 – bX2 (for real numbers a and b not both 0) is normally distributed with mean

a1 - b2 and variance a212 + b222 (from which it follows that X1 – X2 is normally distributed with mean 1 - 2 and variance 12 + 22).

t distribution

There is a family of what are called t-distributions; each member of the family has a particular number (v, where v is a positive integer) degrees of freedom. Every t-distribution in the family is roughly bell-shaped, has a mean of 0, and has a standard deviation greater than 1. As the degrees of freedom approaches infinity, the corresponding t-distribution approaches the standard normal distribution. (note: A t-distribution with 29 or more degrees of freedom is very similar to the standard normal distribution.)

Four t-distributions are depicted in the figure below.

Figure.Distributions depicted, from “tallest” to “shortest”:

Standard normal distribution (legend label “infinity”)

t distribution with 20 degrees of freedom (legend label “20”)

t distribution with 5 degrees of freedom (legend label “5”)

t distribution with 2 degrees of freedom (legend label “2”)

t distribution with 1 degree of freedom (legend label “1”)

Interval Estimates (Confidence Intervals)

A C% (e.g., 95%) confidence interval for a parameter is an interval obtained by a process which--in advance of being applied--has a probability of C/100 (e.g., .95) of yielding an interval containing the parameter. C% is called the confidence level, and C/100 is called the confidence coefficient. Commonly used confidence levels are 90%, 95% and 99%; their associated confidence coefficients are .90, .95, and .99, respectively. When we obtain a C% confidence interval for a parameter, we say that "we are C% confident that the parameter is between the endpoints of the interval." It would be technically incorrect to say the probability is C/100 that the parameter is in the interval because, once you obtain a specific interval, the parameter is either in the interval or not in the interval--there's no probability about it.

Many confidence intervals (interval estimates) for individual parameters are of the form:

point estimate ± (reliability factor)(standard error of estimator),

where:

(1)the point estimate is a single-valued estimate of the parameter

(2)the reliability factor depends on the desired confidence level,

(3)the standard error of the estimator (which typically must be estimated) is the standard deviation of the estimator over repeated samples of size n, and

(4)the (reliability factor)(standard error of estimator) is called the margin of error.

There are two ways to decrease the width of a confidence interval: (1) take a larger sample; or (2) decrease the confidence level.

Confidence intervals may be obtained for items other than individual population parameters. For example, one may obtain a confidence interval for the difference between two population parameters or obtain a confidence interval for the ratio between two population parameters.

See the formula sheet at the end of this unit packet for: (a) confidence interval formulasfor  (the mean of a population), 1 - 2 (the difference between two population means), p (the proportion of a population falling in a certain category), and p1 – p2 (the difference between two population proportions falling in a certain category);and (b) formulas for estimating how large a sample to draw—prior to obtaining a confidence interval for a population mean or proportion—in pursuit ofpre-set confidence level and margin of error targets.

p-value Approach to Hypothesis Testing

The p-value approach to hypothesis testing comprises the following four steps:

(1)Indicate the two competing hypotheses about the population(s). The two hypotheses should be complements of one another (i.e., be non-overlapping yet together cover all the possibilities). The null hypothesis (H0) must be (or contain) a statement about the population(s) which permits you to specify what the probability distribution of some sample statistic (which we call the test statistic) would be like should that statement about the population(s) within the null hypothesis be true.

(2)Decide upon an appropriate test statistic, and—subsequent to drawing sample(s) from the population(s)—calculate the value of the test statistic for the sample(s) drawn. A test statistic is a particular sample measure to be computed from your sample data.

(3)Calculate the p-value, which is the (maximum) probability, should H0 be true, of obtaining a test statistic value as contrary to H0—or more contrary to H0—as the test statistic value you obtained from the sample(s) drawn. The p-value is in essence telling you how rare (in probability terms) it would be to obtain a sample(s) such as yours if H0 was true.

(4)State your conclusion. Standard conclusions associated with various p-values are:

p-valuestandard conclusion

p  .10H0 may be true

.05  p < .10marginal evidence that H1 is true

.01  p < .05evidence that H1 is true

.005  p < .01strong evidence that H1 is true

p < .005very strong evidence that H1 is true

To further clarify this chart, consider the following scenario. Under the assumption that some particular H0 is true, you get a sample result so extreme (outlandish) that a result that extreme (or even more so) would have only a .003 chance of happening if H0 was true. You can respond in one of two ways at this point: (1) you reason that something unusual happened, “that’s all,” and you go along with H0 or (2) you reason that, hey, if H0 was true, this shouldn’t have happened, so you question H0, and go along with H1 instead. The standard practice is to respond in the latter fashion.

See the formula sheet at the end of this packet for specifications of test statistics for testing hypotheses about  (a single population mean) or p (the proportion of a population falling in a certain category.

Significance Level Approach to Hypothesis Testing

With the significance level approach to hypothesis testing, one decides (the decision can be made in advance of obtaining any sample data) how inconsistent with H0 (in probability terms) the sample data would need to be in order to reject H0. The significance level  is that measure (in probability terms) of inconsistency with H0. If sample data that inconsistent (or more so) with H0 is obtained, H0 is rejected; otherwise, H0 is not rejected (we will say accepted). The following chart shows the implications of various choices of  :

Choice of Implications of that choice

.10H0 will be rejected if and only if marginal evidence contrary to H0 is found

.05H0 will be rejected if and only if evidence contrary to H0 is found

.01H0 will be rejected if and only if strong evidence contrary to H0 is found

.005H0 will be rejected if and only if very strong evidence contrary to H0 is found

The significance level is the maximum probability of rejecting H0 should H0 be true.

We will be taking the p-value approach to hypothesis testing. By reporting the p-value, we are conveying just how inconsistent with H0 the sample data turned out to be. A person desiring a particular significance level  would reject H0 if and only if the p-value (often referred to as the observed significance level) is  .

Type I and Type II Errors

Whenever you test a particular null hypothesis using the significance level approach, only one (but you don’t know which one) of the following two errors is possible:

Type I error: you reject a true H0 (i.e., in essence, you conclude H1 is true when in reality H0 is true)

Type II error: you accept a false H0(i.e., in essence, you conclude H0 is true when in reality H1 is true)

In essence, the larger the sample size you use, the less likely it is to make whichever of the two errors is possible for the hypothesis test at hand.