Hypothesis Testing (ASW Chapter 9)

MGMT 201: Statistics

Hypothesis Testing (ASW Chapter 9)

What is hypothesis testing?

We want to examine whether premises about a population are likely to be true (or false).
Specifically, we proceed as follows.
1. We establish a null hypothesis, which may be as simply as H0:  10.
2. We specifically the alternative hypothesis, which is the complement of the null hypothesis. Ha:  > 10.
The alternative hypothesis is chosen to be our research hypothesis.
Suppose, for example, that we are interested in whether a company’s stock is performing better under new managers. Previously, the stock averaged 1.14% per month after adjusting for risk. H0:  1.14% and Ha:  > 1.14%.
3. We examine a sample to determine the probability that the null hypothesis is true.
Notes:
We will never to be able to conclude something with certainty.
The null is given the benefit of the doubt. That is, we look for somewhat conclusive evidence that the null is incorrect.
Three cases
1. H0: 0 ; Ha: 0
2. H0: 0 ; Ha: 0
3. H0:  = 0 ; Ha: 0

Type I and Type II Errors

Suppose H0:  10 and Ha:  > 10.
Suppose that =8.6. If we reject the null hypothesis, we are mistaken. Such an error is called a Type I Error. Specifically, a type I error occurs when we incorrectly reject the null hypothesis.
Suppose instead that =12.1. If we fail to reject the null hypothesis, we are mistaken. Such an error is called a Type II Error. Specifically, a type II error occurs when we incorrectly fail to reject the null hypothesis.
 probability of making a type I error.
This is the same  we used in previous chapters and is called the level of significance.
 probability of making a type II error
In many cases, we do not know . It is difficult to determine because we do not know the population parameters.

Testing Hypotheses

One-Tailed Tests: Large Samples
When 1) H0: 0 ; Ha: 0 …or… 2) H0: 0 ; Ha: 0, we must perform a one-tailed test. We reject the null only if the sample mean is significantly away from the hypothesized mean on a pre-specified side. For example, in case I, we reject the null if and only if the sample mean is significantly greater than 0. In case II, we reject the null if and only if the sample mean is significantly less than 0.
example: Suppose we ask whether a new automobile model is significantly safer than an older model. Historically, the old model resulted in 1.065 personal injury accidents per 1,000,000 miles driven. We have the following data from the new model:

Month / Miles Driven / # of Personal
Injury Accidents / Accidents Per 1,000,000 Miles
November 1997 / 19,995,089 / 19 / 1.052
December 1997 / 10,239,863 / 11 / 0.931
January 1998 / 14,056,370 / 15 / 0.937
… / … / … / …
December 2000 / 19,799,440 / 18 / 1.100
January 2001 / 17,848,493 / 16 / 1.116
February 2001 / 14,446,470 / 12 / 1.204

H0:  1.065; Ha:  < 1.065
How do we proceed? We simply calculate the z-score of given the hypothesized  (1.065 in this case) and the sampling distribution of .
In this case, n = 40, = 1.024, s = 0.1379, and = 0.0218.
Then, . This tells us that the sample mean is 1.878 standard deviations below 1.065. Is this far enough away to conclude that the new accident rate is significantly below the old one?
We must first determine the level of significance for the test. Is it acceptable, for instance, to have a probability of type I error of 10%? 5%? What are we willing to accept?
Suppose =0.5 and consider the normal table. In doing so, we are relying on the CLT.
We have a one-sided test, so we are interested in a 5% error on one side of the distribution. As such, we look up 0.45 in the table and find z=1.645. Thus, we are 95% sure that the population mean is no greater than 1.645 standard deviations above the sample mean.
In this case, z < -1.878 (the hypothesized mean is 1.878 standard deviations above the sample mean). We therefore reject the null hypothesis and conclude that the new model is indeed safer.
Notice that 95% does not correspond to a 1.96 z-score when doing a one-tailed test. Instead, it corresponds to a 1.645 z-score.
An alternative representation is the p-value.
p-value observed level of significance.
In our example, we ask what is P( 1.024 |  = 1.065)? This is the p-value.
Finding 1.88 in the normal table gives 0.4699. P( 1.024) = 0.5-0.4699 = 0.0301.
So, there is a 3.01% chance that we would obtain a sample mean of 1.024 or less when the true mean is 1.065. We conclude that it is unlikely that the population mean is actually 1.065. It is likely to be less than 1.065, implying that the new model is safer than the old model.
Notice that everything is done relative to the hypothesized mean (1.065).

Steps in Hypothesis Testing
1. Determine H0 and Ha.
2. Choose an appropriate test statistic (z in our example).
3. Specify . It is important to do this prior to examining the sample so that we are not influenced by the sample.
4. Collect data and calculate the test statistic
5. Interpret the test statistic.
Two-Tailed Tests: Large Samples
When 3) H0: H0:  = 0 ; Ha: 0, we must perform a two-tailed test. We reject the null if the sample mean is significantly away from the hypothesized mean on either side. Specifically, we reject the null if the sample mean is significantly greater than 0 or if it is significantly less than 0.
example: Suppose we are interested in examining the returns (percentage change in price) around earnings announcements. We might ask whether the mean return during weeks in which earnings are announced is significantly different from the mean return in other weeks.
Suppose that weekly returns average 0.235% during normal weeks.
H0:  = 0.235%; Ha:  0.235%.
We choose to use z as our test statistic.
Suppose  = 5%.
We have the following data:

1 / -0.201%
2 / 1.467%
3 / -6.287%
… / …
61 / 4.685%
62 / 8.164%
63 / -2.368%

Here, n=63, = 1.046%, s = 0.03657, and = 0.004608.
For a two-tailed test with  = 5%, we need to determine the cutoff such that we have 2.5% in each tail. The corresponding z-score is 1.96. We would reject the null hypothesis if z > 1.96 or if z < -1.96.
In this case, we cannot reject the null hypothesis. Said differently, we conclude that in the sample, is not significantly different from 0.235%.
The p-value for a two-tailed test differs from that cited for a one-tailed test. We calculate the area in the tail beyond and then double it. In this case, z = 1.76 corresponds to an area of 0.5 - 0.4608 = 0.0392 in the tail. The p-value is then 0.0784. Since this is greater than , we cannot reject the null hypothesis.
Interval Estimation
We discussed interval estimation in the last chapter. It applies in the current setting because it enables us to establish rejection ranges. Suppose, for example, that we establish a 95% confidence interval in our example.
From before, we know that is our confidence interval for a level of significance of .
So, 0.01046  1.960.004608 = [0.001428,0.01949].
Because 0.235% is within the range, we cannot reject the null hypothesis.

Small Samples
Suppose n<30. What do we do?
If the underlying distribution is unknown and not approximately normal, we are sunk.
If the underlying distribution is approximately normal, we can use the t distribution.
example: We are interested in the annual returns on the S&P500 index of stocks. Because the 1980 were a period of dramatic change in the investment world, we decide to only look at the annual returns beginning with 1990.

Year / Return
1990 / -6.559%
1991 / 26.307%
1992 / 4.464%
1993 / 7.055%
1994 / -1.539%
1995 / 34.111%
1996 / 20.264%
1997 / 31.008%
1998 / 26.669%
1999 / 19.526%
2000 / -10.139%

Long-term Treasury securities currently pay 5.28% per year. We are interested whether stocks are significantly better in terms of returns.
H0:  5.28% ; Ha:  > 5.28%.
test statistic : t
level of significance: 5%
What assumptions are we making if we use the data to answer that question?
The test statistic follows a t distribution with n-1 = 10 degrees of freedom. With 10 degrees of freedom and 5% in one tail, t=1.812.
= 13.742%; s = 15.687%; = 4.730%
.  We conclude that given our sample, stock returns are not significantly greater than long-term Treasury returns.
We know in reality that stock greatly outperform bonds on average. This example illustrates how difficult it is to find significant results with a small sample.

Suppose, instead, that we ask whether stock returns differ significantly from bond returns on average.
H0:  = 5.28% ; Ha:  5.28%.
test statistic : t
level of significance: 20%
The test statistic follows a t distribution with n-1 = 10 degrees of freedom. With 10 degrees of freedom and 20% in two tails (10% in one tail), t=1.372.
1.789 > 1.372, so we conclude that stock returns are significantly different from bond returns.
The 80% confidence interval is 13.742%  1.3724.730% = [7.253%,20.232%].

Tests About Proportions
Dealing with proportions is similar to dealing with other random variables. The only real difference is that there is a specific form for the standard error of the mean.
example: Historically, a product has had a failure rate of 1.2%. The company changed the manufacturing process in an effort to improve reliability. A recent sample of 1500 products found that 8 had failed. Can we conclude that the new process has significantly improved reliability?
H0: p 1.2% Ha: p < 1.2%
test statistic: z
level of confidence: Suppose =5%
Recall that , but what do we use for p? Since we are testing whether p is significantly different from 1.2%, we should use 1.2%.
So,
.
For a one-tailed test with =5%, the cutoff for z is -1.645. In this case, there is significant evidence that the new process has improved reliability.
95% confidence limit = 0.00667 + 1.6450.00281 = 0.0113

Understanding Type II Errors

Recall that a Type II error is one in which we incorrectly fail to reject the null hypothesis. In some circumstances, it is important to try to control the  (the probability of a Type II error). Specifically, if we plan to take some important action if the null is not rejected, we should carefully consider .
Notice that if we knew  and , we could easily calculate the probability of a Type II error using the appropriate distribution (typically normal or t).
Unfortunately, we can’t do any better. We arbitrarily choose a value for  and then calculate  based on that choice. We might choose to estimate  using our sample, by making an educated guess , or perhaps use the worst case scenario.
Consider the example above. What is  for our test about proportions?
By definition,  = P(do not reject H0 | =C), where C is some value such that that the alternative hypothesis is correct. In this case, we consider C < 1.2%.
We reject H0 when z < -1.645. Using , we see that the critical rejection value satisfies , or . In other words, given a sample size of 1500 we will not reject the null whenever .
Now, suppose that the true population mean () is 0.6%. What is the P(| =0.6%)? [Note that this probability is .]
. From the normal table, we see that this corresponds to an area of 0.1879 between 0.737% and 0.6%. We would fail to reject the null for any values of higher than 0.737% (i.e., values of in the upper tail), so P(| =0.6%) =  = 0.5-0.1879 = 0.3121. This is the probability of a Type II error.
The number 1- is called the power of the test and is equal to 0.6879 in this example.
The step for calculating power are as follows:
1. Find the values of for which the null hypothesis is not rejected ( in this example).
2. Choose a value for C such that if =C, the alternative hypothesis is true.
3. Calculate the probability that we would fail to reject the null with our sample if =C (this is ).
4. Calculate the power (this is 1-).
5. Repeat steps 2-4 using different values for C.
We might, for example, also calculate the power for  = 1.0%.
. From the normal table, we see that this corresponds to an area of about 0.3264 between 0.737% and 1.0%.
We reject for all values of higher than 0.737%, so  = P(| =0.7%) = 0.5 + 0.3264 = 0.8264 and the power of the test is 1-0.8264 = 0.1736.
It is important to note where the cutoff is relative to the value of  we are using. That is what tells us whether to add 0.5 or to subtract the number from 0.5.
We can repeat this process over and over for various values of  and plot the power (1-) versus . Such a plot is called a power curve.
Note that in creating a power curve, we only consider values of  for which the null incorrect.
The power curve for our example is as follows:

Determining the Sample Size
Section 9.9 of the text describes how we might adjust the sample size to control the probability of Type II error. You should read through this on your own, but you will not be responsible for it on the final exam.