Topics for Today
Introduction to Non-parametric Significance tests
1- Way Chi-square test
2 – Way Chi-square test
Non-Parametric is a big word
But it makes life easier! (sometimes)
Recall, we had assumptions that were necessary to use the t-tests for comparing means and z-tests for comparing proportions.
We might need a non-parametric test if these assumptions (or conditions) are not met
There are also some scientific questions about nominal or ordinal data that can’t be answered using the t-tests or z-tests.
First off, a recap of the assumptions/conditions necessary for the t-tests and z-tests we’ve looked at already
Assumptions: t-tests for means
Hypotheses involving means of interval or ratio level data.
Assumptions required to use a t-tests for one or more samples:
-Test variable(s) normally distributed, or the sample(s) large enough (ie: > 50) so that the sampling distribution of the mean is normally distributed
-Interval or Ratio level data
Assumptions: z-tests for proportions
Hypotheses involving 1 or 2 proportions.
Assumptions required to use a z-test:
-and(1-sample)
-and (2-samples)
1-Way Chi-Square Test
The objective of this test is to determine how similar is an observed set offrequencies (or relative frequencies), fo, to an expected set of frequencies, fe.
A typical research hypothesis would indicate that individuals are more (or less) likely than expected to select some categories more than others.
… and the most common research hypothesis is that the relative-frequency of responses is similar for all categories.
Which has the following statistical hypotheses:
H0: fo = fe
Ha: fo≠fe
… but what are fo and fe?
We’ve seen fo before, it’s just the observed frequency (or relative frquency) for each category of a nominal or ordinal variable!
The new item is fe… think of this as some expected relative frequency, or %. What does that mean?
-What is the expected relative frequency of men and women going into a mens room?
-What is the expected relative frequency of heads and tails out of 100 flips of a fair coin?
-In Vancouver, what is the expected relative frequency of raining and sunny days?
The Chi-Square Test Statistic
Here’s the formula that we’ll need to calculate the Chi-square test statistic:
…so it’s a bit more complicated than the t-statistic for means and the z-statistic for proportions.
Once we calculate this, we then look up the value in Table E. As with the t-distribution, though, we need a ‘degrees of freedom’ for this test statistic.
Differently from the t-test, the degrees of freedom for the Chi-square is the number of categories (k) minus 1.
Example (1-way Chi-Square test): To determine whether dogs are color blind, a student sets up an experiment where she provides food to a dog in 4 differently coloured dishes and records the colour of the dish the dog chooses to eat from first. She does this for a total of 80 dogs, randomly ordering the dishes each time. If dogs are truly colour blind, each colour dish should be selected about the same number of times.
The Chi-Square test allows us to formally test this research hypothesis.
Research Hypothesis:
Individuals:
Population:
Variable:
Parameter:
Statistical Hypotheses:
The observed frequency of each colour from the 80 dogs is below, as is the expected frequency if the dogs were colour blind:
Colour / fo / feBrown / 25 / 20
Orange / 18 / 20
Yellow / 19 / 20
Green / 18 / 20
And we have,
N = 80
k = 4 (# of categories)
Now, let’s calculate our test statistic:
p-value:
Reject H0 at α = 0.05?
Conclusion:
2-Way Chi-Square Test
We used the 1-way test,to determine whether the observed relative frequency distribution was different than some ‘expected’ distribution.
note the similarity to a 1-sample test for a mean or proportion where we are testing whether the mean or proportion is different than some ‘null’ value
We can use a 2-way test to determine whether relative frequency distributions from two samples are the same or different from one another.
note the similarity to a 2-sample test for means or proportions where we are testing whether the means or proportions are different from one another.
Research questions that require a 2-way chi-square test, are based on relative frequencies (like the 1-way test), but compare two populations (or samples).
-Is the relative frequency of sunny days in a year different between Vancouver and Seattle?
-Is the relative frequency of female students different between UBC and SFU?
-Is the relative frequency of job type (white vs blue vs service) the same for women and men?
Example (Q24, pg 338): A radio executive considering a switch in his station’s format collects data on the radio preferences of various age groups of 78 listeners. Does radio format preference differ by age group?
Research Hypothesis:
Individuals:
Populations:
Variables:
Parameters:
Statistical Hypotheses:
The observed frequency (fo)of age group and radio format preference is below:
Age GroupFormat / Young Adult / Middle Age / Older Adult / Total
Music / 14 / 10 / 3 / 27
News-talk / 4 / 15 / 11 / 30
Sports / 7 / 9 / 5 / 21
Total / 25 / 34 / 19 / 78
And we have,
N = 78
k = 9(# of categories)
but … we need fe! Here’s the formula for each cell, with row and column totals associated with that cell:
So, the table with fe is:
Age GroupFormat / Young Adult / Middle Age / Older Adult / Total
Music / 25*27/78 / 34*27/78 / 19*27/78 / 27
News-talk / 25*30/78 / 34*30/78 / 19*30/78 / 30
Sports / 25*21/78 / 34*21/78 / 19*21/78 / 21
Total / 25 / 34 / 19 / 78
Which … after you do the arithmetic you get:
Age GroupFormat / Young Adult / Middle Age / Older Adult / Total
Music / 8.7 / 11.8 / 6.6 / 27
News-talk / 9.6 / 13.1 / 7.3 / 30
Sports / 6.7 / 9.2 / 5.1 / 21
Total / 25 / 34 / 19 / 78
… now that we have all the components, we can calculate our test statistic (try the arithmetic on your own):
=10.9
p-value:
Reject H0 at α = 0.05?
Conclusion:
One snag …
There is still an assumption necessary for us to be able to use any of the Chi-Square tests.
All the cells in the table (ie: the frequency for all categories) must be at least 5.
Other names for Chi-square tests:
1-Way:
- One-sample Chi-square
- Chi-square goodness of fit
2-Way
- 2-sample Chi-square
- 2x2 Chi-square
- r by c Chi-square
- Chi-square test for independence
Nice page describing how to do Chi-Square tests in SPSS.
So, a decision tree for choosing hypothesis tests:
Single sample?
-Interval or Ratio data?
- One-sample t-test
-Nominal or Ordinal data?
- Proportion (ie: 2 categories)?
- 1-sample z-test for proportions
- Distribution (ie: several categories)?
- 1-way Chi-square
Two samples?
-Interval or Ratio data?
- Individuals measured twice/Matched?
- Paired t-test
- Variances equal?
- 2-sample t-test w/equal variances
- Variances not equal?
- 2-sample t-test w/unequal variances
-Nominal or Ordinal data?
- Proportion (ie: 2 categories) & conditions met?
- 2-sample z-test for proportions
- Distribution (ie: several categories)?
- 2-way Chi-square
Today’s Topics
Chi – Square tests
-for comparing distributions of nominal or ordinal data
-1-way compare distribution in a single sample to some expected distribution
-2-way compare distributions for two populations
New Reading
Chapter 10 up to pg 352
Stat203Page 1 of 23
Fall 2011 – Week 9 Lecture 3