Topics for Today

Introduction to Non-parametric Significance tests

1- Way Chi-square test

2 – Way Chi-square test

Non-Parametric is a big word

But it makes life easier! (sometimes)

Recall, we had assumptions that were necessary to use the t-tests for comparing means and z-tests for comparing proportions.

We might need a non-parametric test if these assumptions (or conditions) are not met

There are also some scientific questions about nominal or ordinal data that can’t be answered using the t-tests or z-tests.

First off, a recap of the assumptions/conditions necessary for the t-tests and z-tests we’ve looked at already

Assumptions: t-tests for means

Hypotheses involving means of interval or ratio level data.

Assumptions required to use a t-tests for one or more samples:

-Test variable(s) normally distributed, or the sample(s) large enough (ie: > 50) so that the sampling distribution of the mean is normally distributed

-Interval or Ratio level data

Assumptions: z-tests for proportions

Hypotheses involving 1 or 2 proportions.

Assumptions required to use a z-test:

-and(1-sample)

-and (2-samples)

1-Way Chi-Square Test

The objective of this test is to determine how similar is an observed set offrequencies (or relative frequencies), fo, to an expected set of frequencies, fe.

A typical research hypothesis would indicate that individuals are more (or less) likely than expected to select some categories more than others.

… and the most common research hypothesis is that the relative-frequency of responses is similar for all categories.

Which has the following statistical hypotheses:

H0: fo = fe

Ha: fo≠fe

… but what are fo and fe?

We’ve seen fo before, it’s just the observed frequency (or relative frquency) for each category of a nominal or ordinal variable!

The new item is fe… think of this as some expected relative frequency, or %. What does that mean?

-What is the expected relative frequency of men and women going into a mens room?

-What is the expected relative frequency of heads and tails out of 100 flips of a fair coin?

-In Vancouver, what is the expected relative frequency of raining and sunny days?

The Chi-Square Test Statistic

Here’s the formula that we’ll need to calculate the Chi-square test statistic:

…so it’s a bit more complicated than the t-statistic for means and the z-statistic for proportions.

Once we calculate this, we then look up the value in Table E. As with the t-distribution, though, we need a ‘degrees of freedom’ for this test statistic.

Differently from the t-test, the degrees of freedom for the Chi-square is the number of categories (k) minus 1.

Example (1-way Chi-Square test): To determine whether dogs are color blind, a student sets up an experiment where she provides food to a dog in 4 differently coloured dishes and records the colour of the dish the dog chooses to eat from first. She does this for a total of 80 dogs, randomly ordering the dishes each time. If dogs are truly colour blind, each colour dish should be selected about the same number of times.

The Chi-Square test allows us to formally test this research hypothesis.

Research Hypothesis:

Individuals:

Population:

Variable:

Parameter:

Statistical Hypotheses:

The observed frequency of each colour from the 80 dogs is below, as is the expected frequency if the dogs were colour blind:

Colour / fo / fe
Brown / 25 / 20
Orange / 18 / 20
Yellow / 19 / 20
Green / 18 / 20

And we have,

N = 80

k = 4 (# of categories)

Now, let’s calculate our test statistic:

p-value:

Reject H0 at α = 0.05?

Conclusion:

2-Way Chi-Square Test

We used the 1-way test,to determine whether the observed relative frequency distribution was different than some ‘expected’ distribution.

note the similarity to a 1-sample test for a mean or proportion where we are testing whether the mean or proportion is different than some ‘null’ value

We can use a 2-way test to determine whether relative frequency distributions from two samples are the same or different from one another.

note the similarity to a 2-sample test for means or proportions where we are testing whether the means or proportions are different from one another.

Research questions that require a 2-way chi-square test, are based on relative frequencies (like the 1-way test), but compare two populations (or samples).

-Is the relative frequency of sunny days in a year different between Vancouver and Seattle?

-Is the relative frequency of female students different between UBC and SFU?

-Is the relative frequency of job type (white vs blue vs service) the same for women and men?

Example (Q24, pg 338): A radio executive considering a switch in his station’s format collects data on the radio preferences of various age groups of 78 listeners. Does radio format preference differ by age group?

Research Hypothesis:

Individuals:

Populations:

Variables:

Parameters:

Statistical Hypotheses:

The observed frequency (fo)of age group and radio format preference is below:

Age Group
Format / Young Adult / Middle Age / Older Adult / Total
Music / 14 / 10 / 3 / 27
News-talk / 4 / 15 / 11 / 30
Sports / 7 / 9 / 5 / 21
Total / 25 / 34 / 19 / 78

And we have,

N = 78

k = 9(# of categories)

but … we need fe! Here’s the formula for each cell, with row and column totals associated with that cell:

So, the table with fe is:

Age Group
Format / Young Adult / Middle Age / Older Adult / Total
Music / 25*27/78 / 34*27/78 / 19*27/78 / 27
News-talk / 25*30/78 / 34*30/78 / 19*30/78 / 30
Sports / 25*21/78 / 34*21/78 / 19*21/78 / 21
Total / 25 / 34 / 19 / 78

Which … after you do the arithmetic you get:

Age Group
Format / Young Adult / Middle Age / Older Adult / Total
Music / 8.7 / 11.8 / 6.6 / 27
News-talk / 9.6 / 13.1 / 7.3 / 30
Sports / 6.7 / 9.2 / 5.1 / 21
Total / 25 / 34 / 19 / 78

… now that we have all the components, we can calculate our test statistic (try the arithmetic on your own):

=10.9

p-value:

Reject H0 at α = 0.05?

Conclusion:

One snag …

There is still an assumption necessary for us to be able to use any of the Chi-Square tests.

All the cells in the table (ie: the frequency for all categories) must be at least 5.

Other names for Chi-square tests:

1-Way:

  • One-sample Chi-square
  • Chi-square goodness of fit

2-Way

  • 2-sample Chi-square
  • 2x2 Chi-square
  • r by c Chi-square
  • Chi-square test for independence

Nice page describing how to do Chi-Square tests in SPSS.

So, a decision tree for choosing hypothesis tests:

Single sample?

-Interval or Ratio data?

  • One-sample t-test

-Nominal or Ordinal data?

  • Proportion (ie: 2 categories)?
  • 1-sample z-test for proportions
  • Distribution (ie: several categories)?
  • 1-way Chi-square

Two samples?

-Interval or Ratio data?

  • Individuals measured twice/Matched?
  • Paired t-test
  • Variances equal?
  • 2-sample t-test w/equal variances
  • Variances not equal?
  • 2-sample t-test w/unequal variances

-Nominal or Ordinal data?

  • Proportion (ie: 2 categories) & conditions met?
  • 2-sample z-test for proportions
  • Distribution (ie: several categories)?
  • 2-way Chi-square

Today’s Topics

Chi – Square tests

-for comparing distributions of nominal or ordinal data

-1-way compare distribution in a single sample to some expected distribution

-2-way compare distributions for two populations

New Reading

Chapter 10 up to pg 352

Stat203Page 1 of 23

Fall 2011 – Week 9 Lecture 3