Chi-Square Analyses

______

  1. Discuss the conditions under which chi square analyses are appropriate/required.
  1. Discuss the kind of statistical inferences that chi square analyses allow.
  1. Learn how to calculate an observed value for chi square.
  1. Understand the limitations of chi-square analyses.

What band should I bring to Amherst?

I am a very wealthy man. I want to reward you guys for doing so well this semester, so I decide to host a free concert. I take a survey to find out which of the following bands you most want to see. If there is a clear preference, I will choose that band. Otherwise, get ready for Spice World!!!

a) Spice Girls

b)

c)

Obs
Exp

How do I choose?

Can I calculate a confidence interval?

Can I do a t-test?

Can I do ANOVA?

Can I do regression?

Multinomial Probability Distribution

Binomial Probability Distribution

  • Flip a coin n times, how many heads?
  • Shoot 10 free throws, how many do I make?
  • A drug is 60 % effective, how many people will be cured if 250 take it?

Properties of Binomial Distribution

  1. Experiment consists of n identical trials.
  2. Each trial has only 2 possible outcomes.
  3. The probability of a success is constant.
  4. The trials are independent.
  5. The variable we are interested in is x, the number of successes in n trials.

Properties of a Multinomial Distribution

  1. Experiment consists of n identical trials.
  2. Each trial has k possible outcomes.
  3. The probability of each individual k occurring is constant and the sum of all pk = 1.
  4. The trials are independent.
  5. The variables we are interested in are nk, the number of occurrences for each k. We refer to the different values of k as cells.

Kinds of questions we can ask with Multinomial Data

One-dimensional data

  • Areresponses distributed equally/randomlyequal across cells?
  • Does the distribution of responses conform to some experimental hypothesis?

Multi-dimensional data

  • Are the two variables independent?

Independence - The occurrence of one eventgives you no information about the probability of a second event occurring.

Steps to solving the problem:

One dimensional data

  1. Do the data conform to the multinomialdistribution?
  1. Is expected value for each cell  5?
  1. Create null and alternative hypotheses

Ho:p1 = p2 = p3 = pk = (1/k)

Ha:At least one proportion exceeds (1/k)

  1. Calculate expected value for each cell.

E(nk) =N (pk)

  1. Calculate our test Statistic. Not, F, r, t…2
  1. Compare the observed value of 2 with a critical value.

Solve problem using their data

(unless they are lame)

Obs
Exp

Ho:p=p=p=.333

Ha:At least one exceeds .333

Critical Value:5.99147

Ben & Jerry’s Internship

______

Based on my glowing letter of recommendation, Ben & Jerry hire you to do some statistical work over the summer. They are building a new factory and need to decide which flavors are the most popular. Based on their last survey, also conducted by a student from my course, they determined that the market share for their top 4 selling flavors were as follows:

Peanut Butter Cookie Dough:30%

Chunky Monkey:25%

New York Super Fudge Chunk:25%

Cherry Garcia:20%

Ben & Jerry want you to determine if preference patterns have shifted so they know how to set up their factory. The results of your survey of 100 people nationwide are as follows:

Peanut Butter Cookie Dough:34

Chunky Monkey:22

New York Super Fudge Chunk:26

Cherry Garcia:18

Do these data suggest that preferences have changed?

Conducting the 2

______

n / E(n)
PBCD / 34 / 30
CM / 22 / 25
NYSFC / 26 / 25
CG / 18 / 20

Critical value for 2=7.8

Therefore, we fail to reject the null (2 (3) = 1.13, p > .05). Ben & Jerry’s initial hypotheses about the popularity of their flavors appear to be valid.

Parking

______

Last year, there was a big debate about where and whether a new student parking lot should be built. The first proposed lot was rejected because of its environmental impact. The question is: was the opposition broadly based, or was this a question of a few squeaky wheels getting the grease?

Pro / Neutral / Against
Observed / 95 / 122 / 83
Expected / .33 / .33 / .33

Contingency Tables

______

2 variables – Want to determine whether they are independent.

  • Gender vs. Band Preference
  • Income vs. Party Affiliation

______

Create a table.

  • One Column Variable
  • One Row Variable

Variables are called dimensions

______

Procedure for calculating 2 doesnot change.

  • Calculate the expected value for each cell
  • Compare it with the observed value for each cell.
  • Same formula for 2

However, nature of inquiry does.

  • Are the two dimensions dependent or independent?

How to conduct a 2for Contingency Tables

(i.e., n-dimensional data)

______

Ho:The two dimensions are independent

Ha:The two dimensions are dependent.

Test Statistic:

where

Rejection Region: Obs 2> Crit 2

  • df = (r-1)(c-1)

Ben & Jerry’sby Area of the country

______

PBCD / CM / NYSFC / CG / Row Tot
North / 28 / 30 / 50 / 42 / 150
South / 52 / 78 / 50 / 70 / 250
Col
Tot / 80 / 108 / 100 / 112 / 400

Calculating the Expected value:

Ben & Jerry’sby Area of the country: Solution

______

PBCD / CM / NYSFC / CG / Col Tot
North / 28
30.0 / 30
40.5 / 50
37.5 / 42
42.0 / 150
South / 52
50.0 / 78
67.5 / 50
62.5 / 70
70.0 / 250
Row
Tot / 80 / 110 / 100 / 112 / 400

2=[(28 – 30.0)2 / 30.0] + [(52 – 50.0)2 / 50.0] +

[(32 – 40.5)2 / 40.5] + [(78 – 67.5)2 / 67.5] +

[(50 – 37.5)2 / 37.5] + [(50 – 62.5)2 / 62.5] +

[(40 – 42.0)2 / 42.0] + [(70 – 70.0)2 / 70.0]

2=.13 + 2.72 + 4.17 + 0.0 +

.08 + 1.63 + 2.50 + 0.0=11.24

Critical value for 2= 7.81

df= (r – 1)(c – 1) = (2 – 1)(4 – 1) = 3

What do we do once we reject the null?

______

Eyeball Technique

PBCD / CM / NYSFC / CG / Col Tot
North / .19
(28) / .20
(30) / .33
(50) / .28
(42) / 150
South / .21
(52) / .31
(78) / .20
(50) / .28
(70) / 250

Ice cream preference and area of the country are not independent: (2 (3) = 11.24, p < .05). Whereas Southerners show a preference for Chunky Monkey, Yankees tend to like New York Super Fudge Chunk.

I’m…uhm…With Matt Damon

______

It would seem the appeal of Matt Damon knows no bounds. Will Hunting…Jason Bourne…Linus Caldwell…Colin Sullivan…LaBoeuf…Loki… Charlie Dillon… ‘uncredited baseball fan at Fenway Park in Field of Dreams’. All iconic roles. But does Matt Damon really appeal to everyone, or is he more popular with the ladies/gents. Let’s find out. Here are data from your original questionnaire looking at whether people can name five Matt Damon films. Use these data to conduct a chi-square analyses to determine whether gender and Matt Damon knowledge are independent.

Can Name 5 Damon / Can’t name 5 Damon
Females / 49 / 43 / 92
Males / 53 / 24 / 77
102 / 67 / 169

Cereal and Health

______

I love me some Frosted Mini-Wheats like it’s nobody’s business. Tammy ridicules my love affair with FMW, so I tell her that I eat them because they are healthy. She asks me to prove it so I collect some data on people’s favorite cereal and how often they get colds. The data appear below. Use these data to conduct a chi square analysis to determine if favorite cereal and colds are independent.

CP / FW / PB
1 or less / 19 / 53 / 22 / 94
2 or more / 15 / 39 / 21 / 75
34 / 92 / 43 / 169

Note: CP = Cocoa Puffs

FW = Frosted Mini Wheats

PB = Peanut Butter Capn Crunch