Chi-Square Analyses
______
- Discuss the conditions under which chi square analyses are appropriate/required.
- Discuss the kind of statistical inferences that chi square analyses allow.
- Learn how to calculate an observed value for chi square.
- Understand the limitations of chi-square analyses.
What band should I bring to Amherst?
I am a very wealthy man. I want to reward you guys for doing so well this semester, so I decide to host a free concert. I take a survey to find out which of the following bands you most want to see. If there is a clear preference, I will choose that band. Otherwise, get ready for Spice World!!!
a) Spice Girls
b)
c)
ObsExp
How do I choose?
Can I calculate a confidence interval?
Can I do a t-test?
Can I do ANOVA?
Can I do regression?
Multinomial Probability Distribution
Binomial Probability Distribution
- Flip a coin n times, how many heads?
- Shoot 10 free throws, how many do I make?
- A drug is 60 % effective, how many people will be cured if 250 take it?
Properties of Binomial Distribution
- Experiment consists of n identical trials.
- Each trial has only 2 possible outcomes.
- The probability of a success is constant.
- The trials are independent.
- The variable we are interested in is x, the number of successes in n trials.
Properties of a Multinomial Distribution
- Experiment consists of n identical trials.
- Each trial has k possible outcomes.
- The probability of each individual k occurring is constant and the sum of all pk = 1.
- The trials are independent.
- The variables we are interested in are nk, the number of occurrences for each k. We refer to the different values of k as cells.
Kinds of questions we can ask with Multinomial Data
One-dimensional data
- Areresponses distributed equally/randomlyequal across cells?
- Does the distribution of responses conform to some experimental hypothesis?
Multi-dimensional data
- Are the two variables independent?
Independence - The occurrence of one eventgives you no information about the probability of a second event occurring.
Steps to solving the problem:
One dimensional data
- Do the data conform to the multinomialdistribution?
- Is expected value for each cell 5?
- Create null and alternative hypotheses
Ho:p1 = p2 = p3 = pk = (1/k)
Ha:At least one proportion exceeds (1/k)
- Calculate expected value for each cell.
E(nk) =N (pk)
- Calculate our test Statistic. Not, F, r, t…2
- Compare the observed value of 2 with a critical value.
Solve problem using their data
(unless they are lame)
ObsExp
Ho:p=p=p=.333
Ha:At least one exceeds .333
Critical Value:5.99147
Ben & Jerry’s Internship
______
Based on my glowing letter of recommendation, Ben & Jerry hire you to do some statistical work over the summer. They are building a new factory and need to decide which flavors are the most popular. Based on their last survey, also conducted by a student from my course, they determined that the market share for their top 4 selling flavors were as follows:
Peanut Butter Cookie Dough:30%
Chunky Monkey:25%
New York Super Fudge Chunk:25%
Cherry Garcia:20%
Ben & Jerry want you to determine if preference patterns have shifted so they know how to set up their factory. The results of your survey of 100 people nationwide are as follows:
Peanut Butter Cookie Dough:34
Chunky Monkey:22
New York Super Fudge Chunk:26
Cherry Garcia:18
Do these data suggest that preferences have changed?
Conducting the 2
______
n / E(n)PBCD / 34 / 30
CM / 22 / 25
NYSFC / 26 / 25
CG / 18 / 20
Critical value for 2=7.8
Therefore, we fail to reject the null (2 (3) = 1.13, p > .05). Ben & Jerry’s initial hypotheses about the popularity of their flavors appear to be valid.
Parking
______
Last year, there was a big debate about where and whether a new student parking lot should be built. The first proposed lot was rejected because of its environmental impact. The question is: was the opposition broadly based, or was this a question of a few squeaky wheels getting the grease?
Pro / Neutral / AgainstObserved / 95 / 122 / 83
Expected / .33 / .33 / .33
Contingency Tables
______
2 variables – Want to determine whether they are independent.
- Gender vs. Band Preference
- Income vs. Party Affiliation
______
Create a table.
- One Column Variable
- One Row Variable
Variables are called dimensions
______
Procedure for calculating 2 doesnot change.
- Calculate the expected value for each cell
- Compare it with the observed value for each cell.
- Same formula for 2
However, nature of inquiry does.
- Are the two dimensions dependent or independent?
How to conduct a 2for Contingency Tables
(i.e., n-dimensional data)
______
Ho:The two dimensions are independent
Ha:The two dimensions are dependent.
Test Statistic:
where
Rejection Region: Obs 2> Crit 2
- df = (r-1)(c-1)
Ben & Jerry’sby Area of the country
______
PBCD / CM / NYSFC / CG / Row TotNorth / 28 / 30 / 50 / 42 / 150
South / 52 / 78 / 50 / 70 / 250
Col
Tot / 80 / 108 / 100 / 112 / 400
Calculating the Expected value:
Ben & Jerry’sby Area of the country: Solution
______
PBCD / CM / NYSFC / CG / Col TotNorth / 28
30.0 / 30
40.5 / 50
37.5 / 42
42.0 / 150
South / 52
50.0 / 78
67.5 / 50
62.5 / 70
70.0 / 250
Row
Tot / 80 / 110 / 100 / 112 / 400
2=[(28 – 30.0)2 / 30.0] + [(52 – 50.0)2 / 50.0] +
[(32 – 40.5)2 / 40.5] + [(78 – 67.5)2 / 67.5] +
[(50 – 37.5)2 / 37.5] + [(50 – 62.5)2 / 62.5] +
[(40 – 42.0)2 / 42.0] + [(70 – 70.0)2 / 70.0]
2=.13 + 2.72 + 4.17 + 0.0 +
.08 + 1.63 + 2.50 + 0.0=11.24
Critical value for 2= 7.81
df= (r – 1)(c – 1) = (2 – 1)(4 – 1) = 3
What do we do once we reject the null?
______
Eyeball Technique
PBCD / CM / NYSFC / CG / Col TotNorth / .19
(28) / .20
(30) / .33
(50) / .28
(42) / 150
South / .21
(52) / .31
(78) / .20
(50) / .28
(70) / 250
Ice cream preference and area of the country are not independent: (2 (3) = 11.24, p < .05). Whereas Southerners show a preference for Chunky Monkey, Yankees tend to like New York Super Fudge Chunk.
I’m…uhm…With Matt Damon
______
It would seem the appeal of Matt Damon knows no bounds. Will Hunting…Jason Bourne…Linus Caldwell…Colin Sullivan…LaBoeuf…Loki… Charlie Dillon… ‘uncredited baseball fan at Fenway Park in Field of Dreams’. All iconic roles. But does Matt Damon really appeal to everyone, or is he more popular with the ladies/gents. Let’s find out. Here are data from your original questionnaire looking at whether people can name five Matt Damon films. Use these data to conduct a chi-square analyses to determine whether gender and Matt Damon knowledge are independent.
Can Name 5 Damon / Can’t name 5 DamonFemales / 49 / 43 / 92
Males / 53 / 24 / 77
102 / 67 / 169
Cereal and Health
______
I love me some Frosted Mini-Wheats like it’s nobody’s business. Tammy ridicules my love affair with FMW, so I tell her that I eat them because they are healthy. She asks me to prove it so I collect some data on people’s favorite cereal and how often they get colds. The data appear below. Use these data to conduct a chi square analysis to determine if favorite cereal and colds are independent.
CP / FW / PB1 or less / 19 / 53 / 22 / 94
2 or more / 15 / 39 / 21 / 75
34 / 92 / 43 / 169
Note: CP = Cocoa Puffs
FW = Frosted Mini Wheats
PB = Peanut Butter Capn Crunch