Worksheet 7: Contingency analysis (frequencies) –answer key 2017

Example of Contingency Analysis

This example uses the Car Poll.CSV sample data table, which contains data collected from car polls. The data includes aspects about the individual polled, such as their sex, marital status, and age. The data also includes aspects about the car that they own, such as the country of origin, the size, and the type. Here we want to examine the relationship between car sizes (small, medium, and large) and the cars’ country of origin.

1)Graph the relationship between Size of car (size) and country of car’s origin (country). Note that both are categorical variables and that you will be graphing the frequency of occurrence of the combinations of size and country

  1. Using Graph Builder put country in the x-axis and size as an overlay variable.
  2. Make sure you put N as the summary statistic
  3. What is the hypothesis that is being tested (the null hypothesis)
    Ho: There is no association between size of car and car country of origin
  4. What are the qualitative results?

Very few Japanese cars fall into the Large size category.
The majority of the European cars fall into the Small and Medium size categories.
The majority of the American cars fall into the Large and Medium size categories.

2)Test your hypothesis using the contingency platform (fit Y by X)

  1. To launch the Fit Y by X platform, select Analyze > Fit Y by X.
  2. Put size in the X category and country in the Y category. Note both are categorical.

  1. Look at the output
  2. The graph is a mosaic plot. A mosaic plot is a graphical representation of the two-way frequency table or Contingency Table. A mosaic plot is divided into rectangles, so that the vertical length of each rectangle isproportional to the proportions of the Y variable in each level of the X variable. The width of the x dimension of each rectangle is proportional to the proportions of each level of the x variable.

1)The proportions on the x-axis represent the number of observations for each level of the Xvariable, which is country.

2)The proportions on the y-axis at right represent the overall proportions of Small, Medium,and Large cars for the combined levels (American, European, and Japanese).

3)The scale of the y-axis at left shows the response probability, with the whole axis being aprobability of one (representing the total sample).

  1. Now look at the Contingency table

Note the following about Contingency tables:

The Count, Total%, Col%, and Row% correspond to the data within each cell that has row and column headings (such as the cell under American and Large).

The last column contains the total counts for each row and percentages for each row.

The bottom row contains total counts for each column and percentages for each column.

  1. Now look at the Tests

1)What is your conclusion concerning your null hypothesis
The null hypothesis that there is no association between size of car and car country of origin is rejected. There is evidence that there is an association between size of car and car country of origin

  1. You know this by looking at the Likelihood ratio and and Pearson Chi Square p-values (both <0.0001)

2)If the null hypothesis is rejected - What combination of country of origin and size of car contributes most to that conclusion?

Looking at the Contingency table with Deviation and cell chi sq values

1)American cars are bigger than expected given the null hypothesis

2)European cars have sizes consistent with the null hypothesis

3)Japanese cars are smaller than expected given the null hypothesis

Example of GENERALIZED (Poisson) Regression

1)

2)

3)

Source / Nparm / DF / Wald ChiSquare / ProbChiSquare
country / 2 / 2 / 59.981803 / <.0001*
size*country / 4 / 4 / 43.144129 / <.0001*
size / 2 / 2 / 35.680011 / <.0001*

4)

USE of KS test to compare distributions

1)

2)

3)

4ABC (RED IS MPA) – in all graphs the hypothesis is supported