Trinity College, DublinIntroduction to Statistics
Generic Skills ProgrammeComputer Laboratory 8
TrinityCollege, Dublin
Generic Skills Programme
Statistics for Research Students
Laboratory 8:Feedback
1One-sample tests and confidence interval for proportions
1.1Assess target achievement
Heard of product:
Was the target achieved? Summarise the results in terms of estimated percentage achieved, confidence interval and significance test.
No. The estimated percentage achieved was 79%, 95% confidence interval was 76% to 82%. This does not cover 90%. Equivalently, the Z statistic value was -9.15, very highly significant.
Bought product
Was the target achieved? Summarise the results in terms of estimated percentage achieved, confidence interval and significance test.
No. The estimated percentage achieved was 50%, 95% confidence interval was 46% to 54%. This does not cover 60%. Equivalently, the Z statistic value was -5.12, very highly significant.
1.2Assess percentages that heard of product by Region
Make a simple summary of the regional breakdown.
Region / SampleSize / Heard of
Product / Heard of
Product, %
A / 200 / 164 / 82
B / 150 / 105 / 70
C / 300 / 246 / 82
Total / 650 / 515 / 79
Summarise the test results.
Region / SampleSize / Heard of
Product, % / Z / Confidence Interval
A / 200 / 82 / –3.77 / 77 / to / 87
B / 150 / 70 / –8.16 / 63 / to / 77
C / 300 / 82 / –4.62 / 78 / to / 86
Total / 650 / 79 / –9.15 / 76 / to / 82
Compare the confidence interval widths, including that for the complete sample. Explain the differences in width.
B is widest (14), A is next (10), C is next (8), Total is narrowest (6).
These are in order of sample sizes which influences denominators of the standard errors.
Compare the sample proportions for Regions A and C, compare their z-values, explain.
Proportions are the same, so deviations from 90 are the same, that is, numerators of Z statistics are the same. Z value for Region A is smaller because denominator of Z is smaller because sample size, denominator of standard error, is larger.
1.3Graphical display
2Chi-Square test of homogeneity of proportions
2.1Testing the homogeneity of regional differences
Tabulated statistics: Region, Bought
Rows: Region Columns: Bought
N Y All
A 91 109 200
99.7 100.3 200.0
B 101 49 150
74.8 75.2 150.0
C 132 168 300
149.5 150.5 300.0
All 324 326 650
324.0 326.0 650.0
Cell Contents: Count
Expected count
Pearson Chi-Square = 23.961, DF = 2, P-Value = 0.000
Report on the statistical significance of the results; focus on Pearson Chi-Square.
2 = 23.96 > 22,0.05 = 5.99. p-value < 0.0005.
The result is highly statistically significant.
Note:The Pearson Chi-Square is the commonly used approach based on the test statistic of the generic form . The Likelihood Ratio Chi-Square reported along with the Pearson Ch-Square is an alterntative which is approximately equivalent to the Pearson Chi-Square analysis, the approximation improving with increasing sample size. As the use of the Chi-Square frequency distribution for calculating critical values, p-values etc. is valid only for large samples, the use of the Likelihood Ratio method is redundant and is ignored here.
Check that the Expected Buy frequencies are those shown in the Y column.
x 200 = 100.308
x 150 = 75.231
x 300 = 150.462
Check that the Expected frequencies in each row add to the corresponding row sample size.
99.7 + 100.3 = 200
74.8 + 75.2 = 150
149.5 + 150.5 = 300
Check that the Expected frequencies in each column add to the corresponding column total.
99.7 + 74.8 + 149.5 = 324
100.3 + 75.2 + 150.5 = 326
Hence, explain the number of degrees of freedom associated with Chi-Square.
The calculation of Chi-Square involves the deviations of observed frequencies from expected frequencies. (The expected frequencies are, effectively, fitted values corresponding to the null hypothesis model. Thus, Observed – Expected correspond to Residuals). The degrees of freedom apply to these deviations.
In each row, Observed – Expected sum to 0, therefore, the deviations corresponding to the second column are determined by the deviations corresponding to first column. Since the latter sum to 0 these, the first two deviations determine the third. Hence, all six deviations are determined by these two, (or, in fact, by any two), so there are 2 degrees of freedom.
More generally, this argument demonstrates that the deviations corresponding to any one row are determined by the deviation corresponding to the remaining rows and the same for columns. Hence, with r rows and c columns, there arte r – 1 "free" rows and c – 1 "free" columns, giving (r – 1) x (c – 1) "free" deviations.
3Two-sample tests of proportions
3.1A two-sample test of regional differences
Test and CI for Two Proportions: Hear?, Region2
Event = Y
Region2 X N Sample p
AC 410 500 0.820000
B 105 150 0.700000
Difference = p (AC) - p (B)
Estimate for difference: 0.12
95% CI for difference: (0.0393028, 0.200697)
Test for difference = 0 (vs not = 0): Z = 3.18 P-Value = 0.001
Test and CI for Two Proportions: Bought, Region2
Event = Y
Region2 X N Sample p
AC 277 500 0.554000
B 49 150 0.326667
Difference = p (AC) - p (B)
Estimate for difference: 0.227333
95% CI for difference: (0.140550, 0.314117)
Test for difference = 0 (vs not = 0): Z = 4.88 P-Value = 0.000
Make a report of the test results.
The difference in percentages that heard of the product (bought the product) between Regions A and C and Region B is estimated to be 12% (23%); 95% confidence interval is 4% to 20%, (14% to 31%). In each case, the interval does not cover 0 and so the difference is statistically significant. Equivalently, the value of the Z statistic for testing the hypothesis of no difference between the percentages that heard of the product (bought the product) in Regions A and C and Region B is 3.18 (4.88). In each case, the calculated Z value exceeds the critical value of 2 (or 1.96), so that the result is statistically significant. Equivalently, in each case the corresponding p-value is 0.001 (<0.0005), smaller than 0.05, so that the result is statistically significant.
3.2A Chi-Square two-sample test
Tabulated statistics: Region2, Hear?
Rows: Region2 Columns: Hear?
N Y All
AC 90 410 500
B 45 105 150
All 135 515 650
Cell Contents: Count
Pearson Chi-Square = 10.097, DF = 1, P-Value = 0.001
Demonstrate the equivalence of the 2-sample Z-test and the Pearson Chi-Square test (calculate the square root of the latter).
= 3.1778 = Z. P-value = 0.001 as before.
Identify the sample proportions of the 2-sample test with relevant entries in the 2x2 table.
pAC= 0.82 = 410 / 500
pB= 0.7 = 105 / 150
Explain the Chi-Square DF.
Once any single deviation of observed – expected is determined, the other three can be determined by subtraction from 0. More explicitly, calculating the expected frequencies as
x 500 = 103.8 / x 500 = 396.2x 150 = 31.2 / x 150 = 118.8
leads to calculation of deviations of observed from expected as
90 – 103.8 = – 13.8 / 410 – 396.2 = 13.845 – 31.2 = 13.8 / 105 – 118.8 = – 13.8
showing just one value, 13.8, for the deviation.
The general formula(r – 1) x (c – 1) with r = 1 and c = 1 evaluates to 1.
4Assessing homogeneity of patterns of proportions
Tabulated statistics: Region, Level
Rows: Region Columns: Level
B H N All
A 54.50 27.50 18.00 100.00
B 32.67 37.33 30.00 100.00
C 56.00 26.00 18.00 100.00
All 50.15 29.08 20.77 100.00
Cell Contents: % of Row
Summarise the variation between regional penetration patterns.
Regions A and C have almost identical penetration patterns, with over half (55%) of the respondents having bought the product, over a quarter (27%) having heard of the product but did not buy while 18% had never heard of the product.
By contrast, just one third of respondents in Region B had bought the product, slightly more than a third (37%) had heard of the product but did not buy and slightly less than a third (30%) had never heard of the product.
4.1Graphical display
Discuss the variation patterns.
Regions A and C have almost identical profiles with Region B having much lower Buying and much higher Never Heard and Heard but did not buy.
4.2Chi-Square test
Tabulated statistics: Region, Level
Rows: Region Columns: Level
B H N All
A 109 55 36 200
B 49 56 45 150
C 168 78 54 300
All 326 189 135 650
Cell Contents: Count
Pearson Chi-Square = 24.608, DF = 4, P-Value = 0.000
Confirm the degrees of freedom for Chi-Square; explain. Calculate the 5% critical value. Report on the result of the Pearson Chi-Square test.
There are 3 rows and 3 columns. The deviations corresponding to one of each are determined by the remaining deviations, making up two rows and two columns, that is, four entries, accounting for 4 degrees of freedom. Equivalently, (r – 1) x (c – 1) = 2 x 2 = 4.
From the Calc menu, = 9.5.
The value of Chi-Square is 24.6, with 4 degrees of freedom. This exceeds the 5% critical value and so the result is statistically significant. Equivalently, the p-value is less than 0.0005 < 0.05. This is illustrated below.
page 1