INDEPENDENT PROJECT
20 – some minor comments
Excellent!
1. Frequency distribution of a variable and bar graph of same variable. I chose the variable “Marital Status,” because it is a nominal variable.
Frequency table results for marital:
marital / Frequency / Relative Frequency / Percent / Cumulative FrequencyNever married / 611 / 0.62860084 / 62.86008 / 62.9
Divorced / 152 / 0.1563786 / 15.63786 / 78.5
Separated / 123 / 0.12654321 / 12.654321 / 91.1
Married / 59 / 0.06069959 / 6.0699587 / 97.2
Widowed / 25 / 0.025720164 / 2.5720165 / 99.8
Don't know / 1 / 0.0010288066 / 0.10288066 / 99.9
Refused / 1 / 0.0010288066 / 0.10288066 / 100
I was able to put the catagories into a descending order for ease of viewing. The graph, however, in statcrunch does not allow you to change the order of the nominal variable chosen. As you can see from the bar plot and the table, most of the participants in the study were never married (specifically 611). 152 were divorced, 123 separated and only 59 were married. The widowed population totaled 25 and that left 1 refusal and 1 person who didn’t know.
2. Descriptives of a continuous variable. I chose BMI because it is a ratio variable that we can determine mean, median, mode and standard deviation of. As you can see by the HISTOGRAM, it has a positive skew and is a unimodal, leptokurtic graph. The red line represents a normal distribution. The mean is 29.22, the median is 28.03, the mode is 24.05 and standard deviation is 7.40. For adults, an ideal BMI is between 18.5 and 24.9. A person with a BMI over 24.9 is considered overweight, and a person with a BMI under 18.5 is considered underweight. The mode of 24.05 is within the ideal BMI range, while the mean of 29.22 is considered overweight.
Summary statistics:
Column / Mean / Std. Dev. / Median / ModeBMI / 29.22 / 7.40 / 28.03 / 24.05
3.Crosstabulation or contingency table of two variables. I chose Race/Ethnicity,(nominal) as the IV, independent variable, which is in the columns. Poverty status, DV, dependent variable, which is nomimal, is in the rows.
Contingency table results: Rows: Poverty status Columns: Race/Ethnicity
Cell formatCount % of Poverty
% of Race
% of Total
Race/Ethnicity
Poverty Status / Black, not Hispanic / Hispanic / Not sure / Other / Refused / White, not Hispanic / Total
Above
% of Race
% of Poverty
% of Total / 169 (77.52%) (21.78%) (17.48%) / 32 (14.68%) (26.23%) (3.309%) / 0 (0%) (0%) (0%) / 3 (1.376%) (21.43%) (0.3102%) / 0 (0%) (0%) (0%) / 14 (6.422%) (26.42%) (1.448%) / 218 (100.00%) (22.54%) (22.54%)
Below
% of Race
% of Poverty
% of Total / 607 (81.04%) (78.22%) (62.77%) / 90 (12.02%) (73.77%) (9.307%) / 1 (0.1335%) (100%) (0.1034%) / 11 (1.469%) (78.57%) (1.138%) / 1 (0.1335%) (100%) (0.1034%) / 39 (5.207%) (73.58%) (4.033%) / 749 (100.00%) (77.46%) (77.46%)
Total
% of Race
% of Poverty
% of Total / 776 (80.25%) (100.00%) (80.25%) / 122 (12.62%) (100.00%) (12.62%) / 1 (0.1034%) (100.00%) (0.1034%) / 14 (1.448%) (100.00%) (1.448%) / 1 (0.1034%) (100.00%) (0.1034%) / 53 (5.481%) (100.00%) (5.481%) / 967 (100.00%) (100.00%) (100.00%)
Per the chart above, 77.46% of the women surveyed are below the poverty level. Of those in that category (spelling: category), 81.04% are Black, 12.02% are Hispanic, 5.21% are White and the remaining 2% refused, other, or not sure. Interestingly enough, the two women that refused or answered “not sure” were below the poverty level.
OK; take your analysis to the next step. Are there differences in poverty status for the various ethnic groups?
This student completed a Chi Square test with these results at the end of her project.
CHI SQUARE TESTING IS NOW REQUIRED IN THE INDEPENDENT PROJECT, QUESTION THREE (it was not required in this prior section)
Provide summary paragraphs of your contingency table, Chi Square testing, and results. For each and every question in this project: explain the results with technically correct statistics language, and then explain the results in plain English. What have we learned from this data?
4. ANOVA QUESTION
Background…the women’s physical and mental health status was measured using the Short Form 12 Health Survey, commonly referred to as the SF-12. The SF-12 is a 12-item scale providing a generic, multidimensional measure of health status, and has been used in numerous nursing and healthcare studies. Six of the 12 items measure physical health, and the remaining items measure mental health. The dataset includes the raw responses to all 12 items. The dataset includes a summary score for both physical and mental health. Scoring was based on standardization procedures in a national sample. These standard scores are based on a national average set to 50.0, with a standard deviation of 10.0. Thus, scores below 50 indicate a less favorable health status than that for a general adult population. The lower the score on the two SF-12 scales, the less favorable a person’s health or mental health status.
Comparison of the effect of three or more groups (single variable) on a single continuous variable.
Null Hypothesis: The means for the population will be equal.
Alternative Hypothesis: The means for the population will not be equal.
For this test, I will be using an ANOVA comparison. Using a one-way ANOVA test, assumptions of ANOVA: The dependent variable will be interval or ratio and will have a normal distribution. The variances among the three groups will be the same.
I chose Education Level because I needed an independent variable with 3 or more independent groups and I chose mental status because I needed a dependent variable that is rational. Degrees of Freedom is 3/872.P is 0.0025 which is not significant.F stat is 4.803 which is significant compared to table F which is somewhere close to 2.60. Therefore, I would reject the null hypothesis that there is no difference in the means of the populations. The means of the group with a BA degree is higher than the others. So this could be an indication that higher education increases women’s outlooks as measured in mental health scores. Also noticeable is the fact that the means of the scores increase as the education level increases.
Analysis of Variance results: Responses stored in Mental Health Score. Factors stored in Education Level. Factor means
Education Level / n / Mean / Std. ErrorAA degree / 25 / 52.45752 / 1.8928332
BA degree / 5 / 53.443 / 1.8753874
Diploma or GED / 431 / 47.523396 / 0.52485675
No high school diploma / 415 / 45.79906 / 0.5228374
Excellent!
ANOVA table
Source / df / SS / MS / F-Stat / P-valueTreatments / 3 / 1656.3951 / 552.1317 / 4.8030834 / 0.0025
Error / 872 / 100239.54 / 114.9536
Total / 875 / 101895.93
For each and every question in this project: explain the results with technically correct statistics language, and then explain the results in plain English. What have we learned from this data?
5.Scatterplot of two continuous variables. I chose a simple linear regression to compare age and mental health scores to see if there is a correlation. These two variables are continuous and rational.
Simple linear regression results: Dependent Variable: Mental Health Independent Variable: age Mental Health = 48.76867 - 0.05165268 age Sample size: 876 R (correlation coefficient) = -0.0303 R-sq = 9.1892167E-4 Estimate of error standard deviation: 10.792525 Parameter estimates:
Parameter / Estimate / Std. Err. / Alternative / DF / T-Stat / P-ValueIntercept / 48.76867 / 2.1366122 / ≠ 0 / 874 / 22.825232 / <0.0001
Slope / -0.05165268 / 0.057610054 / ≠ 0 / 874 / -0.8965914 / 0.3702
Analysis of variance table for regression model:
Source / DF / SS / MS / F-stat / P-valueModel / 1 / 93.63438 / 93.63438 / 0.8038762 / 0.3702
Error / 874 / 101802.3 / 116.4786
Total / 875 / 101895.93
AGE VS. MENTAL HEALTH SCORES SCATTERPLOT
For each and every question in this project: explain the results with technically correct statistics language, and then explain the results in plain English. What have we learned from this data?
6. The results of the above scatterplot and simple linear regression are as follows: F stat 0.801 which is not statistically significant compared to Table F which is 3.92. P-value is 0.3702. Compared to P of 0.05, the p-value is insignificant. Stat r = -0.0303. The coefficient of determination is 9.1892167E-4.Critical r for 500 = .088 and for 1000 = .062. Sample size is 876 which falls roughly in the middle of the two. Absolute value stat r = 0.03 which is insignificant. From these results we can conclude that there is no correlation between age and mental health scores.
Question 6: You are required to analyze the direction, magnitude, and statistical significance of Pearson’s r value, the correlation coefficient. See pages 199 and 412.
CURIOSITY KILLED THE CAT! SEE TABLE BELOW. Null Hyp: There will be no difference among the groups of women with regards to education level and poverty. ALT HYPOTHESIS: EDUCATION LEVEL HAS AN EFFECT (learn these two words) AFFECT ON POVERTY LEVEL.
Contingency table results: Rows: Poverty Columns: Education
Cell formatCount % Poverty Status % Education Level (Total percent) Expected count
EDUCATION LEVEL
POVERTY STATUS / BA degree / AA degree / Diploma or GED / No high school diploma / Total
Above poverty
% Ed Level
% Poverty status
% of Total
Expected Count / 3 (1.376%) (60%) (0.3102%) 1.127 / 7 (3.211%) (25.93%) (0.7239%) 6.087 / 131 (60.09%) (27.35%) (13.55%) 108 / 77 (35.32%) (16.89%) (7.963%) 102.8 / 218 (100.00%) (22.54%) (22.54%)
Below poverty
% Ed Level
% Poverty status
% of Total
Expected Count / 2 (0.267%) (40%) (0.2068%) 3.873 / 20 (2.67%) (74.07%) (2.068%) 20.91 / 348 (46.46%) (72.65%) (35.99%) 371 / 379 (50.6%) (83.11%) (39.19%) 353.2 / 749 (100.00%) (77.46%) (77.46%)
Total
% Ed Level
% Poverty status
% of Total / 5 (0.5171%) (100.00%) (0.5171%) / 27 (2.792%) (100.00%) (2.792%) / 479 (49.53%) (100.00%) (49.53%) / 456 (47.16%) (100.00%) (47.16%) / 967 (100.00%) (100.00%) (100.00%)
Chi-Square test:
Statistic / DF / Value / P-valueChi-square / 3 / 18.886633 / 0.0003
Critical Chi sq is 7.82 for 3 df. Stat Chi-sq is 18.89 which is statistically significant. WOW… P-value of 0.0003 is not significant compared to 0.05.Half a percent of the total women have a BA, while 47% have no high school diploma or GED. The single largest group of women, 379, are below the poverty level and have no high school diploma or GED. How sad for women in this study and women in general. According to the analysis above, the null hypothesis should be rejected. Someone should share this study with the women who participated! The results may change for the better of some!