My Data Choice

MyFinal

DestinyDelPapa

February20,2018

My Data Choice

I chose the depress data set and chose to investigate the relationship between the level of education one possessed and whether or not they were labeled as depressed based on having a CESD>=16. The CESD being the combined score each individual marked themselves as having for C1-C20.

library(ggplot2)
library(dplyr)

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
##
## filter, lag

## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union

depress <-read.delim("C:/Users/Destiny/Documents/Math 130/depress_081217.txt", header=TRUE,sep="\t")

How many participants were there?

str(depress)

## 'data.frame': 294 obs. of 37 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ sex : int 1 0 1 1 1 0 1 0 1 0 ...
## $ age : int 68 58 45 50 33 24 58 22 47 30 ...
## $ marital : Factor w/ 5 levels "Divorced","Married",..: 5 1 2 1 4 2 2 3 2 2 ...
## $ educat : Factor w/ 7 levels "<HS","BS","HS Grad",..: 7 6 3 3 3 3 7 3 3 7 ...
## $ employ : Factor w/ 7 levels "FT","Houseperson",..: 6 1 1 7 1 1 2 1 6 1 ...
## $ income : int 4 15 28 9 35 11 11 9 23 35 ...
## $ relig : int 1 1 1 1 1 1 1 1 2 4 ...
## $ c1 : int 0 0 0 0 0 0 2 0 0 0 ...
## $ c2 : int 0 0 0 0 0 0 1 1 1 0 ...
## $ c3 : int 0 1 0 0 0 0 1 2 1 0 ...
## $ c4 : int 0 0 0 0 0 0 2 0 0 0 ...
## $ c5 : int 0 0 1 1 0 0 1 2 0 0 ...
## $ c6 : int 0 0 0 1 0 0 0 1 3 0 ...
## $ c7 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ c8 : int 0 0 0 3 3 0 2 0 0 0 ...
## $ c9 : int 0 0 0 0 3 1 2 0 0 0 ...
## $ c10 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ c11 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ c12 : int 0 1 0 0 0 1 0 0 3 0 ...
## $ c13 : int 0 0 0 0 0 2 0 0 0 0 ...
## $ c14 : int 0 0 1 0 0 0 0 0 3 0 ...
## $ c15 : int 0 1 1 0 0 0 3 0 2 0 ...
## $ c16 : int 0 0 1 0 0 2 0 1 3 0 ...
## $ c17 : int 0 1 0 0 0 1 0 1 0 0 ...
## $ c18 : int 0 0 0 0 0 0 0 1 0 0 ...
## $ c19 : int 0 0 0 0 0 0 0 1 0 0 ...
## $ c20 : int 0 0 0 0 0 0 1 0 0 0 ...
## $ cesd : int 0 4 4 5 6 7 15 10 16 0 ...
## $ cases : int 0 0 0 0 0 0 0 0 1 0 ...
## $ drink : int 0 1 1 0 1 1 0 0 1 1 ...
## $ health : int 2 1 2 1 1 1 3 1 4 1 ...
## $ regdoc : int 1 1 1 1 1 1 1 0 1 1 ...
## $ treat : int 1 1 1 0 1 1 1 0 1 0 ...
## $ beddays : int 0 0 0 0 1 0 0 0 1 0 ...
## $ acuteill: int 0 0 0 0 1 1 1 1 0 0 ...
## $ chronill: int 1 1 0 1 0 1 1 0 1 0 ...

There were 294 participants.

What were the levels of education present?

depress$educat <-factor(depress$educat, levels=c("<HS", "Some HS", "HS Grad", "Some college", "BS", "MS", "PhD"))
ggplot(depress, aes(x=educat)) +geom_bar()

table(depress$educat)

##
## <HS Some HS HS Grad Some college BS
## 5 61 114 48 43
## MS PhD
## 14 9

educat.prop <-data.frame(prop.table(table(depress$educat)))
educat.prop

## Var1 Freq
## 1 <HS 0.01700680
## 2 Some HS 0.20748299
## 3 HS Grad 0.38775510
## 4 Some college 0.16326531
## 5 BS 0.14625850
## 6 MS 0.04761905
## 7 PhD 0.03061224

The majority of participants were high school graduates, with the high school graduates making up almost 39% of the group, but there were participants at all levels of education.

What were the levels of CESD?

ggplot(depress, aes(x=cesd)) +geom_density(col="purple") +
geom_histogram(aes(y=..density..), colour="black", fill=NA)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

It seems the majority of the participants had a CESD score between 0 and 10.

What is the average CESD score?

mean(depress$cesd)

## [1] 8.884354

The average CESD score is 8.88.

What is the relationship between the participants levels of education and their CESD?

ggplot(depress, aes(x=educat, y=cesd, fill=educat)) + geom_violin(alpha=.1) +
geom_boxplot(alpha=.5, width=.2)

Summary

From these charts I can hypothesize that those that scored the highest on the CESD overall had primarily some high school education and those that scored the lowest had an education that ended before high school.