Bivariate Graphs
Beatrice Villegas, Eduardo Castro
9/26/2017
Intro
In these graphs we will be exploring relationships between smoking, college, gender and relations between depence on cigars.
## mydata$S1Q7D
## Frequency Percent Valid Percent
## 1 1000 2.3206 16.036
## 2 155 0.3597 2.486
## 3 691 1.6035 11.081
## 4 501 1.1626 8.034
## 5 805 1.8681 12.909
## 6 581 1.3482 9.317
## 7 416 0.9654 6.671
## 8 103 0.2390 1.652
## 9 347 0.8052 5.564
## 10 240 0.5569 3.849
## 11 245 0.5685 3.929
## 12 1152 2.6733 18.473
## NA's 36857 85.5290
## Total 43093 100.0000 100.000
table(na.omit(mydata$SEX, mydata$S1Q7D))
##
## 1 2
## 18518 24575
cc <-table(mydata$SEX, mydata$S1Q7D)
barplot(cc, main = "quick side by side barchart using base graphics", beside=TRUE, col=rainbow(2), legend=rownames(cc), xlab="Grade Level during 2000-2001 school year", ylab="Number of students")
Description: This graph is showing the relationship between Grade level and number of students enrolled related to their Gender. 1 being No formal school and 12 being the highest, with a professional degree or anything above a masters degree. Red corresponds to Males and Blue corresponds to Females. The trend in this graph shows that females are more likley to be enrolled in school when compared to males which is supported by the percentages in our round(prop.table(table(mydataSEX,mydataS1Q7D), margin=2),3) code.
ggplot(mydata, aes(x=S3AQ2A1, y=AGE)) + geom_point() + geom_smooth() + ylab ("Age") + xlab("Age when first full cig was smoked") + ggtitle("Age vs Age first smoked full cigarette")
## `geom_smooth()` using method = 'gam'
## Warning: Removed 25337 rows containing non-finite values (stat_smooth).
## Warning: Removed 25337 rows containing missing values (geom_point).
Description: This graph shows the frequency of age when first full cigartte was smoked. The tendency tends to lie towards the younger side, meaning the age when someone smokes their first full cigarette is at a younger age. Under the round(prop.table(table(mydataAGE,mydataS3AQ2A1), margin=2),3) code you can see where a lot of the numbers are 0's and that is because no one at that age in this data set smoked their first full cigarette.
## na.omit(mydata$S3BD5Q2C)
## Frequency Percent
## 1 227 14.268
## 2 121 7.605
## 3 117 7.354
## 4 206 12.948
## 5 175 10.999
## 6 165 10.371
## 7 105 6.600
## 8 192 12.068
## 9 149 9.365
## 10 134 8.422
## Total 1591 100.000
ggplot(mydata, aes(x=AGE, y=S3BD5Q2C)) + geom_point() + ylab("How often used cannabis in the past 12 months") + xlab("AGE") + ggtitle("Usage of cannabis in relation to age")
The purpose of this graph was to show the relationship of cannabis dependence by age. Age is on the x axis while dependence is on the y axis. Dependence starts at 1 being very dependent and 10 is only smoking once a year. In our table(mydataAGE,mydataS3BD5Q2C) one can see that dependence on cannabis is likely during the ages of 18-27 and once going up age the dependence of cannabis is less frequent.
barplot(cc, main = "quick side by side barchart using base graphics", beside=TRUE, col=rainbow(2), legend=rownames(cc), xlab="Use of cannabis in the past 12 months only, used cannabis before the past 12 months only", ylab="Highest grade completed in school")