Homework 3
- Suppose you have the following data:
age: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80
weight: 30, 55, 105, 110, 115, 140, 190, 170, 120, 128, 165, 132, 174, 201, 133, 164
a)Create a data frame called “ageweight” in which the first column is age and the second column is weight. (20pts)
b)Create a new function called “newage” that creates a new variable called age_cat variable that is defined as follows:
1 if age is 20 or less
2 if 20<age≤50
3 otherwise
(15pts)
c)Use the “newage” function on the age variable in the ageweight dataset to create the age_cat variable. (15pts)
- For the following question you will use the fram_heart.csv dataset. The description of the dataset can be found here:
Obesity coexists with a variety of cardiovascular risk factors and has been related to greater cardiovascular risk in a variety of observational studies. In this simulated data based on the Framingham Heart Study we use BMI to determine the relationship between obesity and vascular disease end points. In this study BMI and cardiovascular risk factors were measured at the initial examination, patients were then followed for 15 years to see if any of the following vascular disease occurred: angina pectoris, myocardial infarction, coronary heart disease, diabetes, and stroke. All patients included in this study were free of vascular disease at the initial examination.
Variable Name / DescriptionID / Patient’s ID Number
Age / Patient’s age at initial examination
Weight / Patient’s weight at initial examination in kg, -99 = missing
Height / Patient’s height at initial examination in m, -9 = missing
Gender / 1=Female, 0=Male
Cardiovascular Risk Factors
Smoke / 1= Smoker, 0= Nonsmoker
Hypertension / 1= Yes, 0 = No
Hypercholesterolemia / 1= Yes, 0 = No
Vascular Diseases
Diabetes / 1= Developed during follow-up, 0= Did not develop, 999=missing
Angina Pectoris / 1= Developed during follow-up, 0= Did not develop
Myocardial Infarction / 1= Developed during follow-up, 0= Did not develop, 999=missing
Coronary Heart Disease / 1= Developed during follow-up, 0= Did not develop
Stroke / 1= Developed during follow-up, 0= Did not develop
a)Read in the fram_heart dataset. (10pts)
b)Use conditional indexing to replace the missing values (999,-99,9) for the weight, height, diabetes and myocardial infarction variables with NA. (Hint: this technique was done for the months in the airquality dataset in the inclasscode_3.R script. Replace missing values (99,-99,999) with NA.) (20pts)
- Create a dataset called “cardio” which only contains the subset of participant from the fram_heart dataset who had any of the cardiovascular risk factors (smoke, hypertension, and hypercholesterolemia). (Hint: You can either use an ifelse statement with multiple conditions OR you can first use the apply statement to sum the smoke, hypertension and hypercholesterolemia variables and then take only those where the sum is greater than 1). (20pts)
Extra Credit: on Blackboard and the website is a document called 720_sim. It details a simulation using R. For those who are interested, read it, read it again, and then tell me what is happening in the R code. I will give 5 extra points for explanations of each of the two operations shown in the R code.