Assignment #9 – Biostatistics (36 points)
1. Leukemia Study and the Potential Confounding Effect of Sex
These data consist of remission survival times on 42 leukemia patients, half of whom received a certain new treatment therapy and the other half of whom received a standard treatment therapy. Two potential confounding factors are the log base 2 of the subjects white blood cell count (high counts are bad) and sex of the subject. We have looked at the treatment effect in-class, you will now examine remission time differences between males and females.
Males
5, 6+, 6, 6, 6, 7, 8, 8, 8, 9+, 10+, 10, 11+, 11, 11, 12, 12, 13, 15, 17+, 17, 19+, 22,
Females
1, 1, 2, 2, 3, 4, 4, 5, 6, 8, 16, 20+, 22, 23, 23, 25+, 32+, 32+, 34+, 35+
a) Construct a Kaplan-Meier estimate of the survivor function,, for the males only.
Do this by constructing a table as done in class and plot the survivor function estimate.
(4 pts.)
b) Enter these data into JMP or R and plot the survivor functions for males and females
on the same plot. (3 pts.)
c) Report the results of the log-rank test. What are the implications of this finding when
we consider the possible confounding effects of sex? (2 pts.)
2. Mayo Clinic Lung Cancer Data
Description: Survival experience in patients with lung cancer at the Mayo Clinic. Performance scores rate how well the patient can perform usual daily activities.
Variable Names:
inst: Institution code - (DO NOT USE IN YOUR ANALYSIS)
time: Survival time in days
status: censoring status 1=censored, 2=dead
(has the same effect as 0=censor,1=dead)
age: Age in years
sex: Male=1 Female=2
ph.ecog: ECOG performance score (0=good 5=dead)
ph.karno: Karnofsky performance score (bad=0-good=100) rated
by physician
pat.karno: Karnofsky performance score rated by patient
meal.cal: Calories consumed at meals
wt.loss: Weight loss in last six months
Use the data frame lung in the survival package library.
a) Perform a test so see if the survival experience differs across gender. Obtain and plot the Kaplan-Meier estimates for both genders. Discuss. Be sure to make sex a factor variable before beginning your analysis, i.e. sex = as.factor(sex). (4 pts.)
Also note that there is only one patient classified as having an ECOG performance score (ph.ecog) of 3 or more. We might want to dichotomize this covariate to those with a performance score of 2 or more vs. those with a score of 0 or 1.
> table(ph.ecog)
0 1 2 3
47 81 38 1
> ph.ecog2 = ph.ecog > 1
> ph.ecog2 ß USE THIS IN PLACE OF ph.ecog IN YOUR ANALYSES IN (b)
[1] FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE …
b) Fit a Cox proportional hazards model using the available predictors. Start by fitting the full model and eliminating potential covariates until the p-values are all below . Interpret the estimated parameters in your final model in terms of hazard ratios (HR) and include CI’s. For the continuous predictors, if any are in your final model, pick a reasonable increment c to use for estimating the HR. (10 pts.)
c) Look at the three diagnostic methods discussed in class and also check the proportional hazards assumption for your final model. Discuss any model deficiencies/problems these plots and tests suggest. (3 pts.)
d) Refit your “final” model using sex as a stratification variable. Do the parameter estimates for the other predictors in your final model change? (2 pts.)
e) Construct a plot of the survival curves for different cohorts of patients. Choose these curves so that they illustrate the effect of the continuous covariates in your final model. Carefully label the plots so the cohorts are identifiable from the plots. For any particular set of values for the predictors in your model you will get two curves one for males and one for females and thus you will always get two cohorts determined by gender automatically. Briefly discuss each of your plots. (8 pts.)
3. University of Massachusetts – AIDS Research Unit (UMARU)
* IV Drug User Study Revisited *
Variable Description Codes/Values
ID Identification Code 1 - 628
Age Age at Enrollment Years
Beck Beck Depression Score 0.000 - 54.000
Hercoc Heroin/Cocaine Use During 1 = Heroin & Cocaine
3 Months Prior to Admission 2 = Heroin Only
3 = Cocaine Only
4 = Neither Heroin nor
Cocaine
ivhx History of IV Drug Use 1 = Never
2 = Previous
3 = Recent
ndrugtx Number of Prior Drug Treatments 0 - 40
race Subject's Race 0 = White
1 = Non-White
treat Treatment Randomization 0 = Short
Assignment 1 = Long
site Treatment Site 0 = A
1 = B
LOS Length of Stay in Treatment Days
(Admission Date to Exit Date)
Time Time to Drug Relapse Days
(Measured from Admission Date)
Status Event for Treating Lost to Follow-Up
as Returned to Drugs
1 = Returned to Drugs or Lost to Follow-Up
0 = Otherwise
a) Examine Kaplan-Meier estimates/plots and results of log-rank tests for the categorical covariates: Hercoc, ivhx, race, treat, site. Label your plots. Summarize your findings. (15 pts.)
b) Build a Cox model using the categorical covariates from part (a) along with the continuous covariates: Age, Beck, and ndrugtx. Transforming ndrugtx to the log base 2 scale is recommended.
> logdrug <- log(ndrugtx,base=2)
Attempt to simplify your model deleting terms that do not appear to be important. Justify your “final model” via a general chi-squares test for comparing nested models. (5 pts.)
c) Interpret each of the coefficients from your “final model” in terms of hazard ratios and provide confidence intervals. Discuss your findings in such a way that a drug rehab counselor with no statistics background could understand the importance of the various covariates. (12 pts.)
d) Check the proportional hazards assumption. Do any of the covariates suggest a possible violation of this assumption? Use a = .10. Fit the stratified model using the variable you identified as the stratification variable. (3 pts.)
e) Is there evidence to suggest that the stratified model with interactions for these data is necessary? Justify your answer with the appropriate test. (3 pts.)
1