Data Set Assignment – Harold Otieno and Karin Tanaka, 05 December 2008, Math 157

Part 1

1.  Dickson, E.R., P.M. Grambsch, et al. (1989). “Prognosis in Primary Biliary Cirrhosis: Model for Decision Making.” Hepatology 10: 1-7.

Markus, B.H., E.R. Dickson, et al. (1989). “Efficacy of Liver Transplantation in Patients with Primary Biliary Cirrhosis.” N Engl J Med 320: 1709-13.

2.  The goal of the study is to create a mathematical model for predicting survival for patients with primary biliary cirrhosis based on a small number of universally available measurements to aid in the selection of patients for liver transplantation. A model using five variables – age, serum bilirubin concentration, serum albumin concentration, prothrombin time, and a score for the severity of edema – was developed. It performed as well as the previously accepted model in cross-testing for validity using an independent set of patients.

3.  The data are based on a clinical trial as well as additional patients who declined participation in the trial. Those patients in the clinical trial were randomized to (a) D-penicillamine or (2) placebo.

4.  Primary biliary cirrhosis patients already enrolled in either of two double-blind, placebo-controlled, randomized clinical trials at the Mayo Clinic were eligible for the study. They had previously met established criteria to be involved in these clinical trials (see Dickson et al. 1985). Of 418 total eligible patients, 312 agreed to participate in the study. The study was supported by a research grant from the National Institutes of Health.

5.  Due to the selection process described above, no additional exclusion criteria are mentioned, and of the patients enrolled in the previous clinical trials, none were newly excluded for this study. The original exclusion criteria included that no patients were taking additional anti-inflammatory or immunosuppressive medication.

Patients lost to follow-up or who underwent liver transplantation were censored at time of loss or transplant, and their data were used. However, six of the 112 patients who declined to participate in the original trial were lost to follow-up within a short period of time and were not included in the cross-validation.

6.  Variables (clinical variables taken at baseline, that is, upon entry to the study)

l  id; patient ID

l  fu.days; number of days between registration and death, transplantation or study analysis time, whichever came first

l  status; status = 0 is still alive, status = 1 had a transplant, status = 2 died

(note: “alive” and “transplant” both indicate censored observations)

l  drug: 1= D-penicillamine, 2=placebo

l  partic: 1=participated in the randomized clinical trial, 0 = didn’t participate

l  age; number of days since birth

l  sex; sex = 0 is male, sex = 1 is female

l  edema; presence of edema, edema = 0 is no edema and no diuretic therapy for edema, edema = 0.5 is edema present without diuretics or edema resolved by diuretics, edema = 1 is edema present despite diuretic therapy

l  bili; serum bilirubin in mg/dl

l  chol: serum cholesterol in mg/dl

l  albumin; albumin in mg/dl

l  pro.time; prothrombin time in seconds

Note: Only the variables most relevant in understanding the structure of the study and the final model are given here. A total of 45 prognostic factors were measured in the study; most are omitted.

Part 2

Summary Statistics.

There were 374 female patients and 44 male patients. 354 patients had no Edema, 44 had edema without diuretics and 20 had edema despite diuretic therapy. 161 patients died, 232 survived and 25 had a transplant.

Maximum / Minimum / Mean / Medium / SD
Follow-up days / 4795 / 44 / 1917.782 / 1730 / 1104.673
bilirubin conc. / 28 mg/dl / 0.3 mg/dl / 3.220mg/dl / 1.4 mg/dl / 4.407506
albumin conc. / 4.64 mg/dl / 1.96 mg/dl / 3.49744 mg/dl / 3.53 mg/dl / 0.4249716
age / 28650 days / 9598 days / 18533.35 days / 18628 days / 3815.845


Descriptive Graphics

Box plots of the age of the patients.

From the plot, the median age of patients who survived appears slightly lower than the median age of patients who died.


Box plots of bilirubin concentration.

From the plot, the median bilirium conc. of patients who survived seems lower than that of patients who died. This could be a significant variable in prediction.


Box plots of albumin concentration.

From the plot, the median albumin concentration of patients who survived appears higher than that of patients who died.


Inference:

We did a logistic regression on the data in order to test whether the age, edema, bilirubin conc., albumin conc., and prothrombin time variables are significant in predicting whether patients die or survive.

Call:

glm(formula = status ~ age + edema + bilirubin + albumin + prothtime, family = "binomial")

Deviance Residuals:

Min 1Q Median 3Q Max

-2.1205 -0.8761 0.5560 0.7649 2.7551

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 2.878e+00 1.349e+00 2.134 0.03285 *

age -1.431e-04 3.316e-05 -4.316 1.59e-05 ***

edema -1.793e+00 5.883e-01 -3.047 0.00231 **

bilirubin -3.047e-01 5.031e-02 -6.057 1.39e-09 ***

albumin 3.309e-01 3.016e-01 1.097 0.27256

prothtime 8.288e-03 7.514e-03 1.103 0.27001

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 554.34 on 415 degrees of freedom

Residual deviance: 426.59 on 410 degrees of freedom

(2 observations deleted due to missingness)

AIC: 438.59

Number of Fisher Scoring iterations: 5

From the associated p.values we reject the null hypothesis that age, presence of edema and bilirubin concentration, are not significant predictors of survival.

On the other hand we fail to reject the null hypothesis that albumin concentration and prothrombine time are not significant predictors of survival.

Odds Ratio & Confidence Intervals:

We further calculated the confidence intervals for the odds ratio associated with a 1 unit increase in the age and bilirubin concentration and concluded that: We are 95% confident that the odds ratio of survival associated with a one day increase in age is between 0.999792 and 0.9999219. Note than the interval is less than one, so we can claim that the survival odds decrease. However note that the interval is also very close to one.

We are also 95% confident that the odds ratio associated with a one unit (mg/dl) increase in bilirubin concentration is between 0.6681066 and 0.8137579. Again since this interval is lower than one we can claim that the odds of survival go down.