STAT 405 – BIOSTATISTICS
Handout 18 – Survival Analysis: Kaplan-Meier Method

EXAMPLE: Survival of Leukemia Patients

Suppose a study is conducted to compare survival times of patients receiving two different types of leukemia treatments. The data are given in the file Leukemia.JMP.

The raw data for Treatment Group 1 only are presented in the following graphic. The horizontal axis represents survival time, and each of the horizontal lines represents a patient. An ‘X’ indicates that a death occurred at that point in time. An ‘O’ indicates the last point in time the patient was observed. The current survival status for those marked with an ‘O’ cannot be obtained because observation was terminated before death occurred, and we know only that these patients were alive at last observation.

The points marked with an ‘O’ are examples of right censored observations. We call this “right censored” because all we know about the survival time of these patients is that it is GREATER THAN some value.

Questions:

  1. Using “conventional” statistical methods, how might you approach the analysis of these data?
  1. What are the problems with these “conventional” methods? Explain.

Survival Analysis

Conventional methods discussed earlier in the semester are not appropriate for dealing with either censored data or time-dependent variables. In contrast, survival analysis consists of methods for studying both the occurrence and timing of events, and these methods allow for censoring. Several approaches to survival analysis exist, and we will start by discussing one known as the Kaplan-Meier method.

Kaplan-Meier Method

In biostatistics, the Kaplan-Meier (KM) estimator is the most widely used method for estimating survivor functions. When there are no censored data, the KM estimator is simple and intuitive. For example, suppose that five patients were observed for sixmonths, and the following was observed:

Patient / Month in Which Death Occurred
1 / 1
2 / 1
3 / 2
4 / 4
5 / 5

The survivor function, S(t), is the probability that an event time is greater than t, where t can be any nonnegative number. In this example, an event time refers to the time at which death occurs.

Therefore, since there is no censoring, the KM estimatoris simply the sample proportion of observations which are still alive at each time point:

i.e.

Time, t /
0
1
2
3
4
5

Kaplan-Meier estimator for censored data

When the data are censored and some of the censoring times are smaller than some event times, the observed proportion of cases with event times greater than t can be biased. This happens because cases that are censored before time t may have died before time t, unbeknownst to us.

To handle this can consider using the following from conditional probability. Suppose , then

where

So

Here we are assuming there are k distinct event times. At each of these event times tj, there are rj individuals who are said to be “at risk” (they have not yet experienced an event and they have not been censored prior to time tj). Finally, we let dj be the number who experience an event (death) at time tj and let cj denote the number of censored observations between the j-th and (j+1)-st observed event time.

Useful identities:

The product derived above is called the Kaplan-Meier estimator or product-limit estimator and is defined as:

EXAMPLE: Back to Survival of Leukemia Patients

For the 21 patients who received Treatment 1, the survival times of the patients are given as follows:

6, 6, 6, 6+, 7, 9+, 10, 10+, 11+, 13, 16, 17+, 19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+

We can use the KM method to estimate the survivor function for Treatment Group 1:

Time
tj / # at risk
rj / # of events
dj / # censored
cj / Estimated Probability of Surviving
/ Estimated Survivor Function

Using JMP for Kaplan-Meier Estimation of the Survivor Function

First, note that the data set should be constructed as follows by specifying three columns of data. One column indicates the treatment the subject received, the event time, and a censoring indicator which denotes whether the event time corresponds to the actual event of interest or a censoring time.

As mention above for each case in the data set, there must be one variable in the data set which contains either the time that an event occurred or, for censored cases, the last time at which that case was observed. A second variable is necessary if some of the cases are censored. It is common to set this equal to 1 for uncensored cases and 0 for censored cases.

Specifically for our data set, the variable Time gives the time in months from the beginning of the study to either death or censoring. The variable Censor has a value of 1 for those who died and a value of 0 for those who were censored. It is not uncommon to code the censoring in reverse, i.e. 1 = censored observation and 0 = observed event. An additional variable, group, is an indicator variable for Treatment 1 vs. Treatment 2 patients. We enter these into JMP as show below.

To get the KM estimator for each treatment group separately we use Analyze > Reliability and Survival > Survival and set up the dialog box as shown below.

The following results are obtained for Treatment group 1.

Each of the lines in the above output corresponds to one of the unique time points (except for the first line, which is for time 0). Note that the estimated survivor function, , the failure function , , the , the number of failures andthe number censored cases are reported for each of the unique time points. The number of individuals at risk at the start of each time period is also reported.

Questions:

  1. What is the estimated probability that a patient will survive for 16 months or more?
  1. What is the estimated survival probability for any time from 10 months up to (but not including) 13 months?

The estimated survivor and failure functions are graphed below:

Here, the 25th percentile refers to the smallest event time such that the probability of dying earlier is greater than 25%. No value is reported for the 75th percentile because the KM estimator for these data never reaches a failure probability greater than 55.18% or a survival probability lower than 44.82%.

Note that the 50th percentile represents the median death time (here, it is 23 months). JMP also provides an estimated mean time of death, but this estimate is biased downward since there are censoring times greater than the largest event time. Even if this is not the case, when a substantial number of cases are censored, the median is a much preferred measure of central tendency for censored survival data.

Definition for Mean, Medians, and Quantiles based on the KM Estimators

  • Mean–
  • Median – by definition, this is the time such that However, in practice, it is defined as the smallest time such that The median is more appropriate for censored survival data than the mean.
  • Lower quartile – the smallest time ( such that
  • Upper quartile – the smallest time such that

Constructing Confidence Intervals for the KM estimators

Note that JMP also provided us with a standard error for the “survival” at each point in time. These standard errors can be used to construct confidence intervals, which are easily obtained by selecting that option from the Product-Limit… pull-down menu.

Estimating the

Uses the delta method which says, if Y then is approximately normally distributed with mean and variance .

For example if we consider then

or

and if then

Instead estimating the we can use the delta method to approximate the . The = and using independence of the we have

Now thus by the delta method again we have

thus,

This is called Greenwood’s Formula. The square root of this variance approximation gives approximate standard errors for the estimated survivor function, i.e. . Using this standard error we can compute a 95% CI for S(t) as:

This can yield values outside the range [0,1]. A better approach is to exploit the delta method even further to obtain a CI for . Since this quantity is unrestricted, the confidence interval will be in the proper range when we transform back.

Log-log Approach for Confidence Intervals

1)Define

2)Form a 95% CI for L(t)

3)Since , the 95% CI for S(t) based on the CI for L(t) is

given by

4)Substituting into the upper and lower bounds gives confidence limits

What is ? Using the delta method again…

and the standard error is the square root of this estimated variance.

Constructing Confidence Bands for the Survivor Function

The pointwise confidence interval for the survivor function S(t)discussed above is valid for ONLY a SINGLE FIXED TIME at which the inference is to be made. In some cases, it is of interest to find the upper and lower confidence bands that guarantee, with a given confidence level, that the survivor function falls within the band for all t in some interval. One such approach was proposed by Hall and Wellner and is also implemented in JMP.


Testing for Differences in Survivor Functions

Now, let’s consider the observations from both treatment groups 1 and 2. To determine whether the survivor functions differ across treatment, we test the following hypotheses:

Ho: S1(t) = S2(t) for all t

Ha: S1(t) ≠ S2(t) for all t

JMP provides two test statistics for this alternative hypothesis, the log-rank test and the Wilcoxon test, both of which yield a Chi-square statistic. The larger the chi-square statistic, the stronger the evidence against the null hypothesis that the survival experience of the two population of subjects is the same. Recall large chi-square statistics yield small p-values!

We also obtain a plot of both survivor functions with point-wise confidence intervals added.

We can also plot each curve with simultaneous CI’s.

Kaplan-Meier Estimation and Testing in R

For the 21 patients who received Treatment 1, the survival times of the patients are given as follows: 6, 6, 6, 6+, 7, 9+, 10, 10+, 11+, 13, 16, 17+, 19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+

Time Censor

[1,] 6 1

[2,] 6 1

[3,] 6 1

[4,] 6 0

[5,] 7 1

[6,] 9 0

[7,] 10 1

[8,] 10 0

[9,] 11 0

[10,] 13 1

[11,] 16 1

[12,] 17 0

[13,] 19 0

[14,] 20 0

[15,] 22 1

[16,] 23 1

[17,] 25 0

[18,] 32 0

[19,] 32 0

[20,] 34 0

[21,] 35 0

fit = survfit(Surv(Time,Censor)~1,type="kaplan",conf.type="plain")

> summary(fit)

Call: survfit(formula = Surv(Time, Censor) ~ 1, type = "kaplan", conf.type = "plain")

time n.risk n.event survival std.err lower 95% CI upper 95% CI

6 21 3 0.857 0.0764 0.707 1.000

7 17 1 0.807 0.0869 0.636 0.977

10 15 1 0.753 0.0963 0.564 0.942

13 12 1 0.690 0.1068 0.481 0.900

16 11 1 0.627 0.1141 0.404 0.851

22 7 1 0.538 0.1282 0.286 0.789

23 6 1 0.448 0.1346 0.184 0.712

> plot(fit)

> title(main="Kaplan-Meier with Greenwood-Based CI's")

> fit = survfit(Surv(Time,Censor)~1,type="kaplan",conf.type="log-log")

> summary(fit)

Call: survfit(formula = Surv(Time, Censor) ~ 1, type = "kaplan", conf.type = "log-log")

time n.risk n.event survival std.err lower 95% CI upper 95% CI

6 21 3 0.857 0.0764 0.620 0.952

7 17 1 0.807 0.0869 0.563 0.923

10 15 1 0.753 0.0963 0.503 0.889

13 12 1 0.690 0.1068 0.432 0.849

16 11 1 0.627 0.1141 0.368 0.805

22 7 1 0.538 0.1282 0.268 0.747

23 6 1 0.448 0.1346 0.188 0.680

> plot(fit)

> title(main="Kaplan-Meier with Log-Log CI's")

Examining Difference in Survival Across Groups

> Leuk.fit = survfit(Surv(Time,Censor)~Group,data=leukemia)

> summary(Leuk.fit)

Call: survfit(formula = Surv(Time, Censor) ~ Group, data = leukemia)

Group=1

time n.risk n.event survival std.err lower 95% CI upper 95% CI

6 21 3 0.857 0.0764 0.720 1.000

7 17 1 0.807 0.0869 0.653 0.996

10 15 1 0.753 0.0963 0.586 0.968

13 12 1 0.690 0.1068 0.510 0.935

16 11 1 0.627 0.1141 0.439 0.896

22 7 1 0.538 0.1282 0.337 0.858

23 6 1 0.448 0.1346 0.249 0.807

Group=2

time n.risk n.event survival std.err lower 95% CI upper 95% CI

1 21 2 0.9048 0.0641 0.78754 1.000

2 19 2 0.8095 0.0857 0.65785 0.996

3 17 1 0.7619 0.0929 0.59988 0.968

4 16 2 0.6667 0.1029 0.49268 0.902

5 14 2 0.5714 0.1080 0.39455 0.828

8 12 4 0.3810 0.1060 0.22085 0.657

11 8 2 0.2857 0.0986 0.14529 0.562

12 6 2 0.1905 0.0857 0.07887 0.460

15 4 1 0.1429 0.0764 0.05011 0.407

17 3 1 0.0952 0.0641 0.02549 0.356

22 2 1 0.0476 0.0465 0.00703 0.322

23 1 1 0.0000 NaN NA NA

> plot(Leuk.fit,col=3:4,lty=1:2)

legend(locator(),legend=c("Maintained","Nonmaintained"),col=3:4,lty=1:2)

> plot(Leuk.fit,conf.int=T,col=3:4,lty=1:2)

legend(locator(),legend=c("Maintained","Nonmaintained"),col=3:4,lty=1:2)

> title(main="Survival Curves with Confidence Bands")

Log Rank Test for Comparing Survival Experience

> Leuk.test = survdiff(Surv(Time,Censor)~Group,data=leukemia)

> print(Leuk.test)

Call:

survdiff(formula = Surv(Time, Censor) ~ Group, data = leukemia)

N Observed Expected (O-E)^2/E (O-E)^2/V

Group=1 21 9 19.3 5.46 16.8

Group=2 21 21 10.7 9.77 16.8

Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05

Example 2: Oropharynx Cancer

Name: A clinical Trial in the Trt. of Carcinoma of the Oropharynx
SIZE: 195 observations, 13 variables.

SOURCE: The Statistical Analysis of Failure Time Data, by JD Kalbfleisch

& RL Prentice, (1980), Published by John Wiley & Sons

DESCRIPTIVE ABSTRACT:

These data from a part of a large clinical trialcarried out by the Radiation Therapy Oncology Group in the United States. The full study included patients with squamous carcinoma of 15 sites in the mouth and throat, with 16 participating institutions, though only data on three sites in the oropharynx reported by the six largest institutions are considered here. Patients entering the study were randomly assigned to one of two treatment groups, radiation therapy alone or radiation therapy together with a chemotherapeutic agent.
One objective of the study was to compare the two treatment policies with respect to patient survival.

LIST OF VARIABLES:

VariableDescription

______

CASE Case Number

INST Participating Institution

SEX 1=male, 2=female

TX Treatment: 1=standard, 2=test

GRADE 1=well differentiated, 2=moderately differentiated,

3=poorly differentiated, 9=missing

AGE In years at time of diagnosis

COND Condition: 1=no disability, 2=restricted work, 3=requires assistance

with self care, 4=bed confined, 9=missing

SITE 1=faucial arch, 2=tonsillar fossa, 3=posterior pillar,

4=pharyngeal tongue, 5=posterior wall

T_STAGE 1=primary tumor measuring 2 cm or less in largest diameter,

2=primary tumor measuring 2 cm to 4 cm in largest diameter with

minimal infiltration in depth, 3=primary tumor measuring more

than 4 cm, 4=massive invasive tumor

N_STAGE 0=no clinical evidence of node metastases, 1=single positive

node 3 cm or less in diameter, not fixed, 2=single positive

node more than 3 cm in diameter, not fixed, 3=multiple

positive nodes or fixed positive nodes

ENTRY_DT Date of study entry: Day of year and year, dddyy

STATUS 0=censored, 1=dead

TIME Survival time in days from day of diagnosis

______

STORY BEHIND THE DATA:

Approximately 30% of the survival times are censored owing primarily to

patients surviving to the time of analysis. Some patients were lost

to follow-up because the patient moved or transferred to an institution not

participating in the study, though these cases were relatively rare. From

a statistical point of view, an important feature of these data is the

considerable lack of homogeneity between individuals being studied.

Of course, as part of the study design, certain criteria for patient

eligibility had to be met which eliminated extremes in the extent of disease,

but still many factors are not controlled.

This study included measurements of many covariates which would be expected

to relate to survival experience. Six such variables are given in the data

(sex, T staging, N staging, age, general condition, and grade). The site

of the primary tumor and possible differences between participating

institutions require consideration as well.

The T,N staging classification gives a measure of the extent of the tumor at

the primary site and at regional lymph nodes. T=1, refers to a small primary

tumor, 2 centimeters or less in largest diameter, whereas T=4 is a massive

tumor with extension to adjoining tissue. T=2 and T=3 refer to intermediate

cases. N=0 refers to there being no clinical evidence of a lymph node

metastasis and N=1, N=2, N=3 indicate, in increasing magnitude, the extent of

existing lymph node involvement. Patients with classifications T=1,N=0;

T=1,N=1; T=2,N=0; or T=2,N=1, or with distant metastases were excluded

from study.

The variable general condition gives a measure of the functional capacity of

the patient at the time of diagnosis (1 refers to no disability whereas

4 denotes bed confinement; 2 and 3 measure intermediate levels). The variable

grade is a measure of the degree of differentiation of the tumor (the degree

to which the tumor cell resembles the host cell) from 1 (well differentiated)

to 3 (poorly differentiated)

In addition to the primary question whether the combined treatment mode is

preferable to the conventional radiation therapy, it is of considerable

interest to determine the extent to which the several covariates relate to

subsequent survival. It is also imperative in answering the primary question

to adjust the survivals for possible imbalance that may be present in the

study with regard to the other covariates. Such problems are similar to those

encountered in the classical theory of linear regression and the analysis of

covariance. Again, the need to accommodate censoring is an important

distinguishing point. In many situations it is also important to develop

nonparametric and robust procedures since there is frequently little empirical

or theoretical work to support a particular family of failure time

distributions.

Compare Survival Experience of Patients in the Treatment Groups

> oro.fit = survfit(Surv(TIME,STATUS)~as.factor(TX),data=Pharynx)

> plot(oro.fit,col=3:4,lty=1:2,conf.int=T)

> title(main="Comparison of Therapies for Oropharynx Cancer",xlab=”Time (days)",ylab="S(t)")

> legend(locator(),legend=c("Standard","Test"),col=3:4,lty=1:2)

> oro.diff = survdiff(Surv(TIME,STATUS)~as.factor(TX),data=Pharynx)

> print(oro.diff)

Call:

survdiff(formula = Surv(TIME, STATUS) ~ as.factor(TX), data = Pharynx)

N Observed Expected (O-E)^2/E (O-E)^2/V

as.factor(TX)=1 100 73 78.7 0.410 0.926

as.factor(TX)=2 95 69 63.3 0.509 0.926

Chisq= 0.9 on 1 degrees of freedom, p= 0.336

Use Treatment Regimen and Patient Gender to Define Cohorts

> oro.trtsex = survfit(Surv(TIME,STATUS)~as.factor(SEX)+as.factor(TX),data=Pharynx)

> plot(oro.trtsex,col=1:4,lty=1:4)

> oroTS.diff = survdiff(Surv(TIME,STATUS)~as.factor(SEX)+as.factor(TX),data=Pharynx)

> print(oroTS.diff)

Call:

survdiff(formula = Surv(TIME, STATUS) ~ as.factor(SEX) + as.factor(TX),

data = Pharynx)

N Observed Expected (O-E)^2/E (O-E)^2/V

as.factor(SEX)=1, as.factor(TX)=1 77 58 56.2 0.0558 0.093

as.factor(SEX)=1, as.factor(TX)=2 72 52 49.5 0.1290 0.199

as.factor(SEX)=2, as.factor(TX)=1 23 15 22.4 2.4715 2.976

as.factor(SEX)=2, as.factor(TX)=2 23 17 13.8 0.7170 0.800

Chisq= 3.4 on 3 degrees of freedom, p= 0.33

1