STAT 405 – BIOSTATISTICS
Handout 18 – Survival Analysis: Kaplan-Meier Method
EXAMPLE: Survival of Leukemia Patients
Suppose a study is conducted to compare survival times of patients receiving two different types of leukemia treatments. The data are given in the file Leukemia.JMP.
The raw data for Treatment Group 1 only are presented in the following graphic. The horizontal axis represents survival time, and each of the horizontal lines represents a patient. An ‘X’ indicates that a death occurred at that point in time. An ‘O’ indicates the last point in time the patient was observed. The current survival status for those marked with an ‘O’ cannot be obtained because observation was terminated before death occurred, and we know only that these patients were alive at last observation.
The points marked with an ‘O’ are examples of right censored observations. We call this “right censored” because all we know about the survival time of these patients is that it is GREATER THAN some value.
Questions:
- Using “conventional” statistical methods, how might you approach the analysis of these data?
- What are the problems with these “conventional” methods? Explain.
Survival Analysis
Conventional methods discussed earlier in the semester are not appropriate for dealing with either censored data or time-dependent variables. In contrast, survival analysis consists of methods for studying both the occurrence and timing of events, and these methods allow for censoring. Several approaches to survival analysis exist, and we will start by discussing one known as the Kaplan-Meier method.
Kaplan-Meier Method
In biostatistics, the Kaplan-Meier (KM) estimator is the most widely used method for estimating survivor functions. When there are no censored data, the KM estimator is simple and intuitive. For example, suppose that five patients were observed for sixmonths, and the following was observed:
Patient / Month in Which Death Occurred1 / 1
2 / 1
3 / 2
4 / 4
5 / 5
The survivor function, S(t), is the probability that an event time is greater than t, where t can be any nonnegative number. In this example, an event time refers to the time at which death occurs.
Therefore, since there is no censoring, the KM estimatoris simply the sample proportion of observations which are still alive at each time point:
i.e.
Time, t /0
1
2
3
4
5
Kaplan-Meier estimator for censored data
When the data are censored and some of the censoring times are smaller than some event times, the observed proportion of cases with event times greater than t can be biased. This happens because cases that are censored before time t may have died before time t, unbeknownst to us.
To handle this can consider using the following from conditional probability. Suppose , then
where
So
Here we are assuming there are k distinct event times. At each of these event times tj, there are rj individuals who are said to be “at risk” (they have not yet experienced an event and they have not been censored prior to time tj). Finally, we let dj be the number who experience an event (death) at time tj and let cj denote the number of censored observations between the j-th and (j+1)-st observed event time.
Useful identities:
The product derived above is called the Kaplan-Meier estimator or product-limit estimator and is defined as:
EXAMPLE: Back to Survival of Leukemia Patients
For the 21 patients who received Treatment 1, the survival times of the patients are given as follows:
6, 6, 6, 6+, 7, 9+, 10, 10+, 11+, 13, 16, 17+, 19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+
We can use the KM method to estimate the survivor function for Treatment Group 1:
Timetj / # at risk
rj / # of events
dj / # censored
cj / Estimated Probability of Surviving
/ Estimated Survivor Function
Using JMP for Kaplan-Meier Estimation of the Survivor Function
First, note that the data set should be constructed as follows by specifying three columns of data. One column indicates the treatment the subject received, the event time, and a censoring indicator which denotes whether the event time corresponds to the actual event of interest or a censoring time.
As mention above for each case in the data set, there must be one variable in the data set which contains either the time that an event occurred or, for censored cases, the last time at which that case was observed. A second variable is necessary if some of the cases are censored. It is common to set this equal to 1 for uncensored cases and 0 for censored cases.
Specifically for our data set, the variable Time gives the time in months from the beginning of the study to either death or censoring. The variable Censor has a value of 1 for those who died and a value of 0 for those who were censored. It is not uncommon to code the censoring in reverse, i.e. 1 = censored observation and 0 = observed event. An additional variable, group, is an indicator variable for Treatment 1 vs. Treatment 2 patients. We enter these into JMP as show below.
To get the KM estimator for each treatment group separately we use Analyze > Reliability and Survival > Survival and set up the dialog box as shown below.
The following results are obtained for Treatment group 1.
Each of the lines in the above output corresponds to one of the unique time points (except for the first line, which is for time 0). Note that the estimated survivor function, , the failure function , , the , the number of failures andthe number censored cases are reported for each of the unique time points. The number of individuals at risk at the start of each time period is also reported.
Questions:
- What is the estimated probability that a patient will survive for 16 months or more?
- What is the estimated survival probability for any time from 10 months up to (but not including) 13 months?
The estimated survivor and failure functions are graphed below:
Here, the 25th percentile refers to the smallest event time such that the probability of dying earlier is greater than 25%. No value is reported for the 75th percentile because the KM estimator for these data never reaches a failure probability greater than 55.18% or a survival probability lower than 44.82%.
Note that the 50th percentile represents the median death time (here, it is 23 months). JMP also provides an estimated mean time of death, but this estimate is biased downward since there are censoring times greater than the largest event time. Even if this is not the case, when a substantial number of cases are censored, the median is a much preferred measure of central tendency for censored survival data.
Definition for Mean, Medians, and Quantiles based on the KM Estimators
- Mean–
- Median – by definition, this is the time such that However, in practice, it is defined as the smallest time such that The median is more appropriate for censored survival data than the mean.
- Lower quartile – the smallest time ( such that
- Upper quartile – the smallest time such that
Constructing Confidence Intervals for the KM estimators
Note that JMP also provided us with a standard error for the “survival” at each point in time. These standard errors can be used to construct confidence intervals, which are easily obtained by selecting that option from the Product-Limit… pull-down menu.
Estimating the
Uses the delta method which says, if Y then is approximately normally distributed with mean and variance .
For example if we consider then
or
and if then
Instead estimating the we can use the delta method to approximate the . The = and using independence of the we have
Now thus by the delta method again we have
thus,
This is called Greenwood’s Formula. The square root of this variance approximation gives approximate standard errors for the estimated survivor function, i.e. . Using this standard error we can compute a 95% CI for S(t) as:
This can yield values outside the range [0,1]. A better approach is to exploit the delta method even further to obtain a CI for . Since this quantity is unrestricted, the confidence interval will be in the proper range when we transform back.
Log-log Approach for Confidence Intervals
1)Define
2)Form a 95% CI for L(t)
3)Since , the 95% CI for S(t) based on the CI for L(t) is
given by
4)Substituting into the upper and lower bounds gives confidence limits
What is ? Using the delta method again…
and the standard error is the square root of this estimated variance.
Constructing Confidence Bands for the Survivor Function
The pointwise confidence interval for the survivor function S(t)discussed above is valid for ONLY a SINGLE FIXED TIME at which the inference is to be made. In some cases, it is of interest to find the upper and lower confidence bands that guarantee, with a given confidence level, that the survivor function falls within the band for all t in some interval. One such approach was proposed by Hall and Wellner and is also implemented in JMP.
Testing for Differences in Survivor Functions
Now, let’s consider the observations from both treatment groups 1 and 2. To determine whether the survivor functions differ across treatment, we test the following hypotheses:
Ho: S1(t) = S2(t) for all t
Ha: S1(t) ≠ S2(t) for all t
JMP provides two test statistics for this alternative hypothesis, the log-rank test and the Wilcoxon test, both of which yield a Chi-square statistic. The larger the chi-square statistic, the stronger the evidence against the null hypothesis that the survival experience of the two population of subjects is the same. Recall large chi-square statistics yield small p-values!
We also obtain a plot of both survivor functions with point-wise confidence intervals added.
We can also plot each curve with simultaneous CI’s.
Kaplan-Meier Estimation and Testing in R
For the 21 patients who received Treatment 1, the survival times of the patients are given as follows: 6, 6, 6, 6+, 7, 9+, 10, 10+, 11+, 13, 16, 17+, 19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+
Time Censor
[1,] 6 1
[2,] 6 1
[3,] 6 1
[4,] 6 0
[5,] 7 1
[6,] 9 0
[7,] 10 1
[8,] 10 0
[9,] 11 0
[10,] 13 1
[11,] 16 1
[12,] 17 0
[13,] 19 0
[14,] 20 0
[15,] 22 1
[16,] 23 1
[17,] 25 0
[18,] 32 0
[19,] 32 0
[20,] 34 0
[21,] 35 0
fit = survfit(Surv(Time,Censor)~1,type="kaplan",conf.type="plain")
> summary(fit)
Call: survfit(formula = Surv(Time, Censor) ~ 1, type = "kaplan", conf.type = "plain")
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.707 1.000
7 17 1 0.807 0.0869 0.636 0.977
10 15 1 0.753 0.0963 0.564 0.942
13 12 1 0.690 0.1068 0.481 0.900
16 11 1 0.627 0.1141 0.404 0.851
22 7 1 0.538 0.1282 0.286 0.789
23 6 1 0.448 0.1346 0.184 0.712
> plot(fit)
> title(main="Kaplan-Meier with Greenwood-Based CI's")
> fit = survfit(Surv(Time,Censor)~1,type="kaplan",conf.type="log-log")
> summary(fit)
Call: survfit(formula = Surv(Time, Censor) ~ 1, type = "kaplan", conf.type = "log-log")
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.620 0.952
7 17 1 0.807 0.0869 0.563 0.923
10 15 1 0.753 0.0963 0.503 0.889
13 12 1 0.690 0.1068 0.432 0.849
16 11 1 0.627 0.1141 0.368 0.805
22 7 1 0.538 0.1282 0.268 0.747
23 6 1 0.448 0.1346 0.188 0.680
> plot(fit)
> title(main="Kaplan-Meier with Log-Log CI's")
Examining Difference in Survival Across Groups
> Leuk.fit = survfit(Surv(Time,Censor)~Group,data=leukemia)
> summary(Leuk.fit)
Call: survfit(formula = Surv(Time, Censor) ~ Group, data = leukemia)
Group=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
6 21 3 0.857 0.0764 0.720 1.000
7 17 1 0.807 0.0869 0.653 0.996
10 15 1 0.753 0.0963 0.586 0.968
13 12 1 0.690 0.1068 0.510 0.935
16 11 1 0.627 0.1141 0.439 0.896
22 7 1 0.538 0.1282 0.337 0.858
23 6 1 0.448 0.1346 0.249 0.807
Group=2
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 21 2 0.9048 0.0641 0.78754 1.000
2 19 2 0.8095 0.0857 0.65785 0.996
3 17 1 0.7619 0.0929 0.59988 0.968
4 16 2 0.6667 0.1029 0.49268 0.902
5 14 2 0.5714 0.1080 0.39455 0.828
8 12 4 0.3810 0.1060 0.22085 0.657
11 8 2 0.2857 0.0986 0.14529 0.562
12 6 2 0.1905 0.0857 0.07887 0.460
15 4 1 0.1429 0.0764 0.05011 0.407
17 3 1 0.0952 0.0641 0.02549 0.356
22 2 1 0.0476 0.0465 0.00703 0.322
23 1 1 0.0000 NaN NA NA
> plot(Leuk.fit,col=3:4,lty=1:2)
legend(locator(),legend=c("Maintained","Nonmaintained"),col=3:4,lty=1:2)
> plot(Leuk.fit,conf.int=T,col=3:4,lty=1:2)
legend(locator(),legend=c("Maintained","Nonmaintained"),col=3:4,lty=1:2)
> title(main="Survival Curves with Confidence Bands")
Log Rank Test for Comparing Survival Experience
> Leuk.test = survdiff(Surv(Time,Censor)~Group,data=leukemia)
> print(Leuk.test)
Call:
survdiff(formula = Surv(Time, Censor) ~ Group, data = leukemia)
N Observed Expected (O-E)^2/E (O-E)^2/V
Group=1 21 9 19.3 5.46 16.8
Group=2 21 21 10.7 9.77 16.8
Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05
Example 2: Oropharynx Cancer
Name: A clinical Trial in the Trt. of Carcinoma of the Oropharynx
SIZE: 195 observations, 13 variables.
SOURCE: The Statistical Analysis of Failure Time Data, by JD Kalbfleisch
& RL Prentice, (1980), Published by John Wiley & Sons
DESCRIPTIVE ABSTRACT:
These data from a part of a large clinical trialcarried out by the Radiation Therapy Oncology Group in the United States. The full study included patients with squamous carcinoma of 15 sites in the mouth and throat, with 16 participating institutions, though only data on three sites in the oropharynx reported by the six largest institutions are considered here. Patients entering the study were randomly assigned to one of two treatment groups, radiation therapy alone or radiation therapy together with a chemotherapeutic agent.
One objective of the study was to compare the two treatment policies with respect to patient survival.
LIST OF VARIABLES:
VariableDescription
______
CASE Case Number
INST Participating Institution
SEX 1=male, 2=female
TX Treatment: 1=standard, 2=test
GRADE 1=well differentiated, 2=moderately differentiated,
3=poorly differentiated, 9=missing
AGE In years at time of diagnosis
COND Condition: 1=no disability, 2=restricted work, 3=requires assistance
with self care, 4=bed confined, 9=missing
SITE 1=faucial arch, 2=tonsillar fossa, 3=posterior pillar,
4=pharyngeal tongue, 5=posterior wall
T_STAGE 1=primary tumor measuring 2 cm or less in largest diameter,
2=primary tumor measuring 2 cm to 4 cm in largest diameter with
minimal infiltration in depth, 3=primary tumor measuring more
than 4 cm, 4=massive invasive tumor
N_STAGE 0=no clinical evidence of node metastases, 1=single positive
node 3 cm or less in diameter, not fixed, 2=single positive
node more than 3 cm in diameter, not fixed, 3=multiple
positive nodes or fixed positive nodes
ENTRY_DT Date of study entry: Day of year and year, dddyy
STATUS 0=censored, 1=dead
TIME Survival time in days from day of diagnosis
______
STORY BEHIND THE DATA:
Approximately 30% of the survival times are censored owing primarily to
patients surviving to the time of analysis. Some patients were lost
to follow-up because the patient moved or transferred to an institution not
participating in the study, though these cases were relatively rare. From
a statistical point of view, an important feature of these data is the
considerable lack of homogeneity between individuals being studied.
Of course, as part of the study design, certain criteria for patient
eligibility had to be met which eliminated extremes in the extent of disease,
but still many factors are not controlled.
This study included measurements of many covariates which would be expected
to relate to survival experience. Six such variables are given in the data
(sex, T staging, N staging, age, general condition, and grade). The site
of the primary tumor and possible differences between participating
institutions require consideration as well.
The T,N staging classification gives a measure of the extent of the tumor at
the primary site and at regional lymph nodes. T=1, refers to a small primary
tumor, 2 centimeters or less in largest diameter, whereas T=4 is a massive
tumor with extension to adjoining tissue. T=2 and T=3 refer to intermediate
cases. N=0 refers to there being no clinical evidence of a lymph node
metastasis and N=1, N=2, N=3 indicate, in increasing magnitude, the extent of
existing lymph node involvement. Patients with classifications T=1,N=0;
T=1,N=1; T=2,N=0; or T=2,N=1, or with distant metastases were excluded
from study.
The variable general condition gives a measure of the functional capacity of
the patient at the time of diagnosis (1 refers to no disability whereas
4 denotes bed confinement; 2 and 3 measure intermediate levels). The variable
grade is a measure of the degree of differentiation of the tumor (the degree
to which the tumor cell resembles the host cell) from 1 (well differentiated)
to 3 (poorly differentiated)
In addition to the primary question whether the combined treatment mode is
preferable to the conventional radiation therapy, it is of considerable
interest to determine the extent to which the several covariates relate to
subsequent survival. It is also imperative in answering the primary question
to adjust the survivals for possible imbalance that may be present in the
study with regard to the other covariates. Such problems are similar to those
encountered in the classical theory of linear regression and the analysis of
covariance. Again, the need to accommodate censoring is an important
distinguishing point. In many situations it is also important to develop
nonparametric and robust procedures since there is frequently little empirical
or theoretical work to support a particular family of failure time
distributions.
Compare Survival Experience of Patients in the Treatment Groups
> oro.fit = survfit(Surv(TIME,STATUS)~as.factor(TX),data=Pharynx)
> plot(oro.fit,col=3:4,lty=1:2,conf.int=T)
> title(main="Comparison of Therapies for Oropharynx Cancer",xlab=”Time (days)",ylab="S(t)")
> legend(locator(),legend=c("Standard","Test"),col=3:4,lty=1:2)
> oro.diff = survdiff(Surv(TIME,STATUS)~as.factor(TX),data=Pharynx)
> print(oro.diff)
Call:
survdiff(formula = Surv(TIME, STATUS) ~ as.factor(TX), data = Pharynx)
N Observed Expected (O-E)^2/E (O-E)^2/V
as.factor(TX)=1 100 73 78.7 0.410 0.926
as.factor(TX)=2 95 69 63.3 0.509 0.926
Chisq= 0.9 on 1 degrees of freedom, p= 0.336
Use Treatment Regimen and Patient Gender to Define Cohorts
> oro.trtsex = survfit(Surv(TIME,STATUS)~as.factor(SEX)+as.factor(TX),data=Pharynx)
> plot(oro.trtsex,col=1:4,lty=1:4)
> oroTS.diff = survdiff(Surv(TIME,STATUS)~as.factor(SEX)+as.factor(TX),data=Pharynx)
> print(oroTS.diff)
Call:
survdiff(formula = Surv(TIME, STATUS) ~ as.factor(SEX) + as.factor(TX),
data = Pharynx)
N Observed Expected (O-E)^2/E (O-E)^2/V
as.factor(SEX)=1, as.factor(TX)=1 77 58 56.2 0.0558 0.093
as.factor(SEX)=1, as.factor(TX)=2 72 52 49.5 0.1290 0.199
as.factor(SEX)=2, as.factor(TX)=1 23 15 22.4 2.4715 2.976
as.factor(SEX)=2, as.factor(TX)=2 23 17 13.8 0.7170 0.800
Chisq= 3.4 on 3 degrees of freedom, p= 0.33
1