Statistical Methods in Epi II (171:242)

Cox Regression: Stratification

Brian J. Smith, Ph.D.

March 24, 2003

1

Introduction

Breast-Feeding Example:

In Lab 8 the proportional hazards assumption was tested for each of the variables in the model

.

The following results were obtained:

rho / chisq / p
White / 0.0141 / 0.179 / 0.672
Black / 0.02 / 0.36 / 0.5483
Poverty / -0.0131 / 0.161 / 0.6887
Smoke / 0.0139 / 0.173 / 0.6776
Education / 0.0818 / 6.084 / 0.0136
GLOBAL / NA / 8.897 / 0.1132

Hence, one might be concerned about the proportionality of the hazards across levels of education. One solution is to use a time-dependent covariate in the model, such as

.

The two limitations of this approach are:

  1. A functional form of the interaction with time must be specified, and
  2. Difficult to carry out in S-PLUS.

A second approach is to allow the baseline hazard to vary across levels of the covariate. That is, employ a stratified Cox regression model.

Stratified Cox Regression

Suppose that we wish to allow the baseline hazard function to vary across M levels of a covariate, such that

where m = 1,...,M indexes the strata. This model assumes that the hazards are proportional within the mth stratum:

.

However, the hazards need not be proportional between strata because of the allowance for different baseline hazards. Note that the regression parameters  are the same across strata. In other words, the covariates in the model have the same multiplicative effect regardless of the baseline hazard.

Breast-Feeding Example:

The education variable is summarized in Table 1.

Table 1. Cross-classification of education and event.

Event / Years of Education
3 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 15 / 16 / 17 / 18 / 19
0 / 0 / 0 / 0 / 0 / 0 / 0 / 1 / 13 / 5 / 5 / 1 / 8 / 0 / 1 / 1
1 / 1 / 4 / 6 / 18 / 37 / 66 / 87 / 425 / 83 / 71 / 25 / 58 / 5 / 4 / 2

The results of fitting a Cox regression model stratified by the 15 levels of education are

Covariate / Unstratified / Stratified
Coefficient / p / Coefficient / p
White / -0.3057 / 0.0016 / -0.319 / 0.0014
Black / -0.1274 / 0.3200 / -0.153 / 0.2400
Poverty / -0.2095 / 0.0230 / -0.141 / 0.1400
Smoke / 0.2644 / 0.0007 / 0.249 / 0.0019
Education / -0.0373 / 0.0510 / - / -

Notes

  1. Stratification is attractive because the effect of the covariate need not be modeled as a function of time.
  2. Stratification is a way of controlling for the main effects of a covariate.
  3. A drawback is that one cannot estimate the effect of the stratification variable on survival.
  4. When stratification is employed, the tests of hypotheses for the regression coefficients will have good power only if the deviations from the null are the same in all strata.
  5. The tests of the regression coefficients are appropriate when either the number of failures within strata is large or the number of strata are large.
  6. If there exist strata in which no events are observed, then a loss of power will result. Consequently, continuous variables should be categorized if they are to be used as stratification variables in a Cox model.

Likelihood Estimation of the Regression Parameters

In Chapter 6 the likelihood function

was introduced. Recall, that the estimates obtained from a Cox regression analysis are those values of that maximize the likelihood function. Earlier in the course we used the log-likelihood function to construct the likelihood ratio and AIC statistics. We now take a more detailed look at the likelihood function in order to better understand the estimation of our model parameters.

Suppose that we have the traditional Cox model

and our interest is in estimating the parameters. The likelihood function can be written as

where are the distinct failure times and is the set of subjects at risk at time . Thus, the data contribute to the estimation of the regression parameters only at the failure times. Computational algorithms are employed to find the values of that maximize this likelihood function.

Likelihood Function in the Stratified Model

In the stratified Cox model, the parameters are estimated by maximizing

where the are the likelihoods within each stratum. The individual likelihoods are thus constructed from the distinct failure times and subjects at risk within the stratum.

What happens if there are no failures within a given stratum?

1