Supplementary Appendix 1

Data included in the NMAs

Table A1 summarises the data included in the data included in the NMAs.

Table A1: Trials and data timepoints included in the NMAs

Trial / Treatment groups included in NMA / Trial type / Data timepoints included in induction NMAs / Data timepoints included in maintenance NMAs
ULTRA1[10] / (1) Placebo
(2) Adalimumab 160/80mg / Induction / Week 8 / n/a
ULTRA2[11] / (1) Placebo
(2) Adalimumab 160mg at week 0, 80mg at week 2 and then 40mg EOW beginning at week 4 / Induction and maintenance / Week 8 / Weeks 32 and 52
PURSUIT-SC[12] / (1) Placebo
(2) Golimumab 200/100mg / Induction / Week 6 / n/a
PURSUIT-Maintenance[13] / (1) Placebo
(2) Golimumab 50mg
(3) Golimumab 100mg / Maintenance (golimumab responders only) / n/a / Week 30 and 54
ACT1[9] / (1) Placebo
(2) Infliximab 5mg/kg / Induction and maintenance / Week 8 / Weeks 30 and 54
ACT2[9] / (1) Placebo
(2) Infliximab 5mg/kg / Induction and maintenance / Week 8 / Week 30
Suzuki et al.[14] / (1) Placebo
(2) Adalimumab 160/80mg / Induction and maintenance / Week 8 / Week 32 and 52

Details of the statistical models

Clinical response and remission can be considered as ordered categorical data with three mutually exclusive categories: (i) no response (ii) response and (iii) remission. The model for the data assumed that the treatment effect was the same irrespective of the category. Data available at 6 weeks and 8 weeks were combined, as were data available at 30 weeks and 32 weeks, and 52 weeks and 54 weeks. The likelihood function for the data is described as follows.

Let represent the number of patients in arm of trial in the mutually exclusive category . The responses will follow a multinomial distribution such that:

The parameters in the model are the probabilities, , that a patient in arm of trial has a response equivalent to category .We used a probit link function to map the probabilities, onto the real line such that:

so that:

In this model, the effect of treatment was to change the probit score of the control arm by standard deviations.

The study-specific treatment effects, , were assumed to arise from a common population distribution with mean treatment effect relative to the reference treatment, which in this analysis was placebo, such that:

We further assumed that there is an underlying continuous latent variable which has been categorised by specifying cut-offs, , which corresponds to the point at which an individual moves from one category to the next in trial . The model is re-written as:

The can be treated as fixed, which would assume that these points are the same in each trial and each treatment. Alternatively, they can be treated as random in which they are assumed to vary according to the trial but that within a trial they are the same such that:

We used a model in which the were treated as being random because this resulted in a better fit of the model to the data.

The model was completed by giving the parameters prior distributions. When there are sufficient sample data, we can use conventional reference prior distributions and these will have little influence on the posterior results.

The reference prior distributions used in the analyses were:

  • Trial-specific baselines, 0, 1000)
  • Treatment effects relative to reference treatment,
  • Between study standard deviation of treatment effects,
  • Population cut-offs, ,
  • Between study standard deviation of cut-offs,

In both the induction and maintenance phases, there were relatively few studies to allow Bayesian updating of the implausibly vague prior distribution for the between-study standard deviation. Without Bayesian updating, a reference prior distribution that does not represent genuine prior belief will have a significant impact on the results and give posterior distributions that are unlikely to represent genuine posterior beliefs. To allow for this, we used a weakly informative prior distribution (a half normal distribution) for the between study standard deviation such that

To estimate the absolute probabilities of being in each category for each treatment, we combined the treatment effects with an estimate of the placebo “No response” category (baseline model). We used a binomial likelihood function for the number of patients, in each study who were classified as having “no response” when treated with placebo for the baseline model such that:

.

We used a probit link function such that:

.

We assumed that the study-specific baselines arose from population of effects such that:

.

The model was completed by giving the parameters prior distributions such that:

Again, in both induction and maintenance phase there were relatively few studies providing data so a weakly informative prior distribution was used for the between-study standard deviation such that:

.

For the baseline and relative treatment effects models, we used a burn-in of 50,000 iterations of the Markov chain and retained a further 10,000 iterations to estimate parameters. In addition, the NMAs exhibited moderate correlation between successive iterations of the Markov chains so the chains were thinned by retaining every 10th sample.