1

No common denominator: a review of outcome measures in IVF randomised controlled trials.

Running title: Review of outcome measures in IVF RCTs

Jack Wilkinson*ab, Stephen A Robertsa, Marian Showellc, Daniel R Brisond, Andy Vailab.

aCentre for Biostatistics, Institute of Population Health, Manchester Academic Health Science Centre (MAHSC), University of Manchester, Manchester M13 9PL, UK

bResearch & Development, Salford Royal NHS Foundation Trust, Salford, M6 8HD, UK

cCochrane Gynaecology and Fertility, The University of Auckland, Auckland City Hospital, Auckland 1142, NZ

dDepartment of Reproductive Medicine, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre (MAHSC), Manchester M13 9WL, UK

*email:

WORD COUNT (Main text)

39584604. OneTwo tables, two figures and 254 references.

ABSTRACT

Study question: How are Which outcome measures are reported in randomised controlled trials (RCTs) for in vitro fertilisation (IVF)?

Summary answer: Many combinations of numerator and denominator are in use, and are often employed in a manner that compromises the validity of the study.

What is known already: The choice of numerator and denominator governs the meaning, relevance and statistical integrity of a study’s results. RCTs only provide reliable evidence when outcomes are assessed in the cohort of randomised participants, rather than in the subgroup of patients who completed treatment.

Study design, size, duration: Review of outcome measures reported in 142 IVF RCTs published in 2013 or 2014.

Participants/materials, setting, methods: Trials were identified by searching the Cochrane Gynaecology and Fertility Specialised Register. Reported numerators and denominators were extracted. Where they were reported, we checked to see if live birth rates were calculated correctly using the entire randomised cohort or a later denominator.

Main results and the role of chance: Over 800 combinations of numerator and denominator were identified. No single outcome measure appeared in the majority of trials. Only 22 (43%) studies reporting live birth presented a calculation including all randomised participants or only excluding protocol violators. A variety of definitions were used for key clinical numerators.

Limitations, reasons for caution: Several of the included articles may have been secondary publications. Our categorisation scheme was essentially arbitrary, so the frequencies we present should be interpreted with this in mind. The analysis of live birth denominators was post-hoc.

Wider implications of the findings: There is massive inconsistency diversity in numerator and denominator selection in IVF trials due to its multistage nature, and this causes methodological frailty in the evidence base. The twin spectres of outcome reporting bias and analysis of non-randomised comparisons do not appear to be widely recognised. Initiatives to standardise outcome reporting are welcome, although there is a need to recognise that early outcomes of treatment may be appropriate choices of primary outcome for early phase studies.

Study funding/ competing interests: JW is funded by a Doctoral Research Fellowship from the National Institute for Health Research. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health. JW also declares that publishing research is beneficial to his career. JW and AV are statistical editors, and MS is Information Specialist, for the Cochrane Gynaecology and Fertility Group, although the views expressed here are not necessarily those of the group. DRB is funded by the NHS as Scientific Director of a clinical IVF service. The authors declare no other conflicts of interest.

KEY WORDS

In vitro fertilisation. Outcome measures. Assisted reproduction. Core outcomes. Live birth.

Introduction

Inconsistency and incompleteness of outcome reporting in infertility trials are barriers to understanding and improving treatments (Dapuzzo, et al., 2011, Legro, et al., 2014). In the absence of common standards of reporting, it may be difficult to compare the safety and effectiveness of competing interventions, or to synthesise the results of trials in meta-analysis (Blazeby, et al., 2012, Clarke and Williamson, 2016, Khan, 2014). The choice of outcome also has implications for both the relevance (Heijnen, et al., 2004, Legro, et al., 2014, Min, et al., 2004) and methodological validity (Griesinger, 2016, Vail and Gardener, 2003) of a trial’s results.

Choosing an outcome for trials of in vitro fertilisation (IVF) is particularly complex, owing to the multistage nature of the treatment. Treatment comprises stimulation of the ovaries, retrieval, and fertilisation and culture of oocytes and the culture and transfer of some of the resulting embryos back to the uterine cavityus (Van Voorhis, 2007). Some of these embryos may implant, some of these may result in a clinical pregnancy, and some of these may result in a live birth. Those embryos not used for the initial transfer may be cryopreserved, so that they can later be thawed and transferred in a subsequent attempt. The response at each stage can be quantified: ovarian response by the number and maturity of oocytes; fertilisation by the number of zygotes, and subsequently the number and quality of embryos produced; the transfer procedure by the implantation of embryos; and the clinical outcome of treatment by clinical pregnancy and the birth of a child. Additionally, treatment may fail at each stage: stimulation may be cancelled due to poorunder or overresponse, or all oocytes cryopreserved due to over response; fertilisation failure may occur; embryos may fail to develop, or post transfer fail to implant; and pregnancies may be lost before or subsequent to identification of a clinical pregnancy. One consequence of this for clinical trials of interventions designed to improve IVF is that numerous clinical and procedural events that occur during treatment can be reported. A second consequence is that these events may be reported in subgroups containing only those patients who reach a certain milestone, such as oocyte retrieval or embryo transfer. Further complexity arises due to the fact that IVF involves two or more individuals (for example a male and female partner), some of whom may undertake multiple treatment cycles, and one or more additional individuals (babies) arising from successful treatment (Legro, et al.,2014). When selecting which outcomes to report in an IVF trial therefore, many numerators and denominators are available (Heijnen et al., 2004).

The importance of the choice of numerator is well recognised and has been enshrined in the IMPRINT (Improving the Reporting of Clinical Trials of Infertility Treatments) statement with a call for live birth to be reported in all infertility trials (Legro, et al., 2014), although alternatives, such as ongoing pregnancy, have been proposed on pragmatic grounds (Braakhekke, et al., 2014a). The appropriate choice of denominator is a more subtle issue. The optimal denominator for IVF evaluation has been widely discussed (Abdalla, et al., 2010, Garrido, et al., 2011, Germond, et al., 2004, Heijnen, et al., 2004), and is known to have implications for the interpretation of trials, where the exclusion of randomised participants may introduce bias to the estimated treatment effect (Montori and Guyatt, 2001, Vail and Gardener, 2003, Mastenbroek, et al., 2005, Mastenbroek and Repping, 2014). This could occur if, for example, participants are randomised at the start of ovarian stimulation, but the outcome is calculated only in those who undergo transfer.

We conducted a review of outcomes reported in IVF randomised controlled trials in 2013 and 2014. Our aims were to establish the full range of outcomes in use in IVF randomised controlled trials (RCTs) and to identify the ramifications for the evidence base.

Materials and Methods

Search strategy

MS performed a search of the Cochrane Gynaecology and Fertility Group PROCITE database on 22/06/15 using the search strategy contained in Appendix 1. This is a specialised register of RCTs updated weekly by searching databases, conference abstracts and journals. Further details of the database are provided in Appendix 2. Our initial search covered the period 2010 to 2014, although we subsequently narrowed our focus to the period 2013 to 2014 due to feasibility constraints. We screened the titles and abstracts of the identified articles and excluded those not meeting the eligibility criteria. We reviewed the full text of all articles not excluded during this initial screening phase and made further exclusions as appropriate.

Eligibility criteria

English-language publications of randomised controlled trials in peer-reviewed journals in the period 1st January 2013 to 31st December 2014 were considered eligible. Conference papers were excluded. We did not consider methodological quality to be relevant, as our concerns related to the outcomes reported in this literature and not in the estimation of treatment effects. To be eligible, a study had to have had participants undergoing IVF or intracytoplasmic sperm injection (ICSI) including a period of ovarian stimulation in at least one arm of the trial, or participants undergoing frozen embryo transfer in at least one arm of the trial, or partners of patients undergoing IVF or ICSI in at least one arm of the trial, or oocyte donors donating to an IVF programme. We included trials where surplus oocytes had been obtained as part of IVF or ICSI treatment and an intervention was applied to these oocytes even if there was no intention to subsequently transfer any of the resulting embryos. Finally, the publication had to report clinical or preclinical outcomes to be eligible (which would exclude, for example, purely economic evaluations of interventions).

Data extraction

Initially, we performed a small pilot extraction of 5 reports to inform the extraction process used in the full sample, including the variables to be extracted and the formatting of this information. We extracted information at both study-level and at the level of each reported outcome in a study. We defined an outcome as any post-randomisation variable presented separately for each arm in the study or as a comparison between study arms and recorded both the numerator and denominator used in the calculation. We did not record a reported outcome multiple times if it was presented for each of several subgroups, unless these were defined by excluding patients who did not reach a certain stage in the process. We also did not record outcomes multiple times where these corresponded to repeated measurements at several timepoints. At the study-level, we extracted details of the intervention and the stage in the treatment process at which the intervention was applied (pre-stimulation phase, stimulation phase, post-stimulation including culture and selection of embryos, transfer, frozen transfer or intervention targeted at the male partner, such as manipulation or selection of sperm prior to ICSI). Similarly, we extracted the stage of treatment at which randomisation took place. For each reported outcome, we extracted the numerator and denominator (for numerical variables, the denominator would be the divisor used in the calculation of a mean). Where pregnancy or live birth were reported, we extracted the corresponding definition used by the study authors. Data were extracted into two databases, one containing study-level information and another containing reported-outcome-level information. JW performed data extraction for all studies. SR and AV performed double extraction for a random sample of 10%, to check data quality and consistency of recording. Furthermore, we conducted extensive data validation and cleaning, including manually checking every entered item.

Statistical analysis

We summarised the characteristics of the sample and tabulated the numerators and denominators in use in 9 categories (live birth, pregnancy, stimulation response, transfer, fertilisation, multiple births or pregnancies, other preclinical outcomes, adverse events, postnatal). These categories are arbitrary and have been selected to facilitate the presentation of our results. We note here however that, since our analyses are descriptive and these categories are purely presentational, it would not affect our results were an outcome measure to be reported under one heading rather than another. Due to the large number of outcomes identified, we reported only those appearing in more than one study. We simplified the results by combining similar numerators and denominators. For example, we combined live birth with take home baby rate, and combined the denominators ‘per patient with sufficient embryos’ and ‘per patient with sufficient blastocysts’, where ‘sufficiency’ could be defined on the basis of quantity or quality of embryos (or both). For this primary analysis, wWe did not distinguish between subtly different definitions of outcomes (for example, clinical pregnancy may have been defined as foetal heartbeat on ultrasound at different timepoints in different studies). However, at the suggestion of an anonymous peer reviewer, we also present the definitions used by trial authors for pregnancy and live birth outcomes. In order to investigate the methodological implications of denominator selection, we conducted post-hoc analysis in the subgroup of studies reporting live birth. We recorded whether the denominator used coincided with the cohort of randomised participants (ignoring exclusions due to protocol violations) and if not, the nature and extent of the exclusion. We did not perform statistical inference, because we have attempted to summarise all trials within the time period and it isn’t clear that inference would be meaningful.

Sample size

The decision to include all studies in the period 01/01/13 to 31/12/14 was made primarily on pragmatic grounds, on the basis that this would be sufficient to assess current practices in outcome reporting while proving to be feasible. A post-hoc calculation can be made however. A sample of size 142 yields a 76% probability of observing a relatively rare outcome (appearing in 1 out of every 100 studies) at least once.

Ethical approval

Ethical approval was not required as the study involved only the review of published research.

Results

Results of the search

Figure 1 shows the results of the search and screening process. The search identified 640 references published between 2013 and 2014. Following title and abstract screening, 488 references were discarded without further assessment. The remaining 152 articles were assessed further by reviewing the full texts and a further 10 were excluded for the reasons shown in Figure 1. 142 RCTs were included in the analysis. Agreement between raters was almost universal, with one reviewer erroneously extracting one additional outcome from one study due to misreading the text.

Stage of intervention and randomisation

Interventions were delivered prior to ovarian stimulation in 20 (14%) articles, during stimulation in 51 (3641%), post stimulation or during culture of embryos in 31 (22%), post culture but preceding transfer of embryos in 19 (13%) and following the transfer procedure in 3 (2%). Five (4%) were trials of interventions targeted at male partners and 13 (9%) featured interventions designed to improve outcomes after the vitrification and warming of oocytes or embryos. Randomisation occurred prior to stimulation in 62 (44%) articles, during stimulation in 17 (12%), post stimulation or during culture in 27 (19%) and post culture but prior to transfer in 23 (16%). The point of randomisation was unclear in 13 (9%) articles.