ONLINE SUPPLEMENTAL MATERIAL, Part 1

for

Clinical and functional outcomes after 2 years in the multisite Early Detection and Intervention for the Prevention of Psychosis effectiveness trial
(McFarlane et al., 2014)

by

Bruce Levin, Ph.D.1, Lori Travis, M.S.2, F. Lee Lucas, Ph.D.2, and
William R. McFarlane, M.D.3,2.

May 16, 2013, revised April 19, 2014

This document records some supplemental results and technical statistical notes on the analysis of the Early Detection and Intervention for the Prevention of Psychosis (EDIPPP) effectiveness trial. It supplements the above-named article by William McFarlane et al., submitted for publication to the American Journal of Psychiatry, and should be read in conjunction with that paper.

The supplement contains three Technical Notes:

T1—Technical Note on the EDIPPP Regression Discontinuity Analysis;

T2—Technical Note on the Global Test Procedure Used In EDIPPP; and

T3—Technical Note on the EDIPPP Site-Specific Adjusted Time to First Negative Event Curves.

1Mailman School of Public Health, Columbia University, New York, NY; 2Maine Medical Center Research Institute, Portland, ME;3Tufts University School of Medicine, Boston, MA

TECHNICAL NOTE ON THE EDIPPP REGRESSION DISCONTINUITY ANALYSIS

EDIPPP has an assured allocation design (Finkelstein et al., 1996a,b), more commonly known as a regression discontinuity (RD) design (Campbell and Stanley, 1966, Cook and Campbell, 1979, Kenny and Judd, 1981). Although we prefer the terms “assured allocation” or “risk-based allocation” because they more precisely describe the key design feature of the study and do not pre-suppose the method of statistical analysis, the statistical analysis plan for EDIPPP indeed specified a RD analysis. Specific details of the RD analysis are provided in this technical note.

In general, an assured allocation design requires an assignment variable that is measured for all eligible subjects at baseline together with a pre-specified threshold value on the assignment variable. In studies where that variable is a measure of risk of future deleterious outcomes, the term “risk-based allocation design” is appropriate. All subjects with a baseline measurement at or above the threshold value are assigned to the new or experimental treatment or intervention (hence are “assured” of receiving that intervention) while all subjects with a baseline measurement below the threshold value are assigned the standard or comparison treatment or intervention. There are many reasons studies would utilize this non-randomized design. Considering the ethics of randomizing young people at very high risk of developing psychosis with a one-half probability of receiving standard care, the designers of EDIPPP decided the benefit of an assured, risk-based allocation scheme outweighed the interpretive risks of foregoing a randomized design.

A risk-based allocation scheme clearly creates between-group differences by the mechanism of selection bias, so the analysis to estimate and test the treatment or intervention effect is decidedly not a simple group-by-group comparison of mean outcomes. Instead, a regression model is used to adjust for the group differences created by the allocation scheme, the most typical being the RD model. This is an analysis of covariance model containing the baseline assignmentvariable in addition to the intervention indicator variable(s) and other covariates necessary to appropriately reflect the study design (such as site indicators for multi-site studies with site-stratified allocation). Because all high-risk subjects are assigned to receive the experimental intervention and all subjects below the threshold are assigned to receive the comparison treatment, a graph of the estimated regression line plotted for each group only within the respective allocation interval appears as if it were a single line with a jump discontinuity at the threshold value. Other analytic approaches are discussed in Finkelstein et al. (1996a,b).

In psychiatric intervention studies it is common to measure scales at baseline and at one or more follow-up points in time. It might be helpful to clarify the role of the “regression phenomenon” that we expect to be manifest in change-scores, and distinguish it from the analysis used to address the issues raised by it. Use of a risk-based allocation scheme typically results in the experimental group having mean scores at baseline that are greater than the general population average and certainly above the pooled-group mean, while those in the comparison group typically will be less than the population average and certainly below the pooled-group mean. It follows that the average follow-up measurement in the experimental group can be expected to regress toward the overall mean and therefore manifest an apparent improvement in risk over time by the regression phenomenon, even in the absence of any true intervention effect. However, the average will notregress all the way to the overall mean (except for the extreme case in which the follow-up measurement is uncorrelated with the baseline measurement). Similarly, the comparison group will regress toward the overall mean but not all the way, and so there will be a residual difference between the two group follow-up means. The same is expected for mean change-scores. This is precisely why a simple comparison of change-scores between groups gives a biased estimate of the intervention effect.

Regression adjustment is required to remove the bias. If the RD model is correctly specified, then upon removing the baseline differences caused by the allocation scheme, the adjusted group differences provide a valid estimate of intervention effect. The caveat is critical: model misspecification can produce biased estimates in this context as in any analysis. In particular, the assumptions of linearity and parallelism of the regression surfaces are crucial. In an important sense, the average treatment effect estimated by the RD analysis among those at higher risk is the difference between the average change-score actually observed for those subjects and the predicted mean that those same subjects would have exhibited had they been (counterfactually) assigned the standard treatment. Because no high-risk subjects were actually assigned to the standard treatment, their counterfactual predicted mean represents an extrapolation of the results from the comparison group into the domain of higher risk. If there are unobserved departures from linearity or parallelism between the two regression surfaces, bias may result.

Three common misconceptions should be dispelled. First it is often said that an assured-allocation design is by default an uncontrolled observational study. This is false; just as in a randomized study, the assignment mechanism is known. In particular, there is no need to model the assignment probabilities as is necessary in a propensity-score approach to remove bias in ordinary observational studies. Nor is this a “historical-control” study because there is a concurrent comparison group whose actual data are used in the model-fitting.

Second, because the groups are not balanced at baseline, it is thought that unmeasured confounders can produce bias. Note that the regression adjustment model is specifically charged with estimating mean outcomes averaged over all confounding factors. When comparing the observed outcomes of the high-risk group to the projected outcomes of the exact same subjects if counterfactually they had been given the standard treatment, these subjects would have had the exact same unmeasured confounding factors. Therefore, the extrapolation model must properly reflect the influence of the variation in unmeasured confounders as the allocation variable ranges from low to high risk. This is a part of the assumption of proper model specification.

Third, it is often thought that for the regression-discontinuity analysis to be efficient there must be a strong correlation between the baseline allocation variable and outcome measure. This is also untrue; even if there were zero correlation, the mean adjusted change-scores would estimate the true intervention effect without bias.

The EDIPPP study presented several other features that were incorporated into the analysis. First, the variable used for the risk-based allocation (the P-scale defined below) was the measure pre-specified for the primary analysis. However, nine other outcome measures were also of interest. The ten outcomes comprised four measures of psychosis-related symptoms from the Symptoms of the Prodromal Syndrome (SOPS) scale (sum of items on subscales P, N, D, and G for, respectively, positive, negative, disorganized, and general symptoms) from the Structured Interview for the Prodromal Syndrome (SIPS) and six measures of functioning: the Global Assessment of Functioning (GAF) Scale; the Global Functioning: Social and Role Scales (GF:R and GF:S); and the Heinrich Quality of Life Scale (QLS) Instrumental, Interpersonal, and Intrapsychic subscales. A RD model was required for each of these outcomes. In these models, in addition to the baseline assignment variable, the baseline measurement of the dependent variable (other than the P-scale) was included. Thus the regression coefficient for an intervention group indicator variable can be interpreted both as an adjusted difference in means between groups at 24-months and as an adjusted difference in change-scores between 24 months and baseline.

An important test of the validity of the regression discontinuity model was whether or not baseline differences between groups in the nine measuresother than the P-scale could be removed by a mock regression-discontinuity analysis using the baseline measure as the dependent variable. The results were quite encouraging—this procedure successfully removed between-groupdifferences in baseline scores on all but one of the key outcome variables. See McFarlane et al. (2012).

Second, EDIPPP recruited subjects from six urban sites around the U.S. in the states of Maine; New York; Michigan; Oregon; California; and New Mexico. As an effectiveness trial, the EDIPPP sample was designed to reflect a geographically diverse population with whatever risk factors and practice variations that might entail. Sites were envisioned from the beginning to be an important factor. Therefore all models include a classification variable for site; Maine (site 1) is the reference site. Site-by-intervention effects were not significant and were not included in the final analyses.

Third, in addition to the 24-month outcomes, intermediate assessments were taken at 6 months and 12 months of follow-up. The pre-specified analysis plan called for a mixed-model regression analysis allowing for all post-baseline longitudinal measurements to be included, in an effort to partially offset anticipated losses to follow-up. The regression models therefore included a classification variable for visit number (at 6 and 12 months, with the primary 24 month measure as reference visit) together with visit-by-intervention group interactions. This allowed an arbitrary and possibly non-linear time-course for each endpoint as well as an arbitrary and possibly non-linear change in treatment effect over time. In addition, the crucial coefficient of the allocation variable could be estimated with as many data points per subject, because the models did not include interactions between visit number and the baseline P-score. That is, a single coefficient for the baseline allocation variable was estimated using all longitudinal measurements, reflecting an assumption that the regression of an outcome measure on the allocation variable would be the same irrespective of follow-up time. Finally, subjects were considered random effects in the mixed model analysis, i.e., each subject was assumed to have a random baseline intercept. Thus while each subject could have an arbitrary starting point, the group mean differences at 6, 12, and 24 months were assumed to be constants (with arbitrary differences among them). The error covariance structure in the SAS Proc Mixed implementation was type arh(1).

Fourth, we used two group (treatment) variables for subjects assigned to the experimental intervention group, the first indicating those at clinically higher risk (CHR, n=205 at baseline) and the second for the early first episode of psychosis subgroup (EFEP, n=45); the reference comparison group was at clinically lower risk (CLR, n=87). This was necessitated by the clearly distinct regression lines for the two experimental subgroups—see Figure T1-1 below. The improvement in goodness-of-fit was statistically significant (P<0.0001). Though not pre-specified in the analysis plan, the difference was considered a real possibility in the planning stage of the study. A single regression line for all subjects in the experimental group is clearly an incorrectly specified adjustment model.

Finally, the analyses assumed that subjects who missed visits were missing at random given the observed adjustment variables in the models. Sensitivity analyses to check on the impact of this assumption on the conclusions will be the subject of a separate report.

Appendix T1-1 provides complete SAS output for the P-score and N-score endpoint regression-discontinuity models. In addition, a reviewer asked for additional adjustment of the primary analysis model for age, gender, race, and family income. As explained above, such additional adjustments are not necessary for a valid assessment of the intervention effects under the key assumption that the mathematical form of the adjustment model applies throughout the range of the sum P-score allocation variable. Nevertheless, to reassure readers that the primary results are not sensitive to such additional adjustments, we include the output from two further-adjusted models in Appendix T1-1. The results, summarized in Table T1-1 below, show that when entering age and gender, the intervention effects were only strengthened, while the covariates coefficients themselves were insignificant. Race and family income had similar results, but since the sample size was reduced because of missing race and income data, the p-values for treatment effect were slightly reduced. In any case, the results of the analysis were unchanged: the effect of treatment was highly significant, with or without these variables included.

TABLE T1-1
SUMMARY OF MODELS ADJUSTING FOR ADDITIONAL DEMOGRAPHIC VARIABLES

Model*

/

Effect

/

t

/

p-value

Original (n=337)
CHR vs CLR / -2.5440 / -2.95 / 0.0034
EFEP vs CLR / -8.7684 / -6.26 / <.0001
Plus age and gender (n=337)
CHR vs CLR / -2.6897 / -3.09 / 0.0022
EFEP vs CLR / -8.9176 / -6.34 / <.0001
Age / -0.0939 / -1.37 / 0.1729
Female / 0.2966 / 0.66 / 0.5080
Plus African-American and Income (n=294)
CHR vs CLR / -2.6800 / -2.88 / 0.0044
EFEP vs CLR / -8.8148 / -5.84 / <.0001
Age / -0.1546 / -2.12 / 0.0348
Female / 0.0843 / 0.17 / 0.8616
African-American / 0.6727 / 0.83 / 0.4053
Income$50K / -0.4506 / -0.94 / 0.3459

* Original primary analysis model included terms for visit indicators, site, visit by intervention group interactions, and baseline sum-P score. See Appendix T1-1 for complete output.

FIGURE T1-1—REGRESSION LINES FOR THE CLR, CHR, AND EFEP SUBGROUPS

Abbreviations:

CLR = Clinically Lower Risk
CHR = Clinically Higher Risk
EFEP= Early First Episode Psychosis.

CLR vs. CHR (small blue arrow), p = 0.0034.

CLR vs. EFEP (large red arrow), p < 0.0001.

Green dashed line extending from solid CLR line represents expected regression outcome in the range of CHR and EFEP baseline values, based on CLR values.

Regression lines are plotted through group averages with parallel slopes estimated from the primary analysis model.

Vertical arrow lengths closely approximate effect sizes from the primary analysis which adjusted additionally for site, visit number, and interactions between intervention group and visit number. See Appendix T1-1 below.

References

Campbell, DT and Stanley, JC (1966). Experimental and Quasi-experimental Designs for Research. Chicago: Rand McNally.

Cook, T. D., & Campbell, D. T (1979). Quasi-experimentation: Design and Analysis Issues for Field Settings. Chicago: Rand McNally.

Finkelstein, MO, Levin, B, and Robbins, H (1996). Clinical and Prophylactic Trials with Assured New Treatment for Those at Greater Risk. Part I –– Introduction. American Journal of Public Health 86:691–695.

Finkelstein, MO, Levin, B, and Robbins, H (1996). Clinical and Prophylactic Trials with Assured New Treatment for Those at Greater Risk. Part II –– Examples. American Journal of Public Health 86:696–705.

Kenny, DA and Judd, C (1981). Estimating the Effects of Social Interventions. New York:Cambridge University Press.

McFarlane WR, Cook WL, Downing D, Ruff A, Lynch S, Adelsheim S, Calkins R, Carter CS, Cornblatt B, and Milner K (2012). Early Detection, Intervention, and Prevention of Psychosis Program:Rationale, Design, and Sample Description. Adolescent Psychiatry 2:112-124.

Trochim, WMK (1984). Research Design for Program Evaluation: the Regression-discontinuity Approach. Beverly Hills: Sage Publications.

– T1-1–

APPENDIX T1-1
SAS OUTPUT FOR REGRESSION DISCONTINUITY MODELS

The following SAS code template was used for the P and N score regression models. Similar code was used for the other eight outcome measures, with the addition of the baseline value of the measure.

proc sort data= reg_disc ;

by visit descending site_id ; run ;

/*NOTE: adding the descending option was and based on the preferred reference category*/

proc mixed data= reg_disc(where= (time ne 1))

cl= wald

order= data covtest ;

class id visit group site_id ;

model score_p= visit group site_id group*visit

baseline_p

/ solution ddfm= kr COVB

random intercept / subject= ID gcorr ;

repeated visit / subject= ID type= arh(1) rcorr ;

run ;

proc mixed data= reg_disc(where= (time ne 1))

cl= wald

order= data covtest ;

class id visit group site_id ;

model score_n= visit group site_id group*visit

baseline_n baseline_p

/ solution ddfm= kr COVB

random intercept / subject= ID gcorr ;

repeated visit / subject= ID type= arh(1) rcorr ;

run ;

/*NOTE: data contained rows with data at time 1 (baseline) which must be excluded from the analysis*/

Your feedback and questions are welcome. Contact the SAS code author:

Lori Travis

Maine Health

Portland, Maine

207-662-5626

MIXED MODEL Positive (P)

Dimensions

Covariance Parameters 5

Columns in X 23

Columns in Z Per Subject 1

Subjects 337

Max Obs Per Subject 3

Number of Observations

Number of Observations Read 1011