APPENDIX A

Summary of articles on imprortant aspects of propensity score analysis

First, a search term “propensity score” OR “propensity scores” in the title or abstract or key terms were used in Scopus. Methodological studies which have at least 80 citations in Scopus have been extracted irrespective of subject area. In addition, with in the above search results, additional search terms “variable selection” or “covariate balance” or “diagnostics” or “goodness-of-fit tests” or “C-statistics” in the title, abstract or key terms were used to filter articles that contributed on variable selection and balance assessment in PS analysis. For the latter group, only methodological articles which have a minimum of five citations in Scopus were selected and included in reference tree below. The search was performed on the 15th Dec 2013.

Figure 1. Summary of articles on imprortant aspects of propensity score analysis (List of references below)

List of Refrences

[1] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70:41-55.

[2] Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984; 79:516-524.

[3] Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score.Am Statistician 1985; 39:33–38.

[4] Rosenbaum PR, Rubin DB .The bias due to incomplete matching. Biometrics 1985; 41:106-116.

[5] Drake C. Effects of misspecification of the propensity score on estimators of treatment effect.Biometrics1993; 49:1231–1236.

[6] Rosenbaum PR. Observational Studies. New York: Springer; 1995.

[7] Rubin DR, Thomas N. Matching using estimated propensity score: relating theory to practice.Biometrics 1996; 52:249–264.

[8] Rubin DB. Estimating causal effects from large data sets using the propensity score.Ann Intern Med1997; 127:757–763.

[9] D’Agostino RB, Jr . Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group.Stat Med 1998; 17:2265–2281.

[10] Joffe MM, Rosenbaum PR. Invited commentary: Propensity scores. Am J Epidemiol 1999; 150: 327-333.

[11] Dehejia RH , Wahba S. Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs. J Am Stat Assoc 1999; 94: 1053-1062.

[12] Perkins SM, Tu W, Underhill MG, Zhou XH, Murray MD. The use of propensity scores in pharmacoepidemiologic research. Pharmacoepidemiol Drug Saf 2000; 9:93-101.

[13] Imbens GW. The role of the propensity score in estimating dose-response functions.() Biometrika 2000; 87:706-710.

[14] Robins JM, Hernán MÁ, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11:550-560.

[15] Hernán MÁ, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000; 11:561-570.

[16] Little, RJ , Rubin, DB. Causal effects in clinical and epidemiological studies via potential outcomes: Concepts and analytical approaches. Ann Rev Pub Healt 2000; 21:121-145.

[17] Rubin DB. Using propensity scores to help design observational studies: application to the tobacco litigation. Health Serv and Out Res Method 2001; 2:169–188.

[18] Hirano K, Imbens GW. Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Serv and Out Res Method 2001; 2:259-278.

[19] Dehejia RH, Wahba S. Propensity score-matching methods for non-experimental causal studies. Rev Econ and Stat 2001; 84:151-161.

[20] Braitman LE, Rosenbaum PR. Rare outcomes, common treatments: Analytic strategies using propensity scores. Ann Int Med 2001; 137: 693-695.

[21] Cepeda MS, Boston R, Farrar JT, Strom BL. Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. Am J Epidemiol 2003; 158:280-287.

[22] Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Principles for modeling propensity scores in medical research: a systematic literature review 2004; 13:841-853.

[23] Klungel OH , Martens EP, Psaty BM, Grobbee DE, Sullivan SD, Stricker BHC Leufkens, HGM, De Boer A. Methods to assess intended effects of drug treatment in observational studies are reviewed (Review). J Clin Epidemiol 2004; 54:1223-1231.

[24] Imai K, Van Dyk DA. Causal inference with general treatment regimes: Generalizing the propensity score . J Am Stat Assoc 2004; 99:854-866.

[25] Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat Med 2004; 23: 2937-2960.

[26] McCaffrey DF, Ridgeway G, Morral AR. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 2004; 9: 403-425.

[27] Rubin DB. On principles for modeling propensity scores in medical research. Pharmacoepidemiol Drug Saf 2004;13:855–857.

[28] Weitzen S, Lapane KL, Toledano AY, Hume AL, Mor V. Weaknesses of goodness-of-fit tests for evaluating propensity score models: the case of omitted confounders. Pharmacoepidemiol Drug Saf 2005; 14:227–238.

[29] Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. J Clin Epidemiol 2005; 58:550-559.

[30] Luellen JK , Shadish WR, Clark MH. Propensity scores: An introduction and experimental test (Review). Eval Rev 2005; 29:530-558.

[31] Stürmer T, Schneeweiss S, Avorn J, Glynn RJ. Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. Am J Epidemiol 2005; 162:279-289.

[32] .Stürmer T, Schneeweiss S, Brookhart MA, Rothman KJ, Avorn J, Glynn RJ. Analytic strategies to adjust confounding using exposure propensity scores and disease risk scores: Non-steroidal anti-inflammatory drugs and short-term mortality in the elderly. Am J Epidemiol 2005; 161:891-898.

[33] Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol 2006; 163:1149.

[34] Stürmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. J Clin Epidemiol 2006; 59: 437-447.

[35] Glynn RJ, Schneeweiss S, Stürmer T. Indications for propensity scores and review of their use in pharmacoepidemiology. Basic Clin Pharmacol Toxicol 2006; 98:253-259.

[36] Austin, P.C., Mamdani, M.M.A comparison of propensity score methods: A case-study estimating the effectiveness of post-AMI statin use. Stat Med 2006; 25: 2084-2106.

[37] Kurth T, Walker AM, Glynn RJ, Chan KA, Gaziano JM, Berger K, Robins JM. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol 2006; 163: 262-270.

[38] Ho DE, Imai K, King G, Stuart EA. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 2007; 15:199-236.

[39] Rubin DB. The design versus the analysis of observational studies for causal effects: Parallels with the design or randomized trials. Stat Med 2007; 26:20–36.

[40] D'Agostino Jr RB. Propensity scores in cardiovascular research (Review). Circul 2007; 115: 2340-2343.

[41] Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study. Stat Med 2007; 26: 734-753.

[42] D'Agostino Jr RB, D'Agostino Sr RB. Estimating treatment effects using observational data. JAMA 2007; 297: 314-316.

[43] Austin PC. Goodness-of-fit diagnostics for the propensity score model when estimating treatment effects using covariate adjustment with the propensity score. Pharmacoepidemiol Drug Saf 2008; 17:1202-1217.

[44] Imai K, King G, Stuart E. Misunderstandings among experimentalists and observationalists: about causal inference. J the Royal Stat Soc, Series A 2008; 171(Part 2, Forthcoming):1–22.

[45] Martens EP, Pestman WR, de Boer A, Belitser SV, Klungel OH. Systematic differences in treatment effect estimates between propensity score methods and logistic regression . Int J Epidemiol 2008; 37:1142-1147.

[46] Austin PC. A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat Med 2008; 27: 2037-2049.

[47] Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiol 2009; 20:512-522.

[48] Rubin DB. Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups? Stat Med 2009; 28:1420-1423.

[49] Austin PC. The relative ability of different propensity score methods to balance measured covariates between treated and untreated subjects in observational studies. Medical Decision Making 2009; 29: 661-677.

[50] Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med 2009; 28: 3083-3107.

[51] Harder VS, Stuart EA, Anthony JC. Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol Methods 2010;15: 234-249.

[52] Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat Med 2010; 29:337-346.

[53] Stuart EA. Matching Methods for Causal Inference: A review and a look forward. Stat Science 2010; 25:1-21.

[54] Westreich D, Cole SR, Funk MJ, Brookhart MA, Stürmer T. The role of the c-statistic in variable selection for propensity score models. Pharmacoepidemiol Drug Saf 2011;20:317-320.

[55] Pearl J. Invited commentary: understanding bias amplification. Am J Epidemiol 2011; 174:1223-1227.

[56] Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multi Behav Res 2011;46:399-424.

[57] Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S, Rothman KJ, Joffe MM, Glynn RJ. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol 2011; 174:1213-1222.

[58] Rassen JA, Shelat AA, Myers J, Glynn RJ, Rothman KJ, Schneeweiss S. One-to-many propensity score matching in cohort studies Pharmacoepidemiol Drug Saf 2012; 21: 69-80.

[59] Belitser SV, Martens EP, Pestman WR, Groenwold RHH, Boer A, Klungel OH. Measuring balance and model selection in propensity score methods. Pharmacoepidemiol Drug Saf 2011; 20:1115-1129.

[60] Sekhon JS. Multivariate and propensity score matching software with automated balance optimization: The matching package for R. J Stat Soft 2011;42:1-52.

[61] Patrick AR, Schneeweiss S, Brookhart MA, Glynn RJ, Rothman KJ, Avorn J, Stürmer T. The implications of propensity score variable selection strategies in pharmacoepidemiology: an empirical illustration. Pharmacoepidemiol Drug Saf 2011; 20:551-559.

[62] Sekhon JS, Grieve RD. A matching method for improving covariate balance in cost-effectiveness analyses. Health Econ 2012;21:695-714.

[63] Rassen JA, Shelat AA, Myers J, Glynn RJ, Rothman KJ, Schneeweiss S. One‐to‐many propensity score matching in cohort studies. Pharmacoepidemiol Drug Saf 2012; 21:69-80.

[64] Vansteelandt S, Maarten B, Gerda C. On Model Selection and Model Misspecification in Causal Inference. Stat Methods Medic Res2012; 21:7–30.

APPENDIX B

R Codes for Balance Metrics

1. The standardized difference (SDif) is defined as the absolute difference in means (proportions) between the two treatment groups divided by an estimate of the common standard deviation of that variable in the two treatment groups, i.e. the pooled standard deviation

R Code to calculate absolute standardized difference

SDiff <- function(a,b,ldich=F){

if(ldich){ #For binary Covariate

if (length(table(c(a,b)))==1) d <- 0

if (length(table(c(a,b)))!=1){

pc <- mean(a)

pt <- mean(b)

d <- abs( (pt-pc) / sqrt((pt*(1-pt)+pc*(1-pc))/2))}

}

else { # for Continous Covariate

xc <- mean(a)

xt <- mean(b)

sc <- sd(a)

st <- sd(b)

d <- abs( (xt-xc) /sqrt((st^2 + sc^2)/2) )

}

d}

2. Calculating the OVL involves estimation of two density functions evaluated at the same x values and then calculation of the overlap. The function ovl needs two input vectors of observations on the covariable for both groups (var1 and var0).We used the R build-in function density using the normal density rule bandwidth.nrd. For calculation of the overlap, we used Simpson's rule on a grid of 101. A plot of the two densities and the overlap is optional (plot=T).

R Code to calculate OVL

ovl <- function(group0, group1, plot=F) {

wd1 <- bandwidth.nrd(group1)

wd0 <- bandwidth.nrd(group0)

from <- min(group1,group0) - 0.75 * mean(c(wd1,wd0))

to <- max(group1,group0) + 0.75 * mean(c(wd1,wd0))]]>

d1 <- density(group1, n = 101, width=wd1, from=from, to=to)

d0 <- density(group0, n = 101, width=wd0, from=from, to=to)

dmin <- pmin(d1$y,d0$y)

ovl <- ((d1$x[(n<-length(d1$x))]-d1$x[1])/(3*(n-1)))*

(4*sum(dmin[seq(2,n,by=2)])+2*sum(dmin[seq(3,n-1,by=2)])+dmin[1]+dmin[n])

if(plot){

maxy <- max(d0$y, d1$y)

minx <- min(d0$x)

plot(d1, type="l", lty=1, ylim=c(0, maxy), ylab="Density", xlab="")

lines(d0, lty=3)

lines(d1$x, dmin, type="h")

text(minx, maxy, " OVL =")

text(minx+0.085*(max(d1$x)-minx), maxy, round(ovl,3))

}

round(ovl,3)

}

# Example

treated <- rnorm(100,10,3)

untreated <- rnorm(100,15,5)

ovl(untreated, treated, plot=T)

3. The Kolmogorov–Smirnov Distance

R-Code to calculate the Kolmogorov-Smirnov distance with optional figure

# using function ‘ks2’ used within function ‘ks.gof’

ksdist <- function(group0, group1, plot=F){

n0 <- length(group0)

n1 <- length(group1)

total <- sort(unique(c(group0, group1)))

ma0 <- match(group0, total)

ma1 <- match(group1, total)

F0 <- cumsum(tabulate(ma0, length(total)))/n0

F1 <- cumsum(tabulate(ma1, length(total)))/n1

diff <- abs(F0-F1)

ks <- max(diff)

if(plot){

x.ks <- order(ks-diff)[1]

plot(F1, type="l", lty=1, ylab="Cumulative density",

xlab="")

lines(F0, lty=3)

lines(c(x.ks, x.ks), c(F0[x.ks],F1[x.ks]), lty=2)

text(0.08*(n0+n1), 1, "K-S distance =")

text(0.20*(n0+n1), 1, ks)

}

ks

}

# Example

treated <- rnorm(100,10,3)

untreated <- rnorm(100,15,5)

ksdist(untreated, treated, plot=T)

4. The Lévy distance can be calculated using the following two functions mecdf and Levy.

R Code to calculate the Levy distance

mecdf <- function(group0,group1) {

n0 <- length(group0)

n1 <- length(group1)

total <- sort(unique(c(group0, group1)))

ma0 <- match(group0, total)

ma1 <- match(group1, total)

F0 <- cumsum(tabulate(ma0, length(total)))/n0

F1 <- cumsum(tabulate(ma1, length(total)))/n1

min <- min(F1-F0)

max <- max(F1-F0)

m <- c(min,max)

return(m)

}

Levy <- function(u,v){ f <- function(s,u,v){

t <- mecdf(u,v-s)+s

return(t[1])

}

g <- function(s,u,v){

t <- mecdf(u,v+s)-s

return(t[2])

}

a <- min(c(u,v))

b <- max(c(u,v))

c <- b-a

z1 <- uniroot(f,low=c,up=c,tol=0.00000001,u=u,v=v)

z2 <- uniroot(g,low=c,up=c,tol=0.00000001,u=u,v=v)

z <- max(z1$root,z2$root)

return(z)

}

# Example

treated <- rnorm(100,10,3)

untreated <- rnorm(100,15,5)

mecdf(untreated,treated)

Levy(untreated,treated)

APPENDIX C

List of Articles Included in the Review

[1] Abdollah F, Sun M, Schmitges J, Thuret R, Tian Z, Shariat SF, et al. Competing-risks mortality after radiotherapy vs. observation for localized prostate cancer: a population-based study. International Journal of Radiation Oncology* Biology* Physics 2012; 84:95-103.

[2] Ad N, Henry L, Hunt S, Holmes SD. Do we increase the operative risk by adding the Cox Maze III procedure to aortic valve replacement and coronary artery bypass surgery? J Thorac Cardiovasc Surg 2012; 143:936-944.

[3] Ades L, Le Bras F, Sebert M, Kelaidi C, Lamy T, Dreyfus F, et al. Treatment with lenalidomide does not appear to increase the risk of progression in lower risk myelodysplastic syndromes with 5q deletion. A comparative analysis by the Groupe Francophone des Myelodysplasies. Haematologica 2012; 97:213-218.