Notes on Sample Selection

Notes on Sample Selection

Comparison to Hershberger et al

Differences in sample selection.

2003 sample selection. The primary focus of the 2003 study was an as-treated, or “complete sessions sample” comparison. Results of intent-to-treat analyses were presented in the text. Secondary analyses in the 2003 paperexamined the impact of the intervention on subcategories of participants. Our analyses were intent-to-treat, and we did not replicate the subcategory analyses. Therefore, our results should be compared to the full sample intent-to-treat results presented in the text of Hershberger et al. Based on their Figure 1, the intent-to-treat analyses involved comparing participants with follow-ups in the control or VCT arm of the study (N=516; identified as “Standard Intervention” in the figure) to those in the enhanced Safety Countscondition (N=506, identified as “Enhanced Intervention”). It was not required that all eligible participants complete both sessions of the VCT. There were1022 participants followed from a potential 1362 who attended at least one VCT session.

We attempted to duplicate the complete sessions analyses of Hershberger et al, but were unable to replicate their sample. Only those who completed their assigned protocols(defined as two VCT sessions for controls, and two VCT plus 7 additional sessions for those in the enhanced intervention) and also completed the follow-up assessment were included in that sample. Individuals who did not meet these criteria were dropped from the analysis; i.e., non-compliers were eliminated. We were able to identify from Figure 1 the number of participants who completed their allocated intervention (614/675 controls, 225/687 enhanced), and we know how many participants provided follow-ups (516/675 controls, and 506/687 enhanced). We do not know how many of those completing their assigned protocol had follow-up assessments. In trying to reconstruct this sample from the data, we arrived at 710 participants. Table II of the Hershberger et al paper indicates n=726; Table III indicates n=768. We could not identify the source of these discrepancies.

2009 sample selection. Our intent-to-treat analyses are based upon 1237 individuals who completed both sessions of the standard VCT. In the 2009 analyses, completion of both VCT sessions was a criterion for eligibility. The 2003 sample included some participants who had only received one session of VCT; the 1362 participants randomized in Figure 1 included those who did not attend the second VCT session.

We eliminated121 individuals from the study who reported being in jail 5 or more days out of the past 30 at either the baseline or follow-up assessment. Both baseline and follow-up assessments were excluded for individuals who reported incarceration at either time point. We were interested in the impact of the intervention on drug and sexual risk behaviors. Incarceration imposes limits on individuals’ autonomy in engaging in these behaviors. Substance use measures during periods of incarceration might be artificially low; sexual risk-taking might be artificially high if condoms are unavailable. In addition, those who were jailed may have been exposed to substance abuse treatment. Eligibility criteria included being out of treatment; we considered requiring that individuals be out of jail an extension of that criterion. This exclusion is based on incarceration reported at either baseline or follow-up. The breakdown of individuals excluded is even between the control and enhanced arms of the study, and background characteristics of those excluded did not differ by study arm.

With these exclusions, the 2009 sample was of 1116 participants, 875 of whom completed follow-up assessments. In the longitudinal models, baseline observations are included for the full sample of 1116 even though not all received follow-up assessments. Including these participants helps us by accurately estimating risks at baseline, and by adjusting estimates of the population mean at follow-up to be for the 1116 rather than the 875, decreasing the potential for bias.

Because there were significant differences between completers (who had both baseline and follow-up) and dropouts (who had only the baseline assessment), it is important to include the dropouts in the analysis. Fortunately, dropouts did not differ significantly by study condition and this does mitigate some bias in both our analysis and in two-way ANOVA or paired t-test type analyses. Still, there may be small amounts of bias in the dropouts, just not enough to rise to the level of significance.In these cases, it is desirable to include all participants in the analyses. We may then control for any selection bias in who was lost to follow-up. We adjust for the entire set of observed data values, while the other approaches are only satisfactory if the bias in the two groups cancels out exactly. Since our longitudinal models are estimated using data from both completers and dropouts, we indicate that we have a sample of 1116, for whom we provide descriptive statistics in Table II; we indicate that 875 of them provide timely follow-ups.

Differences in the definition of outcome measures, selection and application of statistical methods.

Since we could not match the complete sessions sample, for which descriptive statistics are provided in the 2003 paper, it was not possible to match those statistics, and thus check the comparability of variable definitions. It is possible that some differences could arise from use of different criteria. For example, in the Risk Behavior Assessment (RBA) and Risk Behavior Follow-up Assessment (RBFA), information about using unclean needles/syringes comes from a series of questions ascertaining the number of injections, number of times dirty works had been used, number of times they were used without cleaning, cleaned with tap water, cleaned with bleach and water, cleaned with alcohol and water, cleaned with boiling water, or cleaned some other way. In calculating number of times used dirty works, we defined all injections with used works as dirty unless they had been cleaned with bleach; the 2003 paper may have defined this differently. However, we do not know if there are differences, as there is not sufficient detail in the Hershberger et al paper.

Information about unprotected sex in theRBA and RBFAcomes from a series of questions about 28 specific sex acts-by-gender-combinations. We looked only at condom use associated with 12 sex acts/gender combinations involving penile-vaginal or penile-anal contact. We are unsure how the 2003 paper defined these measures.

Table 1 summarizes the definition of outcomes in the 2003 and the 2009 articles. The 2009 article does not look at drug treatment, use of barriers, multiple partners, exchanging sex, or sex with IDU. The 2009 article includes urinalysis results, times using alcohol or marijuana, and times using other drugs. Where there was overlap, we used different specifications of outcome measures and different statistical methods. Stopping drug use was an important outcome measure. We performed logistic regressions on the subsets of observations who were using each substance. This is in contrast to the repeated measures log-linear models used in the 2003 article. Both papers, using different analytic strategies, found significant intervention effects for stopping injection use and did not find significant intervention effects for crack use.

The 2009 paper uses counts of risky behaviors (e.g., times injected with dirty works, number of unprotected sex acts) rather than percents (percent of injections with dirty works, percent of times used a condom). The count measures more accurately capture the individual’s true risk. In the case of condom use, for example, percent data do not reflect the frequency of sexual activity. Person A may have sex twice and use a condom once (50% use, one unprotected act). Person B may have sex 60 times and use a condom 30 times (50% use, 30 unprotected acts). Person B is clearly at greater risk, which is captured by the count measure but not by the percent measure.

The 2009 paper did not assume a Poisson distribution for count measures. Distributions of all variables were plotted, and determination of the appropriate models was based upon these plots. All count measures were found to have a Poisson distribution.

Even if one were to use percent measures, we disagree with the choice of two-way ANOVA, which assumes normality. The percent measures have a U-shaped distribution, with peaks near zero and one, and are far from normally distributed. We also disagree with the choice of two-way ANOVA for the analysis of times injected, as was done in the 2003 paper. As mentioned, we plotted this variable and its distribution followed a classic Poisson shape.

Recognizing the appropriate underlying distribution can have a profound effect on the significance of estimates. The impact of varying distributional assumptions was evaluated by conducting a sensitivity analysis, using the intent-to-treat sample that we believe was used in the 2003 paper (n=1362, of whom 1022 have follow-ups), and the times-injected outcome measure. When we performed a two-way ANOVA (SAS PROC GLM) using the baseline and follow-up data for the 1022 completers, the intervention effect was not significant (p=0.44). When we performed a longitudinal random effects model assuming normality (PROC MIXED) using baseline data for all 1362 and follow-up data for the 1022 completers, the intervention effect was still insignificant (p=0.49). However, when we performed a random effects model assuming a Poisson distribution (PROC GLIMMIX), with 1362 baseline observations and 1022 follow-ups, the intervention was highly significant (p<0.001). Moving to the appropriate analysis method can clearly have a very strong effect on the significance of findings.

Table 1. A comparison of outcome measures from the 2003 and 2009 papers.

Category / Hershberger et al paper / Current paper
Injection drug use / Injected drugs: repeated measures log-linear model
Times injected: two-way ANOVA
Percentage of times did not use own works: two-way ANOVA
Percentage of times used unclean needles: two-way ANOVA / Stopped injecting: (among baseline IDU): logistic regression
Times injected: mixed effects model, Poisson distributed outcome
Times injected with used, uncleaned works: mixed effects model, Poisson distributed outcome
Negative urine test for opiates: mixed effects model, binary outcome
Crack use / Used crack: repeated measures log-linear model / Stopped using crack (among baseline crack users): logistic regression
Times used crack: mixed effects model, Poisson distributed outcome
Negative urine test for cocaine: mixed effects model, binary outcome
Other substance use / Times used alcohol/marijuana: mixed effects model, Poisson distributed outcome
Times used other drugs: mixed effects model, Poisson distributed outcome
Sexual behavior / Had sex: repeated measures log-linear model
Percentage of times used condoms(among sexually active): two-way ANOVA
Always used condoms(among sexually active): repeated measures log-linear model
Two or more sex partners(among sexually active): repeated measures log-linear model / Number of sex acts: mixed effects model, Poisson distributed outcome
Number of unprotected sex acts: mixed effects model, Poisson distributed outcome
Abstinent or 100% condom use: mixed effects model, binary outcome