Supplementary methods for Autism Prediction using 30 SNP set from Skafidas et al., 2012

1) Examining the association between each of the most predictive 30 Skafidas SNPs and ASDs in the Psychiatric Genomics Consortium meta-analysis

Sample:

The data are from trio-based samples in the Psychiatric Genomics Consortium (PGC) Autism group. Cases and their respective pseudo-controls (the un-transmitted alleles) were used in the replication analysis 1. All PGC autism cases have an autism spectrum disorder diagnosis. The samples come from ten separate cohorts, pruned for overlapping subjects and ethnic outliers. The resultant sample consists of unrelated individuals of European descent, and the use of pseudo-controls eliminates the need to include ancestral covariates.

Total analytic sample size (across 10 cohorts): 5,417 Cases / 5,417 Pseudo-controls

Analyses:

Using data from each of the ten PGC cohorts, we conducted a meta-analysis of the 30 most significant SNPs identified in the Skafidas et al. report (presented in Table 2 of Skafidas et al.) 2. Not all of the SNPs were directly genotyped within every PGC cohort, so the data were imputed with Impute2 and genotype probabilities were used in the analysis. Varying numbers of imputed SNPs (between 1 and 25) were used in each cohort. With the exception of one SNP (rs7313997), which had an info score of 0.57 in one cohort, all imputed SNPs had info scores above 0.8 in every cohort. Further information regarding genotyping and imputation in PGC autism can be requested from the corresponding authors.

The single SNP analyses involved an allelic association test of SNP and case/pseudo-control status in every cohort, followed by a meta-analytic estimate of overall effect (see Ripke et al 2011 for further detail on meta-analytic approaches within the PGC, though note those analyses do not employ trio data) 3. Supplementary Figures at the end of this document present the meta-analytic results, along with direction and effect size in each cohort. None of the SNPs identified show even nominal significance in the meta-analysis after correction for multiple testing, nor do any show a consistent direction of association across studies.

2) Examining the Skafidas et al. classifier using their 30 most predictive SNPs

Building the classifier:

We were not provided with a comprehensive list of the 267 SNPs used to build the classifier in Skafidas et al. Instead, we examined only the performance of the 30 SNPs identified in Skafidas et al. as most predictive and listed in Table 1 of our letter. The authors claim that these 30 SNPs explain more than 58% of the predictive variance of their classifier. Using the same cases and controls described earlier, we examined the relationship between the subset of the Skafidas classifier that is described in the original report and ASDs in the PGC. Using the SNPs that were provided, we attempted to assess the validity of the classifier in two ways. First, we used the weights given in Table 2 of Skafidas et al.. Second, we created training and testing samples of the same size used by Skafidas et al., and developed a classifier using empirically derived weights that maximized the classifier’s performance in a PGC training sample. To be consistent with the original report, both approaches used allele counts coded as 0, 1, or 3, with zero being two copies of the major allele and three being two copies of the minor allele. Minor allele designations were derived from the ten European cohorts used in the replication. Also consistent with Skafidas et al.’s approach, cases were given a score of 10, and their pseudo-controls were given a score of -10 (in lieu of the 0/1 designations typically used in control/case assignments).

Testing the classifier built with Skafidas et al. weights:

To examine the classifier built with Skafidas et al. weights, we used a subset of the full PGC Autism cohort (4,623 cases / 4,623 pseudo-controls) as a replication set. The Skafidas weights produced negative scores on average (Table 1 and Figure 1), as the observed MAF was generally higher for SNPs with negative weights. In addition, the use of only the 30 most extreme weights among the complete 237 likely led to an over-dispersion of scores across our replication sample (Figure 2). Because we did not use the majority of SNPs in the Skafidas classifier or know the intercept used in their model, we did not use their cutoff of 3.93 as the cutoff score, but rather used the maximum accuracy point along the ROC curve in our replication sample (-1.9; implemented in the ROCR package in R 4).

Supplementary Table 1.

Replication using weights from Skafidas et al. (4,623 cases; 4,623 pseudo-controls)
Area Under the Curve (AUC) / 0.505
Maximized cutoff / -1.9
Mean prediction score of cases / -3.99
Mean prediction score of pseudo-controls / -4.08
T-test p-value of mean scores / 0.46
Positive predictive value / 0.38

The diagnostic classifier did not perform significantly better than the chance AUC value of 0.5 (AUC = .505, p = 0.22), and the mean difference between test case prediction scores and pseudo-control prediction scores was also non-significant (p = 0.46). The skewed nature of the diagnostic scores towards negative values led to a positive predictive value well below chance (0.38).

Supplementary Figure 1.

Testing the classifier built with empirically-derived weights:

To examine the maximum performance of the 30 SNP ASD classifier built with PGC (empirically-derived) weights, we randomly sampled a training set of 732 cases and a non-overlapping testing set of 243 cases, consistent with the number of cases used by Skafidas et al. Differing from the original report, we employed an equal number of pseudo-controls (732 and 243) at each stage in order to balance ancestry in the context of our trio data. As such, we used a larger number of controls than Skafidas et al., which should improve the power of the test. To maximize the information available from the full PGC dataset, we ran 100 iterations of the prediction analysis (100 different training and 100 different testing sets), using a bootstrap sampling procedure to randomly select the training and testing subsets used in a given iteration.

For each iteration, we fit a fixed-effects linear model with all 30 SNPs to the training set, using the -10 and 10 outcomes and 0, 1, 3 allele counts described above. The resulting beta coefficients were used as weights for each SNP, and the intercept was retained to be consistent with Skafidas et al. To create a diagnostic classifier based on these scores, we determined where training set case/control status was best predicted along a ROC curve measuring all possible cutoff scores. If multiple scores were flagged, the median of the selected scores was used as the cutoff. We estimated the AUC of the classifier in the training set to see how well the classifier predicted case/control status (Table 2). As noted above, an AUC of 0.5 is expected by chance.

Supplementary Table 2.

Training Sample (732 case 732 control) / Mean (SD) across 100 iterations
AUC / 0.57 (0.01)
Maximized cutoff / -0.11 (0.04)
Test Sample (243 case 243 control) / Mean (SD) across 100 iterations
AUC / 0.51 (0.02)
Mean prediction score of cases / 0.02 (0.10)
Mean prediction score of pseudo-controls / -0.04 (0.09)
T-test p-value of mean scores / 0.50 (0.29)
Positive predictive value / 0.54 (0.04)

For each test set, we applied the empirical-derived classifier from the training set to examine the maximum prediction accuracy of the 30 SNPs in the PGC. We estimated the AUC based on the diagnostic cutoff score, as well as the measuring the mean differences in prediction scores between test cases and controls (Table 2). Overall, the diagnostic classifier performed no better than chance (mean AUC = 0.51, mean p = 0.37), and the mean difference between test case prediction scores and pseudo-control prediction scores was also non-significant (mean p = 0.5). As a comparison to the like figure presented by Skafidas et al., Figure 2 is the 2nd iteration of the diagnostic classifier (black line) in training and test sets.

Supplementary Figure 2

3) Population case/control prediction classifier for the 30 Skafidas SNPs and ASDs in a subset of the Psychiatric Genomics Consortium meta-analysis

Sample:

In addition to the case / pseudo-control analyses presented above, we also conducted a distinct case/control analysis among a subset of the PGC Autism cohort (Simons Simplex Collection [n=775]; Autism Genome Project [n=926]; Children’s Hospital of Philadelphia [n=701]) and additional independent controls from the Children’s Hospital of Pennsylvania (n=1,847) and the Michigan Bipolar dataset (n=768). All of the data sets were independently genotyped. To ensure high data quality, the datasets were subjected to stringent per-SNP and per-sample quality control analysis via PLINK 5. Each dataset was filtered independently for minor allele frequency <1%, SNP call rates <99%, individuals with genotype missingness rate >1%, and for Hardy-Weinberg equilibrium (HWE) p-values exceeding 1 × 10−6. After merging the datasets, we removed duplicates and first degree relatives based on genome-wide genetic similarity (π_hat > 0.20).

A total of 435,778 genotyped SNPs for 5,420 individuals (2,681 cases and 2,739 controls) passed all QC steps. Imputation of markers missing from the list of 30 SNPs reported in Skafidas et al. (n=5) was performed using IMPUTE2 (v2.1.2) 6, by imputing a 2 Mb region around the missing markers using the recommended parameters for imputation (κ = 80, buffer size 500 kb, Ne = 14,000). For the phased haploid reference panel, we used HapMap 3 release 2 February 2009) samples (n = 1,011)7 as provided by the IMPUTE2 website. After imputation, all five markers passed the quality control thresholds of individual call posterior probability of >0.9 and info measure I(A) of >0.6.

Original sample size: 2,402 Cases / 2,615 Controls

Sample size with no missing alleles: 1,994 Cases / 2,577 Controls

Analyses:

25 SNP genotypes from Table 2 in Skafidas et al. (2012) were available for analysis in the current sample. The remaining 5 alleles were imputed with Impute2 and genotype probabilities were used in the analysis. Analyses were conducted in the same manner as the case/pseudo-control analysis, using both classifiers built with weights from Skafidas et al. (Supplementary Table 3), as well as weights empirically derived from the data itself (Supplementary Table 4).

Supplementary Table 3.

Case – Control Replication using weights from Skafidas et al. (1,994 cases; 2,577 controls)
Area Under the Curve (AUC) / 0.503
Maximized cutoff / 0.77
Mean prediction score of cases / 2.32
Mean prediction score of controls / 2.32
T-test p-value of mean scores / 0.90
Positive predictive value / 0.78

When using the classifier built with weights from Skafidas et al., results generally corroborate with the earlier case/pseudo-control analysis, with the AUC performing little better than chance (mean AUC = 0.503, mean p = 0.35). We do observe differences in the mean prediction score in the case/control sample, which is likely due to very common SNPs (near 0.5 allele frequency) switching in terms of the called minor allele. This subsequently affects the downstream 0,1,3 coding of SNPs and can substantially affect the overall weighting scheme. Unfortunately, the designated minor alleles in the original paper were unavailable, so we were unsure which allele was determined to be the minor allele for any given SNP, particularly when the MAF is near 0.5.

A recent report by Belgard et al. (2013) suggested that population stratification may be driving the autism genetic classifier in Skafidas et al., so we calculated the first 10 principal components from a multi-dimensional scaling analysis in PLINK to include as covariates in the analysis. The empirically derived weights were tested with and without the first 10 principal components to gauge the effect on the prediction classifier, and the effect of the first two principal components are provided at the bottom of Supplementary table 4.

Supplementary Table 4.

Case – Control Replication testing the classifier built with empirically-derived weights:

A) BEFORE controlling for the 1st 10 Principal Components

Training Sample (732 case 732 control) / Mean (SD) across 100 iterations
AUC / 0.59 (0.01)
Maximized cutoff / -0.59 (0.40)
Test Sample (243 case 243 control) / Mean (SD) across 100 iterations
AUC / 0.55 (0.03)
Mean prediction score of cases / 0.22 (0.15)
Mean prediction score of controls / -0.19 (0.10)
T-test p-value of mean scores / 0.09 (0.18)
Positive predictive value / 0.62 (0.11)

B) AFTER controlling for the 1st 10 Principal Components

Training Sample (732 case 732 control) / Mean (SD) across 100 iterations
AUC / 0.57 (0.01)
Maximized cutoff / -0.23 (0.40)
Test Sample (243 case 243 control) / Mean (SD) across 100 iterations
AUC / 0.51 (0.03)
Mean prediction score of cases / 0.02 (0.11)
Mean prediction score of controls / 0.00 (0.10)
T-test p-value of mean scores / 0.44 (0.30)
Positive predictive value / 0.57 (0.12)
Effect of 1st 2 Principal Components / p-value (SD) across 100 iterations
1st Principal Component / 8e-8 (3e-7)
2nd Principal Component / 0.19 (0.15)

For empirically derived weights, we saw an appreciable increase in the prediction accuracy (mean AUC = 0.55, mean p = 0.12), and the mean difference between test case prediction scores and pseudo-control prediction scores was trending towards significance (mean p = 0.09). However, the inclusion of the first 10 principal components removed this signal, dropping the prediction accuracy to chance levels (mean AUC = 0.51, mean p = 0.44). As evidenced earlier in the report from Belgard et al., this is due to the difference in European ancestry tagged by the SNP set, as the 1st principal component is highly significant in predicting between cases and controls (mean p = 8e-8). The remaining 9 principal components did not show a significant association to case/control status (data not shown).

4) Examining the pathways identified by Skafidas et al.

The 267 SNPs that Skafidas et al. included in their classifier were selected from a pathway analysis. Eighteen of the 23 KEGG pathways identified by Skafidas as most significant were examined in the PGC network and pathway analysis group (PGC-NPA) analyses (n= 4,949 cases/5,314 controls). The PGC-NPA used five published pathway analysis methods: FORGE, INRICH, MAGENTA, Set Screen, and ALLIGATOR to examine each of the 18 pathways 8-10. None of the pathways highlighted in the Skafidas et al. paper were statistically significant after taking multiple testing into account. The most significant result from these approaches was a p-value of 0.006, which does not withstand a multiple testing correction.