Supplementary Information
SNPs which failed any of the three criteria below were filtered out.
- The call rate was <90%
- QC criteria related to SNP performance
- More than 2 discordant calls over duplicate samples
Table S-1 Covariate coding and significant test results
Coding / Estimate of the Effect / P valueAffection Status / 0: Control 1: Case / NA / NA
Gender / 1: Male 2: Female / -1.06E+0 / < 0.001
UCIrvine / 0: Not UC Irvine
1: UC Irvine / 7.45E-1 / 0.002
UT / 0: Not Texas
1: Texas / -4.79E+0 / < 0.001
DM_Duration / continuous / 6.83E-2 / < 0.001
Age / continuous / 2.55E-2 / 0.005
PC1 / continuous / 5.64E-5 / 0.067
PC2 / continuous / -3.87E-4 / 0.009
PC3 / continuous / -3.96E-5 / 0.789
PC4 / continuous / -3.04E-4 / 0.050
ACEi / 0: not on ACEi medication
1: on medication / -9.20E-2 / 0.596
ARB / 0: not on ARB medication
1: on medication / 1.19E+0 / < 0.001
Principal Component Analysis for Population Stratification
Principal components were computed from control samples and used for the case and HapMap samples. Only HapMap SNPs that have minor allele frequency > 0.01 in the HapMap populations and are not in linkage disequilibrium with any other SNP (r-squared < 0.2 with all SNPs in a 20 SNP window in each of the 3 HapMap populations) were used to calculate principal components. SNPs in known areas of long LDwere also excluded.
Fig. S-1 shows the fraction of genetic variance explained by the first 12 principal components.
Figure S- 1
Fig. S-2 shows the Normal Q-Q plots of control principal components. The components are expected to be normally distributed for components that capture only random noise, which would produce a straight line in Q-Q plot. As can be seen, the first and possibly the second component are picking up non-random variations. These two components also separate well the CEU, J+C and YRI HapMap populations (see Fig. S-3).
Figure S- 2
The Fig S-3 shows a grid of the scatter plots between the first 5 components. The principal components of case samples and HapMap samples are included in this plot and the 3 HapMap populations are depicted in different colors.
Figure S-3
Table S-1 shows the correlations between each of the first 10 principal components and the case-control status. The p-value depicts the significance of adding the principal component to the model that already contains the covariates Gender, UT and DM_Duration (UCLA samples and UCIrvine samples are merged).
principalcomponent / r-squared with case-control / p-value
1 / 0.002515 / 0.392
2 / 0.011434 / 0.002
3 / 0.000386 / 0.881
4 / 0.008698 / 0.032
5 / 0.002737 / 0.235
6 / 0.000719 / 0.522
7 / 0.000007 / 0.702
8 / 0.000834 / 0.987
9 / 0.000128 / 0.841
10 / 0.000206 / 0.727
Table S- 1
1
Table S- 2 Single-Marker and Two-Marker Association TestsTest / Model / Null hypothesis (H0) / Test Description
Single-marker association
Genotypic Test / / / Allelic-HWD contrast test (Genotypic test)
Allelic Test / / / Allele frequency contrast test (Allelic test)
Two-marker association
5 DF Test / / / Joint Allelic-HWD-LD contrast test
3 DF Test / / / Joint Allelic-LD contrast test
2 DF Test / / / Joint Allelic contrast test
μ is the probability of being affected.and are the genotype values of the first and the second SNP, respectively.
1
Figure S- 4 Results of CNDP1 with LD plot computed from samples. Additional LD plots obtained from our samples are given. Note that the number of SNPs is smaller than that of Hapmap samples.
Figure S- 5 Result of ELMO1 with LD plot computed from our samples
Figure S- 6 Result of HMCN1 with LD plot computed from our samples
Power
We can use a simple relationship that exists between the original and replication case-control studies to calculate the power our study had to replicate the earlier findings. Let and denote the number of cases and controls of the original study and the replication study, respectively. Using the results from Kim et al. (2010), the non-centrality parameter (λ) of the test statistic () in a replication study can be derived as , where is the Chi-square statistic of the original study with d degrees of freedom. If and , then . Thus power of the test with significance level αis given by
(Power)=1-P()
, where is the quantile of the Chi-square distribution with d degrees of freedom.
We assume that one of our SNP sets in CNDP1 has similar LD with a disease susceptibility locus as Janssen et al. (2005)’s microsatellite risk allele, and so, pooling all the other microsatellite alleles to form a diallelic marker, the two markers have similar genotype distributions. From their sample distribution, we calculate the Chi-square statistic of the genotypic test()to be 6.49 (Chi-square test statistic of the 2x3 table with d=2 degrees of freedom). Then, the power of our genotypic test is determined to be 91.5% by the above method, allowing for the fact that we performed 42 tests (18 single-marker tests and 24 two-marker tests for CNDP1) and used a conservative Bonferroni correction (α=0.05/42). Our SNP set for ELMO1 includes rs741301, for which Shimazaki et al. (2005) reported an association. They reported the Chi-square statistic for the genotypic test to be 19.9. Using this value, in the same way we determined the power of our genotypic test to be 57.3%.
Note that in our procedure we would find an association if any test among the two single-marker tests or six two-marker tests (three tests with an upstream SNP and three tests with a downstream SNP) rejects the null hypothesis. Therefore, our analysis has greater power than that of a single marker genotype test: i.e. greater than 91.5% and 57.3% for CNDP1 and ELMO1, respectively.
We also calculated the power of our analysis had we only performed the allelic test (which would have the less strict significance levels: α=0.05/9 and 0.05/42 for CNDP1 and ELMO1, respectively). The results were 88.4% and 35.2% for CNDP1 and ELMO1, respectively, showing that our analysis, with power 91.5% and 57.3%, has greater power than the allelic test.
1