On Individual Genome-Wide Association Studies

Supplemental Data

On individual genome-wide association studies

and their meta-analysis

Yu-Fang Pei, Lei Zhang, Christopher J. Papasian, Yu-Ping Wang, Hong-Wen Deng

Supplementary Table 1. Comparisions between parameters used in our study and reported in Gogrle et al.(Gogele et al. 2012).

Parameter / Majority in Gogele et al. / Values in the present study
Number of studies / 2-21 / 5-50
Sample size / 761-74,053 / 10,000-100,000
Weighting scheme / Inverse variance / Inverse variance
Phenotype distribution / Continuous / Continuous
Mode of inheritance / Additive / Additive

Supplementary Figure 1. The distribution of the locus effects of the 1,000 simulated independent causal SNPs. The 1,000 causal SNPs together account for 50% of phenotypic variance. The locus effects were generated with an inverse gamma distribution (for details, see Methods).

Supplementary Figure 2. The effects of variations of LD levels and allele frequencies on between-study heterogeneity. Between-study heterogeneity was measured by I2 index (Higgins et al. 2003) (for details, see Methods). 10 studies were simulated from 3 populations with proportions 60%:30%:10%. For comparison, we add the “Ideal” line, which represents the scenario that the causal SNP was tested directly under a homogeneous setting in terms of effect size and allele frequency across all individual studies. Panel A shows the effects of variation of LD levels (r2) between causal SNP and the proxy marker. Distinct lines relate to the r2 in three populations of 0.8-0.8-0.8, 0.3-0.5-0.8, and 0.7-0.8-0.9, respectively (see Methods for details); h2 on the x-axis represents heritability (phenotypic variation explained by the SNP). Panel B shows the effects of variation of allele frequencies (see Methods for details). The causal SNP was tested directly. Allele frequencies were generated using the Balding-Nichols model(Balding and Nichols 1995). Different lines relate to the h2 values of 0, 0.1%, 0.3%, 0.5% and 1%, respectively.

Supplementary Figure 3. Average I2 estimates under different parameter settings. I2 is a measure of between-study heterogeneity (for details, see Methods). and represent within- and between-population variances, respectively. 10 individual studies were simulated from three populations with proportions 60%:30%:10%. Panels A, B and C, the mean effect for the causal SNP ( in Equation 1 in Methods section) was the same in all populations in terms of both magnitude and direction; k is the number of studies included in meta-analysis. Panels D and E, “+00” denotes that the SNP only had effects in studies from the 1st population and “+-0” denotes that the SNP presented effects with the same magnitude but at opposite directions in samples from the 1st and 2nd populations, while that from the 3rd population had no effect.

Supplementary Figure 4. Type-I error rate and power estimates for rare variants (MAF=0.005). 10 individual studies were simulated from three populations with proportions 60%:30%:10%. “FE” and “RE” denote fixed-effects and random-effects meta-analysis, respectively. The suffixes “pop1” and “total” denote meta-analyses including studies from the first population, and all three populations, respectively (see Methods for details). Panel A shows type-I error rate estimations. Within-population variance was set to be 0.6 and the significant level was set at 0.05. Panel B shows power estimations. Within-population variance was set at 0.6 and between-population variance () was set at 0.1. h2 on the x-axis represents heritability (phenotypic variation explained by the SNP). Significant level was set at 5x10-8.

Supplementary Figure 5. Power estimates of individual GWA studies and their meta-analyses when testing a single SNP. Panels A and B present the power estimates of the fixed-effects (FE) model and Panels C and D present the power estimates of the random-effects (RE) model. k is the number of studies included in meta-analysis and “k=1” denotes an individual GWA study. and represent within- and between-population variance, respectively. Panels A and C: studies included in a meta-analysis were ideally homogeneous; Panels B and D: only within-population variation was present. h2 represents phenotypic variation explained by the SNP.

Supplementary Figure 6. Power of meta-analysis for identifying at least a specified number of loci on a genome-wide scan. The distribution of the causal locus effects is shown in Supplementary Figure 1. “FE” and “RE” denote fixed- and random-effects meta-analyses, respectively. k is the number of studies included in meta-analysis. Panels A, B, C, and D present the results for k=5, 10, 20, and 50, respectively.

Supplementary Figure 7. Power of meta-analysis to identify a SNP for various significance levels on a genome-wide scan. The distribution of the causal locus effects is shown in Supplementary Figure 1. “FE” and “RE” denote fixed- and random-effects meta-analyses, respectively. k is the number of studies included in meta-analysis. Panels A, B, C, and D present the results for k=5, 10, 20, and 50, respectively.

Supplementary Figure 8. Power of meta-analysis to replicate findings from individual GWA studies. The distribution of the causal locus effects is shown in Supplementary Figure 1. “FE” and “RE” denote fixed- and random-effects meta-analyses, respectively. k is the number of studies included in meta-analysis. Panels A, B and C present the results for k=5, 20, and 50, respectively. Tables on the top right of the figures give the average number of significant SNPs identified at a genome wide significant level by each method. “Overlapped” denotes that samples from GWA studies were included in the meta-analysis and “independent” denotes that samples from GWA studies and meta-analysis were independent.

Supplementary Figure 9. Replicability between independent meta-analyses. The distribution of the causal locus effects is shown in Supplementary Figure 1. “FE” and “RE” denote fixed- and random-effects meta-analyses, respectively. k is the number of studies included in meta-analysis. Panels A, B and C present the results for k=5, 20, and 50, respectively. Tables on the top right of the figures give the average number of significant SNPs identified by each method.

Supplementary Figure 10. Type-I error rate and power estimate for meta-analysis with sample structure from real data. and denote between- and within-population variance, respectively. “FE” and “RE” denote fixed-effects and random-effects meta-analysis, respectively. The suffixes “total” denotes meta-analysis including all studies and “CEU” denotes meta-analysis including studies with subjects of European ancestry. Panel A and B display the type-I error rate estimates and panel C displays the power estimates. Panel A: = 0; panel B: =0.6; panel C: = 0.1 and =0.6.

Supplementary notes

Simulations with sample structure from real data

We simulated individual studies with sample sizes exactly equal to a recently published meta-analysis (Estrada et al. 2012). Specifically, 17 studies with a total sample size of 34,910 individuals were simulated, of which, 16 studies included subjects of European ancestry and 1 study included Chinese subjects. Allele frequencies in each population were sampled using the Balding-Nichols model (Balding and Nichols 1995). The ancestral allele frequency for the causal variant was set at 0.1 and the genetic distance measure Fst between Caucasian and Chinese was set at 0.11 (Nelis et al. 2009). We estimated the type-I error and power under different heterogeneity settings and found that false negative and positive results were introduced under the heterogeneity setting (Supplementary Fig. 10).

Relationship between heterogeneity and ancestral allele frequency

The magnitude of heterogeneity effects decreased with increasing ancestral MAF p0. To interpret this, we wrote the definition of heritability

where G and Y denoted genotypic effect and phenotypes, respectively. When h2 and var(Y) were fixed, var(G) was fixed as well. Under the additive mode of inheritance,

so that each effect size θi was calculated by . Therefore, the difference in θi, which determined the magnitude of heterogeneity effects, was in turn determined by the reciprocal of the square root of Each individual pi was drawn from the beta distribution Beta(p0(1-Fst)/Fst, (1-p0)(1-Fst)/Fst). The mean and variance of this distribution were p0 and Fstp0(1-p0), respectively. Because Fst was set at a small value (0.05), pi was concentrated on a small range towards p0. Together, the change in θi was negatively correlated with pi(1-pi).

Effects of heterogeneity on rare variants

Heterogeneity effects were less severe for rare variants than for common variants. To interpret this, we wrote the model for simulating phenotype as following:

Where and denote the phenotype and genotype, respectively. The heterogeneity statistic

and .

The variation of observed effect sizes was determined by between- and within-population variances, which were set to be the same for SNPs with different MAFs. In a linear regression model, , which will decrease with increasing MAF, p, of the test SNP. So for rare variants is higher than that for common variants and higher corresponds to lower heterogeneity.

References

Balding DJ, Nichols RA (1995) A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96: 3-12

Estrada K, Styrkarsdottir U, Evangelou E, Hsu YH, Duncan EL, Ntzani EE, Oei L, Albagha OM, Amin N, Kemp JP, Koller DL, Li G, Liu CT, Minster RL, Moayyeri A, Vandenput L, Willner D, Xiao SM, Yerges-Armstrong LM, Zheng HF, Alonso N, Eriksson J, Kammerer CM, Kaptoge SK, Leo PJ, Thorleifsson G, Wilson SG, Wilson JF, Aalto V, Alen M, Aragaki AK, Aspelund T, Center JR, Dailiana Z, Duggan DJ, Garcia M, Garcia-Giralt N, Giroux S, Hallmans G, Hocking LJ, Husted LB, Jameson KA, Khusainova R, Kim GS, Kooperberg C, Koromila T, Kruk M, Laaksonen M, Lacroix AZ, Lee SH, Leung PC, Lewis JR, Masi L, Mencej-Bedrac S, Nguyen TV, Nogues X, Patel MS, Prezelj J, Rose LM, Scollen S, Siggeirsdottir K, Smith AV, Svensson O, Trompet S, Trummer O, van Schoor NM, Woo J, Zhu K, Balcells S, Brandi ML, Buckley BM, Cheng S, Christiansen C, Cooper C, Dedoussis G, Ford I, Frost M, Goltzman D, Gonzalez-Macias J, Kahonen M, Karlsson M, Khusnutdinova E, Koh JM, Kollia P, Langdahl BL, Leslie WD, Lips P, Ljunggren O, Lorenc RS, Marc J, Mellstrom D, Obermayer-Pietsch B, Olmos JM, Pettersson-Kymmer U, Reid DM, Riancho JA, Ridker PM, Rousseau F, Slagboom PE, Tang NL, et al. (2012) Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nat Genet 44: 491-501

Gogele M, Minelli C, Thakkinstian A, Yurkiewich A, Pattaro C, Pramstaller PP, Little J, Attia J, Thompson JR (2012) Methods for meta-analyses of genome-wide association studies: critical assessment of empirical evidence. Am J Epidemiol 175: 739-49

Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. Bmj 327: 557-60

Nelis M, Esko T, Magi R, Zimprich F, Zimprich A, Toncheva D, Karachanak S, Piskackova T, Balascak I, Peltonen L, Jakkula E, Rehnstrom K, Lathrop M, Heath S, Galan P, Schreiber S, Meitinger T, Pfeufer A, Wichmann HE, Melegh B, Polgar N, Toniolo D, Gasparini P, D'Adamo P, Klovins J, Nikitina-Zake L, Kucinskas V, Kasnauskiene J, Lubinski J, Debniak T, Limborska S, Khrunin A, Estivill X, Rabionet R, Marsal S, Julia A, Antonarakis SE, Deutsch S, Borel C, Attar H, Gagnebin M, Macek M, Krawczak M, Remm M, Metspalu A (2009) Genetic structure of Europeans: a view from the North-East. PLoS One 4: e5472