Recent polygenic selection on educational attainment: a replication

Davide Piffer

Email:

Abstract

The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment were used to test a polygenic selection model.

Average frequencies of alleles with positive (Beta) effect on the phenotype (polygenic scores) were compared across populations (N=26)using data from 1000 Genomes. The polygenic score of 152 SNPs that reached genome-wide significance in the meta-analysis by Okbay et al. (2016) of the discovery and replication samples (N =405,072) was highly correlated to population IQ (r=0.863).

Moreover, the polygenic scores obtained from the three independent GWAS exhibited strong intercorrelations even after pruning for linkage disequilibrium.

The method of correlated vectors revealed the presence of a Jensen effect of SNP p value on population IQ and factor from the two previous GWAS (r= -.25).

Factor analysis produced similar estimates of polygenic selection strength for educational attainment across the three datasets. The SNPs from the largest GWAS were subset by p value (N= 7) and factor analyzed. An SNP set’s P value-rank correlated substantially (0.4) with a composite index including measures of predictive validity and reliability (r x population IQ, average factor loadings, r x factor scores from the 2 previous GWAS, SAC (spatial autocorrelation)-free effect on population IQ. Moreover, the composite index of factor reliability and validity was strongly correlated (r=0.96) to loadings on a factor extracted from the 7 factors (“meta-factor”). That is, the factors’ with stronger independent correlations to measures of accuracy had stronger loadings on the “meta-factor”.

Nine hits were found to be in LD across publications. This produced replicated factor and polygenic scores with strong correlations to population IQ (0.89 and 0.82-0.9, respectively), surviving control for spatial autocorrelation (B= 0.69 and 0.35-0.79, respectively).

The results together constitute a replication of preliminary findings and provide unequivocal evidence for recent diversifying polygenic selection on educational attainment and underlying cognitive ability.

Introduction

The aim of this study is to replicate the studies by Piffer (2015, 2013) that educational attainment and cognition GWAS hits have different frequencies across populations and thus, were subject to different selection pressures. To this end, the hits from the two latest GWAS on educational attainment (Davies et al., 2016; Okbay et al., 2016) will be used in the analysis. The first GWAS was carried out using the UK Biobank sample (N=100K+). Over a thousand SNPs reached genome-wide significance (P< 5 x 10-8), but after controlling for linkage disequilibrium (Genotypes were LD pruned using clumping to obtain SNPs in linkage equilibrium with an r2<0.25 within a 200 bp window), a few independent signals were identified (Davies et al., 2016). For the sake of simplicity, the three hits found by Rietveld et al. (2013) were lumped together with this polygenic score.

The second GWAS was carried out on a sample of 293K+ individuals (Okbay et al., 2016) and produced 74 independent (“LD-free”) hits.

Factor analysis will be used to extract a factor accounting for cross-population variation in allele frequency, hence representing a signal of polygenic selection. Factor loadings will be examined to ascertain the reliability of the factor (i.e. do most alleles with positive GWAS effect load positively on the factor?). Predictive validity will be measured by computing the correlation between factor scores and population IQ. If alleles with positive GWAS beta (within population effect) load positively on a factor that is positively correlated to population IQ, this is interpreted as evidence of directional selection on the phenotype (educational attainment or related cognitive abilities).

Methods

1000 Genomes

Frequencies were calculated from VCF files belonging to the phase 3 data: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/

Rietveld et al. (2013) produced 3 SNPs reaching GWAS significance for educational attainment.

Davies et al. (2016) reported 1115 SNPs reaching GWAS significance, of which 15 were independent signals for educational attainment. 942 SNPs were found on 1000 Genomes. Among the 15 independent signals, one (2:48696432_G_A) was missing.

Okbay et al. (2016) reported 74 SNPs associated with years of education. 70 were found in 1000 Genomes (the other 4 variants were flagged because they had more than 3 different alleles).

Population IQs for the 1000 Genomes populations were obtained from Piffer (2015).

Polygenic score refers to the average frequency of alleles with positive effect at the individual level (i.e. GWAS beta).

Statistical analyses were carried out using R (v. 3.2.3).

Results

Polygenic scores

Rietveld et al., 2016

A polygenic score was created using the top 3 SNPs in Rietveld et al.

Davies et al., 2016

Davies et al. (2016) reported 1115 SNPs reaching GWAS significance, of which 15 were independent signals for educational attainment. 942 SNPs were found on 1000 Genomes. Among the 15 independent signals, one (2:48696432_G_A) was missing. Thus, a polygenic score (I.S. PS) was calculated using 14 SNPs.

Okbay et al., 2016

Okbay et al. (2016) reported 74 loci independently associated with educational attainment (years of education).

Polygenic scores and population IQs are reported in table 1.

Table 1. Polygenic scores and population IQ.

Population / Rietveld et al. _2013 / PS_Ed_Att_Davies / PS_Ed_Att_Okbay / IQ
Afr.Car.Barbados / 0.106 / 0.419 / 0.508 / 83
US Blacks / 0.129 / 0.447 / 0.517 / 85
Bengali Bangladesh / 0.227 / 0.516 / 0.507 / 81
Chinese Dai / 0.418 / 0.610 / 0.547
Utah Whites / 0.374 / 0.493 / 0.506 / 99
Chinese, Bejing / 0.434 / 0.671 / 0.563 / 105
Chinese, South / 0.414 / 0.648 / 0.555 / 105
Colombian / 0.252 / 0.500 / 0.509 / 83.5
Esan, Nigeria / 0.096 / 0.416 / 0.507 / 71
Finland / 0.387 / 0.560 / 0.523 / 101
British, GB / 0.397 / 0.526 / 0.512 / 100
Gujarati Indian, Tx / 0.311 / 0.498 / 0.508
Gambian / 0.085 / 0.438 / 0.507 / 62
Iberian, Spain / 0.366 / 0.512 / 0.519 / 97
Indian Telegu, UK / 0.229 / 0.510 / 0.502
Japan / 0.417 / 0.652 / 0.556 / 105
Vietnam / 0.461 / 0.618 / 0.552 / 99.4
Luhya, Kenya / 0.079 / 0.425 / 0.502 / 74
Mende, Sierra Leone / 0.127 / 0.416 / 0.509 / 64
Mexican in L.A. / 0.237 / 0.499 / 0.505 / 88
Peruvian, Lima / 0.196 / 0.477 / 0.488 / 85
Punjabi, Pakistan / 0.257 / 0.511 / 0.513 / 84
Puerto Rican / 0.279 / 0.489 / 0.503 / 83.5
Sri Lankan, UK / 0.222 / 0.506 / 0.501 / 79
Toscani, Italy / 0.354 / 0.501 / 0.518 / 99
Yoruba, Nigeria / 0.097 / 0.421 / 0.512 / 71

The polygenic scores have strong intercorrelations and are also strongly correlated to population IQ (table 2).

Table 2. Correlation matrix

Factor analysis

The 17 hits from the two GWAS (Rietveld et al., 2013 and Davies et al., 2016) were lumped together and a hit (rs1906252) was removed because in LD with rs9320913 from Rietveld et al., 2013. This yielded a set of 16 LD-free SNPs.

A factor analysis (function “fa”, package “psych”) was carried out using Ordinary Least Squares to find the minimum residual solution. The proportion of variance explained was 0.54. This factor was correlated to population IQ (r= 0.89). 14/16 alleles loaded positively and the average loading was 0.494 (table 3).

Table 3. Factor loadings (structure matrix)

SNP / Factor Loading
rs13086611_T / -0.76
rs11130222_A / 0.03
rs12553324_G / 0.87
rs55686445_C / 0.88
rs9393692_G / 0.03
rs3847225_C / 0.43
rs4799950_G / 0.7
rs4318611_A / 0.52
rs112374913_A / 0.89
rs12042107_C / -0.67
rs11210887_A / 0.93
rs482507_T / 0.53
rs7701440_T / 0.98
rs9320903_A / 0.75
rs11584700_G / 0.85
rs4851266_T / 0.95
Mean / 0.494

A factor analysis was carried out for 7 sets of 10 SNPs belonging to the 74 Okbay et al. (2016) independent hits (4 were missing). The number 10 was chosen for two reasons: 1) To follow the recommendation that the subject to item ratio be >2:1; 2) Because 70 (the total number of SNPs) is a multiple of 10.

These were sorted by p value, with the first group having the lowest p value (i.e. highest GWAS significance). Factor scores are reported in table 4.

Table 4. Factor scores and population IQ

FactorRietvDavies / Fac_Okbay_1 / Fac_Okbay_2 / Fac_Okbay_3 / Fac_Okbay_4 / Fac_Okbay_5 / Fac_Okbay_6 / Fac_Okbay_7
Afr.Car.Barbados / -1.386 / -0.896 / -1.360 / 1.351 / 0.125 / -1.352 / -0.174 / 1.080
US Blacks / -1.041 / -0.534 / -0.964 / 1.231 / 0.531 / -0.874 / -0.206 / 1.322
Bengali Bangladesh / -0.010 / 0.613 / 0.862 / -0.582 / -0.949 / 0.530 / 0.023 / 0.159
Chinese Dai / 1.229 / -0.801 / 1.090 / -0.742 / 1.371 / 0.034 / 2.009 / -0.247
Utah Whites / 0.385 / 1.278 / 0.006 / -0.764 / -0.949 / 0.898 / -0.818 / -1.093
Chinese, Bejing / 1.614 / 0.347 / 1.483 / -0.395 / 1.507 / -0.091 / 2.045 / -0.341
Chinese, South / 1.399 / 0.142 / 1.266 / -0.371 / 1.627 / 0.067 / 1.882 / -0.478
Colombian / 0.155 / 0.637 / -0.320 / -0.590 / -0.437 / 0.992 / -0.299 / -0.749
Esan, Nigeria / -1.517 / -1.148 / -1.578 / 1.703 / 0.188 / -1.634 / -0.400 / 1.428
Finland / 0.873 / 1.378 / -0.238 / -1.163 / -0.668 / 0.972 / -0.976 / -1.470
British, GB / 0.568 / 1.547 / -0.290 / -0.296 / -1.270 / 0.850 / -0.698 / -1.006
Gujarati Indian, Tx / 0.065 / -0.149 / 0.547 / -0.575 / -1.229 / 0.352 / -0.337 / -0.450
Gambian / -1.380 / -1.471 / -1.274 / 1.645 / 0.597 / -1.528 / -0.144 / 1.829
Iberian, Spain / 0.431 / 1.928 / 0.014 / -0.458 / -0.882 / 1.003 / -0.843 / -0.861
Indian Telegu, UK / 0.030 / 0.112 / 0.542 / -0.442 / -0.968 / 0.436 / -0.618 / -0.046
Japan / 1.422 / 0.058 / 1.456 / -0.508 / 1.542 / 0.011 / 1.731 / -0.224
Vietnam / 1.252 / -0.201 / 1.419 / -0.607 / 1.510 / -0.075 / 1.694 / -0.468
Luhya, Kenya / -1.439 / -1.496 / -1.372 / 1.624 / 0.135 / -1.370 / -0.570 / 1.524
Mende, Sierra Leone / -1.404 / -1.449 / -1.347 / 1.676 / 0.390 / -1.669 / -0.111 / 1.646
Mexican in L.A. / 0.018 / -0.284 / 0.036 / -0.624 / 0.370 / 1.409 / -0.352 / -0.614
Peruvian, Lima / -0.008 / -0.975 / -0.031 / -0.777 / 0.827 / 0.945 / -0.025 / -0.673
Punjabi, Pakistan / -0.050 / 0.481 / 0.836 / -0.545 / -1.197 / 0.453 / -0.390 / -0.314
Puerto Rican / 0.027 / 0.616 / -0.036 / -0.384 / -0.476 / 0.545 / -0.656 / -0.616
Sri Lankan, UK / 0.071 / 0.556 / 0.640 / -0.777 / -0.873 / 0.145 / -0.488 / -0.322
Toscani, Italy / 0.266 / 0.893 / -0.029 / -0.283 / -1.129 / 0.570 / -0.888 / -0.600
Yoruba, Nigeria / -1.572 / -1.183 / -1.358 / 1.654 / 0.307 / -1.621 / -0.391 / 1.586

Spatial Autocorrelation (SAC)

Spatial (phylogenetic) correlation was calculated using the procedure illustrated in a previous paper (Piffer, 2015), which was based (then unknown to the author) on Mantel test (Mantel, 1967). Regression analysis applied to Mantel test enables estimation of polygenic selection pressures (Piffer, 2015).

Pairwise Fst distances and pairwise score distances (absolute value of the difference in polygenic scores) were calculated.

Table 5. SAC control for polygenic scores: Betas.

Source / Fst / PS
P.S. Davies et al. 2016. Β= / 0.385 / 0.294
P.S. Davies et al. 2016 + Rietveld et al. 2013. Β= / 0.329 / 0.361
P.S. Okbay et al. 2016 / 0.540 / 0.154

Table 6. SAC control for factor scores: Betas. Factor scores extracted from Okbay et al. 2016 GWAS. 7 sets of 10 SNPs sorted by p value and factor score extracted from Rietveld et al. (2013) and Davies et al. (2016).

Source / Fst / Factor
Fac_Rietveld_Davies. B= / -0.162 / 0.861
Fac_1. B= / 0.516 / 0.122
Fac_2. B= / 0.650 / -0.076
Fac_3. B= / 0.598 / -0.011
Fac_4. B= / 0.622 / -0.090
Fac_5. B= / 0.699 / -0.138
Fac_6. B= / 0.557 / 0.095
Fac_7. B= / 0.428 / 0.204

MCV

The Method of correlated vector was applied to the 70 SNPs from Okbay et al. (2016): the vector of the correlation of each SNP’s GWAS p value was correlated to the vector of the correlation between each SNP’s frequency and population IQ (r x IQ) and the vector of the correlation with the factor extracted from the two previous GWAS. Negative correlations were found between p value and r x IQ, r x Rietveld_Davies factor (r= -0.26; -0.25).

Table 7. MCV

SNP / p value / r x IQ / r x fact Rietv_Davies
rs10061788 / 2.46E-09 / 0.217 / 0.155
rs1008078 / 6.01E-10 / -0.704 / -0.817
rs1043209 / 1.82E-11 / 0.640 / 0.836
rs10496091 / 5.62E-10 / 0.468 / 0.740
rs11191193 / 5.44E-11 / -0.756 / -0.695
rs11210860 / 2.36E-10 / 0.204 / -0.066
rs112634398 / 4.61E-08 / -0.389 / -0.190
rs113520408 / 1.97E-08 / 0.589 / 0.409
rs114598875 / 2.41E-08 / -0.505 / -0.683
rs11588857 / 5.27E-10 / 0.776 / 0.842
rs11689269 / 1.28E-08 / 0.056 / 0.233
rs11690172 / 1.99E-08 / -0.297 / -0.258
rs11712056 / 3.3E-19 / 0.382 / 0.605
rs11768238 / 9.9E-10 / 0.282 / 0.429
rs12531458 / 3.11E-08 / -0.571 / -0.660
rs12646808 / 4E-08 / -0.727 / -0.886
rs12671937 / 9.15E-10 / -0.422 / -0.334
rs12682297 / 3.93E-09 / -0.677 / -0.815
rs12772375 / 1.56E-08 / 0.564 / 0.564
rs12969294 / 7.24E-09 / 0.431 / 0.512
rs12987662 / 2.69E-24 / 0.897 / 0.960
rs13294439 / 2.2E-17 / 0.797 / 0.913
rs13402908 / 1.7E-11 / 0.200 / 0.389
rs1402025 / 3.42E-08 / -0.867 / -0.888
rs148734725 / 1.36E-18 / -0.180 / -0.493
rs165633 / 2.86E-09 / 0.452 / 0.496
rs16845580 / 2.65E-09 / -0.466 / -0.574
rs17119973 / 3.55E-10 / -0.205 / -0.008
rs17167170 / 1.14E-09 / 0.441 / 0.688
rs1777827 / 1.55E-08 / 0.821 / 0.898
rs17824247 / 2.77E-09 / 0.085 / -0.185
rs2245901 / 4.54E-09 / -0.413 / -0.551
rs2431108 / 5.27E-09 / -0.343 / -0.341
rs2456973 / 1.06E-12 / 0.696 / 0.597
rs2457660 / 7.11E-10 / -0.642 / -0.801
rs2568955 / 1.8E-08 / 0.749 / 0.868
rs2610986 / 2.01E-08 / -0.413 / -0.465
rs2615691 / 4.71E-08 / 0.873 / 0.829
rs2837992 / 3.8E-08 / -0.253 / -0.520
rs2964197 / 3.02E-08 / 0.689 / 0.697
rs2992632 / 8.23E-09 / 0.423 / 0.499
rs301800 / 1.79E-08 / 0.129 / 0.132
rs3101246 / 1.43E-08 / 0.632 / 0.796
rs324886 / 1.91E-08 / -0.500 / -0.493
rs34072092 / 3.91E-08 / -0.365 / -0.261
rs34305371 / 3.76E-14 / 0.354 / 0.239
rs35761247 / 3.82E-08 / 0.318 / 0.171
rs4493682 / 3.32E-08 / -0.814 / -0.914
rs4500960 / 3.75E-10 / 0.548 / 0.407
rs4851251 / 1.91E-08 / -0.395 / -0.424
rs4863692 / 1.56E-10 / 0.686 / 0.867
rs55830725 / 5.37E-10 / -0.254 / -0.419
rs56231335 / 2.07E-09 / 0.862 / 0.888
rs572016 / 3.46E-08 / 0.028 / 0.031
rs61160187 / 3.49E-10 / 0.876 / 0.924
rs62259535 / 2.63E-09 / -0.330 / -0.145
rs62263923 / 7.01E-09 / 0.793 / 0.893
rs62379838 / 3.3E-08 / -0.131 / -0.055
rs6739979 / 4.7E-08 / -0.742 / -0.779
rs6799130 / 2.82E-08 / 0.138 / 0.345
rs7131944 / 9.02E-09 / 0.029 / -0.235
rs7306755 / 1.26E-12 / 0.154 / 0.054
rs76076331 / 3.63E-08 / 0.276 / 0.218
rs7767938 / 2.44E-08 / -0.093 / -0.007
rs7854982 / 1.29E-08 / 0.716 / 0.801
rs7945718 / 1.54E-08 / -0.156 / -0.256
rs7955289 / 4.49E-10 / -0.224 / -0.361
rs895606 / 2.25E-08 / 0.390 / 0.655
rs9320913 / 2.46E-19 / 0.793 / 0.747
rs9537821 / 1.5E-16 / -0.395 / -0.363
Mean / 0.089 - CI (-0.035/ 0.213) / 0.091 - CI (-0.047/0.229)

Four indicators of factor reliability were devised: 1) Average factor loading (mean loading of the 10 SNPs on the factor); 2) correlation to the factor scores obtained from Rietveld et al. (2013) and Davies et al. (2016); 3) Correlation with population IQ; 4) SAC-free Beta (“SAC Beta” for short). The values of these indicators are reported in table 8 for each of the 7 SNPs sets, along with their p value rank.

Table 8. Factor validity and reliability indicators.

SNP set / Average Fac. Loading / r x Fac Rietv_Dav / r x IQ / SAC Beta / P value rank
Set 1 / 0.39 / 0.608 / 0.698 / 0.122 / 1
Set 2 / 0.221 / 0.896 / 0.715 / -0.076 / 2
Set 3 / 0.051 / -0.847 / -0.720 / -0.011 / 3
Set 4 / 0.152 / 0.199 / 0.094 / -0.090 / 4
Set 5 / 0.199 / 0.684 / 0.643 / -0.138 / 5
Set 6 / 0.046 / 0.560 / 0.394 / 0.096 / 6
Set 7 / 0.269 / -0.813 / -0.782 / 0.204 / 7
Mean / 0.190 / 0.184 / 0.149 / 0.015

Table 9 reports the intercorrelations between the accuracy measures and p value rank.

Table 9. Intercorrelations between the accuracy measures and p value rank

A novel measure of factor accuracy(“meta-accuracy”) was calculated as the mean between the four indicators (table X). In turn, the Spearman-rank correlation between the meta-accuracy vector and p value rank was computed. A negative correlation was found: r= -0.408.

With the aim of validating the meta-accuracy measure, a meta-factor was created by factor analyzing the scores of the 7 factors. The factor loadings (“meta-loadings”) were in turn correlated to the meta-accuracy vector, thus producing a “meta-Jensen coefficient” (table 10). The correlation between the two meta-vectors was r= 0.969.

Table 10. Meta-indicator of factor accuracy (“meta-accuracy).

P value rank / Meta-accuracy / Meta-loadings
Set 1 / 1 / 0.455 / 0.76
Set 2 / 2 / 0.439 / 0.8
Set 3 / 3 / -0.382 / -0.99
Set 4 / 4 / 0.089 / -0.2
Set 5 / 5 / 0.347 / 0.93
Set 6 / 6 / 0.274 / 0.16
Set 7 / 7 / -0.281 / -0.96

Factor scores for the meta-factor are reported in table 11.

Table 11. Meta-factor scores

Population / Metafactor_Okbay2016
Afr.Car.Barbados / -1.356
US Blacks / -1.230
Bengali Bangladesh / 0.476
Chinese Dai / 0.572
Utah Whites / 0.825
Chinese, Bejing / 0.534
Chinese, South / 0.532
Colombian / 0.593
Esan, Nigeria / -1.703
Finland / 1.077
British, GB / 0.560
Gujarati Indian, Tx / 0.524
Gambian / -1.739
Iberian, Spain / 0.668
Indian Telegu, UK / 0.364
Japan / 0.540
Vietnam / 0.621
Luhya, Kenya / -1.645
Mende, Sierra Leone / -1.729
Mexican in L.A. / 0.625
Peruvian, Lima / 0.603
Punjabi, Pakistan / 0.559
Puerto Rican / 0.431
Sri Lankan, UK / 0.587
Toscani, Italy / 0.405
Yoruba, Nigeria / -1.692

A linear regression of population IQ on the three factors was carried out. Scatterplots are reported in figures 1 a,b.

Figure 1a. Regression of population IQ on factor extracted from the Okbay et al. (2016) dataset.

Figure 1b. Regression of population IQ on factor extracted from the Rietveld et al. (2013) & Davies et al. (2016) datasets.

There were substantial intercorrelations between the meta-factor, the Rietveld+Davies factor scores and IQ (table 12). SAC-control was applied to the meta-factor. This produced a very weak SAC-free effect (B= 0.097; Fst B= 0.508).

Table 12. interrcorrelations between the meta-factor, the Rietveld+Davies factor scores and IQ

To extract a reliable estimate of polygenic selection, the average of the two factors (“average factor”) was computed. The correlation between the average factor and population IQ was r= 0.858.

LD pruning

Cross-GWAS linkage was checked by feeding SNPSNAP with the list of 86 SNPs, with LD thresholds of 500kb and r= 0.5.

In total, 8 SNP pairs were found to be in LD. One SNP was present in two GWAS datasets (rs9320913).

A list of replicated or pseudo-replicated (in LD across studies) SNPs was created, composed of one of the two linked SNP(one for each pair) and the 8 SNPs in LD across GWAS (table 13). The polygenic scores from the linked SNPs are reported in table 14. The correlation between the two scores is r= 0.919.

Table 13. Pseudo-replicated and replicated SNPs. Sites in LD (r>0.5).

Publication / Index SNP / Publication / Linked SNP
Davies et al., 2016 / rs12042107 / rs1008078 / Okbay et al., 2016
Rietveld et al., 2013 / rs11584700 / rs11588857 / Okbay et al., 2016
Rietveld et al., 2013 / rs4851266 / rs12987662 / Okbay et al., 2016
Davies et al., 2016 / rs13086611 / rs148734725 / Okbay et al., 2016
Davies et al., 2016 / rs11130222 / rs11712056 / Okbay et al., 2016
Davies et al., 2016 / rs55686445 / rs62263923 / Okbay et al., 2016
Davies et al., 2016 / rs12553324 / rs13294439 / Okbay et al., 2016
Davies et al., 2016 / rs4799950 / rs12969294 / Okbay et al., 2016
Rietveld et al., 2013 / rs9320913* / rs9320913 / Okbay et al., 2016

*Replicated

Table 14. Replicated/pseudo-replicated PS score.

Population / PS (Rietveld_Davies) / PS (Okbay et al., 2016)
Afr.Car.Barbados / 0.355 / 0.224
US Blacks / 0.379 / 0.259
Bengali Bangladesh / 0.412 / 0.353
Chinese Dai / 0.481 / 0.425
Utah Whites / 0.425 / 0.384
Chinese, Bejing / 0.559 / 0.481
Chinese, South / 0.532 / 0.462
Colombian / 0.395 / 0.343
Esan, Nigeria / 0.357 / 0.227
Finland / 0.469 / 0.442
British, GB / 0.465 / 0.416
Gujarati Indian, Tx / 0.441 / 0.389
Gambian / 0.368 / 0.218
Iberian, Spain / 0.433 / 0.396
Indian Telegu, UK / 0.423 / 0.365
Japan / 0.536 / 0.458
Vietnam / 0.520 / 0.450
Luhya, Kenya / 0.343 / 0.231
Mende, Sierra Leone / 0.371 / 0.231
Mexican in L.A. / 0.387 / 0.335
Peruvian, Lima / 0.351 / 0.299
Punjabi, Pakistan / 0.429 / 0.378
Puerto Rican / 0.399 / 0.345
Sri Lankan, UK / 0.410 / 0.361
Toscani, Italy / 0.431 / 0.396
Yoruba, Nigeria / 0.363 / 0.231

Frequencies of the replicated hits were also calculated for the 5 super-populations (i.e. races) of 1000 Genomes for both SNP sets. A boxplot is shown in figures 2a and 2b.

Figure 2a. PS of linked/replicated SNPs by race. SNPs from Rietveld et al. (2013) and Davies et al. (2016).

Figure 2a. PS of linked/replicated SNPs by race. SNPs from Okbay et al. (2016).

The replicated SNPs were factor analyzed. Factor scores and loadings are reported in tables 15 and 16, respectively.

Table 15. Factor scores (replicated SNPs)

Population / Factor Repl. (Davies and Rietveld, 2016) / Factor Repl. (Okbay et al., 2016)
Afr.Car.Barbados / -1.305 / -1.309
US Blacks / -1.139 / -1.144
Bengali Bangladesh / -0.233 / -0.355
Chinese Dai / 1.068 / 1.051
Utah Whites / 0.434 / 0.437
Chinese, Bejing / 1.627 / 1.608
Chinese, South / 1.421 / 1.488
Colombian / -0.047 / -0.092
Esan, Nigeria / -1.430 / -1.411
Finland / 0.600 / 0.555
British, GB / 0.709 / 0.742
Gujarati Indian, Tx / 0.246 / 0.230
Gambian / -1.280 / -1.328
Iberian, Spain / 0.345 / 0.318
Indian Telegu, UK / 0.028 / -0.014
Japan / 1.493 / 1.443
Vietnam / 1.441 / 1.468
Luhya, Kenya / -1.434 / -1.426
Mende, Sierra Leone / -1.268 / -1.319
Mexican in L.A. / -0.052 / 0.024
Peruvian, Lima / -0.108 / 0.034
Punjabi, Pakistan / 0.151 / 0.190
Puerto Rican / -0.035 / -0.049
Sri Lankan, UK / 0.034 / -0.029
Toscani, Italy / 0.153 / 0.268
Yoruba, Nigeria / -1.419 / -1.380

Table 16. Factor loadings (replicated SNPs)

SNP / Loading (Davies and Rietveld, 2016) / SNP / Loading (Okbey et al., 2016)
rs12042107_C / -0.59 / rs1008078 / 0.14
rs11584700_G / 0.85 / rs11588857 / 0.82
rs4851266_T / 0.94 / rs12987662 / 0.97
rs9320913_A / 0.71 / rs148734725 / -0.52
rs13086611_T / -0.75 / rs11712056 / 0.61
rs11130222_A / 0.06 / rs62263923 / 0.88
rs55686445_C / 0.87 / rs13294439 / 0.88
rs12553324_G / 0.86 / rs12969294 / 0.49
rs4799950_G 0 / 0.69 / rs9320913 / 0.71
Average / 0.404 / 0.553

The two factors were almost identical (r= 0.998).

Finally, a list of cross-GWAS clumped SNPs was created by keeping only one SNP for each LD pair (e.g. rs12042107 (Davies) - rs1008078 (Okbay). Only the latter (rs1008078) was preserved). Obviously, the replicated SNP (rs9320913) was counted only once.

This resulted in a list of “LD-clumped” (86-8-1)= 77 SNPs.

A LD-clumped polygenic score was calculated. This is reported in table 17.

LD-clumped polygenic score (LD clumping across independent hits from three GWAS. Pre-clumping N=86; Post clumping and overlap: N=77).

Table 17. LD-clumped Polygenic Score.

Population / PS Clumped
Afr.Car.Barbados / 0.498
US Blacks / 0.509
Bengali Bangladesh / 0.511
Chinese Dai / 0.563
Utah Whites / 0.508
Chinese, Bejing / 0.579
Chinese, South / 0.571
Colombian / 0.512
Esan, Nigeria / 0.496
Finland / 0.530
British, GB / 0.516
Gujarati Indian, Tx / 0.508
Gambian / 0.498
Iberian, Spain / 0.523
Indian Telegu, UK / 0.506
Japan / 0.573
Vietnam / 0.566
Luhya, Kenya / 0.495
Mende, Sierra Leone / 0.497
Mexican in L.A. / 0.510
Peruvian, Lima / 0.494
Punjabi, Pakistan / 0.516
Puerto Rican / 0.505
Sri Lankan, UK / 0.506
Toscani, Italy / 0.519
Yoruba, Nigeria / 0.500

The LD-clumped PS had the following correlations with the other variables: r x IQ: 0.766; r x FactorRietvDavies: 0.835; r x Metafactor: 0.475.

The population IQ variable had some missing cases so the correlations are reported both with and without IQ (table 18a and 18b, respectively).

Table 18a. Correlation plot (all polygenic and factor scores). With IQ.

Table 18b.Correlation plot (all polygenic and factor scores). Without IQ.

Figure 3 reports the boxplot of the LD clumped PS by race.

Figure 3. LD-clumped polygenic score by race.

Okbay et al. (2016). 162 independent SNPs that reached genome-wide significance (P < 5×10-8) in the pooled-sex EduYears meta-analysis of the discovery and replication samples (N =405,072)

154 SNPs were found in 1000 Genomes. The polygenic score was computed (table 19). Its correlation to population IQ was r= 0.863 (scatterplot figure 4).

Population / PS
Afr.Car.Barbados / 0.4853493506
US Blacks / 0.4849350649
Bengali Bangladesh / 0.5049357143
Chinese Dai / 0.5171298701
Utah Whites / 0.5056584416
Chinese, Bejing / 0.5298993506
Chinese, South / 0.5240006494
Colombian / 0.5015116883
Esan, Nigeria / 0.4792058442
Finland / 0.5170064935
British, GB / 0.5086090909
Gujarati Indian, Tx / 0.5079487013
Gambian / 0.4844811688
Iberian, Spain / 0.5171688312
Indian Telegu, UK / 0.5080181818
Japan / 0.530288961
Vietnam / 0.5233383117
Luhya, Kenya / 0.4777168831
Mende, Sierra Leone / 0.475287013
Mexican in L.A. / 0.4983077922
Peruvian, Lima / 0.4769019481
Punjabi, Pakistan / 0.5071402597
Puerto Rican / 0.501524026
Sri Lankan, UK / 0.5033376623
Toscani, Italy / 0.5162746753
Yoruba, Nigeria / 0.4824142857

Figure 4. Relationship between P.S. computed from hits by Okbay et al. (2016)’s pooled meta-analysis and population IQ.

SAC: clumped and replicated SNPs

Spatial autocorrelation analysis was run on the three scores. The effect size is reported in table 20.

Table 20. SAC control for polygenic and factor scores.

Source / Fst / Factor/PS
PS clumped. B= / 0.476 / 0.250
PS replicated_Rietv_Davies. B= / 0.395 / 0.352
PS replicated_Okbay. B= / -0.062 / 0.791
PS meta-analysis_Okbay. B= / 0.229 / 0.500
Factor replicated. B= / 0.002 / 0.695

Simulation

Factor loadings

100 sets of 10 SNPs matched to the top significant SNPs in Okbay et al. (2016) were obtained from SNPSNAP. After removal of problematic SNPs (when frequency was 0 for a population, that population was not counted, creating mismatch between rows, hence these had to be removed). Among those,the first 200 sets (to speed up computation) of 10 random SNPs were chosen for a simulation. Factor analysis was iterated over each set. The average factor loading was 0.268 (SD=0.176).

This information was used as a baseline, null model to test against polygenic selection. Z scores were calculated for factor analysis of GWAS hits= (Average loading-0.268)/0.176

Table 21. Z-scores of factor loadings

Factor / Average Loading / Z-score
LD-clumped (Davies et al. 2016 + Rietveld et al., 2013) / 0.494 / 1.284
Pseudo-replicated (Davies et al., 2016 and Rietveld et al., 2013) / 0.404 / 0.773
Pseudo-replicated (Okbay et al., 2016) / 0.533 / 1.506
Okbay et al., 2016: Set 1 / 0.39 / 0.693
Set 2 / 0.221 / -0.267
Set 3 / 0.051 / -1.233
Set 4 / 0.152 / -0.659
Set 5 / 0.199 / -0.392
Set 6 / 0.046 / -1.261
Set 7 / 0.269 / 0.006

Factor scores

The correlations between the factor scores for the 200 sets of 10 SNPs and population IQ were computed. The average Pearson’s r was 0.22 (95% C.I.= -0.757; 0.823; 99% C.I= -0.826; 0.886). Thus, the correlations between the factors (pseudoreplicated hits) and IQ (r=0.89) is significant according to the conventional p value (0.05).

Polygenic scores

The set of 7914 SNPs was divided into 52 sets of 152 SNPs. N=152 was chosen because it corresponds to the number of SNPs in the pooled Okbay et al. (2016) sample.

The average correlation between the polygenic scores for 52 sets of 152 SNPs and IQ was 0.467 (95% C.I= -0.100; 0.817). The upper limit of the 95 % CI was almost identical to that obtained for the factor scores simulation (0.823). Hence, the correlation between the 152 SNPs GWAS hits polygenic score and population IQ (r= 0.863) is significant according to the conventional p value (0.05).

Discussion

The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment were used to test a polygenic selection model.

Strong inter-correlations among population-level polygenic scores of alleles found by three independent GWAS to be associated with educational attainment were observed. Moreover, these polygenic scores were substantially correlated to estimates of average population IQ (table 2).