Title:Genetic Association of Candidate Regions and Risk Scores in a COPD Meta-Analysis

Brief Title:Candidates And Risk Scores COPD Meta-Analysis

Authors:
Affiliations: / Robert Busch, MD1,Brian D Hobbs, MD1,Jin Zhou PhD2,Peter J Castaldi MD1,Michael J McGeachie PhD1, Megan E Hardin MD1, Iwona Hawrylkiewicz MD3,Pawel Sliwinski MD3,Jae-Joon Yim MD4,Woo J Kim MD5,Deog K Kim MD6,Alvar Agusti MD7,Barry J Make MD8,James D Crapo MD8,Peter M Calverley DSc9,Claudio F Donner MD10,David A Lomas ScD11,Emiel F Wouters MD12,Jorgen Vestbo MD13,Ruth Tal-Singer MD14,Per Bakke MD15,Amund Gulsvik MD15,Augusto A Litonjua MD1,David Sparrow DSc16,Peter D Paré MD17,Robert D Levy MD17,Stephen I Rennard MD18,Terri H Beaty PhD19,John Hokanson PhD20,Edwin K Silverman MD1,and Michael H Cho MD1;for the NETT Genetics, ECLIPSE, ICGN, and COPDGene Investigators
1 Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA/USA,2 University of Arizona, Tucson, AZ/USA,3 National Tuberculosis and Lung Disease Research Institute, Warsaw/PL,4 Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul National University College of Medicine, Seoul/KR,5Kangwon National University, Chuncheon/KR,6 Seoul National University College of Medicine Boramae Medical Center, Seoul/KR,7 Thorax Institute, Hospital Clinic, IDIBAPS, University of Barcelona, CIBERES, Barcelona/ES, 8 National Jewish Health, Denver, CO/US,9 University of Liverpool, Liverpool/UK,10Mondo Medico di I.F.I.M. srl, Multidisciplinary and Rehabilitation Outpatient Clinic, Borgomanero, Novara/IT,11University College London, London, UK12University Hospital Maastricht, Maastricht/NL,13University of Manchester, Manchester/UK,14GSK Research and Development, King Of Prussia, PA/USA,15University of Bergen, Bergen/NO,16Brigham and Women's Hospital and the VA Medical Center - Jamaica Plain, MA/USA,17Respiratory Division, Department of Medicine, University of British Columbia, Vancouver, BC/CA,18University of Nebraska Medical Center, Omaha, NE/US,19 Department of Epidemiology, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD/USA, 20University of Colorado, Colorado School of Public Health, Aurora, CO/USA

Corresponding Author:

Robert Busch, MD

Channing Division of Network Medicine

Brigham and Women's Hospital

181 Longwood Ave, Room 456

Boston, MA, 02115

Email:

Telephone: +1 617 525 0959

Supplement:

This manuscript is accompanied by a Methods and Data Supplement

ABSTRACT:

The heritability of COPD cannot be fully explained by existing genome-wide significant risk loci. Studies of candidate regions from previous studies of COPD or lung function in a larger sample size may identify additional associated variants, particularly for severe disease. In addition, the combined contribution of these variants to COPD risk has not been adequately explored.

We genotyped a candidate panel of single nucleotide polymorphisms (SNP) for association with COPD in 2588 cases (1803 severe) and 1782 controls from four cohorts, and performed association testing, combining these results with existing data from 6633 cases (3497 severe) and 5704 controls. Additionally, we developed genetic risk scores from lung function- and COPD-associated SNPs and tested the scores' ability to discriminate cases and controls and explain FEV1.

We identified genome-wide significant associations near PPP4R4 and PPIC/SERPINA1 with severe COPD. No additional candidate regions were significant. Genetic risk scores based on SNPs previously associated with COPD and lung function had a modest ability to discriminate COPD (AUC ~0.6) and accounted for a mean 0.9-1.9% decrease in FEV1 percent-predicted for each additional risk allele, adjusted for age and pack-years of smoking.

Candidate regions (individually or combined as risk scores) may yield significant associations with COPD.

Key Words:chronic obstructive pulmonary disease, Genetic epidemiology, genetic risk factors, alpha-1 antitrypsin

Take Home Message: The PPIC and PPP4R4/SERPINA1 loci are associated with severe COPD in a meta-analysis of over 16,000 subjects.

Introduction

Chronic obstructive pulmonary disease(COPD),a progressive lung disease characterized by irreversible airflow obstruction, is a leading cause of morbidity and mortality worldwide.1, 2While cigarette smoking is the major determinant of COPD susceptibility in the developed world, 3-5 the pulmonary response to cigarette smoking is highly variable.6 Genetic factors contribute to the variability in smoking response, and multiple studies have identified genetic variants associated with increased COPD susceptibility.7-12 The majority of COPD heritability remains unexplained,13 however. In addition, the effect of several previously described risk alleles on lung function or risk of disease, particularly in cohorts of severely affected subjects, has not been well studied. Meta-analysis of genetic association cohorts has the advantage of improving power to detect additional COPD susceptibility risk variants by combining information across studies, which may add to our understanding of disease mechanisms14 as well as provide potential new targets for COPD therapy development15, 16.

There were two primary goals of this study. First, we wished to investigate a panel of previously COPD-relatedvariants in a larger meta-analysis of cross-sectional data in order to increase our power, particularly for severe COPD. The candidate panel included variants in previously reported candidate genes 17 hypothesized to affect COPD, variants that approached genome-wide significance in previous GWAS studies18, and genetic variants in genes previously associated with lung function ("lung function variants").19-21 We hypothesized that some of these loci would reach pre-defined levels of statistical significance with additional sample size.

Since genetic variation is present from birth, genetic risk scores in cross-sectional data may offer a way to consolidate genetic information22 into a clinically meaningful tool that may help clinicians to predict disease susceptibility, progression, and outcomes23, 24. Our second goal was to determine the effect of genetic risk scores that modeled the effect of COPD- and lung function-associated risk alleles on the clinical outcomes of COPD-affection status, severe COPD-affection status, and forced expiratory volume in one second (FEV1) percent predicted. We hypothesized that a combined risk score composed of both COPD and lung function SNPs would explain the genetic contribution to COPD-related outcomes in a clinically useful manner.

Material and Methods

We performed genetic meta-analysis usingeight cohorts, including a total of 16,707 subjects. Baseline characteristics of each of the cohorts are shown in Table 1. Detailed description ofthese cohorts, including quality control and associations,have been previously published:the Genetic Epidemiology of COPD (COPDGene)Study includingnon-Hispanic White (NHW) and African-American(AA) subsets,25Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-Points (ECLIPSE),26National Emphysema Treatment Trial (NETT)27 / Normative Aging Study (NAS),28and Genetics of COPD in Norway (GenKOLS).29Additional genotyping was performed in the Transcontinental COPD Genetics Study (TCGS) Korea cohort andthe TCGS Polandcase-control cohorts, as well as the International COPD Genetics Network (ICGN) and Boston Early-Onset COPD Study (EOCOPD)pedigree-based studies.30ICGN recruited subjects with COPD and available siblings and parents, while EOCOPD recruited extended pedigrees of COPD probands.30 IRB approval and written informed consent were obtained for all of these cohorts.

All subjects in the COPDGene, ECLIPSE, GenKOLS, NETT/NAS, TCGS, and ICGN study were current or former cigarette smokers; EOCOPD included a small number of non-smokers, both with and without COPD (Table 1; additional details are available in the Supplement). In this meta-analysis, we defined “moderate to severe” COPD asGOLD2 spirometric Grade 2-4 COPD(post-bronchodilator FEV1/FVC <0.7, FEV1 <80% predicted), while “severe” COPD was defined asGrade 3-4 COPD (FEV1/FVC <0.7, FEV1 <50% predicted). Controls had normal spirometry (FEV1/FVC 0.7, FEV180%). We classified subjects in each dataset using these consistent definitions of case status. Previously diagnosed alpha-1 antitrypsin deficiency wasan exclusion criteria from all cohorts.

Genotyping

A total of 4900 non-Hispanic White subjects (ICGN= 3043, EOCOPD = 1198, TCGS-Poland = 659) and 458 Korean subjects from TCGS-Korea were genotyped using the HumanExome v1.2 microarray (Illumina, San Diego, CA) and a set of 5,640 custom markers (see Supplement). This custom content included top results from previously published COPD GWAS18, variants identified in association with lung function,19-21, 31 and an additional set of variants from a previous candidate gene analysis (see Supplement).17 These data were combined with pre-existing genotyping (previously investigated by Cho et al18) from the COPDGene, ECLIPSE, NETT/NAS, and GenKOLS studies for meta-analysis.

GeneticAnalysis

PLINK v1.932and GWAF33 were used to perform multiple logistic regression within each case-control and pedigree dataset, respectively, adjusting for age, pack-years of smoking, and principal components of genetic ancestry as previously described. Pedigree-study data were also adjusted for within-family variability using generalized estimating equations with an exchangeable correlation structure. METAL34 was used to perform fixed-effects meta-analysis. Only markers passing genotyping or imputation quality control in at least six of the eight cohorts were included in the analysis, which limited our analysis to approximately 45,000 SNPs. We considered a p-valuethreshold of 5x10-8as genome-wide significant.

For the analysis of COPD-related lociselected from lung function and candidate genes (not including top results from prior COPD GWAS, see Methods; Supplement), we calculated200kb flanks around each candidate SNP using dbSNP mappings (b37). Within each region, we identified the COPD-associated variantwith the lowest p-value ("lead SNP"). We calculated values of D' and r2 between the candidate SNP and lead SNP using PLINK v1.9 with a 1000Genomes phase I v3 EUR reference panel. We designated a candidate-specific significance p-value of 7.5x10-6, equal to the traditional genome-wide significance p-value (5x10-8) dividedby the ratio of our collapsed windows to the length of the total genome, to correct for multiple testing of the SNP-associations within these limited testing regions.35

Genetic Analysis: Genetic Risk Scores

We used PLINK v1.9 to create three separate genetic risk scoring systems (see Table 2). The first was composed of 7genome-wide significant COPD risk association variants from the NHGRI database (COPD7). The second score consisted of25 lung function-associated SNPs(FX25) from previous GWAS.19, 20, 31The final risk score incorporated both the COPD7 SNPs and the FX25 SNPs. Since two lung function loci (HHIP and FAM13A) were already representedinthe COPD7 score, this score had a total of 30 variants (LUNG30). We oriented risk alleles to be consistent with prior reports and gave each allele equal weight. Allthree scoring systems were then applied to the ICGN cohort, the largest individual cohort not used in the genome-wide discovery of these variant associations.

The resultant scores were used as predictors in a linear mixed model of FEV1 percent predicted as well as logistic regression models of both moderate-to-severe and severe COPD incorporating generalized estimating equations. Models were controlled for age, pack-years of smoking, principal component of genetic ancestry, and for familial correlation. In addition, we used the pROC36 and GenAbel37 packages in R to compare the accuracy of two models (genetic risk factors and clinical predictors versus clinical predictors alone) at explaining moderate-to-severe and severeCOPD affection risk. In addition to examining ROC curves, we also used the net reclassification index38 (NRI) to characterize our risk scores' efficacy. The NRI evaluates risk in the decision-making context, and offers an alternative interpretation of classification results. We used the NRI to evaluate the added discriminatory benefit of the addition of genetic information fromgenetic risk score SNPs to a clinical model by dividing subjects into three tiers of COPD risk (low, intermediate, and high) using a clinical risk model based on age and pack-years of smoking. NRIwas calculated using the PredictAbel package,39 and data are presented as total NRI as well as event NRI and nonevent NRI components.The risk scores were also applied to the COPDGene and TCGS Poland cohorts using analagous methods.

Additional detail regarding the cohorts used in this study; genotype-, marker-, and subject-level quality control; and risk score modelingand NRI analysis are available in the Online Methods and Data Supplement.

Results

The baseline characteristics of the cohorts are shown in Table 1. Notably, the TCGS-Korea, TCGS-Poland, and NETT/NASstudies were designed to contain only severe COPD cases, which is reflected in the low average FEV1percent predicted among cases.

GeneticAnalysis: COPD GWAS Follow-up Variants

The moderate-to-severe analysis included 9221 cases and 7486 controls. Previously described COPD risk lociat the TGFB2, FAM13A, HHIP, CHRNA3/CHRNA5/IREB2, and RIN3 regions were genome-wide significant (Supplemental Table 1). In addition, a locusat 16p11.2 (rs40834, p-value 1.90x10-8, odds ratio of 1.17) was associated with moderate-to-severe COPD at genome-wide significance. This locus was recently described in anexome chip analysis of these cohorts.40

The analysis of severe COPD(Table 3) included 5300 cases and 7486 controls. We confirmed genome-wide significance at the TGFB2, FAM13A, HHIP, MMP3/MMP12, and CHRNA3/CHRNA5/IREB2 loci. We also identified two genome-wide significant loci at 5q23.2 between PRDM6 and PPIC(rs6860095,p-value 1.01x10-8,odds ratio of 1.24), and a 14q32.13 intronic variant withinPPP4R4 (rs112458284,p-value 1.28x10-8, odds ratio of 1.69).

We examined these loci using the GTEx eQTL database41 and Haploreg v4.1.42The rs6860095 SNPaffectedgene expression levels of PPIC, snoU13, SNX2, and RN7SL689P in multiple tissues, though not in lung. No significant eQTLs were found for rs112458284; however, it lies approximately 200kb away from SERPINA1, which encodes the protein responsible for alpha-1 antitrypsin deficiency.43, 44We investigated whether this SNP could be tagging alleles of SERPINA1 known to contribute to COPD (e.g. the Z-allele rs28929474or S-allele rs17580). Rs112458284 showed LD with the Z-allele in directly genotyped samples from COPDGene NHW (r2=0.41, D’= 0.78) and, to a lesser extent, the S-Allele (r2= 8.63x10-5, D’= 0.25). To further investigate whether there was any association signal at this locus independent from the Z-allele, we also conditioned on the Z-allele in a meta-analysis model, and found the signal was attenuated(p-value 0.0087).

Known alpha-1 antitrypsin deficiency was an exclusion criterion in our study; however, our genotyping (and imputed data) identified three previously unknown Z-allele homozygotes in the Poland cohort30 and six additional Z-allele homozygotes in the ECLIPSE cohort.45After removing these subjects, the rs112458284 association was mildly attenuated (p-value7.22x10-8). Thus, heterozygous carriers of the Z-allele are driving a large proportion of this association, consistent with prior studies showing an increased risk for MZ heterozygotes.46In addition, these results suggest that if we had not specifically excluded known alpha-1 antitrypsin deficiency in our other populations, that the association p-value with rs112458284 would likely be even lower.47

GeneticAnalysis: Additional Candidate Loci

Next , we focused on a set of regions and variants hypothesized to affect COPD. We defined "lead SNP" as the association with the lowest p-value in a given region, and the "candidate SNP" as the previously described variant. For 26 of these lead SNPs, LD with the candidate SNPmeasured by D'was >0.8, while only ninealso had an r2 >0.3 (Table 4 and Supplemental Table 2). While no candidate loci were genome-wide significant (except HHIP and FAM13A, previously discovered), several lead SNPs within the 200kb windows of prior candidates achieved p-values that met our candidate-specific threshold of7.5x10-6, including SNPs in the TGFB2-LYPLAL1, THSD4, MMP1/MMP12, AGER/PPT2, and ADAM19 regions.

Notably, lung function variants showed increased risk for COPD in 23 of 25previously reported SNPs directly genotyped in our meta-analysis (Table 4). 12 of these 25 lung function risk alleles showed a nominally statistically significant (unadjusted p-value 0.05) effect on COPD risk; only lung function risk-alleles annotated to the ZKSCAN3 and NCR3-AIF1 genes showed a directionally discordant effect on COPD susceptibility (lowered risk of COPD), though those discordant association results were not statistically significant.

Genetic Analysis: Genetic Risk Scores

We examined the abilityofgenetic risk scores to explainboth FEV1 percent predicted as well as COPD affection status in the ICGN cohort. We found a trend among quantiles of risk scores in an unadjusted model (Figure 1). In a linear mixed model adjusting for age, pack-years of smoking, principal components of ancestry, and a within-family component,we found that the COPD7 risk score (0 to 14 possible alleles) was associated with a1.86% reduction in FEV1percent predicted foreach additional risk allele (Table5a). Using generalized estimating equations for models of moderate-to-severe and severe COPD (Table 5b),each additional risk allele of the COPD7 was associated with an odds ratio (OR) of 1.18 for moderate-to-severe COPD and 1.19for severe COPD(p-value4.1x10-8 and 4.4 x10-8, respectively). We found nearly identical results for a standard logistic regression (OR 1.17 and 1.19) without family adjustment, and therefore used these models for receiver operator characteristic (ROC) curves for affection status using genetic variants alone, age and pack-years, and the combination of age, pack-years, and genetic information. The area under the curve (AUC) for the genetic model was 0.58 for moderate-to-severe COPD and 0.59 for severe COPD; however, only modest increases in AUC were observed with the addition of genetic risk scores to clinical predictors (Figure 2). Three-tiered categorical analysis of reclassification38 after addition of the COPD7 risk score and adjustment for genetic components of ancestry to the clinical model (containing only age and pack-years of smoking)resulted in a net reclassification index (NRI) of 0.053 (p-value 2.32x10-3) for the combined model risk stratification of moderate-to-severe COPD and an NRI of 0.047for risk stratification of severe COPD (p-value 0.01). For the expanded FX25 and LUNG30 scores, we found a lower per-allele but larger overall effect (Tables 5a and 5b). We also tested risk scores in the TCGS Poland and COPDGene cohorts and found comparable results (see Online Methods and Data Supplement).

Discussion

In a meta-analysis of multiple cohorts of moderate to severe and severe COPD, we identified two new genome-wide significant loci, including one in strong LD with SERPINA1,andidentified consistent direction of effect on risk to COPD in 23 previously identified markers associated with lung function, consistent with recent reports.7 We also constructed genetic risk scores that demonstrated compelling relationships for quantitative measures of lung function and modest discrimination for COPD affection status. Our results further inform the discussion of how genetic variants influence COPD susceptibility.

The discovery that variants in LD with SERPINA1 are associated with severe COPDdemonstrates that genome-wide association studiescan identify known disease mechanisms.This variant is also in strong LD with rs45505795 near SERPINA10 (r2= 0.96 and D'=1.0 in 1000 Genomes EUR Phase I v3 data), which we recently described in a GWAS of quantitative measures of emphysema.45The 5q23.2 locus containing rs6860095 is a novel locus for severe COPD risk that lies between PRDM6 and PPIC. Peptidylprolyl Isomerase C (PPIC, also known as Cyclophilin C) has functionsrelatedto mitochondrial metabolism, inflammation, and immune response through its interactions with cyclosporine A. While Cyclophilin A has been associated with both COPD48 and lung cancer,49 to our knowledge no prior study has linkedPPIC with risk of COPD. The PRDI-BF1 andRIZ homologyDomain Containing 6 (PRDM6) protein is involved in chromatin remodeling and transcriptional control of smooth muscle gene expression.50Expression of PRDM6 has been implicated in the pseudoglandular and canalicular stages of lung morphogenesis in murine models and expression has been documented in smooth muscle of the developing murine trachea, bronchi, and pulmonary trunk.50Additional studies are needed to confirm this 5q32.2 association in severe COPD.