The genetics revolution in rheumatology: large scale genomic arrays and genetic mapping

Stephen Eyre1, Gisela Orozco1, Jane Worthington1,2

1 Arthritis Research UK Centre for Genetics and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Stopford Building, Oxford Road, Manchester M13 9PT, UK.

2 NIHR Manchester Musculoskeletal BRU, Manchester Academic Health Sciences Centre, Central Manchester Foundation Trust, Grafton Street. Manchester M13 9NT, UK.

Correspondence to J.W.

ABSTRACT

Susceptibility to rheumatic diseases, such as osteoarthritis, rheumatoid arthritis, ankylosing spondylitis, systemic lupus erythematosus, juvenile idiopathic arthritis and psoriatic arthritis, includes a large genetic component. Understanding how an individual’s genetic background influences disease onset and outcome can lead to a better understanding of disease biology, improving diagnosis and treatment and, ultimately, to disease prevention or cure. The pastdecade has seen great progress in the identification of genetic variantsthat influence the risk of rheumatic diseases. The challenging task of unravelling the function of these variants is ongoing. In this Review, the major insights from genetic studies in the context of rheumatic disease, gained from advances in technology, bioinformatics and study design, are discussed. In addition, the pivotal genetic studies of the main rheumatic diseases are highlighted, with insights into how these studies have changed the way in which we view these conditions in terms of disease overlap, pathways to disease and potential new therapeutic targets. Finally, the limitations of genetic studies, gaps in our knowledge and ways in which current genetic knowledge can be fully translated into clinical benefit, are examined.

Introduction

The major rheumatic diseases, including osteoarthritis (OA), rheumatoid arthritis (RA), ankylosing spondylitis (AS), systemic lupus erythematosus (SLE), juvenile idiopathic arthritis (JIA) and psoriatic arthritis (PsA), have multiple genetic and environmental risk factors involved in disease onset. Susceptibility to all of these diseases is attributable largely to the genomic background of patients1-5, with heritability estimates ranging from 50% in OA5 to 90% in PsA3. This genomic susceptibility outweighs the combined risk arising from environmental factors, such as smoking, diet, environment, infection and occupation 1-5.

Genetic susceptibility is therefore a fundamental aspect of rheumatic diseases and a full understanding of how genetic variants influence disease process has the potential to influence disease management. No single genetic risk factor is essential for disease development; rather, combinations of multiple variants influence not only susceptibility to disease, but also the disease course or outcome as well as response to therapy. Indeed, heterogeneity, in terms of both disease presentation and outcome, presents a key challenge to rheumatologists. Genetics provides an opportunity to address this heterogeneity through classifying patients into more homogeneous subtypes based on the genetic pathways that drive disease. This stratification will enable clinicians to target therapy to those patients most likely to respond. Effectively treating disease early on is key to improving long-term outcomes for patients and the eventual development of predictive algorithms might enable identification of disease-susceptible individuals and perhaps ultimately lead to disease prevention. Finally, genetics can affect therapeutic management. Determining which genes and pathways are implicated in a disease provides the opportunity to reposition therapies that successfully target the same pathways in other diseases, or to revisit therapies that have previously failed in clinical trials of unselected patients but might have therapeutic benefit in a subset of patients. In the past 2 years, studies6 indicate that novel therapies targeting pathways whose role in disease is backed by genetic evidence are twice as likely to progress through a pharmaceutical pipeline and deliver a new drug than those without any genetic evidence, with obvious financial implications.

Arguably, the modern era of genetic research in rheumatic diseases began in 2007 with the Wellcome Trust Case Control Consortium (WTCCC) study7 (Table 1 and Figure 1). The WTCCC recognized that common variants with low effect sizes that increase risk to common diseases would only be detected through the use of large sample sizes, collected through collaborative efforts. The WTCCC study now seems relatively low-powered in comparison with some of the huge meta-analysis genome wide association studies (GWAS) performed subsequently that utilize many 10,000s to 100,000s of samples but, at the time, it introduced a paradigm shift in the genetic research of complex diseases. The study compared 14,000 patients who had one of seven diseases, with rheumatic diseases being represented by RA, and 3,000 controls. The use of high-density single nucleotide polymorphism (SNP) arrays revealed genome-wide associations for each disease. The WTCCC study was a step forward in genomic research as previous attempts to link genetic changes to disease development were either largely based on low-powered family linkage studies, or were ‘candidate gene’ studies that assessed disease associations one gene at a time. The success of this study was dependent on not only its scale, but also on the genetic mapping and technology advances that were happening concurrently.

A SNP is generally accepted to be robustly associated with disease in GWAS if the difference in variant (allele) frequency between cases and controls surpasses a ‘genome-wide significance’ threshold (typically a P-value of <5×10-8). This threshold is set to account for multiple comparisons and many variants, particularly those investigated in the early GWAS in which the sample sizes were still relatively small, might not reach genome-wide significance. Therefore, any putative association requires further validation in independent cohorts of cases and controls. Since the pioneering WTCCC study, many more GWAS have been performed such that the GWAS Catalog documented over 24,000 unique SNP-trait associations (as of August 2016) (http://www.ebi.ac.uk/gwas/home). The story for rheumatic diseases has been one of increasing GWAS sample sizes, leading to the discovery of an increasing number of disease-associated variants and providing further insights into disease mechanisms.

This Review will discuss the major insights gained from genetic studies in rheumatic diseases, such as genome-wide association studies, meta-analyses, fine mapping and studies performed across different diseases and/or ethnicities. The most relevant genetic studies of the main rheumatic diseases will be highlighted, and it will be discussed how they have enhance our knowledge of these diseases in terms of disease overlap, pathways to disease and potential new therapeutic targets. Finally, the first efforts to understand how genetic variants influence disease biology will be discussed.

[H1] GWAS in rheumatic diseases

The year 2007 saw the first real advances in determining novel genetic associations with RA susceptibility (Figure 1). The WTCCC study found two genetic regions associated with RA that reached genome-wide significance8: HLA-DRB1 and PTPN22. GWAS conducted in the USA added TNFAIP39 and the TRAF1-C5 locus10 to this list. Finally, a candidate-gene study confirmed an association of STAT4 with RA susceptibility11, bringing the number of loci implicated in RA risk up to five. Further studies in RA, using large international cohorts, validated markersthat either had evidence suggestive of an association with RA (that is, reaching a suggestive genome-wide significance threshold of P <1×10-6) or variants identified with statistical text mining methods (GRAIL) bringing the number of confirmed variants associated with RA up to 31 by 2010 12-15. These variants included those mapped to regions in CTLA4, IL2RA, CCL21 and AFF3. These early GWAS findings in RA confirmed the role of T cell immunity in disease development, something already assumed but now firmly established.

Similar GWAS have investigated the other major rheumatic diseases. For example, one of the first large-scale genetic studies in AS, which again involved the WTCCC, focused on 14,500 non-synonymous SNPs (that is, variants that result in a change of protein sequence) in 1,000 patients with AS and 1,500 healthy individuals 16. This study confirmed the already established link between HLA-B27 and risk of AS, while also implicating both ERAP1 and IL23R in disease pathogenesis. In 2011, the most recent GWAS in AS17, which used a discovery of over 1,700 patients and a validation cohort of ~2,000 patients with AS, confirmed the disease association with these three regions and also validated two others, in regions located in chromosomes 2 and 21, identified by a previous GWAS18. This study also found a further seven regions to be associated with AS, including variants within the RUNX3, IL12B, TNFRSF1A and CARD9 loci.

A well-powered GWAS that included ~7,500 patients with OA in the initial screen and similar numbers in a validation cohort provided further insights into the genetics driving OA19. This pivotal study brought the number of loci associated with OA onset from 3 to 11. Epidemiology studies had previously demonstrated a heterogeneity of the OA phenotype 20 and the findings from this GWAS emphasized this heterogeneity, in that associations were revealed only when the data was stratified into more homogeneous subtypes, for example by site of OA (hip or knee), sex, or both. The study also found an association of OA with the FTO gene, which was previously shown to be important in obesity 21, reinforcing the link between body weight and OA.

[H1] GWAS meta-analyses

The utilisation of GWAS around the globe has enabled the development of large-scale meta-analyses of many thousands of samples, collected through international collaborations. This increased sample size vastly improves the statistical power of a study to detect disease-associated variants, and reduces the necessity for individual groups to perform validation studies. However, interpretation of meta-analyses of global populations might still require careful consideration. For example, different frequencies of DNA variants can occur in different populations, potentially resulting in the identification of spurious or inexact associations. This issue can be avoided through the use of a robust study design and sophisticated analytical techniques (such as principal components analysis).

Meta-analyses have been of great use in rheumatology. For example, a meta-analysis of hip OA GWAS involving over 78,000 European participants 22 found a novel association with a SNP variant that was later shown to correlate with the expression of NCOA3 in cartilage23. NCOA3 is a gene linked to the expression of COL2A1, RUNX2 and MMP13 in chondrocytes, which are all important mediators of OA. A study that combined a GWAS, a meta-analysis and a validation study identified ten new SLE-associated loci, bringing the number of regions robustly associated with SLE in Europeans up to 40 24. This study was the largest genetic study in a European population, and involved over 7,000 participants of European descent. The researchers also identified ten missense coding variants that probably influence the risk of SLE, which were largely present in genes encoding kinases and other enzymes. In addition, the location of several disease-associated loci implicated a number of transcription factors in SLE, indicating that regulation of gene expression is likely to have an important role in disease development. Meta-analysis of three GWAS data sets from both Chinese and European populations, which totalled over 15,600 samples, revealed a further ten loci to be associated with SLE, and found that these risk loci generally overlapped between populations. In this study, the prevalence of SLE in different populations (that is, European, Amerindian, South Asian, East Asian and African populations) was matched by the genetic risk score conferred by the risk alleles25 .

A large international GWAS meta-analysis investigating RA, involving a consortium from across Europe, the USA and Japan, brought the number of identified RA risk regions up to 10126. This multi-ethnic study revealed extensive sharing of genetic risk loci between the Asian and European populations, with only five loci showing a population-specific association. The study findings also confirmed a strong link between RA and T cell immunity. Further analysis in 34 tissues , demonstrated an enrichment of RA risk loci in open and active regions of the genome (as defined by epigenetic histone markers) primary CD4+ T cells 27. By assigning putative causal genes to these regions, this study demonstrated an enrichment of disease-associated loci in signalling pathway genes, and genes encoding approved RA drug targets. A study investigating OA revealed differences in the disease-associated variants found in European and Asian cohorts, with only one region, found within the GDF5 gene, having an unambiguous shared genetic association between these two populations28. The protein encoded by GDF5, growth/differentiation factor 5 (GDF5), is a member of the transforming growth factor -β superfamily and is important in chondrogenesis and bone growth. Thus, large-scale international meta-analysis of GWAS has the capacity to determine ethnic differences in disease-associated variants, and illuminate these differences in terms of the variants, genes and pathways involved and the gene–environment interactions observed between individuals of different geographical regions and ethnicities.

The high degree of overlap in genetic susceptibility observed across different autoimmune and inflammatory diseases has led researchers to combine GWAS cohorts for meta-analysis across different diseases, as a means to increase the number of susceptibility loci identified. A combined analysis of AS and four other clinically related diseases, including psoriasis, Crohn’s disease and ulcerative colitis, used over 50,000 patients and 30,000 controls, which increased the power to detect genetic associations shared across these similar diseases and led to the identification of 17 new GWAS loci for AS, bringing the total number to 4829. Pathway analysis, followed by a search for genes that encode known drug targets, highlighted potential novel therapeutic targets in AS, such as CCR2 and CCR5. Hence, CCR2 and CCR5 antagonists (MLN-1202 and AMD-070 respectively) are potential novel therapeutics for AS.

[H1] Fine-mapping GWAS data

Associations identified following GWAS, validation and meta-analysis will typically implicate multiple genetic variants in disease susceptibility; further genetic studies are required to ‘fine map’ the genetic associations and better define the causal variant. Fine mapping involves genotyping large cohorts of cases and controls for all known variants in a disease-associated region of DNA to experimentally determine the mostly likely causal variant in that region. As a preceding step, the DNA region can be re-sequenced using a cohort of either disease-specific cases or controls to ensure all variants are included when fine mapping. This step is increasingly less likely to identify novel variants as large-scale sequencing projects, such as the UK10K Project and the 1000 Genomes Project, are continuing to add to the catalogue of known variants in multiple populations, although it could still prove fruitful if the causal variant is as yet undiscovered. For example, in a study of SLE, re-sequencing identified a single base pair insertion in a region of DNA close to TNFAIP3, which is probably the causal variant for this region’s association with SLE30.