Keanu R. Sida

Dr. Bert Ely III

BIOL 303 H01

1 November 2014

The Multifaceted and Evolutionally Dynamic Process of LINE-1 Regulation

Retrotransposons are an abundant component of the collective eukaryotic genome.These mobile genetic elements, includingLong Interspersed Nuclear Element-1 (LINE-1 or simply L1), comprise at least two-thirds of the human genome. (deKoning et al. 2011) (Castro-Diaz et al. 2014) LINE-1 elements are the most common group of retrotransposons, amounting in humans to 500,000 copies or roughly 20% of the entire genome. (Castro-Diaz et al. 2014) A full-length mammalian L1 is roughly 6 kb in length. It can be broken into 3 distinct parts: a 5’ UTR, two Open Reading Frames (ORFs), ORF1 and ORF2, which code for ORF1p(a protein with RNA binding capacity) and ORF2p (a protein with endonuclease and reverse transcriptase functions necessary for retrotransposition), respectively, and a 3’ UTR with a poly (A) tail. (Dombroski et al. 1991) These elements are the only autonomous transposons still active in humans. (Castro-Diaz et al. 2014) However,due to the high host priority of the suppression of these elements, and because many are 5’ truncated,99.9% of themare no longer capable of mobilization. (Doucet et al. 2010) Multiple studies have been done to determine the mechanisms by which this strict regulation takes place.

Teneng et al., (2007) categorized the mechanisms which regulate LINE-1 gene expression into 5 subparts: DNA hypermethylation of LINE-1 promoters (Woodcock et al. 1997), bidirectional processing of LINE-1 transcripts to siRNAs that in turn suppress the element(Yang & Kazazian 2006), transcriptional elongation deficiencies caused by premature polyadenylation sites (Perepelitsa-BelancioDeininger 2003)(Han & Boeke 2004), truncation of the 5’ UTR (Myers et al. 2002), and the recruitment of repressors that interact with promoter or ribosomal entry sites in LINE-1s (Li et al. 2006).Faulkner (2013) refined this categorization in his perspective on a work by Ciaudo et al. (2013). He broadly categorized retrotransposition defense mechanisms as including: promoter methylation and heterochromatinization, degradation of retrotransposon transcripts through RNAi, and host factor prevention or destabilization of reverse transcription.The studies in this paper all sought in part to refine these mechanisms and more specifically characterize them.

In the first study, briefly mentioned above, Teneng et al. (2007) sought to evaluate the contextual specificity of LINE-1 activation by cellular stress, as well as the role of the aryl hydrocarbon receptor (AHR) transcription factor in gene activation response. They first evaluated AHR expression levels in human cervical carcinoma cells (HeLa), human microvascular endothelial cells (HMEC), mouse vascular smooth muscle cells (mVSMCs), and mouse embryonic kidney cells (mK4). These cells were chosen to represent normal (HMEC, mVSMC, and mK4) and transformed (HeLa) phenotypes. It was determined through Western blotting (Figure 1a) that all express relatively high levels of AHR. Through semi-quantitative reverse transcriptase PCR (qRT-PCR), they discovered that AHR levels correlated with LINE-1 mRNA levels, and were highest in mVSMCs and lowest in HMECs (Figure 1b).

Figure 1

Constitutive expression of AHR andL1mRNA levels in mammalian cells. Total protein was extracted from HeLa, HMEC, VSMC and mK4 cells under constitutive conditions or following treatment with vehicle (DMSO) or 3µmBaP for 24h. (a) About 20µg of total protein from each sample was resolved on a denaturing SDS-PAGE and immunoblotted for AHR or GAPDH as loading controls. (b) Total RNA was extracted from replicating mammalian cells. RNA was quantified and DNAse treated followed by cDNA synthesis (from 0.2µg starting RNA). After 30 cycles of amplification, PCR products were resolved on a 1% agarose gel.

Teneng et al. (2007) then hypothesized, due to the differences in expression of AHR and LINE-1 in the various cell types studied, that LINE-1 regulation patterns by cellular stressors exhibited cell type and species specificity. They tested this hypothesis by examining L1 mRNA levels in all 4 cell types mixed with either benzo(a)pyrene (BaP) or dimethyl sulfoxide (DMSO) for 24 hours. DMSO was referred to as “vehicle”, which effectively served the same role as a control.BaP was used as it was determined to bind to and activate AHR and as a result induce oxidative stress (damage due to imbalances which cause free radicals to not be completely neutralized by antioxidants) in mammalian cells. LINE-1 expression was then induced and measured (Figure 2a). To ensure that AHR didn’t context-specifically regulate LINE-1 by BaP, another molecule that binds to AHR, 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), was studied. It was determined that only the HeLa cells were substantially responsive to TCDD (Figure 2b). A third AHR ligand, indole 3 pyruvate, did not induce L1 in any cell line tested, which pointed to differences in response to different AHR ligands. Thus, further experimentation was carried out in cells treated only with BaP.

Figure 2

L1mRNA levels are up-regulated by BaP in human and murine cells and by TCDD in HeLa cells. Total RNA was isolated and 200ng subjected to cDNA synthesis. (a) Cells were treated with 3µmBaP for 24h, RNA extracted and cDNA synthesized from 200ng RNA. Samples were analyzed via quantitative real time using human or mouseL1-specific primers. (b) Cells were treated with 10nmTCDD or toluene (Tol; vehicle) for 24h and total RNA extracted, cDNA synthesized using 200ng total RNA, and real time PCR analyses done usingL1-specific primers. Normalizations for real time PCR experiments were carried out using β-actin and GAPDH.

Next, Teneng et al. (2007) determined if UV irradiation also induced L1, since it too induced oxidative stress. They treated all 4 cell types with 20 J/m^2 of UV. They then extracted RNA from each, synthesized cDNA, and ran real time PCR analyses with L1-specific primers. The dosage induced L1 expression once again only in HeLa cells, but no others (Figure 3a). To further examine the contextual specificity of the LINE-1 induction response, the team cloned 5’ tandem repeats of the mouse LINE-1 promoter L1Md-A5 into a luciferase reporter vector, luciferase being the class of enzyme that produces bioluminescence (most notably in fireflies). About 10,000 cells per cubic centimeter of each cell type were transfected with the vector for 24 hours then treated with BaP for an additional 24 hours. Luciferase activity data was collected (Figure 4a) and the team found that BaP treatment induced promoter activity in each of the cell types. These results indicated that similar regulatory factors are at work in the regulation of the LINE-1 promoters of each cell type, since a uniform treatment netted a fairly uniform up-regulation of activity. The team attempted to use TCDD instead of BaP as a treatment but no cell types showed a difference in promoter activation. They also used UV irradiation, and its effects on luciferase, and thus LINE-1 induction, were comparable to BaP (Figure 4b). This suggested that L1 activation indeed involved DNA damage and oxidative stress, but was still contextually specific.

Figure 3

Induction of endogenousL1by UV irradiation in HeLa cells. (a) Cells were harvested 24h after UV exposure, RNA extracted, cDNA synthesized and real time PCR analyses performed. The DNA repair genegadd45was used as control for UV-induced DNA damage (results not shown). (b) The effects of NAC pre-treatment on UV challenge in HeLa cells is shown. NAC pre-treatment inhibits UV-inducedL1activation (n=3).

Figure 4

Inducibility ofL1Md-A5promoter in mammalian cells after BaP treatment and UV irradiation. Transient transfection assays showing the promoter activity ofL1Md-A5tandem repeats in HeLa, HMEC, VSMC and mK4 cells, respectively. (a) Triplicate cultures of 1×104cells/cm2were transfected at 70% confluence in 24-well plates with theL1Md-A5promoter linked to a luciferase reporter as described (Lu & Ramos 2003). A day after transfection, the cells were treated with 3µmBaP or DMSO for an additional 24h, lysed and analyzed for luciferase activity. Luciferase activity was normalized againstRenillaluciferase co-transfected with theL1Md-A5-luciferase construct and relative units plotted. Notice the differences in basal activity between the different cell types. (n=3). (b) Triplicates of 1×104HeLa cells/cm2were transfected at 70% confluence in 24-well plates with theL1Md-A5-luciferase reporter construct and challenged 24h later with 10 and 20J/m2of ultraviolet irradiation. Media was rinsed off after 24h and a luciferase assay performed. About 3µmBaP for 24h was used as a positive control (n=3).

In the second study, Castro-Diaz et al. (2014) demonstrated that, in human embryonic stem (hES) cells, KAP1 [KRAB (Krȕppel-associated box domain)–associated protein 1] represses a certain subset of LINE-1 lineages. They found a similar pattern associated with mice. They went on to identify a LINE-1-binding KRAB-zinc finger protein (KRAB-ZFP), suggesting that this family of proteins is responsible for recognizing LINE-1 elements. They also found that KAP1 knockdown in hES cells induced the expression of KAP1-bound LINE-1s. Furthermore, they found that their younger, human-specific LINE-1 counterparts (L1Hs) remained unaffected by KAP1 knockdown but stimulated instead by the depletion of DNA methyltransferases (DNMTs).

The group began by performing chromatin immunoprecipitation (ChIP) and deep sequencing (ChIP-seq) of H1 hES cells. They found that only about 8% of the known LINE-1-derived sequences were associated with KAP1 peaks in the cells of study. They rationalized this result with the fact that most LINE-1 sequences are 5’ truncated, and thus would not require transcriptional control. They thus focused their studies on sequences which were at least 5 kb in length, close to the 6 kb length of a non-truncated LINE-1. As expected, 52% of these LINE-1 sequences were associated with a KAP1 peak, most commonly over the first 1000 bp, as opposed to only 2% of the elements under 5 kb (Figure 1A, B). After more detailed mapping of the tags obtained through ChIP-seq, they found that most of the KAP1 peaks were in the middle of the 5’ UTR, between base pairs +300 and +600, as seen in Figure 1C. They then ran ChIP-seq once more with antibodies to H3K9me3, a modified histone known to be found in constitutively repressed cells (constitutively repressed meaning it is continuously repressed, as opposed to facultative repression in which something is only repressed “as needed”). Analyses of these data confirmed that KAP1 peaks coincided strongly with the presence of H3K9me3, as seen in Figure 1D. Furthermore, the two mechanisms of repression were usually found together (Figure 1E). The same analysis was then done on mES cells, with one notable difference being that it appeared KAP1 could function without the presence of H3K9me3.

Figure 1.

KAP1 coincides with H3K9me3 at the 5′ end of full-length L1 in hES cells. Distribution of ChIP-seq KAP1 peaks relative to the 5′ end of full-length elements (A) or the center of truncated L1 elements (B) in hES and HEK293 cells. The profiles were normalized to the total number of ChIP-seq peaks for each cell line. (C) KAP1 ChIP-seq peak distribution over the first kilobase of L1. The L1 5′ UTR is schematizedbelow, with sense and antisense promoters as red and green boxes, respectively. Sense promoter is diversely depicted as mainly located in the first 100 bp or extending up to 700 bp. (D) Overlap of KAP1 and H3K9me3 ChIP-seq tags relative to the 5′ end of full-length L1 elements. (E) Relative frequency of KAP1+H3K9me3, KAP1-only, and H3K9me3-only peaks at this location.

Next, Castro-Diaz et al. (2014) studied what effects KAP1 binding had on LINE-1 sequences. To do this, they cloned the KAP1-bound (KB) regions (defined by ChIP-seqdone above) from two distinct LINE-1 elements, L1PA4 and L1PA5, upstream from a GFP gene cassette. (Figure 2A, B) GFP fluorescence was then measured in hES and 293T cells by Fluorescence-Activated Cell Sorting (FACS) analysis (Figure 2C). It was found that expression in vectors with KB LINE-1s was repressed in hES cells but not in 293T cells. In contrast, Non-KAP1-bound NKB LINE-1s and empty vectors showed no change in GFP fluorescence. They also found that KAP1 binding caused methylation of CpG sites of PGK promoters in hES cells (Figure 2E). Next, they aimed to better define the KAP1-inducing elements by cutting down the L1PA4 and L1PA5 sequences (roughly 1000 bp) into subfragments of roughly 200 bp (Figure 2B, F, G). This showed the locations of the cis-repressors (stretches of DNA where transcription factors can bind to control nearby gene expression). The D subfragment of L1PA4 showed faster and stronger repression than the full-length L1PA4, suggesting conflicting influence within the fragment as a whole. L1PA5, in contrast, had two elements which distinct peaks in repression, but neither one equaled that of the full sequence in a similar way to L1PA4. These data suggest that the KAP1 corepressor triggers epigenetic silencing through 5’ tethering of distinct subfamilies of individual LINE-1s in hES cells.

Figure 2.

KAP1-binding L1 fragments can induce repression and DNA methylation of a heterologous promoter in hES. (A) KB (KB L1PA4 and KB L1PA5) and NKB (NKB L1PA4) L1 sequences were cloned in depicted lentiviral vector upstream of a PGK-EGFP expression cassette. The resulting vectors were transduced in hES, and EGFP expression was monitored over time by FACS. (B) Schematic representation of the KAP1 ChIP peaks mapped on the L1PA4 and L1PA5 5′ end, with indication of derived fragments and subfragments cloned in the vector depicted inA. (C) Monitoring of GFP expression in hES cells transduced with the indicated vectors. (No seq) Lentiviral vector with no ERE-derived fragment upstream of the expression cassette. The figure shows the mean and SD of two biological replicates. (D) KAP1 and H3K9me3 recruitment to indicated lentiviral vectors in hES, assessed 35 d after transduction by ChIP-qPCR using PGK-specific primers. The figure illustrates the mean and SD of technical replicates. This experiment was performed twice with similar results (see Supplemental Fig. S3). Relative enrichment was determined by normalizing to a known positive (ZNF180 3′ UTR) control. (E) Influence of the L1cis-acting sequences on the methylation of the nearby PGK promoter. Methylation of eight CpG positions was evaluated by pyrosequencing at days 4 and 35 after transduction of hES cells with the PGK-GFP lentiviral vectors. Mean and standard error mean (SEM) of two biological replicates is shown. Statistical differences were determined by one-way ANOVA test using the Bonferroni multiple test adjustment. (***)P≤ 0.001. (F,G) Fold repression of the indicated vectors containing L1 subfragments described inB, assessed 37 d after transduction (respect to day 5). Overtime fold repression is presented in Supplemental Figure S3. Colored triangles indicate the presence of L1 sequences overlapping with the summits of the respective KAP1 ChIP-seq peaks as depicted inB.

Finally, Castro-Diaz et al. (2014) focused on LINE-1’s unusual patterns of evolution, as shown by Smit et al. (2005), Khan (2005), and Sookdeo et al. (2013). Because KAP1 associated with only a small fraction of L1MA4 to L1PA7 subfamilies, they were able to deduce that this mechanism was notused longer than 26.8 million years ago. It was also nearly absent from L1Hs, the elements that entered the genome after the human-chimpanzee divergence roughly 7.6 million years ago. In contrast, L1PA6 through L1PA3 elements had a high fraction of KAP1 recruitment, peaking at over 80% for L1PA5 through L1PA3 subfamilies (Figure 3A). Also, amazingly, a very similar pattern was observed in mES cells, but for L1 elements that were between 7.3 million and 3.8 million years of age, with L1MdF1 and L1MdF2 containing the highest enrichment.The findings of Castro-Diaz et al. (2014) allowed the team to conclude that different subfamilies of L1 were regulated differently depending on their age, providing insight into the evolution in LINE-1 repression.

Figure 3.

Evolutionally dynamic and KRAB-ZFP-mediated KAP1–L1 interaction. Percentage of KB full-length (FL) L1 elements per subfamily in hES (A) and mES (B) cells, arranged from the oldest to the youngest subfamily using ages obtained from previously published divergence analysis studies (Khan 2005;Sookdeo et al. 2013). (Myr) Million years. (C) Screenshot of a representative L1MdF2 element, illustrating RNA-seq coverage plots from control (shEmpty) and Gm6871 knockdown mES cells as well asgm6871andKap1ChIP-seq tracks. (D) Putativegm6871DNA-binding motif identified by computinggm6871ChIP-seq peaks with the RSAT software (Thomas-Chollier et al. 2012). (E) Relative change in the expression (RPKN [ normalized reads per kilobase]) of murine full-length L1s bound or not bound by KAP1 and/or Gm6871 between Gm6871 knockdown and wild-type mES cells. The raw data were bootstrapped 1000 times with a resampling size of 100 for the plot design. The statistical analyses were calculated on the entire raw data by Wilcoxon nonparametric test. (NS)P> 0.05; (**)P≤ 0.01.

In the third and final study of focus,Pezic et al. (2014) found that small noncoding piwi-interacting RNAs (piRNAs) identify and repress LINE-1 expression in germ cells through CpG site DNA methylation on their sequences. Furthermore, they showed that the piRNA pathway is required to maintain H3K9me3 histone modification levels on LINE-1s in germ cells. They then showed that this piRNA-dependent repression is exclusively targeted toward full-length L1s of actively transposing LINE-1 subfamilies. These results demonstrate the piRNAsextraordinary ability to recognize these active elements among the massive number of LINE-1s and other genomic transposon fragments.

First, Pezic et al.(2014) attempted to understand the distribution of the H3K9me3 mark on the transposable elements (TEs) of somatic and germ cells. To do this, they performed ChIP-seq on liver cells, somatic cells from of the testis, and premeiotic male germ cells, (spermatogonia) in 10 day old mice. They analyzed the density of H3K9me3 over different features of the genome. All three cell types show a strongly reduced signal at transcription start sites (TSSs) of protein-coding genes relative to the average density of the genome (Figure 1A). Exons and intronsalso showed slight reduction of the H3K9me3 mark. In contrast, intergenic regions (IGRs) of the genome had a slightly enriched H3K9me3 mark.Pezic et al. (2014) then used ChIP-seq to generate measurements of histone mark enrichment over discrete TE families in the mouse genome. They found that some types of repeats had H3K9me3 marks whose signatures were differentfrom those of other types. Satellite repeats, Long Terminal Repeats (LTRs), and LINEs had increased levels of the H3K9me3 mark whileShort Interspersed Nuclear Elements (SINEs) and DNA transposonshadlower levels of the mark (Figure 1B). They then analyzed the different TE classes and found that there were large differences between distinct subfamilies, consistent among all three cell types studied(Figure 1C). Several LTR and LINE families showed an enrichment of the H3K9me3 mark, while other families showed a decrease or no change. In both LTRs and LINEs, transcriptionally and transpositionally active families had high levels of H3K9me3. Examples include the LTR familiesIAPEz, IAPEy, and MaLR,and the LINE families L1-Gf, L1-T, and L1-A. Even highly active yet non-autonomous elements such as the LTR family ETn were enriched for the H3K9me3 mark, as were elements in L1-F, which is transcribed at high levels but not transpositionally active due to ORF mutations, as shown by Adey et al. (1991). Three types of repeats (satellites, LTRs, and LINEs) had enriched H3K9me3 markers in each of the three cell typesto differing extents dependentupon cell type. The H3K9me3 signal was increased more ingerm cell LINEs than inboth somatic cell LINEs (Figure 1D). In somatic cells, LTR elements had higher levelsof the H3K9me3 mark than LINEs, while germ cells had nearly twice as much of the mark in LINE sequences as in LTR sequences (Figure 1D). The increase in the H3K9me3 mark on LINE families in germ cells relative to other element families allowed Pezic et al. (2014) to proposethat these cells use additional mechanisms, which are absent or less active in somatic cells, to regulate H3K9me3 production.