Chapter 4:

RNA ELEMENTS INVOLVED IN SPLICING

William F. Mueller and Klemens J. Hertel*

Department of Microbiology & Molecular Genetics,

University of California, Irvine,

Irvine, CA 92697-4025, USA

*To whom correspondence should be addressed. E-mail:


The human genome encodes approximately 25,000 genes (IHGSC 2004) [1],[1] and more than 90% of these are believed shown to produce transcripts that are alternatively spliced [2-4](Wang et al 2008, Pan et al 2008, Fox[2,3,4] 2009). Alternative splicing of pre-mRNAs results in the production of multiple mRNA isoforms from a single pre-mRNA, thus significantly enriching the proteomic diversity of higher eukaryotic organisms [5, 6](Maniatis and Tasic 2002, Johnson et al 2003)[5,6]. The regulation of this process can determine when and where particular mRNA protein isoforms are produced with that have the potential to modulate various cellular activities.

Sequence elements within in the pre-mRNA define the ends of the introns that will be excised. Exon/intron boundaries are recognized by direct interactions between the spliceosome and pre-mRNA sequence elements. The formation of the spliceosome requires the activity of more than 300 distinct protein factors and the U1, U2, U4, U5, and U6 small nuclear RNAs (snRNAs) [7, 8] (snRNA), [Jurica and Moore 2003, 7,8]). In the classical splicing model these spliceosomal components assemble onto the pre-mRNA in a stepwise manner [9]. splieceosomal components assemble onto the pre-mRNA in a stepwise manner (Black 2003).[9] After initial splice-sitesplice site selection and pairing [(Reed 1996, Lim and Hertel 2004, Kotlajich et al 2009),10,11,12] the catalytic components of the spliceosome are activated and extensively rearranged, ultimately resulting in intron removal via two trans-esterification reactions [10-13][13](Staley and Guthrie 1998). There are several RNA elements that mediate efficient definition of exons and introns. This chapter will focus on the basic principles that control initial splice site recognition and how the interplay between various RNA sequence elements results in the generation of differentially spliced mRNA isoforms.

Splice Site Sequence

The identification of splice sites is the first step in the process of pre-mRNAs splicing. The 5’ splice site (also referred to as the donor site) is defined as a single sequence element, 9 nucleotides (nts) in length (Figure 1). In mammals, this site follows a degenerate consensus sequence YAG/GURAGU (where Y is a pyrimidine, R is A or G, and the / denotes the actual splice site) [14] (Sun and Chasin 2000) that base pairs with U1 snRNA in early spliceosomal E complex, and with U5 and U6 snRNAs in subsequent complexes. The 3’ splice site (also referred to as the acceptor site) is defined by three sequence elements that are usually found within 40 nts upstream of the intron/exon junction. These elements are the branchpoint sequence (BPS), the polypyrimidine tract (PPT), and the actual 3’ splice site (the intron/exon junction). The PPT varies in length and is characterized by a high percentage of pyrimidines. The BPS follows the highly degenerate sequence YNYURAY (where Y is C or U) flanking a conserved branch point adenosine (Reviewed Reed 1996[10] [10]). The 3’ splice site is composed of a variable length PPT followed by the sequence NYAG/G [14, 15]G (Sun and Chasin, 2000, Zhang 1998)[14,15]. U2 snRNP interacts with the BPS, and the PPT functions as a binding platform for U2 snRNP auxillaryauxiliary factor (U2AF) (Reed 1996).[10] [10]. These RNA elements and their associationed with snRNA /and/or proteins pre-mRNA and pre-mRNA/protein interactions are essential for initial splice site recognition.

The role of these RNA sequence elements is of such importance that the sequence complementarity of the 5’ splice site to U1 snRNA and the extent of the PPT at the 3’ splice site are used to classify determine the strength of splice sites. Greater complementarity to U1 snRNA and longer uninterrupted PPTs translate into higher affinity binding sites for spliceosomal components and, thus, more efficient splice site recognition [16](Hertel 2007)[16]. Experimental support of this generalization is abundant. For example, the U1 snRNA complementarity defines the competitive strength of a 5’ splice site (Roca et al 2005) and 5’ splice sites that have a high complementarity with U1 snRNA (ie strong 5’ splice sites) splice more efficiently than those with low complementarity (weak 5’ splice sites) [17] (Hicks and Shepard, Hertel Lab unpublished data). This concept of complementarity has been used extensively in numerous methods for the derivingation of numerous splicing scores [18-20] (Senapathy et al 1990, Zhang and Chasin 2005, Yeo and Burge 2004). [17,18,19].

The Intron/Exon Architecture

In addition to splice site sequences, their arrangementthe exon/intron architecture in the pre-mRNA is important for efficient splice site recognition (Berget, 1995). [21] [20]. In mammals small exons and large introns predominate [9]. The average mammalian exon size is between 50 and 300 nts while the average intron size is 3,400 nts [22]. While the majority of spliceosomal components are conserved between species, the length and positioning of introns and /exons architecture isare not. The average mammalian exon size is between 50 and 300 nts while the average intron size is 3,400 nts [21](). In mammals small exons and large introns predominate [9]. (Black 2003). This is not the case in Drosophila or yeast where introns are generally much smaller and exons are larger [23-25] [10,22,23](Reed 1996, Guthrie 1991, Ruby and Abelson 1991). The variable arrangement and size of exons and introns suggests that multiple ways of recognizing introns and exons exist, referred to as intron and exon definition (Figure 2) [21][20]. (Berget 95). Splice site recognition in the exon definition mode occurs across small exons. The assembled spliceosomal components then pair across the intron to interact with spliceosomal components associated with a flanking splice site. Intron definition occurs across small introns where permitting splice site recognition and pairing within the same intronic splicing unit.

Experimental support for the intron and exon definition models exist based on expectations that mutations of exon defined splice sites would result in exon skipping, whereas mutations of intron defined splice sites would result in intron retention. These ideas were tested by iIncreasing the size of mammalian exons tested these ideas, resulting in exon skipping [21, 26][20,24]. (Robberson et al, 1995, Berget, 1995). However, when similar enlarged exons were flanked by small introns, the exons were included [27](Sterner, 1996[25]). In addition, when splice sites were mutated from strong to weak, the resulting splicing phenotype was exon skipping [28, 29](Talerico and Berget 1990, Nakai and Sakamoto, 1994[26,27]). However, when the length of introns in Drosophila or yeast werelengths of introns in Drosophila or yeast were increased, intron retention, loss of splicing, and cryptic splicing was observed [30, 31][28,29](Talerico and Berget 1994, Guo et al 1993). More recent kinetic analyses further showed that weak splice sites are more efficiently spliced when introns are small [4] [4]. (Fox-Walsh et al 2005). These results support the concept that splice sites are can be recognized across either the intron or across the exon.

The above information makes a strong case that the presence of splice sites and the intron/exon architecture are important for activating pre-mRNA splicing. Still, they are not the only players. It is known that there are many potential splice sites in the human genome that are not used and form what are called pseudoexons. Pseudoexons are unused exons with usable splice sites found in introns or non-coding regions of pre-mRNAs [14][14]. (Sun and Chasin 2000). Interestingly, they occur more frequently than true exons by an order of magnitude [32](Zhang et al 2005[30]). Clearly, for pseudoexons to be ignored and for true exons to be recognized, there must be more information in a pre-mRNA molecule than the splice site strength and the relative location to adjacent introns and nearby exons. Indeed, bioinformatic approaches demonstrated that an average region averaging 540 nts upstream and 80 nts downstream of constitutive splice sitesexons contains information regarding splice site recognition [33][31]. (Zhang et al 2003). Along with the remarkable prevalence of pseudoexons, these observations implied that other regulatory cis-elements exist to help direct the spliceosome to bone fide splice sites.

Splicing Regulatory Elements (SREs)

Once the genome was sequenced, it wasSequencing of the genome verified that the majority of splice sites did not match the consensus sequence well at all: less than 5% of 5’ splice sites match the consensus and with greater than 25% havinge 3 or more mismatches from the 9 nt consensus [33][30]. (Zhang and Chasin 2005). Classical experiments also demonstrated that exonic sequences outside other than the splice sites were necessary to correctly process certain transcripts [34][32,33]. (Reed and Maniatis 1986, ). It was shown that some cis-acting RNA sequence elements increase exon inclusion by serving as binding sites for the assembly of multi-component splicing enhancer complexes. These sequence elements, termed exonic splicing enhancers (ESEs), are were located within regulated exons and were defined as exonic splicing enchancers (ESEs)[9] [9]. (review Black 2003). Since the discovery of ESEs other it has been shown that more classes of SREs existwere identified. They canSREs recruit proteins and complexes that can enhance as well as silence splicing and have been named descriptively: intronic splicing enhancers (ISEs), and exonic and intronic splicing silencers (ESSs and ISSs). These elements are important for selecting between pseudoexons and real exons, between competing splice sites, and even for the splicing of constitutive exons.

ESEs have beenwere identified by their protein binding ability, by analysis of mutations that decrease splicing efficiency, and by computational comparison of exons. In general, ESEs are recognized by at least one member of the essential serine/arginine (SR)-rich protein family. These proteins are involved in recruiting the splicing machinery to splice sites [9, 35].(review Black 2003, Gravely 2000). I[9,34] It has been proposed that the RS domain of an ESE- bound SR protein interacts directly with the RS domain of other splicing factors containing an RS domain, thus facilitating the recruitment of spliceosomal components such as U1 snRNP to the 5’ splice site or U2AF65 to the 3’ splice site [35][34]. An alternative mode of spliceosomal recruitment was suggested by experiments demonstrating that RS domains of SR proteins contact the pre-mRNA within the functional spliceosome [36, 37][35,36]. Irrespective of the RS domain activation mode, SR proteins facilitate the recruitment of spliceosomal components to the regulated splice site [9, 38][9,37]. Thus, SR proteins bound to ESEs function as general activators of exon definition [39][38[] (Figure 3A). Two of the first SR proteins extensively studied were ASF/SF2 and SC35, which bind ESE sequencess (GAR)n and GRYYC(G/C)YR respectively [40, 41][(Liu et al 1998, Tacke and Manley 1999, Cartegni and Krainer 200238,40,41]). These proteins were found to be necessary for splicing in add back experiments using SR-depleted depleted splicing extracts, as well as for splice site switching. They have also been shown to be necessary for splice site choice in multiple instances [42-45][(Ge et al. 199142, Krainer et al. 199143, Fu & Maniatis et al 1990, Zhaler et al 199244,45]). Interestingly, ESE-dependent SR protein binding sites are present not only within alternatively spliced exons, but also within the exons of constitutively spliced pre-mRNAsexons [46][46](Schaal and Maniatis, 1999). It is therefore expected that SR proteins bind to sequences found in most exons, indicating an extensive role in splicing by ESEs.

Not all exon recognition enhancements come from the exonexonic sequences. While less explored than other SREs, ISEs are vital to many splicing scenarios. For example, in the alternative splicing of the terminal Calcitonin exon, a conserved intronic sequence is essential for efficient recognition of the therminal 3’ splice site (Lou et al 1995site [47][47]). Part of the intronic element is, surprisingly, a cryptic 5’ splice site. Cryptic intronic 5’ splice sites have been shown to act as enhancers in other contexts [47][(Hastings et al 200148]), however this is not always the case. Furthermoire, it was observed that recognition of mutually exclusive exons in the b-Ttropomyosin gene requires an ISE that specifically interacts with ASF/SF2 [48][49,50]. (Gallego et al (1992, 1997)). These examples, along with those presented concerning ESEs, support an enhancement- controlled model of splice site recognition. ESEs canSplicing enhancers activate constitutive, alternative, strong, or weak splice sites by recruiting SR proteins or splieceosomal components to their splice sites to enhance exon recognition. Yet that is only half of the picture.

Regulation of pre-mRNA splicing is much more complex than a simple enhancer recruitment model. Splicing silencers, either ESSs or ISSs, occur frequently and have been found to influence constitutive and alternative splicing events throughout the genome [49][51] (Pozzoli and Sironi 2005) (Figure 3B). The best-characterized silencers are recognized by heterogeneous nuclear ribonuclearproteinsribonucleoprotein (hnRNPs). ISSs usually bindare usually recognized by the polypyrimidine track binding protein (PTB, also known as hnRNP I) [50, 51].[(52,53]. Several mechanisms have been proposed for ESS- or ISS-mediated splicing repression. HnRNP-bound splicing silencers have been shown to repress spliceosomal assembly through multimerization along exons [52], (Zhu et al 2001),[54] through blocking the recruitment of snRNPs [53, 54](Tange et al 2001, House & Lynch, 2006)[55,56], or by looping out exons (Martinez-Contreras et al 2006).[55][57]. An interesting finding regarding silencers is that altering the location of an enhancer can change its enhancement effect to a silencing effect (Kano pka et al 1996, McNally and McNally 1996[58,59]). Recent advances in identifying splicing silencers have come from ligand selection/evolution experiments (SELEX [56]) paired with simple kinetic analysis [57][60]. . It was shown that splicing silencers can alter the U1 binding at 5’ splice sites and that alterations in silencing kinetics can affect splice site choice [58][61]. (Yu et al 200).

Typically, silencers and enhancers are present within the vicinity of exon/intron junctions, suggesting that the interplay between activation and repression of cis-acting elements modulates the probability of exon inclusion. In addition to the enhancement mentioned previously, studies of Survival of Motor Neuron (SMN) pre-mRNA splicing have uncovered a number of enhancing and silencing elements within exon 7 and its flanking introns (Lorson and Adrophony 2000, Cartegni and Krainer 2002, Kashima and Manley 2003, Singh et al 2004)[41, 59-61]. [62, 41,, 63, 64]. This suggests interplay between SREs, most likely influenced by the concentrations of the various splicing factors involved and the timing of their interactions with the pre-mRNA. I Using in vitro studies showed , it was shown that the location and frequency of SREs along the pre-mRNA alter their effectiveness. For example, as the distance between enhancer complexes and the splice site increased, the probability of exon inclusion decreased [62][65](Graveley et al. (1998)). Increasing the number of ESEs seemed to lessen this effect. However, by creating artificial exons with a variable number and order of ESEs and ESSs, it was shown that the number quantity of enhancers or silencers had a weak linear relationship with splicing efficiency [63][66]. (Zhang et al (2009). This conclusion was mainly based on the observation that constructs with the same ESE to ESS ratios, but different orders of enhancers and silencers displayed drastically different splicing efficiencies. These results support the notion that the recognition of most splice sites are influenced by multiple distinct cis-acting RNA elements and that their activity depends on their context within the pre-mRNA molecule [16, 63, 64][66,67,16].(Zhang and Chasin 2004, Wang et al 2004. Hertel 0).