Functional Analysis of Large Exonic Sequences Through Iterative in Vivo Selection

Functional Analysis of Large Exonic Sequences Through Iterative in Vivo Selection

In vivo selection

Functional analysis of large exonic sequences through iterative in vivo selection

Ravindra N. Singh* and Natalia N. Singh

Department of Biomedical Sciences, Iowa State University, Ames, Iowa, USA.

Running Title: In vivo selection

*Corresponding author. Mailing address: Department of Biomedical Sciences, College of Veterinary Medicine (2034 Vet Med Bld), Iowa State University, Ames, IA 01605. Phone: (508) 856-1333. Fax: (508) 856-6797. Email:

1. Abstract:

Alternative splicing is regulated by a combination of cis-elements and transacting factors. While methods to decipher splicing cis-elements are continuing to evolve, the functional significance and the impact of the majority of cis-elements are still not predictable. The uncertainty in establishing the exact nature and the relative impact of cis-elements is compounded by the fluctuating expressions of splicing factors in different tissue types. Here we describe a powerful in vivo selection method that uses a combinatorial library of partially random sequences. Several advantages of this method include in vivo analysis of large sequences, identification of unique sequence motifs, determination of relative strength of splice sites and identification of long-distance interactions including a role of RNA structures. This method could be applied to identify tissue-specific cis-elements as well as those cis-elements that are responsible for pathogenesis of splicing-associated diseases.

2. Theoretical background

In this chapter, we describe in vivo selection method to identify critical splicing motifs in a large exonic sequence including an entire exon. The concept is based on an approach called SELEX (Systematic Evolution of Ligands through Exponential enrichment) [1, 2]. The SELEX protocol takes advantage of an iterative selection procedure to isolate and amplify the best sequences (for a particular function) from a large pool of random sequences. During the last two decades various adaptations to SELEX protocol have been advanced. The first splicing-related study of in vivo selection (cell-based selection) to analyze a short exonic sequence was reported by Cooper and coworkers [3]. The approach was successful in identifying an AC-rich enhancer motif that promoted exon inclusion during pre-mRNA splicing. The method was subsequently reproduced in another system [4]. However, the success of this method came with a major limitation of selection of artificial motifs that are likely to promote inclusion of most (if not all) exons during pre-mRNA splicing. Other major limitation was the small size of a sequence that could be analyzed using complete randomization without creating cryptic splice sites. These limitations were addressed by employing partially random sequences in an advanced in vivo selection approach in which the entire exon 7 of Survival Motor Neuron 1 (SMN1) was analyzed [5]. The method was successful in identifying three regulatory elements with the additional advantage of identifying of key positions of high significance for exon inclusion [6]. Nature of three cis-elements identified by in vivo selection was independently confirmed by a complementary approach of antisense micro-walk [7]. In addition, the results of in vivo selection of entire exon 7 provided indirect evidence of the role of RNA structure that was subsequently confirmed by RNA structure probing [8]. Most importantly, the in vivo selection of the entire exon revealed that the 5 splice site (5ss) of exon 7 is weak, leading to the discovery of a series of cis-elements that modulate the strength of the 5ss of exon 7 [9].

Here we focus on salient features of the protocol that recapitulates the approach employed during the in vivo selection of entire SMN1exon 7 [5, 6]. The two major aspects of protocol are experimental design and analysis of sequences. This chapter will mostly focus on experimental design. Due to flexible nature of SELEX procedures, various modifications could be introduced at different stages of the protocol. The outcome of the SELEX experiment is affected by the degree of randomization, nature of flanking sequences and the relative concentrations of splicing factors. Since the relative concentrations of splicing factors vary in different cell types, we hope that the method described here is suitable for determining the tissue-specific cis-elements.

3. Protocol

3.1. Minigene, cell culture, transfection and in vivo splicing assay

In order to perform in vivo selection of a large exonic sequence or an entire exon, a minigene incorporating this exon must be created. Construction of minigene requires use of one of the many commercially available mammalian expression vectors. An ideal minigene should have three exons and two introns (Fig. 1A). The exon under investigation should be in the middle. An additional shuttle minigene should be created to exchange the middle exon with the selected sequences. A good example of shuttle minigene is pBxT2 (described later). In vivo selection can be performed in any cell line that is easily transfectable. Since the outcome of selection depends upon the type of the cells used, the method is suitable for examining the tissue-specific splicing. Quantification of the spliced products is done using RT-PCR. Primers for RT-PCR should be designed in a way that endogenous mRNAs are not amplified (Fig.1A).

3.2. Generation of a partially random exon

For the purpose of in vivo selection, a large part of the exoic sequence or the entire exonic sequence could be partially randomized. Synthesis of the partially random sequences is performed commercially. Percentage of randomization is controlled during synthesis stage. Generally, sequences are randomized at 30% level in which each position under examination will have 70% wild type and 10% each (of the three) non-wild type nucleotides [5]. For cloning purposes, the randomized portion of the sequence is flanked by constant sequences that contain restriction endonuclease of choice. Partially randomized sequences are amplified by Taq polymerase and inserted in the splicing cassette using either shuttle minigene or PCR. The actual percentage of randomization at every position should be confirmed by sequencing of about 50 clones from the initial pool.

3.3. In vivo selection

The first step of in vivo selection is to transfect cells with a large pool of unique sequences. Due to limitations of transfection process, it is generally difficult to analyze more than 1012molecules per 106 cells. However, this level of complexity is enough to determine the significance of every position within a large sequence or even an entire exon [6]. After ~24 h of transfection, cells are harvested to collect total RNA. The spliced products are amplified by RT-PCR, using minigene-specific primers. Exon-included product is gel purified, followed by a secondary PCR amplification of sequences of the middle exon. During secondary PCR amplification, primers carry specific mutations to include sites of restriction endonuclease(s) in flanking regions of the exonic sequence. These amplified sequences are ligated into the shuttle vector after digestion with appropriate restriction endonucleases. The ligated mixture (next pool of minigenes) is directly used for transfection and the whole process is repeated again and again till there is no exon-excluded product obtained. These experiments are performed on a large scale to maintain the diversity of the selected pool.

3.4. Analysis of sequences

Statistically significant number of clones (between 50 and 100) from the final pool of in vivo selection should be analyzed. To determine the position-specific significance of nucleotides, sequences should be forced aligned from one end to the other [5]. The “mutability values” shown in Fig. 2 are calculated by comparing the ratios (R) of mutant (mut) to wild-type (wt) nucleotides of exon in the selected pool(pool-s)and initial pool (pool-0), using the equation [(R(mut/wt)pool-s)/(R(mut/wt)pool-0)]-1 [10]. Sequences obtained from in vivo selection could also be analyzed for unique motifs, using a number of algorithms available online and/or chapters covered in this book.

4. Example of an experiment

Here we describe the protocol that we reported for analyzing the cis-elements within exon 7 of Survival Motor Neuron 1(SMN1) [5]. Minigene splicing cassette pSMN1I6 was constructed in mammalian expression vector pCI (Promega). This splicing cassette contains SMN1 genomic sequences from exon 6 through exon 8 with ~6kb internal deletion within intron 6. The shortened minigene increases the transfection efficiency without any change in the splicing pattern of SMN exon 7 [4]. Splicing cassette pSMN1I6 was used to perform in vivo selection of entire exon 7. As a prerequisite to selection strategy, we created shuttle minigene pBxT2 in which BsaXI restriction site in the vector backbone was destroyed and entire exon 7 was replaced with a 27-nucleotide sequence containing BsaXI restriction site (5GGCGCCAGAACTAGTCCTCCATCCGGA-3). Since digestion with BsaXI restriction endonuclease removes the entire 27-nucleotide sequence in pBxT2, this splicing cassette was used to restore the three-exon cassette with middle exon derived from the selected pool (Fig. 1).

In vivo selection of SMN1 exon 7 was performed by partially randomizing the entire exon. The initial pool was generated usinga 90-mer oligonucleotide (E7Rand) (5CCTTTATTTTCCTTACAGggtttcagacaaaatcaaaaagaaggaaggtgctcacattccttaaattaag-gaGTAAGTCTGCCAGCATTA3). Small-case letters represent partially random 54-nucleotide-long exon 7, capital letters represent two stretches of flanking intronic sequences (18 nucleotides each). The randomization (doping) was performed with 70% wild type and 10% each of the three non-wild type nucleotides. Using high fidelity PCR with Pfx (Invitrogen), E7Rand as an upstream primer, PCI-DN as a downstream primer and pSMN1I6 as a template, a 1.1 kb DNA fragment was generated. This fragment contained a partially random exon 7, the entire intron 7 and the most of exon 8. Approximately 4 mg (~ 4X1012 unique molecules) of the gel-purified fragment was used as a mega-primer in the second PCR with PCI-UP as a forward primer and pSMN1I6 as a template. The resultant 1.5 kb-long amplification product (containing the entire exon 6, shortened intron 6, a partially random exon 7, the entire intron 7 and most of the exon 8) was gel purified and digested with NotI and XhoI. The digested product was ligated into NotI-XhoI-digested pSMN1I6. The ligation mixture served as the initial pool (pool 0), which was used for in vivo selection experiments. Partial randomization of initial pool was confirmed by sequencing.

The in vivo selection experiment began with transfection of cells with the initial pool of splicing cassettes (~2X1011 unique molecules per 4X106 cells) and continued with the selection and amplification steps outlined in Fig. 1: (i) amplification of the in vivo spliced products by RT-PCR, (ii) polyacrylamide gel purification of the exon-included product, (iii) amplification of exon-included productwith a second PCR reaction, (iv) ligation of amplified exon back into a splicing cassette, (v) transfection of cells with the ethanol precipitated ligated products. Steps (i) through (v) comprised one round of selection. Repeated rounds of selection enriched sequences that promoted exon inclusion. At each round, the transfection was done directly with the ligation mixture in order to maintain the pool diversity. Ligation was performed in a 100-µl reaction containing ~1 µg of BsaXI-digested pBxT2 and ~ 50 ng of enriched exon fragment. ~1µg of ligated plasmid (~ 2X1011 molecules) was used to transfect two 60 mm plates of C33a cells (~2X106 cells per plate) using calcium phosphate co-precipitation procedure. About 20 h post-transfection, cells were harvested, total RNAs were isolated and the in vivo spliced intermediates amplified by RT-PCR. ~3 µg of total RNA was used per 20 µl RTase reaction and the number of PCR cycles did not exceed 20. Four rounds of selections were performed. The final pool (pool 4) included exon 7 about 200-fold more efficiently than the initial pool (Fig. 2A). The sequences of 59 randomly chosen clones from pool 4 were analyzed. In agreement with the high inclusion efficiency of pool 4, we could not detect the exon 7-excluded product for any of the individual clones.

The results of our in vivo selection could be best explained by the mutability of exonic sequences. A highly mutable position is considered as inhibitory for exon 7 inclusion, whereas the least mutable position is considered as stimulatory. We calculated mutability of wild-type residues of exon using an equation that corrects selection results for the bias of the initial pool (Fig. 2B). Negative and positive bars indicate the stimulatory and inhibitory nature of the wild-type nucleotides, respectively. Positions with mutability value zero are considered as neutral, although the number of neutral positions was low. A small number of neutral positions could be also obtained due to lack of the saturation of sequence space (defined as theoretically possible variants upon complete randomization). A neutral position was considered as conserved when the neutral position was present within a stretch of conserved residues. Likewise, in a stretch of mutable positions, a neutral position was considered as mutable. The highly conserved nature of position 1 is apparent by its mutability value close to -1, whereas the least conserved (or the highly mutable) position 54 has the mutability value of +17.6. The cutoff values for the conserved and the mutable positions were taken as -0.2 and +0.2, respectively. Both, the positive and the negative cutoff values are supported by mutations and/or deletion experiments [5]. Because of the exceptionally high mutability of position 54, other positive values may have been skewed. However, this did not affect the overall significance of the positive and the negative cis-element. In fact, the strength of in vivo selection with partial randomization lies in the fact that it tends to simultaneously reveal the inhibitory and the stimulatory stretches despite the skewed selection at certain positions.

Based on the mutability plot, more than 70% of the conserved residues are located in the middle of exon, forming what we call a “conserved tract”(Fig. 2B). Mutable residues (with values from +1 to +17.6) are located throughout the molecule, although the majority of them are concentrated towards the ends of the exon forming an extended inhibitory context (Exinct) near the 3 ss and a cluster near the 5 ss. Consistent with the stimulatory and the inhibitory nature of cis-elements, mutations within the conserved and the mutable tracts promoted exon 7 exclusion and inclusion, respectively [5]. Results of in vivo selection were subsequently validated using antisense microwalk [7]. In addition to prediction of cis-elements, results of in vivo selection were helpful in revealing the inhibitory nature of two RNA structures within SMN exon 7 [6].

5. Trouble shooting

Problem / Reason + Solution
Smeared PCR product during generation of initial pool / Due to randomization, it is reasonable to expect the smeared PCR product. However, different annealing conditions should be tried to get the best PCR product.
Repeated cycling of selection did not provide enrichment of sequences that promote exon inclusion / Poor ligation and transfection efficiencies could be the cause of not getting enough enrichment. Ligation efficiency could be tested by transformation of E. coli with ligated mixture. It is necessary to make sure that ~1012 ligated molecules are used for transfection.
Final selected pool has many clones with identical sequence / PCR may selectively amplify some sequences. Using different primers during different cycles of selections could rectify this problem.

Table 1. Sequences of primers used

P1 / 5CGACTCACTATAGGCTAGCC3'
P2 / 5GCATGCAAGCTTCCTTTTTTCTTTCCCAACAC3'
B1 / 5TTCATGGTACATGAGTGGCACTCATACTCC-CTATTATCAG3'
B2 / 5TTTAGTGGTGTCATTTAGGAGTGCTC-GTTGCCAGCATTAC3'
PCI-UP / 5TGACATCCACTTTGCCTTTCTCTC3
PCI-DN / 5AGCATCACAAATTTCACAAATAAA3
HPRT-FW / 5AAGGAGATGGGAGGCCAT3
HPRT-REV / 5GTTGA-GAGATCATCTCCACCAAT3

Figure legends

Figure 1. Strategy for the iterative in vivo selection of the entire SMN1exon 7. The figure is adapted from [5]. (A) Selection procedure (diagram not to the scale). pCI vector backbone is shown as a half circle. Exon 6 (E6), Intron 6 (I6), partially random exon 7 (E7R), intron 7 (I7) and exon 8 (E8) sequences are marked. (B)Details of the second PCR amplification step and ligation. Primers B1 and B2 anneal to constant sequences in E6 and E8, respectively. B1 and B2 contain mutations marked in capitalized and underlined letters that create BsaXI site and restore intronic sequences, respectively. Upon digestion with BsaXI, the intact E7R is released with the 3overhangs that correspond to the complementary intronic sequences of the 5overhangs of BsaXI-digested pBxT2. The insertion of E7R into pBxT2 plasmid restores the minigene splicing cassette. For primer sequences, refer to Table 1.

Figure 2. Comparison of splicing efficiency of different pools and selected sequences. The figure is adapted from [5]. (A)Comparative splicing patterns of different selected pools. The 333 and 279 bp products correspond to fully spliced and exon 7-skipped products, respectively. The percent of exon 7 skipping was calculated from the total value of exon 7-included and excluded products. Abbreviations E6, E7 and E8 stand for exon 6, exon 7 and exon 8, respectively. For primer sequences, refer to Table 1. (B)Mutability of residues based on the results of in vivo selection. The values of –1 and +17.6 represent the absolutely conserved and the least conserved residues, respectively. The dotted horizontal lines show the cutoff points with the mutability values of +0.2 and –0.2, corresponding to the mutable and the conserved residues, respectively. Based on the stretches of the mutable and the conserved residues, the extended inhibitory context (Exinct), the 3-Cluster and the long conserved tract have been highlighted. Inhibitory nature of residues covering Exinct has been described in [4]. Consistent with the inhibitory nature of the 3-Cluster, deletions and mutations in this region promoted exon 7 inclusion in SMN2 [5]. Multiple mutations in the region of the conserved tract have been shown to cause exon 7 skipping in SMN1 [6, 9]. The exceptionally high mutability of position 54 is consistent with the dominant effect of A54G substitution on exon 7 inclusion [6].

Acknowledgments

RNS was supported by a grant from United States National Institutes of Health (R01NS055925). Authors would like to acknowledge support of Salsbury Endowment at the Iowa State University.

References

[1]Tuerk, C., and Gold, L. (1990). Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505-510.

[2]Ellington, A.D., and Szostak, J.W. (1990). In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818-822.

[3]Coulter, L.R., Landree, M.A., and Cooper, T.A. (1997). Identification of a new class of exonic splicing enhancers by in vivo selection. Mol. Cell. Biol. 17, 2143-2150.

[4]Singh, N.N., Androphy E.J., and Singh R.N. (2004). An extended inhibitory context causes skipping of exon 7 of SMN2 in spinal muscular atrophy. Biochem. Biophys. Res. Commun. 315, 381-88.