Supplementary material
Data and methods
All of the 899 alternatively spliced genes in Homo sapiens were collected from the Alternative Splice Database of Mammals (AsMamDB; http://166.111.30.65/ASMAMDB.html) [8]. We then selected all the genes in which complete coding sequence (CDS) had been sequenced and deposited in GenBank; this produced 371 genes with a total of 914 alternative-splice events. The lengths of these alternative segments were analyzed. The length distribution of these alternatively spliced exons is shown in FigureS1. Because the average length of human internal exon is ~120 bp, those 100 bp are considered as short alternative-spliced exons, and 50 bp internal exons are considered as very short alternative splicing (VSAS) in our article. Among all of the 914 splicing segments, 258 are 50 nucleotides from 134 genes (36%); for the convenience of this discussion, we called them VSAS genes. If there are two or more alternative-splice events present in one transcript variant, it would be difficult to evaluate their influence on protein structure individually. Therefore, to investigate the influence of VSAS on the protein structure, only those pairs of alternative-splicing variants that differ with each other by a single VSAS segment were considered in this study. These criteria provide us with 43 pairs of alternative transcripts (protein variants) from 43 genes. Table S1 gives a description of the dataset. With the same criteria, we found another 14 VSAS genes from the Alternative Exon Database (AEDB) (http://www.ebi.ac.uk/asd/aedb ) [9].
Table S1. The gene screening criteria and the result from AsMamDBa899 alternative splice human genes deposited in AsMamDB
371 genes (914 AS events); their complete CDS of AS transcript variants have been amplified and deposited in GenBank / 528 genes, only partial CDS were cloned and deposited in GenBank
134 genes (254 AS events) that contain at lease one short AS segment 50 nucleotides / 237 genes: all of their AS segments are not 50 nucleotides
43 VSAS genes: / 91 genes with VSAS events but are not selected for the current study:
Each gene has a pair of variants that are different only in the presence or absence
of one AS segment of 50 nucleotides long / These genes contain VSAS events; either they involve the substitution of a VSAS segment with a longer segment, there are other AS events on the same transcripts with the VSAS segments or the VSAS events are at the beginning or end of the CDS region
aAbbreviations: AS, alternatively spliced; AsMAMDB, Alternative Splice Database of Mammals (http://166.111.30.65/ASMAMDB.html); CDS, complete coding sequence; VSAS, very short alternative splicing.
The protein secondary structures were predicted using the PHD method (http://cubic.bioc.columbia.edu/predictprotein/) with the default parameters. The predictions were performed for the full sequences of the protein variants, and the predicted secondary structures corresponding to the VSAS segments are compared with those of their flanking regions. If the VSAS fragment has a different secondary structure to the flanking structures on both sides, then the VSAS fragment inserts a distinct secondary structure or domain (group A); otherwise the VSAS fragments generates an identical secondary structure (identical with either of the two sides, group B). Within each of these two groups (group A and B), the proportions of the loop, a-helix or b-sheet were further analyzed. Fisher’s Exact Test was adopted for comparing the significance of the difference of this proportion between the two groups.
The homology modeling for 3D-structure prediction was performed by SWISS-MODEL (http://www.expasy.org/swissmod/SWISS-MODEL.html) and the structures were viewed with RasMol software. Based on the amino acid identity with sequences of known 3D structures, the structure of human interleukin 4 (IL-4) [Protein Data Bank (PDB) accession number: 1BCN and 1BBN] were used by the internet modeling server as the main template for the shorter alternative variant, IL-4d2.
In our comparative study with mouse and rat genomes, we used the protein sequences of the studied human VSAS genes to search homologous genes in GenBank through BlastP software with the default parameters. The genes with high sequence similarity were regarded as homologous genes and we searched for all of the mouse and rat genes among these genes. Next, using the annotation in GenBank, we examined whether identical alternative-splice events were also reported in the rat and mouse in these homologous genes. If an identical VSAS event in rodents can be found, it is regarded as a conserved VSAS event; otherwise it is considered as not conserved.
FigureS1. The histogram of the length distribution of the 914 alternative exons.
Table S2. The predicted protein secondary structures of the 57 genes used in this studya /No.b / Genes / Description / Alternative type / Predicted secondary structurec /
1 / FOXM1 / Human hepatocyte nuclear factor-3 / ES / H(L)H /
2 / SPP1 / Secreted phosphoprotein 1 (osteopontin, bone sialoprotein I, early T-lymphocyte activation 1) / ES / L(L)L /
3 / BPAG1 / Bullous pemphigoid antigen 1 / Alt3 / H(E)H /
4 / CALCR / Calcitonin receptor / ES / E(H)E /
5 / SFTPC, SP-C / Surfactant, pulmonary-associated protein C / Alt3 / H(E)L /
6 / TAC1 / Tachykinin, precursor 1 / ES / L(H)L /
7 / PS1, PSEN1 / Presenilin 1 (Alzheimer disease 3) / Alt5 / L(E)L /
8 / CASP8 / Caspase 8, apoptosis-related cysteine protease / ES / H(L)E /
9 / MEN1 / Multiple endocrine neoplasia I / Alt5 / H(L)E /
10 / NTRK3, TRKC / Neurotrophic tyrosine kinase, receptor / ES / E(L)E /
11 / DCX / Doublecortex; lissencephaly, X-linked (doublecortin) / Alt5 / L(L)L /
12 / PPAP2A / Phosphatidic acid phosphatase type 2a / Alt3 / E(E)E /
13 / MLH1, hMLH1 / mutL (E.coli) homolog 1 (colon cancer, nonpolyposis type 2) / Alt3 / L(L)H /
14 / MBP / Myelin basic protein / ES / L(E)L /
15 / FDXR, ADXR / Adrenodoxin reductase / Alt3 / L(H)L /
16 / VEGF / Vascular endothelial growth factor / Alt5 / L(E)L /
17 / IL4 / Interleukin 4 / ES / H(E)H /
18 / NACP, SNCA / Synuclein, a (non A4 component of amyloid precursor) / ES / E(LE)H /
19 / SPTAN1, SPTA2 / Spectrin, a, non-erythrocytic 1 (a-fodrin) / Alt5 / H(H)H /
20 / MYL6, MLC / Myosin, light polypeptide 6, alkali, smooth muscle and non-muscle / ES / H(L)H /
21 / CREB1 / cAMP responsive element binding protein 1 / ES / E(H)E /
22 / KIT / v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog / Alt5 / H/EL /
23 / MCP / Membrane cofactor protein (CD46, trophoblast-lymphocyte cross-reactive antigen) / ES / L(E)L /
24 / CBS / Cystathionine-b-synthase / ES / E(H)L /
25 / HOMER-2B / Homer, neuronal immediate early gene, 2 / Alt3 / L(L)L /
26 / LILRA2, ILT1c, LIR-7 / Leukocyte immunoglobulin-like receptor, subfamily A (with TM domain), member 2 / ES / E(LE)L /
27 / MINK / Misshapen/NIK-related kinase / ES / L(H)E /
28 / ARRB1 / Arrestin, b 1 / ES / L(L)L /
29 / PDE5 / Phosphodiesterase 5A, cGMP-specific / Alt5 / H(H)H /
30 / SPRR3 / Small proline-rich protein 3 / IS / L(L)L /
31 / CDC25B / Cell division cycle 25B / ES / L(E)L /
32 / KCNAB2 / Potassium voltage-gated channel, shaker-related subfamily, b member 2 / ES / E(LH)L /
33 / CTNND1 / Catenin (cadherin-associated protein), / Alt5 / L(L)L /
34 / GLRA3 / Glycine receptor / Unknown / L(HE)L /
35 / PTA / Human pre-T-cell receptor a / Alt3 / L(H)L /
36 / TPR / Translocated promoter region (to activated MET oncogene) / Alt3 / H/LL /
37 / PACE4 / Paired basic amino acid cleaving system 4 / ES / L(LE)L /
38 / PTB / Polypyrimidine tract binding protein / Alt3 / H(LE)L /
39 / SNRP70 / Small nuclear ribonucleoprotein 70kD polypeptide (RNP antigen) / Alt3 / L(L)L /
40 / RANBP3 / RAN binding protein 3 / Alt5 / H(H)H /
41 / CASR / Calcium-sensing receptor / Alt3 / E(E)E /
42 / OCRL / Oculocerebrorenal syndrome of Lowe / ES / H/LH /
43 / NUMA1 / Nuclear mitotic apparatus protein 1 / ES / H(H)H /
44 / GRP / Gastrin-releasing peptide precursor (GRP) / Alt5 / L(E)L /
45 / Troponin T / Troponin T, slow skeletal / Unknown / H(L)H /
46 / TLX / Membrane cofactor protein / IS / L(L)L /
47 / MUC-1 / Mucin 1 precursor / Alt3 / H(E)L /
48 / CREB / cAMP response element bin / Alt3 / E(L)E /
49 / Hu-antigen D / ELAV-like protein 4 (Paraneoplastic encephalomyelitis antigen HuD) / Alt3 / L(L)L /
50 / Cdc25B / M-phase inducer phosphatase 2 / ES / H(L)L /
51 / CT-R / Calcitonin receptor precursor / ES / H(HL)H /
52 / N-CAM L1 / Neural cell adhesion molecule L1 precursor / ES / L(L)L /
53 / Na(+)/Ca(2+)-
exchange protein 3 / Na+/Ca2+ exchanger isoforms 3 / ES / L(E)L /
54 / PMCA2 / Plasma membrane calcium-t / Alt5 / H(L)H /
55 / MSCF2 / Myocyte-specific enhancer / ES / L(L)L /
56 / Hu-antigen B / ELAV-like protein 2 / ES / L(E)L /
57 / HCC1 / RNA-binding region / ES / L(E)L /
aAbbreviations: Alt3, alternative 3¢ site; Alt5, alternative 5¢ site; E, b-sheet; ES, exon skipping; H, a-helix; IS, intro skipping; L, loop.
bNumber 1–43 were collected from the Alternative Splice Database of Mammals (AsMamDB; http://166.111.30.65/ASMAMDB.html); number 44–57 were from the Alternative Exon Database (AEDB; http://www.ebi.ac.uk/asd/aedb).
cThe secondary structures in parentheses are those of the VSAS segments and the structures either side of the parenthetic brackets refer to those of the flanking residues on both sides of the VSAS.
/