SUPPLEMENTAL DATA (for online publication only)
Appendix A. Wheat SPS cDNA and protein sequences. The longest contiguous sequence from each SPS gene family is shown with the component sequences indicated in the heading. Protein coding regions are shown in upper case with the start and stop codons highlighted in bold, and the 5´- and 3´-UTRs are shown in lowercase. Non-wheat segments are underlined.
The TaSPS1 clone (GenBank accession No. BE418274) was derived from a wheat leaf cDNA library and from comparison with known SPS sequences has a full length coding region, encoding the 1055aa wheat SPS1 protein (Table 1). Searches with the TaSPS1 sequence identified three groups of very closely related ESTs (see Supplemental Data, Appendix D). One group (TaSPSIa) is essentially identical (99 identity) to TaSPS1, extending throughout the 3´-UTR. A second group (TaSPSIb) shows 93-95% overall identity with TaSPS1, but has significant differences in the 3´-UTR, for example, a 41-bp insertion directly after the TGA translation stop codon. A third group contains sequences that have <99% identity with TaSPS1 but do not overlap with the dimorphic regions, so they might belong to the TaSPSIb group or to a third group. These sequences are collectively referred to as family I.
TaSPSI[1-39: BQ240343 (EST); 40-3460: TaSPS1 AF310160 (cDNA)]
ccacgcgtccgcggacgcgtgggggcagcagagtagctagctagctcgagggagagccga
gATGGCGGTGGGGAACGAGTGGATCAACGGGTACCTGGAGGCGATCCTCGACGCCGGGTC
GAAGCTGCGGGTGCAAGGGGTGTCGCTGCCGCCGCTGGAGCCAGCGCCGGCGCTCGCGTC
GGAGGAGTCCAGCGCCGCCTACAACCCCACCAGGTACTTCGTGGAGGAGGTCGTCCGGAG
CTTCGACGACCAGGCCCTCCACAAGACATGGACAAAGGTGGTGGCGATGCGGAACAGCCA
GGAGCGGAACAACCGGCTGGAGAACCTGTGCTGGAGGATCTGGAACGTCGCGAGGCAGAA
GAAGCAGGTGGAGAGGGATTACTCGCAGGAGGTCGCTCGGCGGAAGCAAGAGCAAGAGCT
GGGCAGCTTGGAGGCCGCCGAGGACCTCTCCGAGCTCTCGGAGGGCGAGAAGGAGACCGT
CCCCAAGCCGGACGGCGCCGCTGCACACCTGTCCGCCGACGAGCAGCAGCCGCAGCAGCG
CACCCGGCTGGCGAGGATCAACTCCGAGGTGCGGCTCGTCTCTGACGACGAGGACGAGCA
GAGCAAGGACAGAAACCTCTACATCGTCCTCGTCAGCATCCATGGGCTCGTGCGTGGAGA
GAACATGGAGCTCGGGCGAGACTCCGACACCGGAGGCCAGGTGAAGTACGTGGTGGAGCT
GGCCCGGGCGCTGGCGGCGACGGCGGGGGTGCACCGCGTGGACCTCCTGACGCGCCAGAT
CTCCTGCCCCGACGTCGACTGGACCTACGGCGAGCCGGTGGAGATGCTCGAGCGCCTGTC
CTCGGGCGACGACGACGGCGACGAGTCCGGGGGAGGCGGGGCGTACATCGTGCGGCTGCC
CTGCGGGCCACGCGACCAGTACATCCCCAAGGAGGAGCTCTGGCCGCACATCCCCGAGTT
CGTGGACCGCGCGCTCTCGCACGTCACCAACGTGGCGCGCGCGCTGGGCGAGCAGCTCCA
GCCGCCGCCCAGCGACGCCCCGGCGACGGCGCTGGCGGCGCCGGTGTGGCCGTACGTGAT
CCACGGGCACTACGCGGACGCGGCGGAGGTGGCGGCGAACCTCGCGAGCGCGCTCAACGT
GCCGATGGTGATGACGGGCCACTCGCTGGGGCGGAACAAGCTGGAGCAGCTGCTGAAGCT
GGGCCGCATGCACGGGCCCGAGATCCAGGGCACCTACAAGATCGCGCGGCGGATCGAGGC
GGAGGAGACCGGGCTGGACACGGCGGAGATGGTGGTCACCAGCACCAAGCAGGAGATCGA
GGAGCAGTGGGGCCTCTACGACGGCTTCGACCTCATGGTGGAGCGGAAGCTCCGCGTGCG
CCAGCGCCGCGGCGTCAGCAGCCTCGGCCGCTACATGCCGCGCATGGCGGTCATCCCGCC
CGGCATGGACTTCAGCTTCGTCGACACCCAGGACACCGCCGACGGGGACGGCGCCGACCT
CCAGATGCTCATCGACCCCGTCAAAGCCAAGAAGGCTCTGCCTCCCATTTGGTCAGAGAT
TCTGAGGTTCTTCACGAACCCGCACAAGCCGATGATCCTGGCGCTGTCGCGGCCGGACCC
GAAGAAGAATATCACCACGCTACTCAAGGCGTACGGCGAGAGCCGAAAGCTCCGGGAGCT
CGCCAACCTGACGCTGATACTGGGGAACAGAGATGACATCGACGACATGGCCGGCGGCGG
CGGCACGGTGCTCACGGCGGTGCTGAAGCTCATCGACCGCTACGACCTCTACGGCCAGGT
GGCTTATCCCAAGCACCACAAGCAGACGGACGTGCCTCACATCTACCGCCTCGCCGCCAA
GACCAAGGGAGTGTTCATCAACCCGGCTCTTGTAGAGCCGTTCGGCCTCACAATCATCGA
GGCCGCCGCTTATGGTCTGCCCGTGGTGGCGACCAAGAACGGCGGGCCGGTGGACATCCT
CAAGGCGCTTCACAACGGCCTGCTGGTGGACCCGCACTCCGCCGAGGCGATCACCGGCGC
GCTGCTCAGCCTGCTGGCCGACAAGGGGCAGTGGCTGGAGAGCCGCCGCAACGGCCTGCG
CAACATCCACCGCTTCTCCTGGCCGCACCACTGCCGCCTCTACCTCTCCCACGTCGCCGC
CTACTGCGACCACCCGTCGCCGCACCAGCGGCTCCGCGTCCCTGGCGTCCCGGCCGCCTC
GGCGAGCATGGGCGGCGACGACTCCCTCTCGGACTCGCTCCGTGGCCTCTCGCTCCAGAT
CTCCGTGGACGCCTCCAGTGACCTCAATGCCGGGGACTCCGCCGCGCTGATCATGGACGC
CCTACGCCGCCGCCCGGCGGCCGACAGGCGCGAGGGCTCCGGCAGGGCGTTGGGCTTCGC
GCCCGGCAGGAGGCAGAGCCTCCTTGTCGTCGCCGTCGACTGCTACTGCGACGACGGCAA
GCCCGACGTCGAGCAACTGAAGAAAGCCATCGACGCGGCGATGTCCGCCGGTGACGGCGC
GGGAGGGCGGCAGGGGTACGTGCTCTCGACCGGCATGACCATCCCCGAGGCCGCGGAGAC
GCTCAAGGCCTGCGGCGCCGACCCGGCCGGCTTCGACGCGCTGATTTGCAGCAGCGGCGC
GGAGATATGCTACCCGTGGAAGGAGCTCACGGCCGACGAGGAGTACTCGGGCCACGTGGC
GTTCCGGTGGCCCGGCGACCACGTGAAAACCGTCGTGCCGAGGCTCGGGAAGGCTGAGGA
CGCGCAGGCGTCCGACCTCGCCGTCGATGTGTCCGCCGGCTCCGTGCACTGCCACGCCTA
CGCCGCCACCGACGCGTCCAAGGTGAAGAAGGTGGATTCGATCAGGCAGGCGCTGCGGAT
GCGCGGGTTCCGGTGCAACCTCGTCTACACGCGCGCCTGCACGCGCCTCAACGTCATCCC
TCTCTCCGCTTCCCGCCCACGCGCCTTGAGGTACCTGTCGATACAGTGGGGCATCGATCT
CGCCAAGGTGGCGGTGCTCGTCGGCGAGACCGGGGACACCGACCGGGAGAAGCTCCTCCC
GGGGCTGCACAGGACGCTGATCCTGCCGGGGATGGTCTCACGCGGCAGCGAGCAGCTCGT
CCGCGGCGAGGACGGGTACGCCACGCAGGACGTCGTGGCCATGGACTCCCCCAACATCGT
CACGCTCGCTCAAGGCCAGGCTGTCTCCGACCTTCTCAAGGCCATGTGAgagagcacaac
tcgtacgtaatgtaattttggcaggaagatgactgcagaatttgcatacaaggtagtata
aatttatggatgtgcaagcatgagcaaacatgtggcaaataattttttatgtcttagcat
gcctccctgaggtctgttgtacatatatatacactttataaatgaatagtatataagact
tggaggataaaaaaaaaaaaaaaaaaaaaaaaaaaaa
The second family (family II) of SPS genes from wheat includes TaSPS2 (AF347064) and TaSPS8 (AF354298), which share 98% nucleotide identity in the protein coding region, decreasing to 92% in the 3´-UTR. Two shorter clones, TaSPS3 (AF347065) and TaSPS4 (AF347066), are essentially 100% identical to the 3´-ends of TaSPS2 and TaSPS8, respectively. TaSPS4 differs from TaSPS8 in the presence of a 792-bp insert, however, this seems likely to be an unspliced intron as it shows no similarity to other SPS-encoding sequences and is flanked at both ends by canonical AG^GT intron splice sites. The wheat ESTs in family II were assigned to three groups (Supplemental Data, Appendix D) based on the presence of insertions-deletions (indels) in the 3´-UTR and single nucleotide polymorphisms (SNPs). Sequences in groups TaSPSIIa and TaSPSIIb are essentially identical to TaSPS2 and TaSPS8, respectively, whilst the third group, TaSPSIIc, differs from both. A few sequences with <99% identity to TaSPS2 could not be assigned to any group as they do not overlap with the polymorphic regions.
TaSPSIIa[1-21: Lolium perenne SPS2 (cDNA)/rice OsSPS8 (cDNA AK101676)/maize SPS2/Bambusa oidhamiiSPS (cDNA AY445835); 22-233: BG604750 (EST); 234-3653: TaSPS2 AF347064 (cDNA)]. The TaSPSIIa contig is predicted to be missing 21 bp from the 5´-end of the coding region. Orthologous sequences from rice (OsSPS8, see Appendix B), maize (ZmSPS2, see Appendix C), Lolium perenne (Demmer et al., 2003)and Bambusa oldhamii (AY445835) all contain the following sequence: ATGGC(G/C)GGGAACGACTGGATC, in this region, which encodes the invariant 7-aa peptide: MAGNDWI. Therefore, for the phylogenetic and other analyses it was assumed that the wheat SPS2 protein would also begin with this sequence.
ATGGCNGGGAACGACTGGATCAACAGCTACCTGGAGGCCATCCTCGACGCGGGCGGCGCC
GCGGGGGACATCACGGCCGCCTCCGTCGCCTCCGCGGCGCCCGGGGGCGGCGCGGGCTCC
GCGGCGGCGGAGAAGAGGCGGGACAAGGCGTCGCTGATGCTGCTGGAGCGCGGCCGCTTC
AACCCGGCGCGCTACTTCGTGGAGGAGGTCATCTCCGGCTTCGACGAGACCGACCTCTAC
AAGACCTGGGTCCGCACGTCGGCGATGAGGAGCCCGCAGGAGCGGAACACGCGGCTGGAG
AACATGTCCTGGCGGATCTGGAACCTCGCCAGGAAGAAGAAGCAGATTGAAGGCGAGGAA
GCCTCCCGCTCATCGAAGAAACGTCTTGAGCGTGAGAAGGCCCGTCGAGATGCTGCTGCT
GATTTGTCTGAGGACCTATCTGACGGAGAAAAAGGAGAACATATTAATGAATCATCTATT
CACGCTGAGAGTACAAGGGGACACATGCCAAGGATAGGTTCAACTGATGCTATTGATGTT
TGGGCAAATCAGCACAAAGATAAAAAACTGTACATAGTGCTAGTAAGCATTCATGGTCTT
ATACGTGGTGAGAATATGGAGCTTGGGCGTGATTCAGATACAGGTGGCCAGGTCAAATAT
GTTGTAGAGCTTGCTAGGGCGTTAGGTGAAACACCTGGAGTATATAGAGTGGATCTGCTG
ACAAGGCAGATTTCTGCACCTGATGTTGATTGGAGTTATGGGGAACCTACAGAGATGCTG
AGTCCAAGAAATTCAGAGAATCTTGGGGATGACATGGGTGAAAGCAGTGGTGCTTATATT
GTCAGGATACCATTTGGGCCAAGAGAAAAGTATATCCCTAAAGAGCAGCTCTGGCCCCAC
ATCCAGGAATTTGTTGACGGTGCACTTGTCCATATCATGCAAATGTCCAAGGTTCTTGGA
GAACAAGTTGGCAATGGCCAACCAGTATGGCCTGTTGTTATCCATGGACACTATGCTGAT
GCAGGCGATTCTGCTGCTTTATTATCTGGGGCACTCAATGTGCCAATGGTCTTCACAGGT
CATTCTCTTGGCAGAGACAAGTTAGAGCAACTTCTAAAGCAAGGGCGTCAAACCAGGGAT
GAAGTAAATGCAACATACAAGATAATGCGACGGATTGAGGCTGAGGAACTTTGTCTTGAT
GCATCTGAAATTGTAATCACTAGCACTAGGCAAGAGATAGATAAACAATGGGGATTATAT
AATGGATTTGATGTGATTATGGAGAGGAAACTTAGAGCAAGAATAAAGCGTGGTGTTAGC
TGCTATGGTCGTGAAATGCCTCGTATGGTTCCAATTCCTCCTGGTATGGAGTTCAGCCAT
ATAGTGCCTCATGATGTTGATCTTGATAGTGAAGAAGCGAATGAAGTTGGCTCAGATTCA
CCAGATCCACCTGTTTGGGCCGATATAATGCGCTTCTTCTCGAACCCTCGCAAGCCCATG
ATTCTCGCTCTTGCTCGTCCAGATCCCAAGAAGAACATCACTACGCTGGTTAAGGCATTT
GGTGAACACCATGAACTGAGAAATTTAGCAAACCTTACGTTGATCATGGGTAACCGTGAT
GTTATTGATGAAATGTCAAGCACAAATGGAGCTGTTTTGACATCAGTACTCAAGTTAATT
GACAAGTATGATCTATATGGGCAAGTGGCATACCCCAAGCACCATAAGCAATCTGAAGTT
CCAGATATTTATCGTCTAGCGGCAAGAACAAAGGGGGTGTTTATTAACTGTGCTTATATT
GAACCATTTGGGCTCACCTTGATCGAGGCTGCTGCTTATGGTCTACCTATGGTTGCTACC
CAAAATGGTGGGCCTGTCGATATACACCGGGTTCTTGACAATGGCATTCTTGTTGATCCC
CACAATCAAAATGATATAGCTGAGGCACTTTATAGACTTGTTTCTGATAAGCAATTGTGG
GCAAAATGCCGTCAGAATGGTCTGGATAATATCCATCGATTTTCTTGGCCTGAACATTGC
AAAAACTATTTGTCACGGGTTGGTACGCTCAAGTCTAGACATCCACGATGGCAAAAGAGC
GATGATGCTACTGAAGTTTCTGAAACAGATTCACGTGGTGACTCTTTGAGGGATATTCAT
GATATATCACTTAACTTGAAGATCTCCTTGGACAGTGAAAAATCAGGCAGCATGTCAAAA
TATGGAAGGAGTTCAACCAGTGACAGGAGAAACCTTGAGGATGCTGTACAAAAATTTTCA
GAAGCTGTTAGTGCTGGCACAAAGGATGAGTCTGGTGAGAAAGCTGGGGCCACCACAGGC
TCCAATAAATGGCCATCTCTGCGAAGGAGAAAACACATCGTTGTTATTGCTGTAGATTCT
GTGCAAGATGCGGACTTGGTTCAGATTATCAAAAACATTTTTCAGGCTTCAAACAAAGAA
AAATCATCTGGTGCTCTTGGTTTTGTATTGTCAACATCTCGAGCAGCATCAGAGATACAT
CCTTTGTTAACATCTGGGGGCATAGAAATTACTGATTTTGATGCCTTCATATGCAGCAGT
GGCAGTGATCTTTGCTATCCATCTTCAAATTCAGAAGACATGCTTAGCCCTGCCGAGCTT
CCATTTATGATCGATCTTGATTATCACTCTCAGATTCAATATCGTTGGGGAGGAGAAGGT
TTAAGGAAGACACTAATTCGTTGGGCAGCAGAAAAGAATAGCGAGAGTGGAAAAGAAGCA
GTTGTTGAAGATGACGAATGTTCATCCACTTACTGCATTTCATTTAAAGTGAAGAATACT
GAGGCTGTCCCTCCTGTGAAGGATCTTAGGAAGACAATGAGAATTCAAGCATTGCGCTGT
CATGTATTGTACAGCCATGATGGCAGCAAGTTGAACTTCATACCTGTTCTAGCATCACGA
TCCCAAGCACTAAGGTACTTGTATATAAGGTGGGGCGTAGAGCTGTCGAACATGACGGTG
GTTGTTGGTGAAAGCGGCGATACAGATTATGAAGGGCTACTCGGAGGCGTGCAGAAGACC
ATCATACTCAAAGGCTCATTTAATTCCGCGCCAAACCAGCTTCATGCCGCCAGAAACTAT
TCGCTAGAGGATGTCGTATCGTTTGACAAGCCAGGAATTGCTTCCGTCGACGGTTATGCC
CCAGATATCCTAAAATCAGCTCTACAACAATTTGGTGCCCTGCAGGGCTAGaatcaccag
ctttgtcagagtcagcagaggaataaagtaggtgtgtggaggcacacttttgctgtgggg
tcgcacacgagacacggcggcctcgtcccgtgcctggtatctggatagggcgcgaccgtg
gcggcaatcgagcacaagggtgtggcttgcatgctcggtttatcccttcggattttgtga
cctacattccattttcaacgcatggtgaaattgttcataggttccaaataaatacagtta
gtatcgaactatggtcataaagcgtgtgctatagggaatgaagtaccctgttgtttggaa
accatatatatacccaacaatgtgcatgcatctctacgaataattagcatctccaagggg
cttaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
The third family (family III) of wheat SPS genes includes the cDNA clone TaSPS9 (AF534907), which encodes a full length 964-aa protein (wheat SPS9, Table 1). Three groups (TaSPSIIIa [=TaSPS9], TaSPSIIIb and TaSPSIIIc) of TaSPS9-like ESTs were found, based on indels in the 3´-UTRs and SNPs, with a small number of other unassigned sequences.
TaSPSIIIa[TaSPS9 AF534907 (cDNA)]
ctcacttcctcctcctccccacctccctcttttggcttagcctccactcttcccatcccc
cgatctctccggggctgcggcggcggctgcggcggcggcggcgaagATGGTGGGCAACGA
CAACTGGATCAACAGCTACCTGGACGCCATCCTCGACGCCGGCAAGTCGGCCGTCGGCGG
CGACCGCCCCTCCTTGCTCCTCCGCGAGCGTGGCCACTTCTCCCCGGCCCGCTACTTCGT
CGAGGAGGTCATCACCGGCTACGACGAGACCGACCTCTACAAGACATGGCTCCGGGCGAA
CGCGATGCGGAGCCCGCAGGAGAGGAACACGCGGCTGGAGAACATGACATGGAGGATCTG
GAACCTCGCGAGGAAGAAGAAGGAGTTAGAGAAAGAAGAAGCCTGTCGTTTGTTGAAAAG
GCATCCAGAAACCGAGAAAACACGAATTGATGCTACAGCGGATATGTCTGAAGATCTCTT
TGAAGGTGAAAAGGGAGAAGATGCTGGTGACCCATCTGTTGCCTATGGTGATAGCACCAC
AGGGGTCTCACCCAAGACAAGTTCAGTTGACAAGCTGTACATTGTATTGATCAGTCTCCA
TGGTCTTGTCCGTGGTGAGAATATGGAGCTAGGCCGAGATTCAGATACTGGTGGCCAGGT
CAAATATGTGGTTGAATTTGCTAAAGCATTGAGTTCGTCTCCTGGAGTTTACCGGGTTGA
TTTGCTGACAAGACAAATTTTGGCACCAAATTTTGATCGTAGTTATGGTGAACCTGCAGA
AATGTTGGTTTCAACAACCTTTAAAAATTCCAAACAGGAAAAGGGAGAGAACAGTGGTGG
ATACATCATTCGGATACCATTTGGACCAAGAGATATGTACCTGACTAAAGAACGTCTATG
GCCTTTCATTCAAGAATTTGTTGATGGTGCACTCAGCCATATTGTGCGGATGTCAAAAAC
AATTGGTGAAGAAATCGGCTGTGGGCATCCAGTCTGGCCTGCTGTGATTCATGGGCATTA
CGCCAGCGCGGGAATAGCTGCTACCCTGTTATCAGGAGCACTTAACCTGCCTATGGCATT
TACAGGACATTTCCTTGGGAAAGATAAACTGGAAGGGCTTCTCAAGCAAGGGCGACAATC
AAGGGAAGAAATAAATATGACATACAAAATAATGCGCCGAATCGAGGCGGAGGAATTGTC
TCTTGATGCATCTGAAATTGTTATTGCTAGTACTAGGCAAGAGATAGAAGAGCAGTGGAA
CTTGTATGATGGTTTTGAGGTCATACTTGCAAGGAAGCTTCGAGCAAGAGTCAAGCGTGG
TGCTAACTGCTATGGGCGTTATATGCCTCGTATGGTTATAATTCCTCCTGGTGTTGAATT
TGGTCATATAATTCATGATTTCGATATAGATGGTGAAGAAGAAAATCATGGCCCAGCCTC
TGAGGATCCGCCTATCTGGTCTCAGATAATGCGCTTCTTTACAAATCCTCGGAAGCCTAT
GATTTTGGCTGTTGCCCGTCCATATCCTGAAAAGAATATAACAACACTTGTAAAGGCATT
TGGTGAGTGCCGCCCACTGAGGGAGCTCGCAAATCTCACACTAATAATGGGTAACCGTGA
GGCTATTTCAAAGATGCACAACACGAGTGCTTCTGTCTTGACATCTGTGCTTACACTAAT
AGACGAATACGATTTGTATGGTCAAGTGGCATACCCCAAGCATCACAAGCACTCTGAAGT
TCCTGACATTTACTGTTTAGCCACAAGAACTAAGGGGGCTTTTGTTAACGTGGCTTATTT
TGAACAATTTGGTGTTACCTTGATAGAGGCCGCTATGAATGGTTTGCCTGTTATTGCTAC
AAAAAATGGAGCTCCTGTTGAAATTCATCAGGTGCTCAACAATGGTCTCCTTGTCGATCC
ACATGATCAGAATGCCATTGCAGATGCACTGTATAAACTTCTTTCCGAGAAGCAACTTTG
GTCAAGGTGCAGAGAAAATGGACTAAAAAATATTCACCAATTTTCCTGGCCTGAACATTG
CAAGAATCACCTGTCAAGGATATTGACTCTTGGCATGAGATCTCCTGCTGTCGGTAGCGA
AGAGGAAAGGAGTAAGGCACCTATATCAGGAAGGAAGCATATCATTGTTATTTCTGTAGA
CTCTGTTAACAAGGAGAATCTAGTGCGGATCATCAGAAATGCGATTGAGGCCGCACATAC
AGAAAACACACCGGCTTCAACTGGTTTCGTGCTGTCAACTTCGCTAACAATATCAGAGAT
ATGTTCACTGCTAGTATCTGTAGGCATGCATCCTGCTGGTTTTGATGCTTTCATCTGCAA
CAGTGGGAGTAGCATTTACTATCCTTCATATTCTGGTAATACGCCAAGCAATTCCAAGGT
TACCCATGTAATAGATCGAAATCATCAATCACATATTGAGTATCGTTGGGGAGGAGAAGG
TCTAAGAAAGTATCTTGTGAAATGGGCTACTTCAGTGGTTGAAAGAAAGGGAAGAATTGA
AAGGCAAATGATTTTTGAAGATTCAGAACACTCTTCTACATATTGTCTTGCATTTAAAGT
GGTCAATCCAAATCATCTGCCTCCCCTAAAGGAGTTGAGGAAGTTGATGAGAATCCAGTC
GCTCCGTTGTAATGCGCTGTATAACCACAGCGCTACCAGACTGTCTGTAACTCCTATTCA
TGCGTCACGTTCTCAGGCAATAAGGTACTTGTTTGTACGGTGGGGGATAGAGTTGCCAAA
TATCGTGGTCATGGTTGGTGAAAGTGGTGATTCCGATTACGAAGAGCTGCTAGGGGGTCT
CCACAGGACCATAATCCTGAAGGGCGACTTCAACATTGCTGCAAACAGAATCCACACAGT
CCGGAGATACCCCTTGCAGGATGTCGTTGCACTGGACAGCTCCAACATCATCGAAGTCCA
GGGTTGCACTACAGAGGACATCAAGTCTGCCCTGCGTCAGATTGGTGTGCCGACACAATA
Acatctttgcgcgcaccacacgaaaaggaagaagaaaaggagaggaagaacgagccaaac
cgagcgccactatttccatacctgatgggaatgtcgattttgtttgtagattgtagagtg
tgggtgtggtatattctcgagctgtgaataacttccaccttttgtttgtactattcacaa
attttgaagtggacaatatcgataaatgtagtgggaaaacaaatgtgagcagaaaagtca
tttgggaactgagatgccccgaaaatacagacaaggcgggagcctaaatggattaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
The fourth family of wheat SPS genes (family IV) was initially discovered by RT-PCR from wheat leaf RNA, which yielded a 674-bp fragment that was highly similar (87% identity) to the protein coding region of cDNA clone TaSPS9 (=TaSPSIIIa), but was more divergent than the TaSPSIIIb and TaSPSIIIc sequences within family III, which have >94% nucleotide identity with TaSPS9. The 674-bp RT-PCR fragment was linked to a partial cDNA clone, TaSPS7 (AF347069) by a second RT-PCR. This produced three groups of very similar 683-bp fragments, of which one had 5´- and 3´-ends that showed 100% identity with the 3´-end of the 674-bp fragment and 5´-end of TaSPS7, respectively. The resulting contiguous sequence was extended at the 5´-end by 5´-RACE. The longest product amplified by this procedure contained 428 bp, extending the overall contiguous sequence (TaSPSIVa) to a total length of 2517 bp. The 5´-end (520 bp) of an orthologous rice gene (OsSPS6) was used to search the ESTs from wheat and other species in the Triticeae. The only matching ESTs found were from rye (Secale cereale; 95% nucleotide identity) and barley (86-95% nucleotide identity), with the rye EST (BE494645) containing the likely translation initiation codon. From comparison with these sequences, it was estimated that the contiguous wheat SPSIVa sequence covers 2387 bp (82%) out of a 2919-bp protein coding region. Two groups of wheat ESTs in family IV were identified based on SNPs and a 3-bp indel in the 3´-UTR, with seven other sequences unassigned (Supplemental Data, Appendix D).
TaSPSIVa[1-303: 5´RACE; 304-958: RT-PCR; 959-1234: RT-PCR; 1235-2517: TaSPS7 AF347069 (cDNA)]
GTGAGAACTTGGAGCTTGGCCGGGATTCAGACACTAGTGGGCAGGTCAAATATGTTGTGG
AACTTGCTAAAGCATTGAGTTCATGCCCTGGAGTATACCGAGTTGATCTGTTGACGAGGC
AAATATTAGCACCTAATTATGATCGTGGATATGGTGAACCGTCAGAGACACTGGTACCAA
CAAGCTTCAAGAATCTTAAACAGGAAAGAGGAGAGAACAGTGGTGCATATATCACCCGAA
TACCATTTGGACCGAAAGACAAGTATCTAGCTAAAGAACATCTCTGGCCTTACGTGCAAG
AATTTGTTGATGGTGCACTCAGTCATATAGTGCACATGTCAAAGATCATAGGTGAAGAAA
TCGGCTGTGGACATCCAATGTGGCCTGCTGTGATTCATGGTCATTATGCCAGTGCAGGAG
TTGCTGCTGCTCTGATATCTGGAGCACTTAACGTTCACATGGTATTTACTGGGCATTTTC
TTGGGAAAGACAAGTTGGAAGGGCTTCTCAAGCAAGGGAGACAGACAAGGGAAGAAATAA
ATATGACATACAAAATAATGCGCCGAATTGAAGCAGAAGAATTATCTCTTGATGCATCTG
AAATAGTAATTGCAAGTACTAGACAAGAGATAGAAGAGCAATGGAATTTGTATGATGGTT
TTGAGGTCATGCTTGCAAGGAAGCTTCGTGCGAGAGTCAAGCGTGGTGCTAATTGCTATG
GACGTTACATGCCTCGTATGGTTATAATTCCTCCAGGTGTTGAATTTGGCCATATGATTC
ATGAATTTGATATGGAAGGCGAGGAAGATAGCCATTCCCCAGCCTCTGAAGATCCGCCTA
TTTGGTCTGAGATAATGCGGTTCTTCACAAATCCTAGGAAACCTTTGATTCTGGCTGTTG
CTCGTCCATACCCTGAAAAGAATATTACAACACTTGTAAAAGCCTTTGGTGAGTGCCGAC
CATTGAGGGAGCTTGCTAACCTAACACTGATTATGGGTAACCGTGAAGCTATTTCCAAAA
TGAGTAATATGAGTGCAGCTGTTTTGACATCAGTACTTACATTGATTGATGAATATGATT
TGTATGGTCAAGTGGCATACCCAAAGCATCACAAACACTCAGAAGTTCTTGATATTTATC
GTTTAGCAGCGAGAACGAAGGGTGCTTTTGTAAATGTAGCTTACTTTGAACAATTCGGTG
TTACCTTGATAGAGGCTGCCATGCATGGTTTACCTGTAATTGCAACAAAAAATGGAGCTC
CTGTTGAAATTCACCAGGTGTTGGACAACGGCCTCCTTGTTGATCCCCACGATCAGCATG
CAATTGCAGATGCACTCTATAAGCTTCTTTCTGACAAGCAACTCTGGTCAAGATGTAGAG
AAAATGGGCTGAAAAATATACACCGGTTTTCTTGGCCTGAACATTGCAAGAATTACTTGT
CGAGGATATTAACTCTTAGCCCAAGATACCCTTCTTTTCCGAGCAATGAAGACCAGTTTA
AGGCACCTATCAAGGGAAGGAAGTGTATCATCGTTATTGCCGTAGACTCTGCCAGTAAGA
AAGATCTGGTCTGTATCATAAAAAATTCTATTGAGGCTACACGGAAAGAAACGTTGTCAG
GTTCAACAGGTTTTGTGTTGTCGACTTCCCTGACAATGTCAGAGATACATTCCCTATTAA
TATCTGCAGGCATGGCTCCTACAGATTTTGATGCTTTCATATGCAATAGTGGGAGTGATT
TATTTTACCCTTCGCGGGCTGGTGATTCACCAAGCACTTCCCGTGTGACATTTTCATTAG
ACCGTAATTACCAGTCTCATATCGAGTATCGTTGGGGAGGAGAAGGTTTAAGGAAGTACC
TAGTGAAGTGGGCTTCCTCGATAGTAGAAAGAAGGGGAAGAACTGAAAAACAAGTCATTT
TTGAAGATGCAGAGCACTCCTCAACAAGTTGCCTTGCATTTAGAGTGGTCAATCCAAATT
ATTTACCTCCTTTGAAGGAGCTGCAAAAGTTGATGAGAATCCAGTCACTGCGTTGCCATG
CTCTTTATAACCACAGTGCTACCAGGCTATCTGTAATTCCAATTCATGCATCACGGTCCC
AGGCTCTAAGGTACTTGTCTGTTCGTTGGGGCATAGAGTTGCGAAACGTCGTGATTCTTG
TCGGTGAAAGCGGCGACTCAGATTACGAAGAGCTGTTTGGAGGCCTTCACAAGACGATCG
TCCTGAAGGGCGAGTTCAACACGCCCGCAAACAGGATCCACACGGTCAGGCGGTACCCGC
TACAGGACGTCATCGCGCTCGATTGCTCGAACATCATCGGGGTCGAGGGCTGCAGCGCCG
ATGACCTGACGCCTACCCTGAAGGCGCTCGGCATACCGACAAAGTGAcagatagccatat
aatttttgcccttttttctttatacgatgagaggaccggacaatacacgaatatagcaaa
tatatactaccatcgtttccatgctcgatggaaatacaaaaaaaaaaaaaaaaaaaa
Wheat SPS family V is represented by two partial cDNA clones, TaSPS5 (AF347067) and TaSPS6 (AF347068), which show 96% identity to each other and are more similar (79-80% identity) to the original maize SPS sequence (Worrell et al., 1991) than the other wheat clones (Table 2). Both TaSPS5 and TaSPS6 are truncated at the 5´-end. Five ESTs (CA498087, CA498012, BJ244687, BJ259967 and BJ248938) form a contiguous sequence that was judged to form part of the missing 5´-end of the TaSPS6 coding region, based on the 3´-EST partners of BJ259967 and BJ248938 (BJ265682 and BJ255184, respectively, see Supplemental Data, Appendix D), which share >99% nucleotide identity with the 3´-UTR of TaSPS6. The resulting 5´- and 3´-end contigs together cover 3029 bp (>94%) out of an estimated 3213-bp protein coding region. Plasmids that should have contained the putative full length cDNA clones, CA498087 and CA498012, were obtained but all attempts to reintroduce one of them into E. coli for sequencing were unsuccessful, and the other plasmid contained a different, unrelated clone to the one expected. A third clone (BJ244962) containing the 5´-end of a different but closely related gene (97% identity) was sequenced, however, the plasmid was found to contain a truncated chimeric cDNA insert with only 641 bp of SPS-like sequence (TaSPS10; AY425710). The ESTs in family V fall into three groups (TaSPSVa [=TaSPS5], TaSPSVb [=TaSPS6] and TaSPSVc), with three further unassigned sequences that do not overlap with the diagnostic polymorphic regions (Supplemental Data, Appendix D).
TaSPSVb[1-1405: CA498087, CA498012, BJ259967, BJ244687 & BJ248938 (EST); 1406-1588: 184 bp gap (predicted); 1589-2588: CA741331, BF484659, BJ256105 (EST); 2599-3614: TaSPS6 AF347068 (cDNA)]
gtcgacccacgcgtccggacagttccatcttccctctcgcccaacggagcatctctccct
ctctctcgcccaacggaacagacgcacgcgaatcgggccgccgcggcgggtgcatcacga
tggccggcaacgagtggatcaATGGCTATCTGGAGGCGATTCTTGACAGCGGCGCGTCGG
GTGGCGGAGGCGGAGGCGGTGGCGGCTCCGGTGCCGGGGCCGGCGGCGGCGGCGGCGGCG
GCGGCGGGGGTGACCCGAAGTCGTCGTCGAGCCCCCGCGGGCCGCACACGATATTCAACC
CCACAACGTACTTTGTGGAGGAGGTGGTGAAAGGCGTCGACGAGAGCGACCTCCACAGGA
CATGGATCAAGGTCGTGGCGACCCGCAACGCCCGCGAGCGCAGCAGCCGCCTCGAGAACA
TGTGCTGGCGCATCTGGCACCTCGCCCGCAAGAAGAAGCAGCTGGAGATTGAGGGCATCC
AGAGGATGTCGGCTCGGCAGAATGAACAGGAGAAGGTGCGCCGCGAGGCCACGGAGGACC
TGTCGGAAGATCTCGACGAGGGCGAGAAGGGGGACATCGTCGGCGAGCTGATGCCGTCAG
GGACCCCCAAGAAGAAGTTCCAGAGGAATTTCTCCGACCTTAGTGTGTGGTCGGACGAGA
ATAAGGAGAAGAAGTTGTACATTGTGCTCATCAGTGTGCACGGTCTTGTCCGTGGAGAAA
ACATGGAACTGGGTAGTGATTCAGATACGGGAGGGCAGGTGAAATATGTTGTGGAACTTG
CGAGAGCGCTTGCAATGATGCCCGGAGTGTACAGAGTAGACCTGTTTACTCGCCAAGTGT
CATCACCCGACGTGGACTGGAGCTACGGGGAGCCAACAGAGATGTTAACCTCCGGTTCCC
ACGACGCAGAGGGGAGCGGTGAGAGCGGCGGGGCATACATTGTGCGCATCCCTTGCGGCC
AGAGTAACAAGTACATCAAGAAGGAGTCCCTGTGGCCTTACCTCCAAGAGTTTGTTGACG
GAGCCCTTGCGCACATTCTAAACATGTCAAAGGTTTTGGGCGAACAGGTAGGCCATGGGA
AGCCAGTGCTGCCTTATGTGATCCATGGCCACTATGCCGACGCTGGCGATGTTGCTGCCC
TTCTTTCTGGCGCGTTGAATGTGCCGATGGTGCTCACTGGTCACTCGCTTGGGAGGAACA
AGCTGGAGCAGATTATGATGCAAGGGCGTATGTCCAAGGAGGAGATCGACGCAACCTACA
AGATCATGAGGCGTATTGAGGGGGAGGAGCTGGCCTTGGACGCAGCAGAGCTTGTGATTA
CTAGCACCAGGCAGGAGATCGATGAGCAGTGGGGATTGTATGATGGCTTTGATGTCAAGC
TTGAGAAAGTGTTGCGGGCACGGACnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnGCGCTTCCTGACCAATCCTCACAAGCCAATG
ATCTTGGCACTGTCAAGGCCTGATCCNAAGAAGAACATCACTACTCTTGTCAAAGCATTT
GGAGAATGCCGCCCACTGAGGGAACTTGCAAACTTAGTTCTGATCATGGGGAACAGAGAT
GACATCGAGGAGATGCCTCCCGGCAATGCAAATGTCCTCACCACTGTCTTGAAGCTGGTT
GACAAGTATGATCTGTATGGAAGTGTGGCCTTCCCCAAGCATCACAAGCAGGCCGACGTC
CCTGAGATTTATCGCCTCACAGCCAAGACGAAGGGTGTATTCATCAATCCTGCTCTCGTG
GAGCCTTTCGGCCTTACACTAATCGAGGCTGCAGCGCACGGTCTCCCAATCGTCGCCACC
AAGAACGGCGGTCCGGTCGACATTACAAATACACTGAACAGTGGGCTGCTAGTGGATCCG
CACGACCAGAACGCCATCGCCGACGCGCTGCTGAAGCTGGTGGCCGACAAGAACCTGTGG
CATGAGTGCCGGAAGAACGGGCTGCGCAACATCCACCTCTACTCGTGGCCGGAGCACTGC
CGGACGTACCTCGCCAGGGTGGCCGGGTGCCGGATCAGGAACCCACGCTGGCTCAAGGAC
ACGCCTGCGGACGCTGGCGCTGACGATGAGGCCGAGGACTCGCTCATGGAATTCCAGGAC
CTATCGCTCCGCCTGTCCATCGACGGCGAGCGAGGCTCCACTAACGAGCCCGCCTCGTCG
GACCCGCAGGACCAGGTGCAGAAGATCATGAACAAGCTCCACCAGTCGTCTTCTGCAGCT
CCAGATGCTGCCACGGACAAGAATCCGGCCAACGTTCACGCCGCTGGCACCGTCAACAAG
TACCCACTCCTGCGCCGCCGCCGCCGGCTGTTCATCGTGGCCGTGGACTGCTATGGTGAC
GACGGACGTTCCAGCAAGAAGATGTTGCAGGTGATTCAGGAGGTGTTCAGGGCGGTCCGG
TCTGACACCCAATTGTCCAAGATCTCCGGGTTCGCACTGTCGACGGCGATGCCGCTGTCC
GAGACGCTCCAGCTCCTACAGACGGGGAAGGTTCCCCCAACCGACTTCGATGCGCTCATC
TGCGGCAGTGGCAGCGAGGTGTACTACCCTGGCTCCGCGCAGTGCTTGGATGCCCAGGGG
AAGCTCCGGCCCGACCAGGACTACCTGCAGCACATCAACCATCGGTGGTCTCACGACGGC
GCCAGGCAGACCATAGGGAAGCTCATGGCCTCACAGGACGGCTCCGGCAGTGTCGTCGAG
CCTGACGTGGAGTCCTGCAATGCGCACTGCGTCTCCTTCTTCGTCAGGGACCCCAAGAAG
GTGAGAACTATCGATGAGATGCGGGAGAGGCTGAGGATGCGTGGCCTCCGGTGCCACCTC
ATGTACTGTCGGAAGTCAACGAGGATGCAGGTTGTCCCTCTCATGGCATCAAGGTCACAA
GCACTCAGGTACCTCTTTGTGCGTTGGGGCCTGCCCGTGGGCAACATGTACATTGTCCTC
GGGGAACATGGCGACACCGACCGTGAGGAGATGCTTTCAGGACTACACAAGACGGTGATT
GTTAAAGGCGTCACCGAGAAAGGCTCAGAGGACCTGCTGAGGAGCTCAGGGAGTTACCAC
AAGGAAGACGTTGTCCCGTCCGATAGCCCATTGGCTACCACCACGCGCGGTGATCTGAAG
TCGGATGAGATCCTGCGGGCTCTCAAGGAGGTCTCCAAGGCTTCCAGCGGCTGAtcggtt
gcccggaagctcgatgatgccgaacatcttcactcatgctatttaatatcttctgcttgt
tcataaccagagattcgaaccatttctcttttcttcactacatatataatcttgtgagca
ctagcacgactacatcttgcagtgagaaataattaagacactgctgatgatgccactgtc
tggcttgaatgtttgtgaactaattcagcatatacgtttgcaatatatattgtgcttatt
caaaaaaaaaaaaaa
Wheat SPS1a
MAVGNEWINGYLEAILDAGSKLRVQGVSLPPLEPAPALASEESSAAYNPTRYFVEEVVRS
FDDQALHKTWTKVVAMRNSQERNNRLENLCWRIWNVARQKKQVERDYSQEVARRKQEQEL
GSLEAAEDLSELSEGEKETVPKPDGAAAHLSADEQQPQQRTRLARINSEVRLVSDDEDEQ
SKDRNLYIVLVSIHGLVRGENMELGRDSDTGGQVKYVVELARALAATAGVHRVDLLTRQI
SCPDVDWTYGEPVEMLERLSSGDDDGDESGGGGAYIVRLPCGPRDQYIPKEELWPHIPEF
VDRALSHVTNVARALGEQLQPPPSDAPATALAAPVWPYVIHGHYADAAEVAANLASALNV
PMVMTGHSLGRNKLEQLLKLGRMHGPEIQGTYKIARRIEAEETGLDTAEMVVTSTKQEIE
EQWGLYDGFDLMVERKLRVRQRRGVSSLGRYMPRMAVIPPGMDFSFVDTQDTADGDGADL
QMLIDPVKAKKALPPIWSEILRFFTNPHKPMILALSRPDPKKNITTLLKAYGESRKLREL
ANLTLILGNRDDIDDMAGGGGTVLTAVLKLIDRYDLYGQVAYPKHHKQTDVPHIYRLAAK
TKGVFINPALVEPFGLTIIEAAAYGLPVVATKNGGPVDILKALHNGLLVDPHSAEAITGA
LLSLLADKGQWLESRRNGLRNIHRFSWPHHCRLYLSHVAAYCDHPSPHQRLRVPGVPAAS
ASMGGDDSLSDSLRGLSLQISVDASSDLNAGDSAALIMDALRRRPAADRREGSGRALGFA
PGRRQSLLVVAVDCYCDDGKPDVEQLKKAIDAAMSAGDGAGGRQGYVLSTGMTIPEAAET
LKACGADPAGFDALICSSGAEICYPWKELTADEEYSGHVAFRWPGDHVKTVVPRLGKAED
AQASDLAVDVSAGSVHCHAYAATDASKVKKVDSIRQALRMRGFRCNLVYTRACTRLNVIP
LSASRPRALRYLSIQWGIDLAKVAVLVGETGDTDREKLLPGLHRTLILPGMVSRGSEQLV
RGEDGYATQDVVAMDSPNIVTLAQGQAVSDLLKAM
Wheat SPS2a
MAGNDWINSYLEAILDAGGAAGDITAASVASAAPGGGAGSAAAEKRRDKASLMLLERGRF
NPARYFVEEVISGFDETDLYKTWVRTSAMRSPQERNTRLENMSWRIWNLARKKKQIEGEE
ASRSSKKRLEREKARRDAAADLSEDLSDGEKGEHINESSIHAESTRGHMPRIGSTDAIDV
WANQHKDKKLYIVLVSIHGLIRGENMELGRDSDTGGQVKYVVELARALGETPGVYRVDLL
TRQISAPDVDWSYGEPTEMLSPRNSENLGDDMGESSGAYIVRIPFGPREKYIPKEQLWPH
IQEFVDGALVHIMQMSKVLGEQVGNGQPVWPVVIHGHYADAGDSAALLSGALNVPMVFTG
HSLGRDKLEQLLKQGRQTRDEVNATYKIMRRIEAEELCLDASEIVITSTRQEIDKQWGLY
NGFDVIMERKLRARIKRGVSCYGREMPRMVPIPPGMEFSHIVPHDVDLDSEEANEVGSDS
PDPPVWADIMRFFSNPRKPMILALARPDPKKNITTLVKAFGEHHELRNLANLTLIMGNRD
VIDEMSSTNGAVLTSVLKLIDKYDLYGQVAYPKHHKQSEVPDIYRLAARTKGVFINCAYI
EPFGLTLIEAAAYGLPMVATQNGGPVDIHRVLDNGILVDPHNQNDIAEALYRLVSDKQLW
AKCRQNGLDNIHRFSWPEHCKNYLSRVGTLKSRHPRWQKSDDATEVSETDSRGDSLRDIH
DISLNLKISLDSEKSGSMSKYGRSSTSDRRNLEDAVQKFSEAVSAGTKDESGEKAGATTG
SNKWPSLRRRKHIVVIAVDSVQDADLVQIIKNIFQASNKEKSSGALGFVLSTSRAASEIH
PLLTSGGIEITDFDAFICSSGSDLCYPSSNSEDMLSPAELPFMIDLDYHSQIQYRWGGEG
LRKTLIRWAAEKNSESGKEAVVEDDECSSTYCISFKVKNTEAVPPVKDLRKTMRIQALRC
HVLYSHDGSKLNFIPVLASRSQALRYLYIRWGVELSNMTVVVGESGDTDYEGLLGGVQKT
IILKGSFNSAPNQLHAARNYSLEDVVSFDKPGIASVDGYAPDILKSALQQFGALQG
Wheat SPS3a
MVGNDNWINSYLDAILDAGKSAVGGDRPSLLLRERGHFSPARYFVEEVITGYDETDLYKT
WLRANAMRSPQERNTRLENMTWRIWNLARKKKELEKEEACRLLKRHPETEKTRIDATADM
SEDLFEGEKGEDAGDPSVAYGDSTTGVSPKTSSVDKLYIVLISLHGLVRGENMELGRDSD
TGGQVKYVVEFAKALSSSPGVYRVDLLTRQILAPNFDRSYGEPAEMLVSTTFKNSKQEKG
ENSGGYIIRIPFGPRDMYLTKERLWPFIQEFVDGALSHIVRMSKTIGEEIGCGHPVWPAV
IHGHYASAGIAATLLSGALNLPMAFTGHFLGKDKLEGLLKQGRQSREEINMTYKIMRRIE
AEELSLDASEIVIASTRQEIEEQWNLYDGFEVILARKLRARVKRGANCYGRYMPRMVIIP
PGVEFGHIIHDFDIDGEEENHGPASEDPPIWSQIMRFFTNPRKPMILAVARPYPEKNITT
LVKAFGECRPLRELANLTLIMGNREAISKMHNTSASVLTSVLTLIDEYDLYGQVAYPKHH
KHSEVPDIYCLATRTKGAFVNVAYFEQFGVTLIEAAMNGLPVIATKNGAPVEIHQVLNNG
LLVDPHDQNAIADALYKLLSEKQLWSRCRENGLKNIHQFSWPEHCKNHLSRILTLGMRSP
AVGSEEERSKAPISGRKHIIVISVDSVNKENLVRIIRNAIEAAHTENTPASTGFVLSTSL
TISEICSLLVSVGMHPAGFDAFICNSGSSIYYPSYSGNTPSNSKVTHVIDRNHQSHIEYR
WGGEGLRKYLVKWATSVVERKGRIERQMIFEDSEHSSTYCLAFKVVNPNHLPPLKELRKL
MRIQSLRCNALYNHSATRLSVTPIHASRSQAIRYLFVRWGIELPNIVVMVGESGDSDYEE
LLGGLHRTIILKGDFNIAANRIHTVRRYPLQDVVALDSSNIIEVQGCTTEDIKSALRQIG
VPTQ
Wheat SPS4a (truncated at N-terminus)
ENLELGRDSDTSGQVKYVVELAKALSSCPGVYRVDLLTRQILAPNYDRGYGEPSETLVPT
SFKNLKQERGENSGAYITRIPFGPKDKYLAKEHLWPYVQEFVDGALSHIVHMSKIIGEEI
GCGHPMWPAVIHGHYASAGVAAALISGALNVHMVFTGHFLGKDKLEGLLKQGRQTREEIN
MTYKIMRRIEAEELSLDASEIVIASTRQEIEEQWNLYDGFEVMLARKLRARVKRGANCYG
RYMPRMVIIPPGVEFGHMIHEFDMEGEEDSHSPASEDPPIWSEIMRFFTNPRKPLILAVA
RPYPEKNITTLVKAFGECRPLRELANLTLIMGNREAISKMSNMSAAVLTSVLTLIDEYDL
YGQVAYPKHHKHSEVLDIYRLAARTKGAFVNVAYFEQFGVTLIEAAMHGLPVIATKNGAP
VEIHQVLDNGLLVDPHDQHAIADALYKLLSDKQLWSRCRENGLKNIHRFSWPEHCKNYLS
RILTLSPRYPSFPSNEDQFKAPIKGRKCIIVIAVDSASKKDLVCIIKNSIEATRKETLSG
STGFVLSTSLTMSEIHSLLISAGMAPTDFDAFICNSGSDLFYPSRAGDSPSTSRVTFSLD
RNYQSHIEYRWGGEGLRKYLVKWASSIVERRGRTEKQVIFEDAEHSSTSCLAFRVVNPNY
LPPLKELQKLMRIQSLRCHALYNHSATRLSVIPIHASRSQALRYLSVRWGIELRNVVILV
GESGDSDYEELFGGLHKTIVLKGEFNTPANRIHTVRRYPLQDVIALDCSNIIGVEGCSAD
DLTPTLKALGIPTK
Wheat SPS5b (=wheat SPS6)
MAGNEWINGYLEAILDSGASGGGGGGGGGSGAGAGGGGGGGGGGDPKSSSSPRGPHTIF
NPTTYFVEEVVKGVDESDLHRTWIKVVATRNARERSSRLENMCWRIWHLARKKKQLEIE
GIQRMSARQNEQEKVRREATEDLSEDLDEGEKGDIVGELMPSGTPKKKFQRNFSDLSVW
SDENKEKKLYIVLISVHGLVRGENMELGSDSDTGGQVKYVVELARALAMMPGVYRVDLF
TRQVSSPDVDWSYGEPTEMLTSGSHDAEGSGESGGAYIVRIPCGQSNKYIKKESLWPYL
QEFVDGALAHILNMSKVLGEQVGHGKPVLPYVIHGHYADAGDVAALLSGALNVPMVLTG
HSLGRNKLEQIMMQGRMSKEEIDATYKIMRRIEGEELALDAAELVITSTRQEIDEQWGL
YDGFDVKLEKVLRARARRGVSCHGRFMPRMVVIPPGMDFSNVVVPEDTSDGDDKDDINL
DGASPRSLPPIWAEVMRFLTNPHKPMILALSRPDXKKNITTLVKAFGECRPLRELANLV
LIMGNRDDIEEMPPGNANVLTTVLKLVDKYDLYGSVAFPKHHKQADVPEIYRLTAKTKG
VFINPALVEPFGLTLIEAAAHGLPIVATKNGGPVDITNTLNSGLLVDPHDQNAIADALL
KLVADKNLWHECRKNGLRNIHLYSWPEHCRTYLARVAGCRIRNPRWLKDTPADAGADDE
AEDSLMEFQDLSLRLSIDGERGSTNEPASSDPQDQVQKIMNKLHQSSSAAPDAATDKNP
ANVHAAGTVNKYPLLRRRRRLFIVAVDCYGDDGRSSKKMLQVIQEVFRAVRSDTQLSKI
SGFALSTAMPLSETLQLLQTGKVPPTDFDALICGSGSEVYYPGSAQCLDAQGKLRPDQD
YLQHINHRWSHDGARQTIGKLMASQDGSGSVVEPDVESCNAHCVSFFVRDPKKVRTIDE
MRERLRMRGLRCHLMYCRKSTRMQVVPLMASRSQALRYLFVRWGLPVGNMYIVLGEHGD
TDREEMLSGLHKTVIVKGVTEKGSEDLLRSSGSYHKEDVVPSDSPLATTTRGDLKSDEI
LRALKEVSKASSG
Appendix B. Rice SPS cDNA and protein sequences. Genomic and cDNA sequences for the OsSPS1 gene have previously been described from both the japonica (cv. Nipponbare) and indica (cv. IR36) cultivar groups of rice (Sakamoto et al., 1995; Valdez-Alarcon et al., 1996). The two sequences are 98% identical and the gene from japonica rice was mapped to chromosome 1 (Sakamoto et al., 1995). The OsSPS8 (AP004041) gene has been independently annotated in the database on the basis of a full length cDNA sequence (AK101676). The OsSPS2 gene is also represented by full length, but unannotated, cDNA sequences (AK065273 and AK098898). The OsSPS6 and OsSPS11 sequences were derived from unannotated genomic sequence as described below.
The cDNA sequences were deduced from genomic DNA sequences by comparison with related rice or wheat cDNA sequences and ESTs where available, and otherwise by comparison of conceptual translations of the genomic sequence with known SPS protein sequences. The component genomic and cDNA sequences are indicated in the headings. Protein coding regions are shown in upper case, with the start and stop codons highlighted in bold, and the 5´- and 3´-UTRs are shown in lowercase. Deduced protein sequences from unannotated cDNA sequences are also shown.
In addition to these five SPS genes, there is a 339-bp open reading frame (ORF) on chromosome 10 that is annotated as a putative SPS (NM_197662). Although this ORF shows 93-95% identity with parts of exons 9 and 10 of OsSPS11, it lacks the region encoding the catalytic glucosyltransferase domain and so is too short to encode a functional SPS protein. No other SPS-like sequences were found in the vicinity of this ORF on chromosome 10. Furthermore, the ORF is not represented among the rice ESTs, so we conclude that it is a pseudogene and is perhaps a remnant from an ancient duplication event.
OsSPS6 (D family, sub-family IV) [1-2378: AP003523 (japonica cv. Nipponbare HTGS), AP003568 (japonica cv. Nipponbare HTGS) and AAAA010009 (indica cv. 93-11 WGS)]. The OsSPS6 gene sequence was assembled out of fragments of genomic sequence from japonica rice cv. Nipponbare (AP003523 and AP003568) in the high throughput genome sequence (HTGS) database and from indica rice cv. 93-11 (AAAA010009) in the whole genome shotgun sequence (WGS) database. Exons and introns were assigned, where possible, by comparison with a partial OsSPS6 cDNA sequence (AK071732), OsSPS6-derived ESTs and the related TaSPSIVa sequence from wheat, and otherwise by comparison of the protein translation of the rice genomic sequence with known SPS sequences.
ATGTACGGCAACGACAACTGGATCAACAGCTACCTGGATGCCATCCTCGACGCCGGCAAG
GGCGCCGCCGCCTCGGCCTCGGCCTCGGCGGTGGGTGGGGGAGGCGGAGCGGGGGACCGC
CCCTCGCTCCTCCTCCGCGAGCGCGGCCACTTCTCCCCCGCGCGGTACTTCGTCGAGGAG
GTCATCACCGGCTACGACGAGACCGACCTCTACAAGACATGGCTCCGCGCGAACGCGATG
CGGAGCCCGCAGGAGAAGAACACGAGGCTGGAGAACATGACGTGGAGGATCTGGAACCTC
GCCAGGAAGAAGAAGGAGTTGGAGAAGGAGGAAGCCAATCGCTTGTTGAAACGCCGTCTA
GAGACAGAGAGGCCACGGGTTGAGACTACTTCAGATATGTCGGAAGATCTCTTTGAAGGT
GAGAAGGGAGAGGATGCTGGTGATCCTTCTGTTGCCTATGGTGACAGCACAACTGGGAAC
ACACCTAGGATCAGTTCAGTTGACAAGCTGTACATAGTGTTGATCAGCCTTCATGGCCTG
GTCCGTGGTGAGAACATGGAGCTTGGCCGTGATTCAGATACTGGCGGACAGGTCAAGTAT
GTTGTGGAACTTGCTAAAGCATTGAGCTCATGTCCTGGAGTTTACCGGGTTGATCTTTTT
ACAAGACAGATCTTAGCGCCAAATTTTGATCGTAGCTATGGTGAACCAGTGGAACCTTTG
GCATCAACAAGCTTCAAGAATTTTAAGCAGGAAAGAGGAGAGAATAGTGGTGCATATATC
ATCCGAATACCATTTGGACCAAAAGACAAATATCTAGCTAAGGAACATCTCTGGCCTTTC
ATTCAAGAATTTGTTGATGGCGCCCTCAGTCATATAGTGAAGATGTCAAGGGCCATAGGT
GAAGAAATCAGCTGTGGGCATCCGGCGTGGCCTGCTGTGATTCATGGTCATTATGCTAGT
GCAGGAGTTGCTGCTGCTCTACTGTCAGGAGCACTTAATGTTCCCATGGTCTTTACAGGG
CATTTTCTTGGGAAAGATAAGTTGGAAGAGCTTCTCAAGCAAGGGAGACAGACAAGGGAG
CAAATAAACATGACATACAAAATAATGTGTAGAATTGAGGCAGAGGAGTTGGCTCTTGAT
GCATCTGAAATAGTTATAGCAAGCACTAGGCAAGAGATAGAAGAGCAATGGAATTTGTAT
GACGGTTTTGAGGTCATACTTGCAAGGAAACTCCGTGCAAGAGTCAAGCGTGGTGCTAAC
TGCTATGGTCGCTATATGCCTCGTATGGTTATCATTCCCCCAGGTGTTGAATTTGGCCAT
ATGATTCATGACTTCGATATGGATGGTGAGGAAGATGGTCCATCCCCAGCCTCTGAAGAT
CCATCTATTTGGTCCGAGATAATGCGGTTCTTTACAAACCCTAGGAAACCTATGATTCTG
GCAGTTGCTCGCCCTTATCCTGAAAAGAATATTACTACTCTTGTGAAGGCGTTTGGTGAG
TGCCGACCACTGAGGGAGCTTGCTAATCTAACATTGATAATGGGAAACCGTGAGGCTATT
TCCAAGATGCATAATATGAGTGCAGCTGTTTTGACATCAGTACTTACATTGATTGATGAA
TATGATTTGTATGGTCAAGTGGCATACCCAAAGCGTCACAAACACTCGGAAGTTCCTGAT
ATTTACCGTTTAGCAGTGAGAACAAAGGGTGCTTTTGTAAATGTGCCTTACTTTGAACAG
TTCGGTGTCACCTTGATAGAGGCTGCCATGCATGGTTTGCCTGTAATTGCAACAAAAAAT
GGAGCTCCTGTTGAAATTCACCAGGTGCTGGACAATGGTCTCCTTGTTGATCCCCATGAT
CAGCATGCAATTGCAGATGCACTCTATAAACTCCTTTCTGAAAAACAACTTTGGTCAAAA
TGCCGAGAGAATGGGCTGAAAAATATACATCAGTTTTCTTGGCCTGAACATTGCAAGAAT
TACTTGTCAAGGATATCAACTCTTGGCCCAAGGCATCCTGCTTTTGCAAGCAATGAAGAC
CGGATTAAGGCACCTATTAAGGGAAGGAAGCATGTCACTGTTATTGCTGTAGATTCTGTC
AGTAAGGAAGATCTGATTCGCATTGTCAGAAATTCTATCGAGGCTGCACGTAAAGAAAAT
TTGTCAGGATCGACAGGTTTTGTGTTGTCAACTTCCCTGACAATAGGGGAGATACATTCT
CTATTAATGTCTGCTGGCATGCTTCCTACTGATTTCGATGCTTTCATATGCAATAGTGGA
AGTGATTTGTATTATCCTTCATGTACTGGTGATACACCAAGCAACTCCCGTGTTACATTT
GCATTAGATCGTAGTTACCAATCACATATAGAGTATCATTGGGGAGGAGAAGGTTTAAGG
AAATATCTAGTGAAGTGGGCTTCTTCCGTGGTAGAAAGAAGAGGGAGGATTGAAAAACAA
GTTATCTTCGAAGATCCAGAGCACTCTTCAACATACTGTCTTGCATTTAAAGTGGTCAAT
CCAAATCATTTACCTCCTTTAAAGGAGCTGCAAAAGTTGATGAGAATTCAGTCACTCCGT
TGTCACGCTCTGTATAACCATGGTGCTACCAGACTATCTGTAATTCCAATCCACGCATCA
CGGTCTAAGGCTCTAAGGTACTTATCTGTTCGCTGGGGCATAGAGTTGCAAAATGTGGTG
GTTCTTGTTGGTGAAACTGGTGATTCAGATTACGAAGAATTGTTTGGAGGTCTTCATAAG
ACGGTCATCCTTAAGGGTGAATTCAACACATCTGCAAATAGAATCCATTCTGTTAGGCGG
TATCCTTTACAAGATGTTGTTGCACTTGATAGCCCAAACATCATTGGAATTGAGGGTTAT
GGCACTGATGACATGAGGTCTGCTCTGAAACAACTGGATATACGGGCACAGTGAcaccaa
gcccccatctgtttatcattaatatatgaagaaaaccagtggacgatacaaagacagcaa
acaaacactagcatttccatacttgatggagatgccgattttgccatgtaagtcatgtag
tttatgtgtgtggtccttgagctgtgaatagcattccgaaatctcatcccattgagattt
tggtatgtggcaattttggagtaaaaatcgattccatccaggaatacggacaaaagaaat
tggttacaatgttgataatgaaaaacatgttaaggaagcattaattcagcaagaaaagct
tccaaaatcactacaattcttggccaagcttgcaatttccctttttttgaagtggaagct
tatgttgtgtgtttactgctgggtggaccatatggccctggcagcccttctttactatgt
ttactccaggagggctgcctagctttcgtgtaagtattgtttgacacgatggttcattct
atatatccaaagttttgttgagatc
OsPS11 (B family = family 1) [1-3321: AAAA010009 (indica cv. 93-11 HTGS)]. The OsSPS11 gene sequence was assembled from japonica (AC135258) and indica (AAA010009) genomic sequence fragments.
ATGGCGGTGGGGAACGAGTGGATCAACGGGTACCTGGAGGCGATCCTGGACGCCGGCGTG
AAGCTGCGGGAGCAGCGAGGGGCGGCGGCGGTGCAGCTGCCGCCGCTGCTGCCGGCGCCG
GAGGACGCCGCGTCGGCGGTGGCGACGGCGGCGACGTACAGCCCGACGAGGTACTTCGTG
GAGGAGGTCGTCAGCCGCTTCGACGACCGCGACCTCCACAAGACGTGGACCAAGGTGGTG
GCGATGAGGAACAGCCAGGAGAGGAACAACCGGCTGGAGAACCTGTGCTGGAGGATCTGG
AACGTCGCGAGGAGGAAGAAGCAGGTGGAGTGGGAATTCTCGCGACAGCTGTCTCGCCGG
CGGCTGGAGCAGGAGCTTGGCAGCCGGGAGGCCGCCGCCGACCTCTCGGAGCTCTCCGAG
GGTGAGAAGGACGGCAAGCCGGACACTCACCCTCCACCGCCGGCGGCGGCGGCGGCGGAA
GCGGCGGCCGACGACGGCGGCGGCGGCGATCACCAGCAGCAGCAGCCGCCGCCGCCGCCG
CATCAGCTCAGCCGGTTCGCCCGGATCAACTCCGACCCCCGGATCGTCTCCGACGAGGAG
GAGGAGGTCACCACCGACCGGAACCTCTACATCGTGCTCATCAGCATTCATGGGCTCGTG
CGGGGCGAGAACATGGAGCTCGGCCGAGACTCTGACACCGGGGGCCAGGTGAAGTACGCG
GTGGAGTTGGCGCGGGCGCTGGCGGCGACGCCGGGGGTGCACCGCGTCGACCTCCTGACG
CGGCAGATCTCGTGCCCGGACGTCGACTGGACCTACGGCGAGCCCGTCGAGATGCTCACC
GTCCCGGCTGCCGACGCCGACGACGAAGACGGCGGCGGCGGCTCGTCCGGCGGCGCGTAC
ATCGTGCGGCTGCCGTGCGGGCCGCGCGACAAGTACCTCCCCAAGGAGTCGCTCTGGCCG
CACATCCCGGAGTTCGTCGACCGCGCTCTGGCGCACGTCACCAACGTGGCGCGCGCGCTC
GGCGAGCAGCTCTCGCCGCCGCCGCCGTCCGATGGCGCGGGCGCGGCGGCGCAGGCGGTG
TGGCCGTACGTGATCCACGGGCACTACGCGGACGCGGCGGAGGTGGCGGCGCTGCTGGCG
AGCGCGCTGAACGTGCCGATGGTGATGACGGGGCACTCGCTGGGGCGGAACAAGCTGGAG
CAGCTGCTGAAGCTGGGGCGCATGCCGCGCGCCGAGATCCAGGGCACGTACAAGATCGCG
CGGCGGATCGAGGCGGAGGAGACCGGCCTCGACGCCGCCGACATGGTGGTGACGAGCACC
AAGCAGGAGATCGAGGAGCAGTGGGGGCTCTACGACGGGTTCGACCTCAAGGTGGAGCGC
AAGCTCCGCGTCCGCCGCCGCCGCGGCGTCAGCTGCCTCGGCCGCTACATGCCGCGCATG
GTCGTCATCCCCCCCGGCATGGACTTCAGCTACGTCGACACCCAGGACCTCGCCGGCGAC
GGCGCCGGCGGCGCCGGCGACGCCGCCGACCTGCAGCTGCTCATCAACCCCAACAAGGCC
AAGAAGCCCCTCCCTCCCATCTGGTCGGAGGTGCTCCGGTTCTTCACGAACCCTCACAAG
CCGATGATCCTCGCGCTGTCGCGGCCGGACCCGAAGAAGAACGTCACCACGCTGCTCAAG
GCGTACGGCGAGAGCCGCCATCTCCGGGAGCTCGCCAACCTGACGCTGATACTGGGGAAC
AGGGACGACATCGAGGAGATGTCCGGCGGCGCGGCGACGGTGCTGACGGCGGTGCTGAAG
CTGATCGACCGGTACGACCTCTACGGCCAGGTCGCCTACCCCAAGCACCACAAGCAGACC
GACGTGCCGCACATCTACCGCCTCGCCGCCAAGACCAAGGGTGTGTTCATCAATCCTGCT
CTTGTCGAGCCATTCGGCCTCACCATCATAGAGGCTGCTGCATATGGGTTGCCGGTAGTG
GCGACGAAGAACGGCGGGCCGGTGGACATCCTCAAGGTGCTGAGCAACGGGCTGCTGGTG
GACCCGCACGACGCGGCGGCGATCACCGCCGCGCTGCTCAGCCTGCTCGCCGACAAGTCC
CGGTGGTCGGAGTGCCGCCGCAGCGGCCTCCGCAACATCCACCGCTTCTCCTGGCCGCAC
CACTGCCGCCTCTACCTCTCCCACGTCGCCGCCAGCTGCGACCACCCGGCACCGCACCAG
CTGCTCCGCGTCCCGCCCTCCCCGTCCTCCTCCTCCGCCGCCTCCGCCGCCGCCGGTGGC
GGCGGCGCGGCCGCCTCCTCCGAGCCGCTCTCCGACTCGCTGCGCGATCTCTCGCTCCGC
ATCTCCGTGGACGCCGCGTCGCCTGACCTCAGCGCTGGGGACTCCGCCGCGGCAATCTTG
GACGCGCTCCGCAGGCGCCGGTCCACCGACAGGCCGGCGGCGAGCTCTGCCGCCAGGGCG
ATCGGGTTCGCGCCGGGACGGCGTCAGAGCCTCCTCGTCGTCGCCGTCGACTGCTACGGC
GACGACGGGAAGCCGAACGTGGAGCAGCTCAAGAAGGTGGTCGAGCTGGCGATGTCGGCC
GGCGACGGCGACGACGCCGGCGGGAGGGGATACGTTCTTTCCACCGGCATGACCATCCCC
GAGGCCGTCGACGCGCTCAGGGCATGCGGCGCCGACCCGGCCGGCTTCGACGCGCTGATC
TGCAGCAGCGGCGCGGAGATCTGCTACCCGTGGAAGGGGGAGCAGCTCGCCGCCGACGAG
GAGTACGCCGGGCACGTGGCGTTCCGGTGGCCCGGCGACCACGTGAGGTCCGCGGTGCCG
AGGCTCGGGAAGGCCGACGGCGCGCAGGAGGCGGACCTCGCCGTCGACGCCGCCGCCTGC
TCCGTGCACTGCCACGCCTACGCCGCCAAGGACGCGTCCAAGGTGAAGAAGGTCGACTGG
ATCAGGCAGGCGCTGCGGATGCGCGGCTTCCGGTGCAACCTCGTCTACACGCGCGCGTGC
ACGCGCCTCAACGTCGTCCCGCTCTCCGCCTCCCGGCCGCGCGCGCTCAGGTACCTGTCG
ATACAGTGGGGGATCGACCTGTCCAAGGTGGCGGTGCTCGTCGGCGAGAAGGGCGACACC
GACAGGGAGCGCCTCCTCCCGGGGCTGCACAGGACGGTGATCCTGCCGGGGATGGTCGCC
GCCGGCAGCGAGGAGCTCCTCCGCGACGAGGACGGGTTCACCACGGAGGACGTCGTGGCC
ATGGACTCCCCCAACATCGTCACCCTCGCCGACGGCCAGGACATCGCCGCCGCCGCCGCC
GACCTCCTCAAGGCCATCTGA
Rice SPS2 (D-family, sub-family III) [CDS: 126-3017 bp of AK065273 (cDNA)]
MAGNDNWINSYLDAILDAGKAAIGGDRPSLLLRERGHFSPARYFVEEVITGYDETDLYKT
WLRANAMRSPQERNTRLENMTWRIWNLARKKKEFEKEEACRLLKRQPEAEKLRTDTNADM
SEDLFEGEKGEDAGDPSVAYGDSTTGSSPKTSSIDKLYIVLISLHGLVRGENMELGRDSD
TGGQVKYVVELAKALSSSPGVYRVDLLTRQILAPNFDRSYGEPTEMLVSTSFKNSKQEKG
ENSGAYIIRIPFGPKDKYLAKEHLWPFIQEFVDGALGHIVRMSKTIGEEIGCGHPVWPAV
IHGHYASAGIAAALLSGSLNIPMAFTGHFLGKDKLEGLLKQGRHSREQINMTYKIMCRIE
AEELSLDASEIVIASTRQEIEEQWNLYDGFEVILARKLRARVKRGANCYGRYMPRMVIIP
PGVEFGHIIHDFEMDGEEENPCPASEDPPIWSQIMRFFTNPRKPMILAVARPYPEKNITS
LVKAFGECRPLRELANLTLIMGNREAISKMNNMSAAVLTSVLTLIDEYDLYGQVAYPKHH
KHSEVPDIYRLAARTKGAFVNVAYFEQFGVTLIEAAMNGLPIIATKNGAPVEINQVLNNG
LLVDPHDQNAIADALYKLLSDKQLWSRCRENGLKNIHQFSWPEHCKNYLSRILTLGPRSP
AIGGKQEQKAPISGRKHIIVISVDSVNKEDLVRIIRNTIEVTHTEKLSGSTGFVLSTSLT
ISEIRSLLVSAGMLPTVFDAFICNSGSNIYYPLYSGDTPSSSQVTPAIDQNHQAHIEYRW
GGEGLRKYLVKWATSVVERKGRIERQIIFEDPEHSSTYCLAFRVVNPNHLPPLKELRKLM
RIQSLRCNALYNHSATRLSVVPIHASRSQALRYLCIRWGIELPNVAVLVGESGDSDYEEL
LGGLHRTVILKGEFNIPANRIHTVRRYPLQDVVALDSSNIIGIEGYSTDDMKSALQQIGV
LTQ
Rice SPS6 (D family, sub-family IV)
MYGNDNWINSYLDAILDAGKGAAASASASAVGGGGGAGDRPSLLLRERGHFSPARYFVEE
VITGYDETDLYKTWLRANAMRSPQEKNTRLENMTWRIWNLARKKKELEKEEANRLLKRRL
ETERPRVETTSDMSEDLFEGEKGEDAGDPSVAYGDSTTGNTPRISSVDKLYIVLISLHGL
VRGENMELGRDSDTGGQVKYVVELAKALSSCPGVYRVDLFTRQILAPNFDRSYGEPVEPL
ASTSFKNFKQERGENSGAYIIRIPFGPKDKYLAKEHLWPFIQEFVDGALSHIVKMSRAIG
EEISCGHPAWPAVIHGHYASAGVAAALLSGALNVPMVFTGHFLGKDKLEELLKQGRQTRE
QINMTYKIMCRIEAEELALDASEIVIASTRQEIEEQWNLYDGFEVILARKLRARVKRGAN
CYGRYMPRMVIIPPGVEFGHMIHDFDMDGEEDGPSPASEDPSIWSEIMRFFTNPRKPMIL
AVARPYPEKNITTLVKAFGECRPLRELANLTLIMGNREAISKMHNMSAAVLTSVLTLIDE
YDLYGQVAYPKRHKHSEVPDIYRLAVRTKGAFVNVPYFEQFGVTLIEAAMHGLPVIATKN
GAPVEIHQVLDNGLLVDPHDQHAIADALYKLLSEKQLWSKCRENGLKNIHQFSWPEHCKN
YLSRISTLGPRHPAFASNEDRIKAPIKGRKHVTVIAVDSVSKEDLIRIVRNSIEAARKEN
LSGSTGFVLSTSLTIGEIHSLLMSAGMLPTDFDAFICNSGSDLYYPSCTGDTPSNSRVTF
ALDRSYQSHIEYHWGGEGLRKYLVKWASSVVERRGRIEKQVIFEDPEHSSTYCLAFKVVN
PNHLPPLKELQKLMRIQSLRCHALYNHGATRLSVIPIHASRSKALRYLSVRWGIELQNVV
VLVGETGDSDYEELFGGLHKTVILKGEFNTSANRIHSVRRYPLQDVVALDSPNIIGIEGY
GTDDMRSALKQLDIRAQ
Rice SPS8 (A family = family 2) [CDS: 196-3396 bp of AK101676 (cDNA)]
MAGNDWINSYLEAILDAGGAAGEISAAAGGGGDGAAATGEKRDKSSLMLRERGRFSPARY
FVEEVISGFDETDLYKTWVRTAAMRSPQERNTRLENMSWRIWNLARKKKQIEGEEASRLA
KQRLEREKARRYAAADMSEDLSEGEKGENINESSSTHDESTRGRMPRIGSTDAIEAWASQ
HKDKKLYIVLISIHGLIRGENMELGRDSDTGGQVKYVVELARALGSTPGVYRVDLVTRQI
SAPDVDWSYGEPTEMLSPRNSENFGHDMGESSGAYIVRIPFGPRDKYIPKEHLWPHIQEF
VDGALVHIMQMSKVLGEQVGSGQLVWPVVIHGHYADAGDSAALLSGALNVPMIFTGHSLG
RDKLEQLLKQGRQTRDEINTIYKIMRRIEAEELCLDASEIIITSTRQEIEQQWGLYDGFD
LTMARKLRARIKRGVSCYGRYMPRMIAVPPGMEFSHIVPHDVDQDGEEANEDGSGSTDPP
IWADIMRFFSNPRKPMILALARPDPKKNITTLVKAFGEHRELRNLANLTLIMGNRDVIDE
MSSTNSAVLTSILKLIDKYDLYGQVAYPKHHKQSEVPDIYRLAARTKGVFINCAFIEPFG
LTLIEAAAYGLPMVATRNGGPVDIHRVLDNGILVDPHNQNEIAEALYKLVSDKQLWAQCR
QNGLKNIHQFSWPEHCKNYLSRVGTLKPRHPRWQKSDDATEVSEADSPGDSLRDVHDISL
NLKLSLDSEKSSTKENSVRRNLEDAVQKLSRGVSANRKTESVENMEATTGNKWPSLRRRK
HIVVIAIDSVQDANLVEIIKNIFVASSNERLSGSVGFVLSTSRAISEVHSLLTSGGIEAT
DFDAFICNSGSDLCYPSSNSEDMLSPAELPFMIDLDYHTQIEYRWGGEGLRKTLIRWAAE
KSEGGQVVLVEDEECSSTYCISFRVKNAEAVPPVKELRKTMRIQALRCHVLYSHDGSKLN
VIPVLASRSQALRYLYIRWGVELSNMTVVVGESGDTDYEGLLGGVHKTIILKGSFNAVPN
QVHAARSYSLQDVISFDKPGITSIEGYSPDNLKSALQQFGILKDNV
Rice SPS11 (C family = family 1)
MAVGNEWINGYLEAILDAGVKLREQRGAAAVQLPPLLPAPEDAASAVATAATYSPTRYFV
EEVVSRFDDRDLHKTWTKVVAMRNSQERNNRLENLCWRIWNVARRKKQVEWEFSRQLSRR
RLEQELGSREAAADLSELSEGEKDGKPDTHPPPPAAAAAEAAADDGGGGDHQQQQPPPPP
HQLSRFARINSDPRIVSDEEEEVTTDRNLYIVLISIHGLVRGENMELGRDSDTGGQVKYA
VELARALAATPGVHRVDLLTRQISCPDVDWTYGEPVEMLTVPAADADDEDGGGGSSGGAY
IVRLPCGPRDKYLPKESLWPHIPEFVDRALAHVTNVARALGEQLSPPPPSDGAGAAAQAV
WPYVIHGHYADAAEVAALLASALNVPMVMTGHSLGRNKLEQLLKLGRMPRAEIQGTYKIA
RRIEAEETGLDAADMVVTSTKQEIEEQWGLYDGFDLKVERKLRVRRRRGVSCLGRYMPRM
VVIPPGMDFSYVDTQDLAGDGAGGAGDAADLQLLINPNKAKKPLPPIWSEVLRFFTNPHK
PMILALSRPDPKKNVTTLLKAYGESRHLRELANLTLILGNRDDIEEMSGGAATVLTAVLK
LIDRYDLYGQVAYPKHHKQTDVPHIYRLAAKTKGVFINPALVEPFGLTIIEAAAYGLPVV
ATKNGGPVDILKVLSNGLLVDPHDAAAITAALLSLLADKSRWSECRRSGLRNIHRFSWPH
HCRLYLSHVAASCDHPAPHQLLRVPPSPSSSSAASAAAGGGGAAASSEPLSDSLRDLSLR
ISVDAASPDLSAGDSAAAILDALRRRRSTDRPAASSAARAIGFAPGRRQSLLVVAVDCYG
DDGKPNVEQLKKVVELAMSAGDGDDAGGRGYVLSTGMTIPEAVDALRACGADPAGFDALI
CSSGAEICYPWKGEQLAADEEYAGHVAFRWPGDHVRSAVPRLGKADGAQEADLAVDAAAC
SVHCHAYAAKDASKVKKVDWIRQALRMRGFRCNLVYTRACTRLNVVPLSASRPRALRYLS
IQWGIDLSKVAVLVGEKGDTDRERLLPGLHRTVILPGMVAAGSEELLRDEDGFTTEDVVA
MDSPNIVTLADGQDIAAAAADLLKAI
Appendix C. Maize SPS cDNA and protein sequences derived from unannotated genomic and cDNA sequences. Non-contiguous genome survey sequence (GSS) assemblies were linked by overlapping maize cDNA (EST) sequences where possible, and otherwise by overlapping sorghum or sugarcane cDNA sequences. These two species are closely related to maize, with all three species belonging to the Andropogoneae tribe within the Poaceae (Gaut, 2002), and orthologous sequences from the three species, especially sorghum and maize, have very high similarity (See Supplemental Data, Appendix D). The cDNA sequences were deduced by comparison with related maize, sorghum, sugarcane or rice cDNAs or ESTs where available, and otherwise by comparison of conceptual translations of the genomic sequence with known SPS protein sequences. The component genomic and cDNA sequences are indicated in the heading. Protein coding regions are shown in upper case, with the start and stop codons highlighted in bold, and the 5´- and 3´-UTRs are shown in lowercase. Non-maize segments are underlined.
The original full length maize SPS cDNA sequence (Worrell et al., 1991) belongs to family V. The corresponding genomic sequence, designated ZmSPS5a, was assembled into two contigs, leaving the 3´-end of the second intron and 420 bp of coding sequence not covered by genomic sequence (Fig. 3). Comparison with the other maize and rice genes (Figs 2 and 3) suggests that the missing genomic sequence would include a further intron between bases 609 and 610 of the coding region. This is supported by the presence of the canonical AG^ and ^GT flanking sequences at this position in the cDNA sequence around the putative splice site (^). From this we infer that the ZmSPS5a gene probably has 12 exons. The family V-like genomic fragment (ZmSPS5d) only contains sequence corresponding to the deduced exon 3 and part of exon 4 of the ZmSPS5a gene and is bordered downstream by an Rp1-D rust resistance gene-like sequence (Fig. 3). We found no ESTs corresponding to the ZmSPS5d sequence, which is most likely to be a pseudogene. Interestingly, it appears to have retained the canonical AG^GT splice site sequence at exactly the position where the putative missing third intron in ZmSPS5a would be expected to start (Fig. 3).
ZmSPS1 [1-2897: CG320417, BZ709543, BZ709537, CC609839, CG320429, CC722586, CC609827, CG257173, CC461280, CC655096, CC640591, CC639439, BZ417567, CG257183, CC722576, CC639443, BZ658429,CC655107, BZ658444, BZ828094, BZ822376, BZ828100, BZ736355, CC627270, CG342628, BZ736346, CG342619, CC627275, CG069314, CG069313, CG363346, CG441265, CG379748, CG450606, CG445658 and CG363337 (GSS); 2896-2936: Sorgum bicolor AW672411 and AW672444 (EST); 2937-3156: CG457869 and CG457830 (GSS)]. The genomic sequence covers 99% of the predicted protein coding region of ZmSPS1, with a single gap in intron 10 extending for an estimated 39 bp into exon 11 (Fig. 3). This gap was bridged by two sorghum ESTs (AW672411 and AW672444) that have 96% identity in the 359-bp overlap with the maize sequence. The 39 bp of sequence missing from exon 11 corresponds to the highly conserved 13-aa peptide: YLSIQWGIDLXKV, in the predicted protein sequences from orthologous sorghum (AW672444; X=D), wheat (TaSPS1; X=A), rice (OsSPS11; X=S), and Festuca arundinacea (Demmer et al., 2003; X=S) sequences. Therefore, for further analysis it was assumed that the putative maize protein sequence would also contain this peptide, with a D residue in the variable 11th position as in sorghum, the closest relative of maize.
ATGGCGGCGGGGAACGAGTGGATCAACGGGTACCTGGAGGCGATCCTGGACGCGGGCACG
AGGCTGCGCGGGCCGTGGCAGCAGCAGGGCGGCGCGGCTTCGCTGACGGCCGCGCTGCCG
AGGCTGCTGGCGGAGGCCGGCGGCCAGCAGGGCGCCGCGGCGTACAGCCCGACGCGCTAC
TTCGTGGAGGAGGTGGTGAGCCGCTTCGACGACCGCGACCTCCACAAGACGTGGACCAAG
GTGGTGGCGATGCGCAACAGCCAGGAGCGGAGCAACCGGCTGGTGAACCTGTGCTGGAGG
ATCTGGCACGTCGCGAGGAAGAAGAAGCAGGTGCAACGGGAGTACGCGCGGCAGCTGGCG
CAGCGGCGCCTGGAGCAGGAGCTGGGCAGCCGGGAGGCCGCCGAGGAGCTCTCCGATGGC
GAGAAGGACGGCGCCCCCGACGCCGCCCAGCAGCCTGTGTCGGTGGCGGCGCCCGACGGC
CGGATCGCCAGGATCGGGTCCGAGGCCCGGATCGTGTCGGACGACGAGGGCGGGGACGGC
GGCAAGGACGACAGGAACCTCTACATCGTGCTCATCAGCATACACGGGCTCGTCCGTGGC
GAGAACATGGAGCTCGGCCGTGACGCTGACACCGGGGGGCAGGTGAAGTATGTGGTGGAG
CTGGCCCGGGCGCTGGCGGCGACCGCCGGCGTGCACCGCGTGGACCTACTCACGCGCCAG
ATCTCCTGCCCGGACGTCGATTGGACCTACGGGGAGCCCGTCGAGATGATCACCCATCAG
GCCGACGACGGCGACGGCAGCGGCGGCGGGGCCTACATCGTGCGCCTCCCGTGCGGGCCC
CGCGACAAGTACCTCCCCAAGGAGTCCCTGTGGCCGCACATCCCCGAGTTCGTGGACCGC
GCGCTGGCGCACGTCACCAACGTCGCGCGCGCGCTGGGGGACCAGCAGCAGCAGCAGCCC
GACGCCGGCGCGGGCGCGGGCGCGGCCGCCCCGGTGTGGCCGTACGTGGTCCACGGCCAC
TACGCGGACGCGGCGGAGGCGGCGGCGCACCTGGCCAGCGCGCTCAACGTGCCCATGGTC
ATGACGGGCCACTCCCTGGGGCGGAACAAGCTGGAGCAGCTGCTCAAGCTGGGCCGCATG
CCGCGCGCCGAGATCCAGGGCACCTACCGGATCGCGCGCCGAATCGAGGCCGAGGAGACC
GGCCTCGACGCCGCCGATATGGTGGTCACCAGCACCAAGCAGGAGATCGAGGAGCAGTGG
GGCCTCTACGACGGCTTCGACCTCATGGTGGAGCGCAAGCTCCGGGTGCGCCGCCGCAGG
GGCCTCAGCTGCCTCGGACGCTACATGCCGCGGATGGTCGTCATCCCGCCGGGGATGGAC
TTCAGCTACGTCGACACGCAGGACCTCGCTGAGGGCGACGCCGACCTGCAGATGCTCATG
AGTCCCGGCAAGGCCAAGAAGCCATTGCCTCCCATTTGGTCAGAGGTCCTGAGGTTCTTC
GTCAACCCGCACAAGCCCATGATCCTGGCGCTGTCGCGGCCGGACCCCAAGAAGAACGTC
ACCACGCTGCTCAAGGCCTATGGCGAGAGCCGCCACCTCCGCGAGCTTGCCAACCTTACA
CTGATACTGGGGAACCGGCATGATATCGAGGAGATGTCCGGCGGCGCCGCCACAGTTCTG
ACGGCAGTGCTCAAGCTCATCGACCGGTACGACCTGTACGGCTGCGTCGCCTACCCTAAG
CACCACAAGCAGACCGACGTGCCGCACATATACCGCCTTGCCGCCAAGACGAAGGGAGTG
TTCATTAATCCTGCACTTGTGGAGCCATTCGGCCTCACCCTCATAGAGGCCGCTGCGTAT
GGTCTGCCCGTGGTGGCCACCAAGAACGGCGGTCCGGTGGACATCATCAAGGCGCTGCAC
AACGGGCTGCTGGTGGACCCGCACGACGAGGCGGCGATAACGGAGGCGCTGCTCAGCCTG
CTAGCCGACAAGGCGCGGTGGGCCGAGTGCCGGCGCAACGGCCTCCGCAACATCCACCGC
TTCTCCTGGCCGCACCACTGCCGCCTCTACCTCTCCCACGTGGCCGCCAACTGCGACCAC
CCGGCGCCGCACCAGCTGCTCCGCGTCCCGGCCAGCCCGCGCGCCGCGCTCGCCGAGCAT
GGCACCGACGACTCCCTCTCGGAGTCGCTCCGCGGCCTCTCCATCTCCATCGACGCCTCG
CACGACCTCAAGGCCGGGGACTCCGCCGCGGCCATCATGGACGCGCTCCGCCGGCGCCGC
TCCGCCGACCGGCCGCCGAGCTCGGCCGCGAGGGCGATCGGGCACGCGCCGGGCCGGCGG
CAGGGTCTCCTCGTCCTCGCCGTCGACTGCTACAACGGCGACGGCACACCGGACGCCGAG
CGGATGAAGAAAGCTGTCGACCTGGCGCTGTCGGCGGCGGCAGCGGCCGGCGGCCGGCTC
GGGTGCGTCCTGTCGACCGGGATGACCATAGCCGAGGCCGCGGACGCGCTCAGCGCCTGC
GGCGTCGACCCGGCCGGCTTCGACGCGCTCGTCTGCAGCAGCGGCGCGGATCTCTGCTAC
CCGTGGAGGGAGGTCGCGGCGGACGACGAGTACGCGGGGCACGTGGCGTTCCGGTGGCCC
GGGAACCACGTGCGCGCGGCCGTGCCGAGGCTCGGAAAGGCCGAGGGCGCGCAGGAGGCG
GACCTCGCCTTCGACGAGGCCGCCTGCTCCGGGCCTTGCCACGCCTACGCCGCGGCGGGC
GCGTCTAAGGTGAAGAAGGTGGACTCGATCCGGCAGTCGCTGCGCATGCGCGGGTTCCGG
TGCAACCTCGTGTACACGCGCGCGTGCACGCGCCTCAACGTCATCCCGCTCTCGGCGTCG
CGGCCGCGCGCGCTCAGGTACCTGTCGATACAGTGGGGCATCGACCTGGACAAGGTGGCC
GTGCTCGTGGGCGACAAGGGCGACACCGACCGCGAACGGCTGCTCCCTGGCCTGCACAGA
ACGCTGGTCCTGCCGGAGCTAGTTTGCCACGGCAGCGAGGAGCTGCGCCGCGACCAAGAC
GGGTTTCTGGCAGAGGACGTCGTCTCCATGGACTCCCCGAATATCCTCACCCTCGCCGAG
TACCAGGCGGCGGTCGACATCCTCAAGGCTATTTGA
ZmSPS2 [1-2428: CG368022, CC402284, CC402281, CG368015, CC638981, CG456169, CC044164, CG456129, CG266316, CG128856, CC429603, CC429606, CG128859, BZ708827, CC004963, CC989506, BZ708812, CC742796, CC199476 and CC789337 (GSS); 2429-2693: AW257903 (EST); 2694-3670: CC777626, BZ684004 and CG053750(GSS)]. The full length protein coding region was derived from genomic sequence except for 266 bp from the middle of exon 11 that were derived from two maize ESTs (AW257903, CD997541) (Fig. 3). A gap in intron 3 was bridged by two sorghum ESTs (BG053850 and BF655583) that are >96% identical to the overlapping maize sequence, and a gap in intron 12 was bridged by seven maize ESTs.