BIT150 – Fall 2008 – Homework 3 KEY

Due on Thursday October 15th by email to TA:

as Hwk3_Lastname BEFORE the Lab

1. 35 points The following is a multiple sequence alignment of a 41-bp fragment from a putative plant cytochrome P450 gene from rice, maize, sorghum, and rye:

1.1.  Using the Jukes and Kantor 1-parameter model showed below,

A / C / G / T
A / - / 1 / 1 / 1
C / 1 / - / 1 / 1
G / 1 / 1 / - / 1
T / 1 / 1 / 1 / -

- Calculate pair-wise distances between the sequences and construct BY HAND a distance matrix.

- Show your calculations.

- Present the distance matrix.

ANSWER 1.1.

Pair-wise distances

3

Rice-Maize: 8

Rice-Wheat: 9

Rice-Rye: 8

Maize-Wheat: 9

Maize-Rye: 8

Wheat-Rye: 3

3

Distance matrix:

Maize / Wheat / Rye / Rice
Maize / - / - / - / -
Wheat / 9 / - / - / -
Rye / 8 / 3 / - / -
Rice / 8 / 9 / 8 / -

1.2.  Using the distance-based method UPGMA,

- Construct BY HAND a phylogenetic tree based on the distance matrix created in 1.1.

- Provide distances for all the branches.

- Include all your intermediate matrices.

- Show your calculations.

- Manually draw the phylogenetic tree.

ANSWER 1.2.

Wheat and rye are the most closely related. Branch length: 3/2= 1.5

Merge wheat and rye.

Calculate average distance of Maize and Rice to Wheat-Rye.

(9+8)/2=8.5

Maize / Wheat-Rye
Wheat-Rye / 8.5
Rice / 8 / 8.5

Rice and Maize are the next closest. Branch length: 8/2=4

Average distance between Wheat-Rye and Maize-Rice = (8.5+8.5)/2= 8.5

Branch group Wheat-Rye (8.5/2)=4.25 à 4.25-1.5 = 2.75

Branch group Maize-Rice (8.5/2)=4.25 à 4.25-3.5 = 0.25


2. 10 points Sequences from the flavanoid 3’ hydroxylase gene, Fop1, are provided below.

>Triticum monoccocum

MDHSVLLLLASLAAVAVAAVWHLRSHGRRTKLPLPPGPRGWPVLGNLPQLGAMPHHTMAALARQHGPLFRLRFGSVEVVVAASAKVARSFLRAHDANFSDRPPTSGAEHLAYNYQDLVFAPYGARWRALRKLCALHLFSARALDALRTIRQDEARLMVTHLLSSSSPAGVAVNLCAINVCATNALARAAIGRRMFGDGVGEGAREFKDMVVELMQLAGVLNIGDFVPALRWLDPQGVVAKMKRLHRRYDRMMDGFISERGQHAGEMEGNDLLSVMLATMRWQSPADAGEEDGIKFTEIDIKALLLNLFTAGTDTTSSTVEWALAELIRDPCILKQLQHELDGVVGNDRLVTEADLPRLTFLAAVIKETFRLHPATPLSLPRVAAEDCEVDGYHVSKGTTLIMNVWAIARDPASWGPDPLEFRPVRFLPGGLHESADVKGGDYELIPFGAGRRICAGLGWGLRMVTLMTAMLVHAFDWSLVDGTTPEKLNMEEAYGQTLQRAVPLVVQPVPRLLSSAYTV

>Zea mays

MCAMAREYGPLFRLRFGSAEVVVAASARVAAQFLRAHDANFSNRPPNSGAEHVAYNYQDLVFAPYGSRWRALRKLCALHLFSAKALDDLRGVREGEVALMVRELARQGERGRAAVALGQVANVCATNTLARATVGRRVFAVDGGEGAREFKEMVVELMQLAGVFNVGDFVPALAWLDPQGVVGRMKRLHRRYDDMMNGIIRERKAAEEGKDLLSVLLARMREQQPLAEGDDTRFNETDIKALLLNLFTAGTDTTSSTVEWALAELIRHPDVLRKAQQELDAVVGRDRLVSESDLPRLTYLTAVIKETFRLHPSTPLSLPRVAAEECEVDGFRIPAGTTLLVNVWAIARDPEAWPEPLEFRPARFLPGGSHAGVDVKGSDFELIPFGAGRRICAGLSWGLRMVTLMTATLVHALDWDLADGMTADKLDMEEAYGLTLQRAVPLMVRPAPRLLPSAYAE

>Oryza sativa

MDVVPLPLLLGSLAVSAAVWYLVYFLRGGSGGDAARKRRPLPPGPRGWPVLGNLPQLGDKPHHTMCALARQYGPLFRLRFGCAEVVVAASAPVAAQFLRGHDANFSNRPPNSGAEHVAYNYQDLVFAPYGARWRALRKLCALHLFSAKALDDLRAVREGEVALMVRNLARQQAASVALGQEANVCATNTLARATIGHRVFAVDGGEGAREFKEMVVELMQLAGVFNVGDFVPALRWLDPQGVVAKMKRLHRRYDNMMNGFINERKAGAQPDGVAAGEHGNDLLSVLLARMQEEQKLDGDGEKITETDIKALLLNLFTAGTDTTSSTVEWALAELIRHPDVLKEAQHELDTVVGRGRLVSESDLPRLPYLTAVIKETFRLHPSTPLSLPREAAEECEVDGYRIPKGATLLVNVWAIARDPTQWPDPLQYQPSRFLPGRMHADVDVKGADFGLIPFGAGRRICAGLSWGLRMVTLMTATLVHGFDWTLANGATPDKLNMEEAYGLTLQRAVPLMVQPVPRLLPSAYGV

>Sorghum bicolor

MDVPLPLLLGSLAVSVVVWCLLLRRGGNGKGKGKRPLPPGPRGWPVLGNLPQVGSHPHHTMCALAKEYGPLFRLRFGSAEVVVAASARVAAQFLRAHDANFSNRPPNSGAEHVAYNYQDLVFAPYGSRWRALRKLCALHLFSAKALDDLRGVREGEVALMVRELARHQHQHAGVPLGQVANVCATNTLARATVGRRVFAVDGGEEAREFKDMVVELMQLAGVFNVGDFVPALAWLDLQGVVGKMKRLHRRYDDMMNGIIRERKAVEEGKDLLSVLLARMREQQSLADGEDSMINETDIKALLLNLFTAGTDTTSSTVEWALAELIRHPDVLKKAQEELDAVVGRDRLVSESDLPRLTYLTAVIKETFRLHPSTPLSLPRVAAEECEVDGFRIPAGTTLLVNVWAIARDPEAWPEPLQFRPDRFLPGGSHAGVDVKGSDFELIPFGAGRRICAGLSWGLRMVTLMTATLVHALDWDLADGMTAYKLDMEEAYGLTLQRAVPLMVRPAPRLLPSAYAAE

>Phyllostachys edulis

MDLPLPLVLSTLAVSAIVCYVLFFRAGKARRRAPLPPGPRGWPVLGNLPQLGGKTHQTLHVMTKVYGPLLRLRFGSSDVVVAGSAAVAEQFLRIHDAKFSNRPPNSGGEHMAYNYQDVVFGPYGPRWRAMRKVCAVNLFSARALDDLRAVRERETALMVRSLVEASAPRGAPAVPLGKAVNVCTTNALSRAAVGRRVFAAGSEVAKEFKEIVLEVMQVGGVLNVGDFVPALRWLDPQGVVAKMKKLHRRYDDMMNAIIGERRAGVKPAGEEGKDLLGLLLAMMQEEQPLAGGEEDKITDTDIKALTLVS

2.1. Construct phylogenetic tees using NJ and UPGMA methods:

- Use Number of differences as the substitution model.

- Use bootstrap as the test of inferred phylogeny, with 1,000 replications.

- Present the trees in your homework.

ANSWER 2.1.

NJ

UPGMA

2.2. What are the bootstrap values indicating in these trees?

ANSWER 2.2.

The bootstrap values indicate the number of times that these two species appeared to be joined by the single same node in the phylogenetic tree. For example, maize and sorghum were joined 100% of the times, according to both the NJ tree the UPGMA tree, while other relationships are less certain from these alignments.


3. 15 points From the following trees (A, B, C, D):

3.1. Construct BY HAND:

- a strict consensus tree (groups present in ALL trees);

- a 50% majority-rule consensus tree (groups in >50% of the trees).

3.2. What are consensus trees used for?

ANSWER 3.1.

Strict consensus tree 50% majority-rule consensus tree

ANSWER 3.2.

Consensus trees, because they are composite trees that summarize information from different trees, are used to present results from a tree-construction method that produces several equally parsimonious trees, and also to combine results from different tree-construction methods.

4. 10 points From the following induced multiple sequence alignment:

Induced multiple sequence alignment of a segment of the ‘4-coumarate Co-A Ligase’ gene (‘-‘ indicates a gap).

H1 / T / C / T / A / C / T / G / A / C
H2 / A / C / - / A / C / G / G / A / C
H3 / A / C / T / A / C / G / A / A / T
H4 / A / C / T / G / T / G / - / - / C

4.1. Calculate BY HAND the ‘sum-of-pairs’ distance score, scoring transitions (A<->G and C<->T) as 1 unit of distance and transversions as 2 unit of distance (Kimura 2-Parameter model) and affine gap penalties: gap opening 3; gap extension 1.

A / C / G / T
A / - / 2 / 1 / 2
C / 2 / - / 2 / 1
G / 1 / 2 / - / 2
T / 2 / 1 / 2 / -

- Indicate all your calculations within the table provided:

ANSWER 4.1.

Kimura 2-parameter
H1 vs. H2 / 2+3+2 = 7
H1 vs. H3 / 2+2+1+1=6
H1 vs. H4 / 2+1+1+2+3+1=10
H2 vs. H3 / 3+1+1=5
H2 vs. H4 / 3+1+1+3+1=9
H3 vs. H4 / 1+1+3+1+1=7
Sum of Pairs / 44

5. 30 points Given the following 6 CCT domain protein sequences:

>T._urartu_ZCCT1

MSMSCGLCGANNCPRLMVSPIHHRHHHHQEHQLREHQFFAQGNHHHHHPVPLPPANFDHSRTWTTPFHETAAAGNSSRLTLEVGAGGRPMAHLVQPPARAHIVPFYGGAFTNTISNEAIMTIDTEMMVGPAHYPTMQERAAKVMRYREKRKRRRYDKQIRYESRKAYAELRPRVNGRFVKVPEAMASPSSPASPYDPSKLHLRWFR

>Ae._tauschii_ZCCT-D1

MSMSCGLCGPNNCPRLMVSPIHHHHHQEHQLREHQFFAQGNHHHQHHGAAADHPVPLPPANFDHRRTWTTPFHETAAAGSSISRLTLEVGAGGRHMAHLSSARAHIVPFYGGAFTNTISNEAIMTIDTEMMVGPAHYPTMQERAAKVMRYREKRKRRRYDKQIRYESRKAYAELRPRVNGRFVKVPEAMASPSSPASPYDPSKLHLGWLR

>ZCCT-S2_Ae._speltoides

MSMSCGLCGASNCPHHMISPVLQHHQEHGLREYQFFAQGHHHHHHDGTAADYPPPPPANCHHCKSWTTPFHETAAAGNSSRLTLEVDAGGQHLAHLLQPPAPPRATIVPFREGAFTSTISNATIMTIDTEMMVGAAHNPTMQERHAKVMRYREKRKRRRYDKQIRYESRKAYAKLRPRVNGRFVKVPEAAVSPSPPASPYDPSKLNLGLFR

>ZCCT2_T._tauschii

MSMSCGLCGASNCPHHMNSPVLHHHHHHQEHRLCEYQFFAQGQHHHHHGAAADYPPPPPANCHHRRSWTTPFHETAAAGNSSRLTLEVDAGGQHTAHLLQPPAPPRATIVPFCGGAFTSTISNATIRTIDTEMMVGAAHNPTMQEREAKVMRYREKRKRRRYDKQIRYESRKAYAELRPRVNGRFVKVPEATASPSPPTSPYDPSKLHLGWFR

>Os_AAL7978

MSAASGAACGVCGGGVGECGCLLHQRRGGGGGGGGGGVRCGIAADLNRGFPAIFQGVGVEETAVEGDGGAQPAAGLQEFQFFGHDDHDSVAWLFNDPAPPGGTDHQLHRQTAPMAVGNGAAAAQQRQAFDAYAQYQPGHGLTFDVPLTRGEAAAAVLEASLGLGGAGAGGRNPATSSSTIMSFCGSTFTDAVSSIPKDHAAAAAVVANGGLSGGGGDPAMDREAKVMRYKEKRKRRRYEKQIRYASRKAYAEMRPRVKGRFAKVPDGELDGATPPPPSSAAGGGYEPGRLDLGWFRS

OSI Os_AP005307

MGMANEESPNYQVKKGGRIPPRSSLIYPFMSMGPAAGEGCGLCGADGGGCCSRHRHDDDGFPFVFPPSACQGIGAPAPPVHEFQFFGNDGGGDDGESVAWLFDDYPPPSPVAAAAGMHHRQPPYDGVVAPPSLFRRNTGAGGLTFDVSLGERPDLDAGLGLGGGGGRHAEAAASATIMSYCGSTFTDAASSMPKEMVAAMADDGESLNPNTVVGAMVEREAKLMRYKEKRKKRCYEKQIRYASRKAYAEMRPRVRGRFAKEPDQEAVAPPSTYVDPSRLELGQWFR

5.1. Use tCOFFEE to produce an alignment of conserved protein regions.

5.2. Produce a multiple sequence alignment using ClustalW. Using BOXSHADE, prepare a publishable alignment for these sequences. Paste the alignment into your homework document.

-Between tCOFFEE and ClustalW which program seems to have better identified conserved regions between these genes?

5.3. Construct a phylogenetic tree with the NJ method (using Number of differences as the substitution model) with bootstrap values. Include the tree here.

ANSWERS 5.1.-5.3.

tCOFFEE:

BOXSHADE:

Wheat_ZCCT1-A 1 --MSMS------CGLCGANN---CPRLMV------SPIHHRHH
Wheat_ZCCT1-D 1 --MSMS------CGLCGPNN---CPRLMV------SPIHH--H
Wheat_ZCCT2-B 1 --MSMS------CGLCGASN---CPHHMI------SPVLQHH-
Wheat_ZCCT2-D 1 --MSMS------CGLCGASN---CPHHMN------SPVLHHHH
Rice-1 1 --MSAASGAACGVCGGG-VGECGCLLH------QRRGGGGGGGGGGVRCGIAADLNRGFPAIFQGVGVEETAVEGDGG
Rice-2 1 MGMANEESPNYQVKKGGRIPPRSSLIYPFMSMGPAAGEGCGLCGADGGGCCSRHRHDDDGFPFVFP-----PSACQGIG-
Wheat_ZCCT1-A 27 HHQEHQLREHQFFAQG---NHHHHH------PVPLPPANFDHSRTWTTPF------HETAAAGNS-
Wheat_ZCCT1-D 25 HHQEHQLREHQFFAQG---NHHHQHHGAAADHPVPLPPANFDHRRTWTTPF------HETAAAGSSI
Wheat_ZCCT2-B 26 --QEHGLREYQFFAQG---HHHHHHDGTAADYPPPPP-ANCHHCKSWTTPF------HETAAAGNS-
Wheat_ZCCT2-D 27 HHQEHRLCEYQFFAQG---QHHHHH-GAAADYPPPPP-ANCHHRRSWTTPF------HETAAAGNS-
Rice-1 70 AQPAAGLQEFQFFGHD----DHDSVAWLFNDPAPPG--GTDHQLHRQTAPMAVGNGAAAAQQRQAFDAYAQYQPGHGLTF
Rice-2 75 -APAPPVHEFQFFGNDGGGDDGESVAWLFDDYPPPSPVAAAAGMHHRQPPY---DGVVAPP-----SLFRRNTGAGGLTF
Wheat_ZCCT1-A 77 ------SRLTLEVGAGGRPMAHLVQP--PARAHIVPFYGGAFTNTISNEAIMTIDTEMMVGPAHYP-----TMQERA
Wheat_ZCCT1-D 83 ------SRLTLEVGAGGRHMAHLSS----ARAHIVPFYGGAFTNTISNEAIMTIDTEMMVGPAHYP-----TMQERA
Wheat_ZCCT2-B 80 ------SRLTLEVDAGGQHLAHLLQPPAPPRATIVPFREGAFTSTISNATIMTIDTEMMVGAAHNP-----TMQERH
Wheat_ZCCT2-D 82 ------SRLTLEVDAGGQHTAHLLQPPAPPRATIVPFCGGAFTSTISNATIRTIDTEMMVGAAHNP-----TMQERE
Rice-1 144 DVPLTRGEAAAAVLEASLGLGGAGAGGRNPATSSSTIMSFCGSTFTDAVSSIPKDHAAAAAVVANGGLSGGGGDPAMDRE
Rice-2 146 DVSLGE----RPDLDAGLGLGG-GGGRHAEAAASATIMSYCGSTFTDAASSMPKEMVAAMADDGESLNPNTVVGAMVERE
Wheat_ZCCT1-A 141 AKVMRYREKRKRRRYDKQIRYESRKAYAELRPRVNGRFVKVPEAM--ASPSSPASPYD-----PSKLHLR-WFR-
Wheat_ZCCT1-D 145 AKVMRYREKRKRRRYDKQIRYESRKAYAELRPRVNGRFVKVPEAM--ASPSSPASPYD-----PSKLHLG-WLR-
Wheat_ZCCT2-B 146 AKVMRYREKRKRRRYDKQIRYESRKAYAKLRPRVNGRFVKVPEAA--VSPSPPASPYD-----PSKLNLG-LFR-
Wheat_ZCCT2-D 148 AKVMRYREKRKRRRYDKQIRYESRKAYAELRPRVNGRFVKVPEAT--ASPSPPTSPYD-----PSKLHLG-WFR-
Rice-1 224 AKVMRYKEKRKRRRYEKQIRYASRKAYAEMRPRVKGRFAKVPDGELDGATPPPPSSAAGGGYEPGRLDLG-WFRS
Rice-2 221 AKLMRYKEKRKKRCYEKQIRYASRKAYAEMRPRVRGRFAKEPDQE---AVAPPSTYVD-----PSRLELGQWFR-

3