My Contact info:
Paul Chafe
204A Lumbers
Downloading and Using MEGA:
MEGA is a new and easy to use phylogenetic analysis software. It is available for free download; HOWEVER, you will need to provide an email address in order to get a download link. I’ve chosen MEGA because it is both fast and very easy to use.
Head to the MEGA website:
This website has information on the program, how to complete various analyses, etc.
Click on DOWNLOAD for whichever version you’re going to use (Windows, Linux, Mac). I’ve only used the Windows version, so the information below work for windows, I cannot confirm that it will work on MAC.
Fill in the information requested (name and email). MEGA will send you an email with a link to download the program. Click the link within the email and the program will download. Once it has downloaded start the MEGA5 setup program. Click NEXT to install, choose the desired program folder (e.g. MEGA5), then select the startup menu folder name (e.g. MEGA5), next you can choose whether to add a desktop item, finally you can click install. Once the installation is complete you can choose to start the program.
The program website has a tutorial, which may help familiarize you with the software:
Now, we will complete a sample analysis of the AUSTROBAILEYALES, using Nymphaeacaerulea (NYMPHALES) as an outgroup. Note that the NYMPHALES sequence is first among those listed below.
Copy the sequences below into a .txt file (either open notepad and save a new file, or open a blank word document and then save the file as text only format). You will also want to change the first 10 characters to something useful, for instance, I called Nymphaea caerulea >Nymphaea in my analysis. Once you’ve done that you can save your file as something informative (like AUSTROByourname).
Nymphaeaceae gi|298379483|gb|GQ468660.1| Nymphaea caerulea isolate NycW1 ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, partial cds; chloroplast
AAGTGTTGGATTCAAAGCTGGTGTTAAAGATTACAGATTGACTTATTACACTCCTGATTATGAAACCCTT
GCTACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCTGGAGTTCCGCCTGAGGAAGCAGGAGCTG
CGGTGGCTGCCGAATCTTCCACTGGTACATGGACAACTGTGTGGACCGATGGACTTACCAGCCTTGATCG
TTACAAAGGACGATGCTACCACATCGAGCCTGTTGCTGGGGAGGAAAATCAATATATTGCTTATGTAGCT
TATCCTTTGGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTG
GGTTCAAAGCCCTACGAGCTCTACGTCTGGAGGATCTGAGAATTCCTCCTGCTTATTCTAAAACTTTCCA
GGGCCCACCTCATGGAATCCAAGTTGAGAGAGATAAATTGAACAAGTATGGTCGTCCCCTATTGGGATGT
ACTATTAAACCAAAATTGGGGTTATCCGCAAAGAACTATGGGAGAGCGGTTTATGAGTGTCTCCGTGGTG
GACTTGATTTTACCAAGGATGATGAAAACGTGAACTCCCAACCGTTTATGCGTTGGAGAGACCGTTTCTT
ATTTTGCGCCGAAGCTATTTATAAAGCGCAGGCCGAAACAGGTGAAATTAAAGGACATTACTTGAATGCT
ACTGCAGGTACATCCGAAGAAATGATCAAAAGGGCGGTATGTGCCCGAGAGTTGGGAGTTCCTATCGTAA
TGCATGACTACTTAACAGGGGGATTCACCGCAAATACTAGCTTGGCTCATTATTGCCGAGACAATGGCCT
ACTTCTTCACATCCACCGCGCAATGCATGCAGTTATTGATAGACAGAGGAATCATGGTATTCACTTCCGT
GTACTAGCTAAAGCGTTGCGTATGTCTGGGGGGGATCATATTCACTCTGGTACCGTAGTAGGTAAACTGG
AAGGGGAACGAGATGTCACTTTGGGCTTTGTTGATTTACTACGTGATGATTTTATTGAAAAAGACCGGAG
TCGCGGTATTTATTTCACTCAAGATTGGGTATCTATGCCAGGTGTTCTGCCCGTGGCTTCAGGGGGTATT
CACGTTTGGCATATGCCTGCCCTGACCGAGATATTTGGGGATGATTCCGTGCTACAGTTCGGTGGAGGAA
CTTTGGGACACCCTTGGGGGAATGCACCTGGTGCAGTAGCTAATAGGGTAGCTTTAGAAGCGTGTGTACA
AGCTCGTAATGAGGGACGTGATCTTGCTCGTGAAGGTAATGAAATTATTCGTGAAGCTAGCAAATGGAGT
CCTGAACTGGCTGCTGCTTGTGAGGTATGGAAAGAGATCAAATTTGAATTCGAAGCAATGGATGTCTTGT
AA
>gi|37194768|gb|L12632.2|AUBCPRBCLA Austrobaileya scandens ribulose 1,5-bisphosphate carboxylase large subunit (rbcL) gene, partial cds; chloroplast gene for chloroplast product
GTGTTGGATTCAAGGCTGGTGTTAAAGATTACAGATTGACTTATTATACTCCTGACTATGAAACTAAAAT
GACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCCGGAGTTCCACCTGAGGAAGCGGGGGCTGCG
GTAGCTGCAGAATCTTCTACTGGTACATGGACAACTGTGTGGACCGATGGACTTACCAGCCTCGATCGTT
ACAAAGGTCGATGCTACCACATCGAGCCTGTTGCTGGGGAGGAAAATCAATATATTGCTTATGTAGCTTA
CCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTGGG
TTCAAAGCCCTACGAGCTCTGCGTCTGGAAGATCTGCGAATTCCTCCTGCTTATTCCAAAACTTTCCAAG
GCCCGCCTCATGGCATCCAAGTTGAGAGAGATAAATTGAACAAGTATGGGCGTCCCCTATTGGGATGTAC
TATTAAACCAAAATTAGGTTTATCTGCCAAGAACTACGGTAGAGCGGTTTATGAATGTCTCCGCGGTGGA
CTTGATTTTACCAAGGATGATGAGAACGTGAACTCCCAACCGTTTATGCGTTGGAGGGACCGTTTCGTAT
TTTGTGCCGAAGAAGTTTATAAAGCGCAGGCAGAAACAGGTGAAATCAAAGGACATTACTTGAATGCTAC
CGCAGGTACATGCGAAGAAATGATCAAAAGGGCCGTATTTGCCAGAGAATTGGGAGTTCCTATCGTAACG
CATGACTACTTAACAGGGGGATTCACTGCAAATACTAGCTTGGCTCATTATTGCCGAGACAACGGCCTAC
TTCTTCACATCCATCGCGCAATGCATGCAGTTATTGATAGACAGAGGAATCATGGTATACACTTTCGTGT
ACTAGCTAAAGCGTTGCGTATGTCTGGTGGAGATCATGTTCACTCTGGTACCGTAGTAGGCAAACTGGAA
GGGGAACGGGACGTCACTTTGGGTTTTGTTGATTTACTACGTGATGATTTTATTGAAAAAGACCGAAGTC
GCGGTATTTATTTTACTCAAGATTGGGTATCTATGCCAGGTGTTTTACCCGTGGCTTCAGGAGGTATTCA
CGTTTGGCATATGCCTGCCCTGACCGAGATCTTTGGGGATGATTCCGTACTACAGTTCGGTGGAGGAACT
TTAGGGCACCCTTGGGGAAATGCACCTGATGCAGTAGCCAATCGGGTGGCTTTAGAAGCGTGTGTACAAG
CTCGGAATGAGGGACGTGATCTTGCTCGTGAAGGTAATGAGGTTATCCGTGAAGCGAGCAAATGGAGCCC
TGAACTAGCTGCTGCTTGTGAGGTATGGAAGGAGATCAAATTCGAATTCGAAGCAATGGATGTCTTGTAA
>gi|37194806|gb|L12652.2|ILLCPRBCLA Illicium parviflorum ribulose 1,5-bisphosphate carboxylase large subunit (rbcL) gene, partial cds; chloroplast gene for chloroplast product
GTGTTGGATTCAAGGCTGGTGTTAAAGATTACAGATTGACTTATTATACTCCTGAATATGAAACGAAAGA
GACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCCGGAGTTCCACCTGAGGAAGCGGGAGCTGCG
GTAGCTGCGGAATCCTCTACTGGTACCTGGACCACTGTGTGGACTGATGGACTTACCAGCCTCGATCGTT
ACAAAGGGCGATGCTACCACATTGAGCCCGTTGCTGGGGAGGAAAATCAATATATTGCTTATGTAGCTTA
TCCTTTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTGGGTAATGTATTTGGG
TTCAAAGCCCTACGAGCTCTGCGTCTGGAAGATTTGCGAATTCCTCCTGCTTATTCCAAAACTTTCCAAG
GCCCACCTCATGGCATCCAAGTTGAGAGAGATAAATTGAACAAGTATGGTCGTCCTCTATTGGGATGTAC
TATTAAACCAAAATTAGGATTATCTGCCAAGAACTACGGTAGAGCGGTTTATGAATGCCTCCGCGGTGGA
CTTGATTTTACCAAGGATGATGAGAACGTGAACTCCCAACCATTTATGCGTTGGAGGGACCGTTTCGTAT
TTTGTGCCGAAGCAGTTTATAAAGCGCAGGCCGAAACAGGTGAAATTAAAGGACATTACTTGAATGCTAC
TGCAGGTACATGCGAAGAAATGATCAAAAGGGCTGTATTTGCCAGAGAATTGGGAGTTCCTATCGTAATG
CATGACTACTTAACAGGGGGATTCACTGCAAATACTAGCTTGGCTCATTATTGCCGAGACAACGGCTTAC
TTCTTCACATCCATCGCGCAATGCATGCAGTTATTGATAGACAGAGGAATCATGGTATGCACTTTCGTGT
ACTAGCTAAAGCGTTGCGTATGTCTGGTGGAGATCATATTCACGCTGGTACTGTAGTAGGTAAACTGGAA
GGGGAACGGGATGTCACTTTGGGTTTTGTTGATTTACTACGTGATGATTTTATTGAAAAAGACCGAAGTC
GCGGCATTTATTTCACTCAAGATTGGGTATCTATGCCAGGTGTTCTGCCCGTGGCTTCAGGGGGTATTCA
CGTTTGGCATATGCCTGCCTTGACCGAGATCTTTGGGGATGATTCCGTACTACAGTTCGGTGGAGGAACT
TTAGGACACCCTTGGGGAAATGCGCCTGGTGCAGTAGCTAATCGAGAGGCTTTAGAGGCGTGTGTACAAG
CTCGTAATGAGGGACGTGATCTTGCTCGTGAAGGTAATGAAGTTATCCGTGAAGCTAGCAAATGGAGCCC
TGAACTAGCTGCTGCTTGTGAGGTATGGAGGGAGATCAAATTCGAATTCGAAGCAATGGATGTCTTATAA
>gi|37194836|gb|L12665.2|SDRCPRBCLA Schisandra sphenanthera ribulose 1,5-bisphosphate carboxylase large subunit (rbcL) gene, partial cds; chloroplast gene for chloroplast product
GTGTTGGATTCAAGGCTGGTGTTAAAGATTACAGATTGACTTATTATACTCCTGAATATGAAACGAAAGA
TACTGATATCTTGGCAGCATTCCGAGTAACTCCTCAACCCGGAGTTCCGCCCGAGGAAGCGGGAGCTGCG
GTAGCTGCGGAATCTTCTACTGGTACCTGGACTACTGTGTGGACTGATGGACTTACCAGCCTCGATCGTT
ATAAAGGGCGATGCTACCACATTGAGCCCGTTGCTGGGGAGGAAAATCAATATATTGCTTATGTAGCTTA
CCCTTTAGACCTTTTTGAAGAAGGCTCTGTTACTAACATGTTTACTTCTATTGTGGGTAATGTATTTGGG
TTCAAAGCCCTACGAGCTCTGCGTCTGGAAGATTTGCGAATTCCTCCTGCTTATTCCAAAACTTTCCAAG
GCCCACCTCATGGCATCCAAGTTGAGAGAGATAAATTGAACAAGTATGGTCGTCCCCTATTGGGATGTAC
TATTAAACCAAAATTAGGGTTATCTGCCAAGAACTACGGTAGAGCGGTTTATGAATGTCTCCGCGGTGGA
CTTGATTTTACCAAGGATGATGAGAACGTGAACTCCCAACCGTTTATGCGTTGGAGGGACCGTTTCTTAT
TTTGTGCCGAAGCTCTTTATAAAGCGCAGGCCGAAACAGGTGAAATTAAAGGACATTACTTGAATGCTAC
TGCAGGTACATGCGAAGAAATGATGAAAAGGGCTGTATTTGCCAGAGAATTGGGAGTTCCTATCGTAATG
CATGACTACTTAACAGGGGGATTCACTGCAAATACTAGCTTGGCTCATTATTGCCGAGACAACGGCCTAC
TTCTTCACATCCATCGCGCAATGCATGCAGTTATTGATAGACAGAGGAATCATGGTATCCACTTTCGTGT
ACTAGCTAAAGCGTTGCGTATGTCTGGTGGAGATCATATTCACTCTGGTACCGTAGTAGGTAAACTGGAA
GGGGAACGGGACGTCACTTTGGGTTTTGTTGATTTACTACGTGATGATTTTATTGAAAAAGACCGAAGTC
GCGGCATTTATTTCACTCAAGATTGGGTATCTATGCCAGGTGTTCTGCCCGTGGCTTCAGGGGGTATTCA
CGTTTGGCATATGCCTGCCCTGACCGAGATCTTTGGGGATGATTCCGTACTACAGTTCGGTGGAGGAACT
TTAGGACACCCTTGGGGAAATGCGCCTGGTGCAGTAGCTAATCGTGTGGCTTTAGAGGCGTGTGTACAAG
CTCGTAATGAGGGGCGTGATCTTGCTCGTGAAGGTAATGAAGTTATCCGTGAAGCTAGCAAATGGAGCCC
TGAACTAGCTGCTGCTTGTGAGGTCTGGAAGGAGATCAAATTCGAATTCGAAGCAATGGATGTCTTGTAA
>gi|37544966|gb|AY116658.1| Trimenia moorei 1,5-bisphosphate carboxylase large subunit (rbcL) gene, partial cds; chloroplast gene for chloroplast product
TGGATTCAAGGCTGGTGTAAAAGATTACCGTTTGACTTATTATACTCCTGAATATGATACGAAAGAGACT
GATATCTTGGCAGCATTCCGAGTAACTCCTCAACCCGGAGTTCCACCGGAGGAAGCAGGGGCTGCGGTAG
CTGCGGAATCTTCTACTGGTACATGGACCACTGTGTGGACGGATGGGCTTACCAGCCTCGATCGTTACAA
AGGGCGATGCTACCACATTGAACCAGTTCCTGGGGAGGATAATCAATTTATTGCTTATGTAGCTTATCCT
TTAGACCTTTTTGAAGAAGGTTCTGTTACTAACATGTTTACTTCCATTGTTGGGAATGTATTTGGGTTTA
AAGCCCTACGAGCTCTGCGTCTGGAAGATCTGCGAATTCCTACTGCTTATATCAAAACTTTCCAAGGTCC
GCCTCATGGCATCCAAGTTGAGAGAGATAAATTGAACAAGTATGGTCGTCCCCTATTGGGATGTACTATT
AAACCAAAATTAGGGTTATCCGCCAAGAACTACGGTAGAGCGGTTTATGAATGTCTCCGTGGTGGACTTG
ATTTTACTAAGGATGATGAGAATGTGAACTCCCAACCATTTATGCGCTGGAGGGACCGTTTCTTATTTTG
TGCCGAGGCCCTTTATAAAGCGCAGGCCGAAACCGGTGAAATCAAAGGACATTACTTGAATGCTACTGCA
GGTACATGCGAAGAAATGATCAAAAGGGCTGTATTTGCCAGAGAATTGGGAGTTCCTATCGTAATGCATG
ACTACTTAACAGGGGGATTCACTGCAAATACTAGCTTGGCTCATTATTGCCGAGACAACGGCCTACTTCT
TCACATCCATCGCGCAATGCATGCAGTTATTGATAGACAGAAGAATCATGGTATGCACTTTCGTGTACTA
GCTAAAGCCTTGCGTATGTCTGGTGGAGATCATATTCACTCTGGTACCGTAGTGGGGAAACTGGAAGGGG
AACGGGATATCACTTTGGGTTTTGTTGATTTATTACGCGATGATTTTATTGAAAAAGACCGAAGTCGCGG
CATTTATTTTACTCAAGATTGGGTATCTCTGCCAGGTGTTCTGCCCGTGGCTTCCGGGGGTATTCACGTT
TGGCATATGCCTGCCCTGACTGAGATCTTTGGGGATGATTCCGTACTACAGTTCGGCGGAGGAACTTTAG
GGCACCCTTGGGGAAATGCACCAGGTGCAGTAGCTAATCGGGTGGCTTTAGAGGCGTGTGTACGAGCTCG
TAATGAGGGACGTGATCTTGCTCGCGAAGGGAATGAAATTATCCGCGAAGCTTCCAAATGGAGTAAGGAA
CTATATGCTGCT
Once you’ve got the file saved you can open CLUSTALX (or access it online: ) and import the text file you’ve created. To do this click File, Load Sequences, and search for your text file. You can now align the sequences. To do this you click on Alignment, then do complete alignment. Depending on the number of sequences it may take a few seconds to complete the sequence alignment.
When the alignment is complete, you’ll need to save it in a format that MEGA can work with. So in clustalx click on file, then ‘save sequences as’, and select the format ‘Nexus’. Make sure you name the file something informative!!! You can now close Clustalx and open MEGA.
In MEGA you need to load, convert, and analyze your sequence alignment.
The first step is to convert your sequence alignment file. To do this you click on File, then ‘Convert File Format to MEGA’. A pop up window will now appear and you can select your sequence alignment. First you will need to select the format (It is important to choose ‘Nexus’ (Paup, Macclade), rather than .aln (clustal) since MEGA has a difficult time dealing with files in clustal format.), then you can seek out your nexus file (it will be called, for instance, AUSTROB.nxs; the .nxs file extension denoting a nexus file). Click on OPEN (you may need to change the file format option back to nexus at this point), then click OK. You now have the option to save your alignment file as a MEGA file (.meg). Again, give this file an informative name. MEGA will now expect you to review the conversion of your file to MEGA format, and you can just close the editor.
Now, back in the main MEGA program, you can open the file that you’ve just converted. To do this go to File, then click on Open a file/session, and select your converted MEGA file (e.g. Austrob.meg). A screen asking for the type of data will now appear, select Nucleotide data. Next you will asked whether your data is protein coding, it is so you can click ‘Yes’ (this just means that you’ll have options for base substitution models later on).
A good idea is to now recheck that your alignment has converted properly. If it has, it will look like the sequence data below (you can click the button that say TA with dots below to show/hide sites that are identical):
Now that we know that the data has imported properly, we can move on to performing some phylogenetic analyses!
Start by clicking on the ‘Phylogeny’ tab. In this tab there are several options for phylogenetic analysis. First we’re going to construct a maximum parsimony tree. To do this we click on ‘Construct/test maximum parsimony tree(s)’, which brings up a pop up window in which we can enter the criteria for the test. Set the following data n the menu (it should look like the one below):
Test of Phylogeny: None
Subsitiutions model: Nucleotide
Gaps/Missing Data Treatment: Complete Deletion
MP Search Method: Max-mini Branch and Bound
Now click ‘compute’. Since this analysis has a relatively small number of taxa the search is fairly fast. However, if you’re analyzing more than about 15 taxa a branch and bound search may take far too long to complete. If this is the case you can change the MP Search Method to something else (use Close-neighbor-interchange). Now click ‘Compute’. The program will come up with a tree that will appear in a new window. If your outgroup appears inside the analysis, you can tell MEGA to root the tree on the branch containing the outgroup. To do this you can click on the branch leading to the outgroup, then click on the ‘Place root on branch’ button. Now that the tree is ready you can save it for use in your report.
My example MP tree is below:
Now that we have a MP tree we can now test the tree by bootstrapping.
To do this we again click on phylogeny, then on construct/test maximum parsimony tree. Keeping the settings as before, we now select ‘Test of Phylogeny’ and change the test method to ‘Bootstrap’. Now change the number of bootstrap replications to 1000 (if this takes more than 5 minutes to compute, you can lower the number of bootstrap replications to 500).
Now, click ‘compute’ and wait for the program to give you a new tree file. Note there will be both an original tree and a bootstrap consensus tree. In the tree-viewer make sure you view the bootstrap consensus tree and copy it into your write-up. My example bootstrap tree is below:
Next, we are going to construct a maximum likelihood tree. Here you will run a test to determine which model is most appropriate for your data.
To run a model test, first click on ‘Analysis’, then ‘Find Best DNA/Protein Models (ML)’. Then you will get an options screen, slick ‘Compute’. The program will then analyze the different models available for maximum likelihood analyses. When the analysis is finished you will get a table with print outs of the different substitution models, organized by their BIC (Bayesian Information Criterion). The model with the lowest BIC is considered the best descriptor of the observed substitution pattern. The abbreviations listed in the table are described below the table (e.g. T92 is the ‘Tamura-Nei’ model.) . Now make sure that you copy the top 5 listed in your print out and include this information
Once your ‘best’ model has been determined you should write down the parameters, then proceed with the analysis. For my analysis of the AUSTROBAILEYALES the ‘best-model’ was TN92+G. The information for this model is described below the output table that was printed out after the model-test.
Now, with this information I proceed to run a maximum likelihood analysis using this model. I now go to ‘Analysis’, ‘Phylogeny’, then we select ‘Construct/test Maximum likelihood tree’. We can then enter the information we obtained above (Note, your information will vary depending on the results of the model test described above. However, keep the No. of discrete categories; Gaps/Missing data treatment; ML Hueristic method; and Initial tree for ML as described below):
Test of phylogeny: none
Substitutions model: nucleotide
Model/Method: Tamura-Nei
Rates among sites: Gamma Distributed (G)
No of discrete gamma categories: 5
Gaps/Missing data treatment: Complete deletion
ML Hueristic method: Close-neighbor-interchange
Initial tree for ML: Make initial tree automatically
Now, click compute. My example tree is below:
Keeping the other options the same, now perform a bootstrap test of your Maximum likelihood phylogeny. To do this, in the Maximum likelihood test, change the ‘test of phylogeny’ to bootstrap and the ‘No of replications’ to 1000 (maximum likelihood takes longer to compute than parsimony. If the length of the analysis is longer than about 1 hour you can reduce the number of replications to 500). Once the computation is complete you should view your consensus tree and copy it into your write up. My example is below:
This is the phylogenetic tree I copied from the Angiosperm phylogeny website (
REMEMBER to give your figures appropriate titles, indicating the family, the method used to construct the phylogeny, and any tests that were performed on the data (i.e. Bootstrapping).