Phylogeny exercise, Bioinformatics for cell biologists, 2012
This exercise will simulate the construction of a phylogenetic tree for organisms where only individual genes (rather than whole genomes) are sequenced, akin to the construction of 16S rRNA-based trees for bacteria. The same tools can also be used to find how paralogs are related.
1. Select a gene (e.g. CYTB) and find its DNA sequence for different mammalian species (somewhere 5-10 species). NCBI Nucleotide (http://www.ncbi.nlm.nih.gov/nuccore) is a database that has such sequence information, search then click on the link labeled FASTA. Put the sequences in a single FASTA file. Name the sequences by abbreviations of the species names (anything over 10 letters will be truncated).
2. Log into the server 130.237.142.51. You will need both a SSH client (PuTTy) for running programs, and an SCP client (WinSCP) for transferring files, they should be installed already.
SSH will give you access to a Unix/Linux command line. Some useful commands:
cd folder (to change folder; cd .. to go up one level)
ls (shows the contents of the current folder)
mv source destination (for renaming a file)
cp source destination (copies a file)
rm filename (deletes a file; rm -r deletes a folder)
less filename (for reading a text file; q to exit, f and b to scroll)
mkdir folder (makes a new folder)
keys: ctrl+C (shuts down the running program), tab (auto-completes file name), arrow up (gives last command)
3. Run multiple sequence alignment using Muscle.
Run: muscle -in in.fa -phyiout alignment
(-phyiout produces interleaved phylip format)
3. Build a tree using maximum likelihood
Rename (or rather copy) the output file from muscle 'infile' (cp alignment infile), then run phylip dnaml
Dnaml will create the files 'outfile' and 'outtree'
4. View your tree
Have a glance at the files outfile and outtree, using less or another program for reading text files
outtree is in Newick format, which other programs can read
Copy outtree to your computer and visualise the tree using 'Newick Viewer' (http://www.trex.uqam.ca/index.php?action=newick&project=trex)
Note that your tree is unrooted, so at each node, you don't know which one of the three branches it connects is the ancestor and which two derive from that ancestor.
5. Add a sequence for the same gene from a bird/reptile to use as outgroup. Build a new tree using maximum likelihood and view it. Open the tree in Newick Viewer and root the tree by the outgroup (find the leaf's or node's number by unticking 'Use leaf names' or ticking 'Label internal vertices' respectively, put that number in the box for 'Root with leaf/node').
By adding one or several species you know are less related to any of the species you investigating then they are to one another, you get the time direction in the tree. So you have a rooted tree.
6. Build a tree using neighbor joining
Name the output file from muscle 'infile' (if needed), then run phylip dnadist
This will create a distance matrix (you can look at 'outfile' that it produces)
Rename 'outfile' to 'infile' and run phylip neighbor (mv outfile infile)
The file 'outtree' can be viewed in Newick Viewer
Maximum likelihood and neighbor joining are two different algorithms for constructing phylogenetic trees. There are a few others available in the phylip package.
7. Bootstrap a tree to see how well your data supports it
Name the output file from muscle 'infile'
Run phylip seqboot and make 100 bootstrap randomizations of your alignment
(it will ask you for seed for random number generation – any number works)
Then run phylip dnadist and phylip neighbor, use the M option
Rename outtree to intree and run phylip consense to count how often the trees agree
View the outree file that phylip consense created
A high bootstrap value says that there is enough data, and the data is consistent enough, to support a node.
If it won't work, try copying from the following:
muscle-in CYTB.fa -phyiout CYTB.muscle.phyi
cp CYTB.muscle.phyi infile
phylip dnaml
[Y]
phylip dnadist
[Y]
mv outfile infile
phylip neighbor
cp CYTB.muscle.phyi infile
phylip seqboot
[Y 7]
mv outfile infile
phylip dnadist
[M D 100 Y]
mv outfile infile
phylip neighbor
[M 100 1 Y]
mv outtree intree
phylip consense
[Y]