Phylogeny exercise, Bioinformatics for cell biologists, 2012

This exercise will simulate the construction of a phylogenetic tree for organisms where only individual genes (rather than whole genomes) are sequenced, akin to the construction of 16S rRNA-based trees for bacteria. The same tools can also be used to find how paralogs are related.

1. Select a gene (e.g. CYTB) and find its DNA sequence for different mammalian species (somewhere 5-10 species). NCBI Nucleotide (http://www.ncbi.nlm.nih.gov/nuccore) is a database that has such sequence information, search then click on the link labeled FASTA. Put the sequences in a single FASTA file. Name the sequences by abbreviations of the species names (anything over 10 letters will be truncated).

2. Log into the server 130.237.142.51. You will need both a SSH client (PuTTy) for running programs, and an SCP client (WinSCP) for transferring files, they should be installed already.

SSH will give you access to a Unix/Linux command line. Some useful commands:

cd folder (to change folder; cd .. to go up one level)

ls (shows the contents of the current folder)

mv source destination (for renaming a file)

cp source destination (copies a file)

rm filename (deletes a file; rm -r deletes a folder)

less filename (for reading a text file; q to exit, f and b to scroll)

mkdir folder (makes a new folder)

keys: ctrl+C (shuts down the running program), tab (auto-completes file name), arrow up (gives last command)

3. Run multiple sequence alignment using Muscle.

Run: muscle -in in.fa -phyiout alignment

(-phyiout produces interleaved phylip format)

3. Build a tree using maximum likelihood

Rename (or rather copy) the output file from muscle 'infile' (cp alignment infile), then run phylip dnaml

Dnaml will create the files 'outfile' and 'outtree'

4. View your tree

Have a glance at the files outfile and outtree, using less or another program for reading text files

outtree is in Newick format, which other programs can read

Copy outtree to your computer and visualise the tree using 'Newick Viewer' (http://www.trex.uqam.ca/index.php?action=newick&project=trex)

Note that your tree is unrooted, so at each node, you don't know which one of the three branches it connects is the ancestor and which two derive from that ancestor.

5. Add a sequence for the same gene from a bird/reptile to use as outgroup. Build a new tree using maximum likelihood and view it. Open the tree in Newick Viewer and root the tree by the outgroup (find the leaf's or node's number by unticking 'Use leaf names' or ticking 'Label internal vertices' respectively, put that number in the box for 'Root with leaf/node').

By adding one or several species you know are less related to any of the species you investigating then they are to one another, you get the time direction in the tree. So you have a rooted tree.

6. Build a tree using neighbor joining

Name the output file from muscle 'infile' (if needed), then run phylip dnadist

This will create a distance matrix (you can look at 'outfile' that it produces)

Rename 'outfile' to 'infile' and run phylip neighbor (mv outfile infile)

The file 'outtree' can be viewed in Newick Viewer

Maximum likelihood and neighbor joining are two different algorithms for constructing phylogenetic trees. There are a few others available in the phylip package.

7. Bootstrap a tree to see how well your data supports it

Name the output file from muscle 'infile'

Run phylip seqboot and make 100 bootstrap randomizations of your alignment

(it will ask you for seed for random number generation – any number works)

Then run phylip dnadist and phylip neighbor, use the M option

Rename outtree to intree and run phylip consense to count how often the trees agree

View the outree file that phylip consense created

A high bootstrap value says that there is enough data, and the data is consistent enough, to support a node.

If it won't work, try copying from the following:

muscle-in CYTB.fa -phyiout CYTB.muscle.phyi

cp CYTB.muscle.phyi infile

phylip dnaml

[Y]

phylip dnadist

[Y]

mv outfile infile

phylip neighbor

cp CYTB.muscle.phyi infile

phylip seqboot

[Y 7]

mv outfile infile

phylip dnadist

[M D 100 Y]

mv outfile infile

phylip neighbor

[M 100 1 Y]

mv outtree intree

phylip consense

[Y]