Using Computational Analysis to Determine Evolutionary Relationships

Name ______

Instructions:

A.  Enter the NCBI home page at http://www.ncbi.nlm.nih.gov/.

B.  In the search window, enter “emperor penguin” and hit return.

C.  The next window gives the search results. The upper box shows written information about emperor penguins that is available from this site. The second box has information about genetic databases that contain data on emperor penguins. You will notice that there is not a lot there – not much work has been done with emperor penguins, it seems.

D.  We will look at core nucleotides – sequences for genes that show up in most organisms. These genes control basic cellular functions that you would expect all animals to possess. This database contains the sequences of only 10 core nucleotides in the emperor penguin. Click on the “Core Nucleotides” button.

E.  The third gene on this screen (its ID is DQ137225) is the gene sequence for a protein called cytochrome B (abbreviated “cytb”). As we have already learned, cytochrome proteins are involved in the release of energy from food in the mitochondria of the cell. One would expect this protein to be one that is shared by most, if not all animals. Click on the ID number.

F.  The next screen has information about this gene including who sequenced it and when it was sequenced. At the bottom of the screen, you actually get to see the 1008 base pairs (bp) that make up this gene. Scroll back up and click on the scientific name, Aptenodytes forsteri.

G.  This takes you to a page that is specifically about emperor penguins. The table on the right tells you what type of data are available through NCBI. At the bottom is a table with links out to other sources of information about the emperor penguin. In between, the full taxonomic lineage of the emperor penguin is listed. The family name of the penguins is Spheniscidae. Click on the family name.

H.  The next page lists the six penguin genera and the 18-20 species and subspecies. We will want to refer back to this listing, so you have received a printed copy of the scientific and common names. The listing is in alphabetical order – it does not show the evolutionary relationships between the different genera and species. The NCBI site has tools that allow us to reconstruct a cladogram that shows the relatedness of these species based upon their genetic similarities.

I.  Go back to the NCBI home page and click the BLAST button right above the search box. BLAST stands for Basic Local Alignment Search Tool. BLAST is a computer program that can look at DNA from hundreds of species in the database at once and find which ones have the best match. At the top of the page are a dozen organisms for which we have complete genomes. (You may not recognize all of their scientific names, but nearly all are familiar organisms.)

J.  The program we will use is “nucleotide blast”. Click on the link and you will get the search page. In the “Enter Query Sequence” window where it says “Enter accession number...” enter the ID number “DQ137225”. In the “Choose Search Set”, click the “Others” button or else we will only compare the penguin to humans or to mice – we want to look at all available species. The pull-down menu below should default to “nucleotide collection (nr/nt)”. At the bottom, ask it to display the results in a new window and click the BLAST button. It will take a little while to perform the search for you. Be patient – it’s checking over 22.5 billion letters!

K.  On the new window, scroll past the gibberish at the top until you get to the table with the red bars. This table graphically represents the matches you got to the emperor penguin sequence. Below that, we have a table with descriptions of the sequences that matched in order of their similarity. Not surprisingly, the closest matches were with gene sequences from the same species Aptenodytes forsteri and from its close relative, the king penguin Aptenodytes patagonicus. As you scroll down the list, you may see the names of other penguin species that belong to different genera. However, there is a much better tool available to visualize the data.

L.  Below the graph with the red bars, you will see “Distance Tree of Results”. Click it and maximize this new window to full size. Scroll down to the bottom and you should find the highlighted target sequence. What the program has done is to construct a cladogram of the results. The distance tree is arranged like a cladogram in that it groups protein sequences on the same branches if they share a great deal of similarity.

If the emperor and king penguins evolved from the same ancestral population of penguins, we would expect their DNA to be more like each other’s than any other species. Looking at this one cytochrome gene, that appears to be the case. Not only are these two species similar in plumage, physical structure, behavior, calls, diet, and habitat, their cytochrome b gene sequence is similar as well. All of these lines of evidence support the fact that these species evolved from the same common ancestor.

Analysis:

1.  Use the BLAST tree to construct a cladogram of the penguin species that are listed. (If you find two or more “twigs” on a branch that share the same species name, just lump them together as one twig.)

2.  What are some of the traits shared by penguins that they all must have inherited from their common ancestor? (Think beyond just physical traits, too.)

3.  If you were to expand your cladogram, it would have to start including birds that are NOT penguins. What modern bird(s) do you suppose are most closely related to penguins? Go back to your BLAST tree and find the branch that connects closest to the penguin branch. You may need to use Google or some other search engine to find out what these scientific names mean. What birds seem to be the most closely related to penguins?

4.  Pick a different animal and use a cytochrome b or c BLAST sequence to construct a cladogram of its most closely related genera and species. Try to stick to complete sequences instead of partial sequences, if possible. Include at least six separate species and/or subspecies in your cladogram.