Dangerous Ideas, Spring 2008Name: ______
Lab 5: Alignment and Phylogenetic analysis of DNA Sequences
OBJECTIVES:
- To understand how DNA can be used to study evolutionary history
- To become familiar with the process of aligning sequences and constructing phylogenetic trees
To explore on-line resources including GenBank and BLAST
MATERIALS: Access to a high-speed internet connection
INTRODUCTION:
Our collection of known DNA sequences has increased dramatically in the last few years due to recent advances in the field of molecular biology. The DNA sequence of an individual contains information that can be used in a wide variety of applications, from forensics to the study of evolution.
Evolutionary biologists view DNA as a “document” of evolutionary history. Comparing the DNA sequences of genes from different organisms can reveal evolutionary relationships that might not otherwise be inferred from their morphology. Since genomes acquire mutations gradually, the amount of sequence difference found in two organisms should tell us something about how recently these two organisms shared a common ancestor. In other words, two organisms that share a relatively recent common ancestor should have more similar DNA sequences than two organisms that diverged earlier.
/Molecular phylogenetics is the field of study that attempts to determine the rates and/or patterns of change occurring in DNA (and other macromolecules) and to reconstruct the evolutionary history of genes and organisms. The evolutionary history revealed by the sequence data is frequently presented in a phylogenetic tree. Phylogenetic trees are branching diagrams depicting the evolutionary relationships of organisms.
It is important to note that our current understanding of most evolutionary relationships comes from a variety of data including both traditional morphological approaches as well as molecular data.
Researchers attempting to construct phylogenetic trees must go through a series of steps:
Step 1: Acquire the DNA sequences- DNA sequences may either be determined directly by sequencing a region of DNA, or indirectly, by acquiring the sequence from a public database or published source. (DNA sequencing will be discussed in lecture; we will use public databases in our exploration today.)
Step 2: Align the DNA sequences- Once accurate DNA sequences have obtained, they must be properly aligned to reveal their evolutionary relationships. Consider the following example:
Organism 1- A T G G G C T G T C A A
Organism 2- A T G G G T G T C A A T
At first glance, organism 1 and 2 appear to have dramatically different DNA sequences. In fact, they seem to share only 6 of the 12 bases being examined (50% sequence homology). Now examine these sequences properly aligned:
Organism 1- A T G G G C T G T C A A
Organism 2- A T G G G - T G T C A A
With a gap correctly inserted, it is now apparent that the two organisms share 11 of the 12 bases being examined (92% sequence homology). Correct alignment is difficult and usually done through the use of software such as CLUSTAL.
Step 3: Construct a Phylogenetic Tree- With the sequences correctly aligned, a phylogenetic tree can now be constructed. Consider the following, aligned, sequences:
Organism 1: A T G G G C T G T C A A
Organism 2: A T G G G - T G T C A A
Organism 3: A T G G G - T G T C A A
Organism 4: A T G G G C T G T C A A
These organisms seem to share some evolutionary history as they all have similar DNA sequences. Organisms 2 and 3, however, are both “missing” the C at position 6. Their evolutionary relationships, as predicted by this data set, could be presented as:
As the DNA sequence under consideration gets longer and more complicated, so, too, does the process of constructing an appropriate tree. Again, most of this work is done by using one of several software packages.
DNA SEQUENCE RESOURCES:
The National Center for Biotechnology Information (NCBI)-
Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease. You can explore NCBI at
Two especially useful services provided at the NCBI website are PubMed and BLAST. (Click the links in the upper header.) PubMed is a searchable database of published scientific papers in the fields of medicine and biotechnology. BLAST is a software program (a suite of algorithms actually) that allows one to search GenBank for similar sequences. This allows for the identification of unknown sequences as well as comparison between similar sequences.
GenBank-
GenBank® is the National Institute of Health’s (NIH) genetic sequence database, an annotated collection of all publicly available DNA sequences. There are approximately 22,617,000,000 bases in 18,197,000 sequence records as of August 2002. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. In other words, this is a global, cooperative effort to share DNA sequence information as it’s acquired. This is the database against which BLAST will search to identify a sample sequence and/or find similar sequences in the database.
YOUR EXPLORATION:
Today we will access several DNA sequences from a public database, align them, and construct a phylogenetic tree. The sequences we will analyze today are from human mitochondrial DNA. (Remember that mitochondria contain their own DNA, and that this DNA is always maternal in origin.)
Mitochondrial DNA has been extensively studied in an attempt to understand human evolution and prehistoric migratory patterns. Some anthropologists have argued that people evolved at least partly from the Neanderthals. The opposing theory is that modern humans evolved in Africa, then spread outward, overwhelming earlier hominids including Neanderthals. The short, squat Neanderthals inhabited much of Europe from about 100,000 years ago until dying out about 28,000 years ago. Analyzing mitochondrial DNA has provided data with which to evaluate these two different hypotheses.
Acquiring Sequences:
To access the sequence information for this exercise, you will need to follow these steps:
- Open an Internet browser and go to
- Go to the butler labeled “Sequence Server” and click the “Enter” button below it.
- Click the “Manage Groups” button in the top center of your screen.
- From the pull-down menu under “Sequence Sources”, select “Prehistoric Human mtDNA”.
- Eight different entries will appear in your window. Note that you can view these sequences by clicking on the red “View” button next to each.
- Select all eight sequences by clicking in the box on their left. Click on “OK” after all are selected.
Aligning the Sequences:
We will now ask the server to align all eight of our sequences using a program called Clustal.
- Select all eight of your sequences by clicking in the box to their left. With all the sequences selected, click on the “Compare” box directly above.
2. You will now be shown an alignment. The yellow color indicates regions where
all the sequences do not align. Scroll through the sequence and note the high
levels of variation!
Constructing a Phylogenetic Tree:
- Return to the previous screen by clicking on “Done”.
- Be sure all eight of your sequences are highlighted. (Boxes to their left should be checked.)
- Click on the toggle menu bar that currently says “CLUSTAL W”. Select “Phylogenetic Tree” and click on the “Compare” Button.
- A window will open containing a phylogenetic tree based on the mtDNA sequence provided.
TO TURN IN:
Using the tree you just created, and the bioserver database, answer the questions on the following page.
Lab 5: EXPLORINGNames of Group Members:
PHYLOGENETICS
1. What is the hypothesis being tested in this analysis? (Hint: There are two, conflicting hypotheses; you’ll have to pick one!)
2. What do you predict you’ll see in the phylogenetic tree if your hypothesis is correct?
3. In the space below (or on a separate sheet), draw the tree generated from the mitochondrial sequences analyzed:
4. Does this tree support your hypothesis? Explain.
5. To further clarify your data, return to bioserver. Close the window containing your tree. Click on “Manage Groups” again to import another set of sequences. This time select “modern human mtDNA”. Both sets of sequences will now appear in your window. Select one or two of the modern sequences and generate another phylogenetic tree. Draw this tree in the space below (or on a separate sheet).
6. Does this tree support your hypothesis? Explain.
Lab 2Part2: Analysis of mtDNA Sequences
Objectives:
Review the process of DNA replication, electrophoresis, and PCR
Understand the process of DNA sequencing
Explore the Bioserver database and Genbank
Compare and analyze our own mtDNA sequences
General Background:
Recall that earlier in the quarter we collected our cheek cells using a saline rinse, ruptured those cells to extract their DNA, and then used the polymerase chain reaction (PCR) to make multiple copies of a small portion of our mitochondrial DNA (mtDNA). (See Lab 2, Part 1 for details.) When you last saw your sample, it was in the thermocycler, ready to begin that PCR.
In the time since, I have used DNA electrophoresis to visualize your samples. I took a small portion of your PCR product (5 ul) and ran it on a gel to see if your reaction had worked. If the PCR did not work, there was too little DNA to see on the gel. If it did work, a strong band was visible on the gel. In this case, I then sent your PCR reactions to Cold Springs Harbor Laboratory for sequencing on their DNA sequencers.
Cold Springs Harbor Lab technicians then used your PCR product as a template for DNA sequencing, and visualized the results on an automated DNA sequencer. (See notes on sequencing below.) The sequence they obtained has been posted on the Bioserver website. We will access these sequences together in lab today and compare our mtDNA sequences to each other, and to modern humans from around the world!
Notes on DNA Sequencing (also see figure at the end of this handout):
DNA sequencing takes advantage of what is known about DNA replication in cells. In many ways, it is also quite similar to the reaction you performed, PCR, to copy your original cheek cell mtDNA. As with PCR, heat is used to temporarily separate the two strands of a DNA molecule. A DNA polymerase (the enzyme that copies DNA) can then use one strand as a template to make a copy of the original molecule. When this reaction is done for the purposes of PCR, it is done with a nearly unlimited supply of nucleotides (A, T, C, and G), the building blocks of DNA.
In standard DNA sequencing however, this reaction is split into four separate tubes. Each of these reaction tubes receives plenty of DNA polymerase and nucleotides, but also receives a small amount of a modified nucleotide. (Thus one tube will receive a modified “A”, one tube a modified “T”, one a “G”, and one a modified “C”.) This modified nucleotide (a dideoxynucleotide) is unique in that it is unable to form a bond with the next nucleotide in the growing chain. Thus these modified nucleotides are often called chain terminators. As the polymerase moves along the template molecule, catalyzing the production of a new strand, it will usually incorporate a “normal” nucleotide, but will occasionally incorporate a chain terminator. When it does so, DNA replication stops. As many hundreds of thousands of these reactions are occurring simultaneously in your tube, all possible lengths of DNA molecules will be produced. And in the tube with the modified “A”, all of these chains will end in “A”. This is true for the tubes containing the T, G, and C chain terminators as well. Thus each tube contains a mixture of molecules, all of which end in a particular nucleotide.
This collection of molecules is then sorted using electrophoresis. As with the electrophoresis we did earlier this quarter, this process will sort the DNA molecules based on their size. By running all four tubes next to each other, we can then “read” up the gel to reconstruct the sequence of our original DNA template.
Common Questions:
1. Why didn’t my PCR work??
PCR is a notoriously finicky reaction. Common errors or sources of failure include pipetting errors, and template quality. For example, if you had too few cheek cells in your preparation, your PCR might not have worked. The presence of too many cheek cells, or other contaminants, could also keep your reaction from working.
2. What does “N” mean in a DNA sequence?
Often times we cannot interpret which nucleotide (A, T, C, or G) is at a particular location in a DNA molecule. When it cannot be determined, we insert an “N” into the sequence to indicate an unknown nucleotide.
3. What makes some DNA sequences “excellent” and others “poor”?
On your data table, I have scored the results of each sequence on a qualitative scale ranging from excellent to poor. This primarily reflects the number of N’s in your sequence. Sources of sequence ambiguity can include poor template quality (PCR product) as well as several factors out of your control, including the quality of the sequencing reaction and the skill of the technician performing the sequencing!
Procedures:
Your goal today is primarily exploratory. You will work with one other student to access our DNA sequences, practice some alignments, and generate phylogenetic trees from our sequences. Students with good sequencing results may wish to identify their number, but note that this is optional!
Step 1: To begin, open a browser to Login to the Sequence Server as a guest.
Step 2: Click the “Manage Groups” button at the top of the screen. This will open the Manage Groups Window. In this window, choose “classes” from the popup menu on the upper right. A new screen will appear and you will see our class listed under Suzanne Schlador, Dangerous Idease. To select our class, click the checkbox next to the listing and click “OK”. This will move our class onto your worksheet.
Step 3: To compare sequences, you will need to have more than once sequence on the worksheet. To add students from our class, select the desired sequences from the popup menu. Then click the check box for each sequence you want to include in your comparison, and press the “Compare” button. Sequence server will open a new window to display the results of your comparison. Note that sequences with many N’s (those rated “poor” on my table) are difficult for the server to align!
Step 4: Practice an alignment! Select two or more sequences for alignment as described in Step 3. In the space below, or on an additional sheet, make note of which sequences you choose to align, and how many differences you observed between them. (Differences will be highlighted in yellow; ambiguous positions are noted in grey.)
Step 5: From your main worksheet, you can also compare any of ours to those contained in the international database, Genbank. To do this, select a sequence by clicking the round button to the right of the sequence and then clicking “Analyze”. Sequence server will open a new window showing the results of your analysis. Note that you can follow links (the Genbank accession numbers) in the results to learn more about the sequences you match with. Try this with at least one of our sequences.
Step 6: Now for the fun part! As we just did in the previous exercise (Lab #5), try using the Bioserver software to generate at least one phylogenetic tree. From your main worksheet, return to “Manage Groups”. Notice that you can add a variety of groups to your worksheet including modern humans, ancient humans (those Neandertals!), other students, and other animals. Select two of more groups of interest to you, and at least two of our students, and draw the phylogenetic tree you generate in the space below.
Does the tree you’ve generated look like you would expect it to? Why or why not?
DNA Sequencing (the Sanger Method):