Step 6) Analyzing your Mitochondrial DNA Sequences

Trimming your mitochondrial DNA sequence using 4Peaks.

In order to determine how related you are to your classmates, to Rarely Reclusive, and to the human ancient and modern groups in the Dolan DNA database, you will need to first determine whether or not your sequence is of sufficient quality for the analysis, and then to trim your sequence so that only good quality data is used. The 4Peaks program that you used during your yeast experiments will work very well for this purpose.

Since 4Peaks is a Mac-only program, you will need to do the first two stages of this analysis on a Mac (your own, if you have one, or a departmental laptop if you do not). You can then do the MEGA5 and Dolan DNA analysis on any computer platform.

  1. Examiningthe primarydataofthemitochondrial control region
  • Log in to the class Blackboard Site. Choose“Content”in the left menu bar.
  • Open the “Mitochondrial DNA Sequences” folder and download your sequence file (labeled“TT(#)” to the desktop.
  • Open the file usingtheprogram“4Peaks”.
  • Atthis pointyoushouldseepeaks of fourdifferentcolors. Thereis alsoanX-axis thatconsists ofa timeunitofmeasurement,a nucleotidenumber, andanucleotide(designatedA,C, G, orT). Thereareafewthings tonote:
  • atmanypoints there is onlyonepeakandthatthecolorofthepeakdetermines the nucleotide indicatedontheX-axis. The sequence at these points is of high quality, and can be used for phylogenetic analysis. A sample is shown below:
  • atsomepoints,there is morethanonepeak, or the peak will look very broad or misshapen (seebases nearthebeginning, for example). This is due totechnicalproblems andmakes itdifficultorimpossible topredictthebase that is atthatpositionontheDNAstrand. This is whythere is occasionallya“N” insteadofan A,C, G,orT. “N”designates thattheidentityofthebaseatthis positionwas notclearly determined. Sequence with too many “Ns” cannot be used in phylogenetic analysis. An example is shown below, although some sequences will certainly look even worse than this:

  • 4 Peaks may also try to call sequences when the peaks are obviously of poor quality, as shown below. Areas with poor quality peaks should NOT be included in your sequence analysis.

  • Look at your sequence. Does it have a section with good sequence, or are there N’s or poor quality peaks all the way through it? If there is good sequence available, highlight (select) thegood sequence by clicking on the first base of the good stretch, and then holding and dragging to the right until you have selected the last base of the good stretch. If your sequence is not of sufficient quality, do not worry. Simply work with a sequence from your team that is.
  • Select "Crop" fromthedrop-down menu in the lower left hand corner of the window (the ‘gear’ icon).The sequence that remains in the 4Peaks window is now just the good quality DNA.
  • Do NOT close your cropped file when you move to the next step – you will need it again soon!
  • Use “Export” to rename this cropped file and to save it to the desktop. Change the .txt to .fas This step must be done or the MEGA 6 program will not recognize the sequences.
  • Repeat this process (cropping and exporting) for each member of your team. If a team member did not get good sequence of their own, simply choose one of the other class sequences.
  • AT THIS STAGE, IF YOU ARE NOT YET USING YOUR OWN COMPUTER, EMAIL THE CROPPED FILES TO THE TEAM MEMBER WHO BROUGHT THE COMPUTER, AND THEN CONTINUE ON WITH THE 4PEAKS ANALYSIS BELOW.
  1. BLAST Analysis of your cropped sequence
  • Under the gear icon in the lower left corner. choose “BLAST Sequence” then Nucleotide (BLASTn). This is the same program that you used for your yeast DNA sequence comparisons, except we are now using the entire sequence database, rather than just yeast.
  • You should see your sequence in the query window. Click on “BLAST”.
  • Does your sequence appear to be human mitochondrial DNA? How do you know? What is the percent coverage, the percent identity and the E value? What do these values mean? Write the answers to these questions in the final lab worksheet found at the end of this handout.
  • Repeat this process for all of the members of your team. Report the data for all of your team members in the final report sheet.

Step 7) Phylogenetic Analysis ofthemitochondrial control region

Now it is time to fine out how closely related you are to rarely reclusive, to other members of your team, and to students from last year. You will do this using the MEGA 6 program.

  • Switch computers and download all sequences to the desktop, if needed.
  • Open MEGA 6 by double-clicking on its icon. You will see a window that looks like this:

  • Begin your analysis by clicking on “Align” and then clicking on “Edit/Build Alignment” from the drop-down menu. Choose “Create a New Alignment” and click ‘Ok’, and then “DNA”. You should see a window that looks like this:

  • Import your sequences. Under “Edit” choose ‘Insert Sequence From File”. Make sure that the ‘Select Files of Type’ window at the bottom shows “FASTA”. Select your sequence, and click on “Open”. Your sequence should now appear in the Alignment Explorer Window. Repeat this step for each sequence from your team, and for Rarely Reclusive (Rarely.fas, respectively). As you import each sequence, you will notice that it comes in ‘selected’ (all blue). You can unselect (and reselect) the sequence by clicking on the sequence name to the left. You should notice that each base is a different color when the sequences are not selected.

“Aligning” sequences makes sure that you are comparing the sequences of the same parts of a gene with each other. The sequences above are not aligned (yet), so even though they are all mitochondrial DNA, they do not look like they match. When you ask MEGA to align sequences, it will search for large areas of identity and similarity, and then slide the genes around until those areas are matched to each other. The match will never be perfect (unless you have two identical sequences, of course), so the program is really looking for the ‘best match’.

  • Align your sequences by selecting all four sequences (shift click), and then clicking on ‘Alignment’ and “Align by Clustal W” in the Alignment Explorer window. Click “Ok” to accept the default parameters. When you align your sequences, you should now see some of them match pretty well, while others may not. The computer will also insert short gaps into the sequences if it needs to to make things align better. In the aligned sequences below, the first three sequences are identical, while the next two are aligned, but are obviously NOT perfect matches.
  • Export your alignment. Once the alignment is complete,save and export the current alignment session to the desktop by selectingData| Export Alignment from the Alignment Explorer window main menu.Choose “MEGA” format and give the file an appropriate name, such as "MTalignment_team3.meg". This will allow the current alignment session to be used in the next step.

So, who is most closely related to Rarely Reclusive? You will answer this question using UPGMA analysis. As we discussed in class, this analysis not only looks at the total number of nucleotide differences between individuals, but it also maximizes the parsimony of the tree. You will notice as you play around with MEGA that there is a specific “Maximum Parsimony” phylogeny program. However, in this analysis, we do not want to penalize sequences for being of different lengths. Remember that each of you trimmed your sequences to different lengths based on sequence quality, not based on evolution, we do NOT want the program thinking length differences are significant! UPGMA analysis will only use the bases that all of the sequences have in common when it constructs the tree.

  • Generate a UPGMA Phylogenetic Tree. Go back to the MEGA 6Window and click on “Phylogeny” and then “Construct/Test UPGMA Tree.”
  • Choose your saved .meg file and open it.
  • Click on “Compute”. You should now see a phylogenetic tree in a window that looks something like this:

You can save the file as a .pdf image under the “Image” menu. The image can then be pasted into your Word document for your final lab report (question 2). If you prefer, you can also simply draw your tree in the space provided.

Which student in your analysis is most closely related to Rarely Reclusive?

Step 7) Phylogenetic Analysis – Human evolution and your place in the human family.

Well, you can't all be most closely related to Rarely, so who are you related to?

  1. Follow this link ( to the Dolan DNA Bioserver login page. Click ‘Enter’ under the Sequence Server Site.
  1. Click on 'Manage Groups' from the menu in the center of the top of the page. You will now see a window that looks like this:

  1. Click on the upper right pull-down menu (sequence sources). You should now see a variety of choices. Choose 'modern human mt DNA'. Click on the boxes to the left of each of the sequences to select them, and then click on 'ok'.
  2. Click on 'Manage Groups' again. This time select 'Ancient Human mt DNA' and click on all of the boxes, and then 'ok'. You have now uploaded a whole bunch of mtDNA sequences for analysis. If you wish, you can also select and upload ‘Ancient HumanmtDNA’ or ‘Neanderthal mt DNA’ in the same way.
  3. To upload your own sequence, go to your FASTA sequence, select only the sequence itself, and copy it.
  4. In the sequence server window, choose 'Create Sequence'. Give your sequence a name, and then paste your sequence data into the window. Click 'ok'.
  5. You will now see your sequence added to the rest. Do the same thing for all of your group members.
  6. To compare your group members to the available sequences, click on the box to the left of your sequences, and then next to any other sequences that interest you (you'll see lots of drop-down possibilities within each group). You may only select up to ten total sequences (including your group sequences). Have fun, but also use sequences that make sense, given what you know about your ancestry.
  7. After you have selected 10 sequences for analysis, find the word 'compare' in the gray bar menu, and choose 'phylogenetic tree' from the drop-down box. Click on "Compare". After your sequences are analyzed, a popup window will show you your tree.
  1. Choose 'phenogram' and 'yes' (to make the tree branch lengths proportional to the evolutionary distances). You will see something that looks like the picture below, but that contains the individuals and species that you chose to analyze.
  1. Phenograms such as this one can provide a ton of information. For example, one thing that this phenogram shows is that Lake Mungo Man and African American #1 share a common ancestor, to which Lake Mungo Man is more closely related.
  1. Use the Grab program (scissors on the dock)and then choose “Capture” and “selection” (or perform a screen capture of the selection on your Windows computer). Select the phylogenetic tree by drawing a box around it and releasing the mouse. You will now see a window with the tree image in it. Next click on ‘copy’ to copy the image, and then go to your lab report and paste the image into the report (question #3).

Genetics Final Lab Report, Fall 2012. Bioinformatics and Human Evolution

Names:

1) In the spaces below, type or write the BLAST Values for your team’s mitochondrial DNA sequence(s) and provide a brief description, in your own words, of what those values mean in terms of your sequence matches with human mitochondrial DNA.

% Coverage:

E value:

% Identity:

2) Paste or draw your MEGA 6 Phylogenetic Tree in the space below. Which classmate is most closely related to Rarely Reclusive? Justify your answer.

3) Paste or draw your Phylogenetic Tree from the Dolan DNA Server in the space below. Describe, in your own words, the evolutionary relationships shown in the tree.