Supplementary information: potential question for students with answers

Supplementary information Table 2 describes the published HVR1 sequence data thatwill be used in these questions. For ancient DNA samples, the approximate age of the fossil is listed, for extant species, an estimate of MRCA with humans is listed. The data will be analyzed in DNAsp [1] and is available in nexus format in “Supplementary information data 2”. Using any available data prepare narrative responses to each of the following (one page limit for each, including all figures and tables):

Question 1.

How diverse is our class in terms of mtDNA variation?

Hint: one of the easiest measures of DNA diversity to conceptualize is the average number of pairwise differences between sequences, often abbreviated k. To give you an idea,k within populations often ranges from over 20 to nearly zero. Most species of animals with large population sizes (not those near extinction!) are more diverse than humans (this includes all other great apes!). Another form of measurement which is independent of sequence length is the average number of pairwise differences per site (π or Pi) which can be calculated dividing k by the length of the sequence.

Answer: In our class data k was 6.16 and Pi was 0.017

Question 2.

Based on our data do you think it is likely that modern humans and Neanderthals are a single species? Do our data prove that humans and Neanderthals did/did not interbreed? As a group, how divergent are Neanderthals from modern humans? How does this compare to the maximum level of divergence within Neanderthals and modern humans?

Hint: Human and Neanderthal mtDNA are exclusively monophyletic (i.e. belong to their own clades).

Answer: There is no “cut off” to define species. In our dataset humans and Neanderthals had 0.074 nucleotide substitutions per site (compare to 0.017 within humans and to 0.18 for humans and chimps). Neanderthal mtDNA fall outside the human mtDNA clade and is quite divergent, thus according to mtDNA data alone there is no evidence they interbred. Across the genome, however, humans and Neanderthals are known to be very similar and would probably fall closer to subspecies category rather than full species, they were still capable to hybridizing and producing fertile offspring [2]. The lack of evidence for interbreeding in mtDNA could be either a gene sorting phenomena (Neanderthal mtDNA was lost due to drift or selection) or maybe the hybridizing females tended to be human (unidirectional gene flow).

Question 3.

It has been estimated that humans mtDNA share a common ancestor with Neanderthals at approximately 825,000 years ago[2]. Assuming this to be correct, what is the substitution rate (μ) of HVR1? (What is the difference between substitution rate and mutation rate and how are they related to reach other? ) Consider that μ=pi/2t, where t is the time of divergence and K(JC) is the equivalent of pi for divergence between populations. PS: Use only extant humans in this (ancient humans will have a shorter branch length)! What is the problem with this estimate? What are the assumptions about t and branch lengths? Will your estimate likely underestimated or overestimated?

Answer: The problem with this estimate is that Neanderthals are extinct thus t is not the same as t for humans (it is shorter since they stopped “evolving”). The estimate is thus an underestimate (µ should be larger had Neanderthals had the full “t” to accumulate mutations!). In our sample µ=4.48×10-8 (substitutions per site per year). However, because the DNA from Neanderthals is fossil DNA, its quality will be lower and sequence accuracy could be compromised. Students should understand that while mutation rate refers to the rate at which mutations arise, substitution rate refers to the rate at which they get fixed in the population. Under the neutral theory the rate of substitution is equal to the rate of mutation and the molecular clock is observed.

Question 4.

The Lake Mungo 3 (LM3) fossil is the remains of an anatomically modern human (i.e. belonging to the species H. sapiens) that was discovered in a 60,000 year old stratum in the New South Wales region of southern Australia[3]. This fossil represents the oldest anatomically modern human for which DNA has been isolated. Despite being a member of our species, LM3 has an mtDNA haplotype that is no longer present in the modern human gene pool (so far as we know). How similar is LM3’s mtDNA to that of extant humans (in terms of πor K(JC))? Using HVR1 data as a molecular clock (and the substitution rate we determined in class), estimate the time of mtDNA divergence between LM3 and extant modern humans. Why is there a discrepancy between the age of the LM3 fossil (60,000 years) and the HVR1 divergence time? If so why would that be? Given that LM3 was a modern human who likely exchanged genes with other modern humans, why is his mtDNA so different from that found in extant humans? Develop a hypothetical evolutionary scenario (involving genetic drift, migration, natural selection, or some combination of these factors) that can account for the failure of LM3’s mtDNA haplotype to be found in extant humans.Draw trees! Think in terms of last common ancestor and assumptions on the calculations!

Answer: The lineage of LM3 is extinct (for whatever reason nothing similar exists in current human populations) – thus the estimate is actually an estimate of humans to the MRCA with LM 3, not to the actual fossil itself (i.e. fossil age is not an accurate description of MRCA). In our samples t=393,770, which pre-dates the extant human coalescence time (time to mitochondrial eva). Furthermore there will also be a problem with “t”, since LM3 is has died long ago, it does not have the same branch length as extant humans.

Question 5.

Is the accuracy of a molecular clock scale-dependent? To answer this question use available sequence data, together with information in Table 2, to make several independent estimates of the substitution rate of HVR1 (to avoid problem above only use species for this purpose, not ancient humans). Is your estimate uniform, regardless of the calibration point? What is the relationship between divergence time and your estimate of HVR1substitution rate (support your answer with an appropriate figure)? Keeping in mind that HVR1 is one of the most quickly mutating portions of the genome, provide a hypothesis that may explain the patterns you observe. What are the implications of your observations for the use of molecular clocks in general?

Hint: if you use “Analysis Polymorphism andDivergence…” function on DNAsp you will get two estimates of divergence (K) and (K(JC)), they are equivalent to Pi within species (that is substitution per site). The first is uncorrected and the second is corrected for multiple substitutions.

Answer: Neanderthals to humans have a very high substitution rate that cannot be explained by shorter branch length in Neanderthals – it is possible that this is a consequence of lower quality of fossil DNA. However, even discounting the Neanderthal comparison, substitution rates gets lower with time. This is because more than one substitution might be accumulating per site (breaking the assumption of infinite sites model). This may cause reversals and older mutations to be “erased” by more recent mutations thus the substitution rate to be underestimated (counted as one change rather than two). Using the corrected K(JC) decreases this problem but does not solve it. Finally there is the complicating factor of generation time (which is not really apparent in this dataset, but could be raised as a potential problem). The way the clock is being calculated is per on a year basis, so if one organism reproduces faster than another it could be evolving faster. Although the molecular clock is weakly dependent on generation time, Ohta [4] found that synonymous sites follow a per-generation time scale while non-synonymous site tend to follow a per-year time scale. One reason for that is that organisms with short generation time tend to be more abundant (higher effective population sizes) and thus have a more effective negative selection (Ns<1), as the HVRI is non-coding it would be more affected by generation time.

Question 6.

An under-appreciated aspect of chimpanzees is that there are actually two distinct species of them. The common chimpanzee (Pan troglodytes) was historically widespread across east, central and west Africa, although its range has been severely limited by habitat destruction and poaching. The bonobo, or pygmy chimpanzee (Pan paniscus), is morphologically and behaviorally distinct from the common chimpanzee. Its historical geographic range is limited to a small area of the Congo basin in central Africa. Like the common chimpanzee, the bonobo is critically endangered. Based on the HVR1 data how long has it been since common chimpanzees and bonobos shared a common ancestor? Based on the results of question above, how much confidence do you have in this estimate? How could you change your experimental approach to improve the confidence in your estimate of divergence time between these species? In applying a molecular clock that was calibrated from human data to chimpanzees, what additional assumptions are you making about the nature of the clock?

Answer: Using Neanderthal human substitution rate gives a divergence time of ~1.6 million years t=K(JC)/(µ×2). Using human-Chimp substitution rates gives a divergence time to ~4.6 million years. The first is an underestimate because substitution rate is likely to be inflated (due to low quality fossil DNA) and the second is an overestimate since substitution rate is likely underestimated (due to substitution accumulation). The actual divergence time is about 2 million years [5].

Question 7.

Using any available sequence data (class data, extant human data, ancient DNA data, inter-specific data, or other published data that you track down yourself) test an evolutionary hypothesis of your own choosing.

Answer: Answers will vary. Population demography hypothesis can be evaluated with Tajima’s D or Fu and Li’s tests. In our data these tests they were not significant, but Tajima’s D was negative (-2.15) indicating an excess of low frequency polymorphism and evidence of population expansion. Students can also look at genetic diversity in different continents. Although this test would be highly biased (and technically not acceptable), since we do not have a random sample for each continent, Africa is still the continent with the highest diversity.

Question 8.

You have gotten information on the haplotype group that your analyzed DNA (and others in class) belongs to. Are human haplotypes fixed within populations? How about continents? How can you use mtDNA information to trace migratory paths of ancient humans? How certain can you be about the geographic origin of your haplotype? (PS: you can choose to talk about any haplotype analyzed in class, not necessarily the one you analysed).

If you run a more complex, model based tree searching method (e.g. Bayesian analysis), it is likely that many of the sequences will lose resolution (that is, will become polytomic with many groups). Why is this happening?

Answer: Here the goal is to evaluate the students understanding of phylogenetic results and make them aware that haplotypes are not fixed within a population although some (although not all) could be restricted to given continents. They should understand how migration patterns can be inferred from phylogenetic/phylogeographic data.

The loss of resolution observed in model based tree searches in relation to distance matrix methods can be explained by the high rate of change in HVR. Distance matrix methods are unable to account for homoplasies as they just measure the distance between each pair of species, leaving out all information from higher-order combinations of character states.

References:

1.Librado, P. and J. Rozas, DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 2009. 25(11): p. 1451-2.

2.Green, R.E., et al., A Draft Sequence of the Neandertal Genome. Science, 2010. 328(5979): p. 710-722.

3.Adcock, G.J., et al., Mitochondrial DNA sequences in ancient Australians: Implications for modern human origins. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(2): p. 537-542.

4.Ohta, T., Synonymous and Nonsynonymous Substitutions in Mammalian Genes and the Nearly Neutral Theory. Journal of Molecular Evolution, 1995. 40(1): p. 56-63.

5.Yang, Z.H. and B. Rannala, Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Molecular Biology and Evolution, 2006. 23(1): p. 212-226.

1