Supplemental Materials
Supplemental Methods
Illumina mutational analysis
After each Illumina run, we applied the standard Illumina data analysis pipeline: Firecrest for tile image analysis, Bustard for base calling, and ELAND for alignment to the reference genome sequence. Illumina reads were aligned to the C. elegans genome (build WS170) with Illumina’s alignment program ELAND version 0.2.2.6. In order to calculate genome-wide coverage and identify mutation sites, the first 32 bases of reads from ELAND read categories U0, U1, and U2 were placed on the reference genome using the coordinates provided by ELAND. U0 reads match unique genomic regions with zero mismatches, U1 align to unique regions with one mismatch, and U2 align to unique regions with two mismatches. Reads containing missing bases, which appear as Ns in the sequence, were excluded. An entire flow cell of data (7 lanes) was included in each analysis. The total number of U category reads aligned to the C. elegans ~102 Mb genome ranged from 15 to 32 million across the 5 data sets analyzed here. This resulted in 93% of the genome being covered by at least one read (1-fold coverage), and 88% of the genome at 3-fold coverage. Coverage per-base approximated a normal distribution that centered at 7-fold. Custom scripts were created that tracked both the coverage data and the quality (Q) scores of each base in the coverage so that for any position on the genome the reads covering that position and the associated quality scores could be easily retrieved.
Candidate mutations were identified at positions that met the following criteria: 1) At least 3-fold coverage, 2) majority of reads indicated a single non-reference base, 3) there was at least one read from each strand of DNA, 4) the Q scores for all bases contributing to the candidate SNP was greater than or equal to 25, and 5) the coverage was not greater than 25-fold. This heuristic rule set was determined after several rounds of confirmation using conventional PCR and ABI fluorescent DNA sequencing methods. Positions where the coverage exceeded 25-fold showed an unusually high false-positive rate. We suspect that these positions are in genomic regions that are at least partially repetitive and are erroneously aligned to unique regions by the ELAND alignment program. The sum of sites in the genome that fit these Illumina data parameters were used as the “number of sites considered” in mutation rate calculations.
Mutations observed in MA12 and all three recovery lines (n=68) were considered to be mutations fixed along the bottlenecked MA12 lineage, and maintained in each recovery line. Mutations not observed in MA12 but identified in one or more recovery lines (n=28) were considered recovery line-specific mutations. Because the low coverage (about 7-fold) across nucleotide sites was insufficient to distinguish fixed mutational changes from heterozygous changes, we evaluated all 28 recovery-line mutations using conventional capillary sequencing technology and in all cases each mutation was confirmed to be in a fixed or nearly-fixed state in the recovery line.
The coding context of each Illumina mutation was determined with custom scripts that parsed the General Feature Format (gff) files for C.elegans build WS170. If a position was not found within the boundaries of a curated gene it was deemed intergenic. For positions within genes, the site was categorized by its position relative to untranslated regions (either 5’ or 3’), intron, or exon. For exon positions, the relative coding regions were translated using both the reference base and the mutant base and the type of resulting amino acid change was determined. These were categorized into synonymous and nonsynonymous groups.
We used the equation μa = ma/(n x T x w)] to estimate the per-bp adaptive mutation rate, then multiplied by the C. elegans genome size (108) for the per-genome adaptive mutation rate (Ua). μa indicates the per-bp adaptive rate, ma is the number of inferred adaptive mutations (6 mutations), n is the number of nucleotides surveyed in the three recovery lines (262,906,047 sites), T is time (in generations), and w is the estimated number of worms transferred each generation (1,000 nematodes).
Probability of fixation of a neutral mutation
Our experiment assumes that each experimental replicate is initiated from a single homozygous genotype. Thus any novel nucleotides that are observed in subsequent generations must have arisen via novel mutation. If that mutation is selectively neutral and not linked to any other segregating mutation under selection, then that mutation can only become fixed via genetic drift. The probability of eventual fixation of any neutral mutation is therefore a function of the rate of mutational input and the rate of genetic drift determined by the effective population size. This probability can be calculated in three ways: via a diffusion equation approach, via an exact solution using a Markov transition matrix, or via simulation. Because in this experiment the mutation rate is small, the effective population relatively large, and the total length of the experiment short, it is actually exceedingly hard to calculate the probability of fixation for a neutral mutation using any of these approaches. Here, we show how these approaches can be used to generate an upper bound on the fixation probability that shows it to be virtually impossible that the observed fixation events are driven solely by mutation and genetic drift.
Crow and Kimura (1970, pp. 394-295) provide a solution to the diffusion equation for the case of a recurrent neutral mutation. For the circumstances of our study, this solution takes the form of
,
where t is the number of generations, n is the mutation rate, Ne is the effective population size, c = 4 Nen, and G is the gamma function. The sum in this equation shows very poor convergence properties when nt is small, as it is in our study. The smallest value for n in which we can obtain adequate convergence over 60 generations with Ne = 1000 is n = 10-2, which yields f(60) = 9.4 x 10-37. This value is very sensitive to variation in the mutation rate, however, and is likely to be quite unreliable.
To achieve greater precision, we also used an exact Fisher-Wright Markov transition matrix approach for selfing populations as outlined by Schoen et al. (1998) and Estes et al. (2004), except in our case we are looking at the fixation probability of a single mutant rather than the flux of mutations across the entire genome. The needed transition matrix is of size (N+1)(N+2)/2 x (N+1)(N+2)/2, which grows very quickly with population size N. It is currently computationally impossible for solve this problem for N = 1,000. Instead, we calculated the fixation probability for a recurrent neutral mutation for a variety of population sizes ranging from 20 to 80 and then extrapolated to larger population sizes. Within this rage of population sizes, the probability of fixation declines at a nearly perfectly linear rate (Supplemental Fig. S3). Using the equation for this line to extrapolate to larger population sizes, we estimate that the probability of fixation for a population of size 1,000 is approximately 10-38. This is actually likely to be an upper bound (potentially substantially so), as the predicted values slightly overestimate the fixation probability even for the relatively small population sizes considered here.
It is infeasible to simulate a complex process that occurs at a rate on the order of 10-38 using existing (or currently conceivable) computing equipment. Nevertheless, we used the simulation approach outlined in the Methods to examine the neutral case and demonstrated that it must occur at a rate below 10-10.
Even if the errors in these three approaches are off by many orders of magnitude, the probability that a new neutral mutation would be fixed within the time course of this experiment is small enough to be considered an experimental impossibility.
Ne for experimental populations
The effective size of an evolving population (Ne) clearly has a major influence on the probability that a new mutation will achieve fixation (Supplementary Fig. S4). We estimated that the transfer (bottleneck) population size experienced by recovery lines was on the order of 1,000 individuals, and that the average census population size reached by recovery lines prior to each successive transfer was ~10,000 individuals. This variance in population size means that the true recovery line Ne probably fell closer to the smaller of these numbers. Because our recovery populations evolved in the absence or near absence of sexual recombination, Ne may have been even further reduced. Variance in reproductive success may have also affected Ne in our experimental lines. One characteristic of an ideal population is that individual contributions to the next generation follow a Poisson distribution where variance in offspring number equals the mean. In real populations, variance in reproductive success frequently exceeds the mean such that Ne N (Crow 1954). We explored this possibility in our lines by comparing the mean and variance of individual reproductive success during the fitness assays. Contrary to the above situation, we found that the distributions of offspring production were all less variable than expected under a Poisson process, indicating that Ne > N. This pattern held true for all population treatments (i.e., ancestral and evolved controls, MA line, and recovery lines at all generational time points assayed); however, the difference between the mean and the variance for the MA line was less than that for the other treatments. Specifically, mean offspring production exceeded the variance by 0.249 for the MA line and an average of 0.872 for the control and recovery treatments combined. Owing to the competing effects of variance in reproductive success, and variance in population size and a mating system predominated by selfing, we settled upon Ne = 1,000 as a conservative estimate for use in the diffusion approximations and most simulations.
Supplemental Figure Legends
Supplemental Fig. S1. Locations of two mutations shared by R12A and R12C in the C. elegans genome. GBrowse images were obtained from WormBase. (A) shows the parallel mutation on Chr. II; (B) shows the parallel mutation on Chr. IV.
Supplemental Fig. S2. Chromatogram images showing evidence for mutational fixation at the 28 recovery-line mutation sites. Each panel shows evidence for two specific sites, one on the left and the other on the right. Ancestral (MA12, N2) sequence data is shown on top, recovery line sequence data is shown on the bottom. The chromosome position mutation for each site is indicated at the top of each panel. Asterisks point out the mutation sites in the trace files.
Supplemental Fig. S3. Decrease in probability of fixation a recurring neutral mutation with increasing population size. Probability of fixation was calculated exactly using a Markov approach as described in the text (points). The best fit line was then used to extrapolate to larger population sizes (R2 = 99.93%, F1,11 = 15,982, P < 1018).
Supplemental Fig. S4. Influence of population size on the probability of fixation. The probability of fixation of a recurrent mutation involves a complex interaction between the effective population size, the strength of selection, and the mutation rate. Here we use the empirical estimate of the mutation rate, m = 2.6 x 10-9. For strong selection, the probability of fixation increases monotonically with increasing population size (top). As selection becomes weaker (middle, bottom), populations that are going to reach fixation quickly will do so largely under the influence of genetic drift, which is more effective within smaller populations. However, population sizes need to be several orders of magnitude smaller than those used here for this effect to be important, and even then the probability of fixation in 60 generations under these scenarios is too low to play a role in these experiments. Over time, larger populations are more likely to fix new mutations because selection is more effective in these populations. Therefore, (1) the interpretation of the results of this study is fairly robust to several fold variation in effective population size and (2) selection needs to be quite strong regardless of population size for rapid fixation to be a realistic outcome.
Supplemental Figures
Fig. S1
Fig. S2
Fig. S3
Fig. S4
Supplemental Tables
Table S1. MA12 mutationsChr / Chr Pos / N2 base / MA12 base / Coding / Gene
I / 689217 / C / A / UTR / Y18H1A.13
I / 2468841 / A / G / UTR / Y74C10AL.2
I / 3286334 / G / A / IG
I / 5140486 / A / G / IG
I / 11074324 / T / A / IG
I / 12641768 / C / T / IN / H16D19.4
II / 1190693 / C / T / IG
II / 1567617 / C / A / EX_AS / K05F6.8
II / 2502744 / C / T / IG
II / 2952324 / A / G / IN / T11F1.8
II / 3575290 / T / A / IN / W10G11.7
II / 3579016 / G / T / EX_Syn / W10G11.12
II / 3862748 / G / A / IG
II / 4142481 / G / A / IG
II / 5580193 / C / T / EX_RQ / T25E4.2
II / 7119993 / C / A / EX_QK / F10E7.4
II / 7869785 / A / G / IN / T09A5.12
II / 10733064 / C / T / IG
II / 11424722 / A / C / IG
II / 13602580 / T / G / UTR / Y48E1B.13
II / 13895145 / A / T / IG
II / 13925739 / T / A / IN / W02B8.6
III / 985011 / C / T / IN / R06B10.5
III / 3949120 / G / A / IG
III / 4531951 / C / T / EX_PS / F25F2.2
III / 6245585 / G / A / EX_PS / T12A2.7
III / 6791985 / T / A / IN / K07E12.1
III / 7216251 / G / A / IG
IV / 1164566 / A / T / IG
IV / 1590657 / T / A / UTR / Y41D4B.19
IV / 2224934 / C / T / IG
IV / 4843961 / T / A / IG
IV / 5440743 / T / A / IN / Y4C6A.2
IV / 5576281 / T / G / IG
IV / 5581671 / A / T / IG
IV / 7692157 / C / T / EX_HY / F33D4.2
IV / 8257909 / G / C / IG
IV / 9905842 / C / A / IG
IV / 11577276 / C / T / EX_Syn / F12F6.5
IV / 13127969 / C / T / IG
IV / 13838596 / G / A / IN / H08M01.2
IV / 14129173 / T / C / IG
IV / 15488726 / G / A / IG
IV / 17481326 / T / G / EX_KQ / 4R79.2
V / 807918 / T / G / IG
V / 986328 / A / T / IN / Y50D4C.1
V / 1159944 / T / A / IN / T21H3.5
V / 1362169 / T / C / IG
V / 1501288 / C / T / EX_Syn / C38C3.7
V / 3916167 / A / G / IG
V / 6218330 / G / T / IN / W06H8.7
V / 7060194 / C / A / IG
V / 7576334 / A / T / IN / F19F10.12
V / 9000830 / C / T / IG
V / 9264949 / A / T / EX_WR / ZK682.2
V / 9916603 / G / T / IG
V / 20848332 / C / A / IG
X / 1124986 / G / A / IG
X / 4603450 / C / A / EX_GV / T13H2.5
X / 7824345 / C / T / EX_Syn / R03G5.1
X / 9167048 / G / A / IG
X / 12251568 / T / A / IG
X / 12642670 / A / T / IG
X / 12964465 / C / A / IG
X / 13146771 / A / T / IG
X / 16239816 / T / G / IG
X / 16629455 / C / T / IG
X / 17519171 / C / A / UTR / R09G11.2
Chr indicates chromosome and Chr Pos indicates chromosome position (relative to WS170 build). N2 base indicates the ancestral base present in N2 and MA12 base indicates the base present in MA12. Coding shows the coding sequence category of the mutation: EX indicates exon, IN indicates intron, UTR indicates untranslated region, and IG indicates intergenic. For exon mutations, Syn indicates synonymous mutations; for nonsynonymous mutations the specific amino acid changes are denoted.