Haplotype Inferences with Phase v.2

The three-marker haplotypes frequency distributions for LG2 and LG12 showed a large number of unique haplotypes and very similar frequencies of shared haplotypes (Table S4-1). Two-marker haplotypes, used to estimate haplotypic LD, were obtained by collapsing the three-marker haplotypes from either side (LG2 and LG12) or by direct inference (LG10). Shared haplotype frequencies increased in a similar way for Qp46-Qr87, Qr8-Qr112 and Qr112-Qr30, while Qp119-Qp46 departed from the previous pattern with almost twice the number of unique haplotypes. The haplotypes frequency distribution in LG10 (Qr11-Qr96) showed the lowest number of unique haplotypes, but resembled rather closely those from both LG12 and from the second LG2 segments.

All segments showed rather high haplotype probabilities (Figure S4-1), which are a function of the data homozygosity and of the coalescent based approach implemented in Phase (Stephens et al., 2001). Shared haplotypes had slightly higher probabilities than unique haplotypes (Figure S4-1 left panel), as expected because of the use of the coalescent approach. On the other hand, the small probability differences shown by the two species (Figure S4-1 right panel) could be explained by the single marker heterozygosities. Anyway, median haplotype probabilities were very high in all three LGs (LG2: 0.78; LG10: 0.96; and LG12: 0.76). Note that collapsing the three-marker into two-marker haplotypes logically increases the corresponding probabilities.

Haplotype Inferences with the ELB Algorithm

(using Arlequin v.3.5.1)

The ELB algorithm (Excoffier et al., 2003) is a Bayesian method for reconstructing the (unknown) gametic phase of multilocus genotypic data. A detailed description of the method is available within Arlequin's manual (

The comparison with the haplotype reconstructions made with Phase (Table S4-2) showed that the ELB algorithm identified a slightly larger number of unique haplotypes than Phase, which might be a consequence of the coalescent-based approach. However, both methods identified the same frequent haplotypes. The inferred haplotypes probabilities were lower with the ELB algorithm than with Phase (FiguresS4-1,2), again reflecting differences in the coalescent-based approach. Finally, the two methods similarly identified few haplotypes shared between the two oak species (Figure S4-3 and Figure 3). In both instances, LG 10 was an exception to this rule, as its most frequent haplotype was common to both species.

In spite of the differences between the two haplotype reconstruction methods, the final outcomes regarding haplotypic LD were very much alike (Table 3 and Table S5-3).

MethodsS4

Methods for haplotype reconstructions using Phase have been described in the main text and in Supplemental File 3.

Haplotype reconstructions from multilocus genotype data, using the ELB algorithm, were carried out with Arlequin v.3.5.1 (Excoffier et al., 2010). The Dirichlet prior (alfa value), the weights given to haplotypes differing by a single mutation from present haplotypes (epsilon value) and the parameter preventing adaptive windows to grow too much (gamma value) were those recommended for SSRs (0.01, 0.1 and 0 respectively). The heterozygote site influence zone was allowed to include all markers from each linkage group (3 markers for LG 2, LG9 and LG 12 and 2 markers for LG 10). The burn-in steps in the Gibbs sampler were set up to the maximum allowed value (9999999). We then obtained 20.000 gametic phases which were sampled every 1.000 iterations. The gametic phases with highest posterior probabilities from each LG were selected to represent the true haplotypes.

Figures S4-2 and S4-3 from this supplemental file were prepared with the library “Rcomander” (Fox et al., 2010) under the R environment (R Development Core Team, 2010).

References S4

Excoffier L, Laval G, Balding D (2003) Gametic phase estimation over large genomic regions using and adaptive window approach. Human Genomics, 1 (1), 7-19

Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: A new series of programs to perform population genetic analyses under Linux and Windows. Molecular Ecology Resources (in press).

Fox J, with contributions from Liviu Andronic, Michael Ash, Theophilius Boye, Stefano Calza, Andy Chang, Philippe Grosjean, Richard Heiberger, G. Jay, Kerns, Renaud Lancelot, Matthieu Lesnoff, Uwe Ligges, Samir Messad, Martin Maechler, Robert Muenchen, Duncan Murdoch, Erich Neuwirth, Dan Putler, Brian Ripley, Miroslav Ristic and Peter Wolf. (2010). Rcmdr: R Commander. R package version 1.5-5.

R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL

Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978-989.

Table S4-1 Frequency distributions of the haplotypes inferred by Phase. Two-marker haplotypes (Bi-H1 and Bi-H2) were obtained after collapsing the respective three-marker haplotypes (Tri-H) from both sides. Kis the number of different haplotypes and N is the total number of haplotypes.

Table S4-2: Comparison between the haplotype frequencies estimated by Phase and the ELB algorithm. LG2, LG9 and LG12 show the three-marker haplotypes frequencies, while LG 10 shows the two-marker haplotypes frequencies.

Figure S4-1: Probabilities assigned by Phase to the inferred haplotypes. The distributions compare the probabilities for Allvs.Shared haplotypes (left panel) and for the haplotypes that belong to each of the two species (right panel).

Figure S4-2: Haplotype probabilities frequency distributions for the best reconstructions with the ELB algorithm.


Figure S4-3: Within-species frequency distributions for the best haplotype reconstructions with the ELB algorithm.

1