SUPPLEMENTAL MATERIAL

Comparison of beta diversity distance metrics

In the analyses presented in the main text, we use unweighted UniFrac and Principal Coordinates Analysis (PCoA) for identifying relationships between the overall microbiota compositions in different samples based on Operational Taxonomic Unit (OTU) counts per sample. UniFrac evaluates the distance between two samples based on the degree to which the sequences are from unique versus shared phylogenetic lineages (Lozupone and Knight 2005). We chose unweighted UniFrac and PCoA because of its successful application in previous meta-analyses (Lozupone and Knight 2007; Ley et al. 2008; Lozupone et al. 2012). Other conceptual reasons that unweighted UniFrac is a good choice is that it accounts for phylogenetic relationships between OTUs when comparing diversity (Lozupone and Knight 2005). Since we are comparing, in some instances, samples that are very divergent compositionally, accounting for phylogenetic relatedness can add power since phylogenetically related taxa tend to have similar properties. However, we were also interested in verifying that unweighted UniFrac performed well compared to other beta diversity measures, since performance evaluations of a variety of beta diversity measures using 454 pyrosequencer-generated 16S rRNA datasets and simulated data have identified other diversity measures that perform well for detecting gradients (Chi-square and Pearson correlations) and clusters (Gower, Canberra and Jaccard) in microbial community datasets (Kuczynski et al. 2010). The QIIME database ( allows for the easy application of many different established diversity measures to any given table of OTU counts per sample.

Because distinct compositions have been described for different human body habitats in many different publications (Costello et al. 2009; Qin et al. 2010; Caporaso et al. 2011; Ravel et al. 2011; 2012; Koren et al. 2013), we reasoned that one way to evaluate the performance of different beta diversity measures is to evaluate how well they distinguish between samples from the adult gut, vagina, skin, and oral cavity from many different studies. This analysis was similar to the one shown in Fig. 1, except that infants were excluded since their gut microbiota is highly divergent from the adult gut microbiota (Palmer et al. 2007; Koenig et al. 2011; Yatsunenko et al. 2012). We also excluded the study of the gut microbiota during pregnancy (Koren et al. 2012) and in Inflammatory Bowel Disease (Willing et al. 2010), since these factors appeared to cause a particularly large deviation from the healthy adult gut microbiota.

We clustered the samples using Pearson, Jaccard, Canberra, and unweighted UniFrac dissimilarities. These measures were determined to perform well compared to other dissimilarity measures for detecting clusters (Canberra and Jaccard) and gradients (Pearson) in simulated and real 16S rRNA survey data of microbial communities (Kuczynski et al. 2010), and all produce a distance measure between 0 and 1, where 0 is considered identical and 1 maximally different. Distances calculated between samples from different body sites (between) were higher than those calculated between samples from the same body site (within) for all four measures (Fig. S1b). Canberra dissimilarities performed especially poorly in this regard; PCoA clustering with this measure produced a different clustering pattern from the other three measures, with PC2 variation driven by differences between sets of vaginal samples. UniFrac distances between samples from different body sites were generally smaller than those calculated for Pearson, Jaccard and Canberra dissimilarities, which all approached the maximum distance value of 1 indicating almost no overlap in observed OTUs (Fig. S1b). This may be due to UniFrac counting different OTUs that are phylogenetically related as similar, so that if related organisms have adapted to different body habitats, those habitats will show increased similarity because of this feature.

References

Caporaso JG, Lauber CL, Costello EK, Berg-Lyons D, Gonzalez A, Stombaugh J, Knights D, Gajer P, Ravel J, Fierer N et al. 2011. Moving pictures of the human microbiome. Genome Biol12(5): R50.

Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. 2009. Bacterial community variation in human body habitats across space and time. Science326(5960): 1694-1697.

Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature486(7402): 207-214.

Koenig JE, Spor A, Scalfone N, Fricker AD, Stombaugh J, Knight R, Angenent LT, Ley RE. 2011. Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A108 Suppl 1: 4578-4585.

Koren O, Goodrich JK, Cullender TC, Spor A, Laitinen K, Kling Backhed H, Gonzalez A, Werner JJ, Angenent LT, Knight R et al. 2012. Host Remodeling of the Gut Microbiome and Metabolic Changes during Pregnancy. Cell150(3): 470-480.

Koren O, Knights D, Gonzalez A, Waldron L, Segata N, Knight R, Huttenhower C, Ley RE. 2013. A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets. PLoS computational biology9(1): e1002863.

Kuczynski J, Liu Z, Lozupone C, McDonald D, Fierer N, Knight R. 2010. Microbial community resemblance methods differ in their ability to detect biologically relevant patterns. Nat Methods7(10): 813-819.

Ley RE, Lozupone CA, Hamady M, Knight R, Gordon JI. 2008. Worlds within worlds: evolution of the vertebrate gut microbiota. Nature reviews Microbiology6(10): 776-788.

Lozupone C, Faust K, Raes J, Faith JJ, Frank DN, Zaneveld J, Gordon JI, Knight R. 2012. Identifying genomic and metabolic features that can underlie early successional and opportunistic lifestyles of human gut symbionts. Genome research22(10): 1974-1984.

Lozupone C, Knight R. 2005. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol71(12): 8228-8235.

Lozupone CA, Knight R. 2007. Global patterns in bacterial diversity. Proc Natl Acad Sci U S A104(27): 11436-11440.

Palmer C, Bik EM, DiGiulio DB, Relman DA, Brown PO. 2007. Development of the human infant intestinal microbiota. PLoS Biol5(7): e177.

Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T et al. 2010. A human gut microbial gene catalogue established by metagenomic sequencing. Nature464(7285): 59-65.

Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SS, McCulle SL, Karlebach S, Gorle R, Russell J, Tacket CO et al. 2011. Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci U S A108 Suppl 1: 4680-4687.

Willing BP, Dicksved J, Halfvarson J, Andersson AF, Lucio M, Zheng Z, Jarnerot G, Tysk C, Jansson JK, Engstrand L. 2010. A pyrosequencing study in twins shows that gastrointestinal microbial profiles vary with inflammatory bowel disease phenotypes. Gastroenterology139(6): 1844-1854 e1841.

Yatsunenko T, Rey FE, Manary MJ, Trehan I, Dominguez-Bello MG, Contreras M, Magris M, Hidalgo G, Baldassano RN, Anokhin AP et al. 2012. Human gut microbiome viewed across age and geography. Nature486(7402): 222-227.

Figure S1: Principal Coordinates Analysis (PCoA) of samples from 10 studies (Table 1) of the human gut microbiota. PCoA was applied to a distance matrix created by calculating Jaccard, Pearson, Canberra, or unweighted UniFrac values for all pairs of samples. The most abundant bacterial families are superimposed on the same PCoA plots in the right panels in purple. The size of the sphere representing a taxon is proportional to the mean relative abundance of the taxon across all samples. Panel B shows the average pairwise values for comparisons of samples from the same body site (within) or between different body sites (between). Samples were classified broadly as from the gut (mostly feces but also colon, ileum and rectum), vagina, oral cavity (e.g. saliva, tongue, cheek), and skin and other (diverse skin sites, hair, nostril, and urine).

Fig. S1