Supplemental Information:

Host Genetic and Environmental Effects on Mouse Intestinal Microbiota

James H. Campbell1, Carmen M. Foster1, Tatiana Vishnivetskaya1,2, Alisha G. Campbell1,3, Zamin K. Yang1, Ann Wymore1, Anthony V. Palumbo1, Elissa J. Chesler3,4 and Mircea Podar1,3*

1Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA 37831

2Center for Environmental Biotechnology, University of Tennessee, Knoxville, TN, USA 37996

3Genome Science and Technology Program, University of Tennessee, Knoxville, TN, USA

37996

4The Jackson Laboratory, Bar Harbor, ME, USA 04609

Subject Category:Microbe-microbe and microbe-host interactions

Running Title: Genetic effects on mouse gut microbiota

*Corresponding Author

Mircea Podar

Oak Ridge National Laboratory

Biosciences Division

Oak Ridge National Laboratory

Oak Ridge, TN 37831

Phone: (865) 576-6144

Fax: (865) 576-8646

Email:

Supplemental Methods

Mice.Mice were fed an irradiated rodent diet (Purina 5053 or 5058) and received 100% fresh air processed into the facility through 95% efficient filters (hospital grade). Air being introduced into the primary enclosure came from the room, and was first passed through a roughing filter, then a High Efficiency Particulate Air (HEPA) filter and was exhausted via the facility exhaust system.In the primary barrier, all mice were housed in ventilated racks manufactured by Thoren Caging or Animal Care Systems (ACS). Thoren cages were under positive pressure created by a supply air blower motor (50 air exchanges per hour). ACS cages were under negative pressure created by the facility exhaust system (20-30 changes per hour). ACS cages were exhausted through a HEPA-like filter in front of the cage, and had a solid lid, designed such that unfiltered room air cannot enter the cage.All cages were opened under a NuAire or Baker Laminar Flow work Station.

Mice were euthanized by cervical dislocation at the same time each day, and a 12-cm- long segment of jejunum starting 5 cm distally from the ligament of Trietz was dissected. The cecum was included in the dissection. Jejunum sections were flushed with ice-cold PBS, placed in RNA later and stored for other analyses. Cecum contents were extruded manually, snap frozen in liquid nitrogen and stored at -80°C. Cecum tissue was flushed with ice cold saline and the tissue either snap frozen or stored in RNAlater (Ambion) for gene expression analyses (to be reported elsewhere). All procedures were approved by the ORNL and University of Tennessee Animal Care and Use Committees.

Extraction of Microbial Genomic DNA.Microbial genomic DNA (gDNA) was extracted from mouse cecum contents using a protocol based on that used by Ley et al (Ley et al 2008). Approximately 100 mg of cecum contents was added to a 2-ml, screw-capped tube containing 1 g of silica/zirconia beads (0.1 mm; BioSpec Products; Bartlesville, OK), 500 µl of phenol:chloroform:isoamyl alcohol (25:24:1) and 210 µl of 20% SDS. Headspace was filled with cold DNA extraction buffer (200 mM Tris at pH 8, 200 mM NaCl, 20 mM EDTA). Bead tubes were attached to a MoBio vortex adapter and shaken horizontally at high speed for 10 min. Aqueous phase was washed three times with phenol:chloroform:isoamyl alcohol (25:24:1) in phase gel lock tubes (Qiagen; Valencia, CA). Nucleic acids were precipitated with 1 vol ammonium acetate (7.5 M), 2 vol isopropanol and incubation at -20C for at least 1 hr. Precipitated nucleic acids were concentrated by centrifugation at 15,000 g for 15 min then dissolved in TE buffer. RNase A digestion (100 U) was performed for 30 min at 37C. Genomic DNA was precipitated with 0.1 vol sodium acetate (3 M, pH 5.5) and 3 vol ethanol and incubation at -20C for at least 1 hr. Again, DNA was concentrated by centrifugation at 15,000 g for 15 min, pellets were washed twice with 70% ethanol, air dried and dissolved in PCR-grade water. Mock extractions without cecum contents were used as negative controls.

Preparation and Pyrosequencing of SSU rRNA gene Amplicon Libraries. Amplicon libraries of both V1-2 and V4 regions of 16SSSUrRNA gene were obtained using similar methods. Amplification of the V1-2 region was performed in 50-µl reactions composed of 1× polymerase buffer (Invitrogen; Carlsbad, CA), 200 µM each dNTP, 3 mM MgSO4, 300 nM of forward primer (MWG Operon; Huntsville, AL), 300 nM reverse primer mix (MWG Operon), 1 U of Platinum® Taq DNA Polymerase High Fidelity enzyme (Invitrogen) and 100 ng of gDNA. We used a modification of the 27F primer(Frank et al 2008)fused to 6-nucleotide multiplexing tags and to the 454 FLX sequencing primer A (5’-GCCTCCCTCGCGCCATCAGxxxxxxGTTTGATCMTGGCTCAG-3’), where the x region represents the multiplexing tag and the SSU rRNA primer is bold, and a single reverse primer (5’- GCCTTGCCAGCCCGCTCAGCTGCTGCCTYCCGTA-3’)modified from 342R (Weisburg et al 1991). Each amplification began with a denaturation step of 94C for 2 min followed by 25 amplification cycles of 94C for 20 sec, 53C for 30 sec and 68C for 45 sec. A final extension at 68C for 3 min followed amplification cycles.V4 amplicons were generated in 50-µl reactions composed of 1× AccuPrime Pfx reaction mix (Invitrogen), 300 nM forward primer (Integrated DNA Technologies; Coralville, IA), 300 nM reverse primer mix (IDT), 1.5 U AccuPrime Pfx polymerase (Invitrogen) and 100 ng gDNA. We used barcoded forward primers (5’-GCCTCCCTCGCGCCATCAGxxxxxxAYTGGGYDTAAAGNG-3’) and a mix of reverse primers (the FLX B adaptor sequence 5’-GCCTTGCCAGCCCGCTCAG fused to the rRNA gene sequences TACCRGGGTHTCTAATCC, TACCAGAGTATCTAATTC, CTACDSRGGTMTCTAATC or TACNVGGGTATCTAATCC-3’ in a 6:1:2:12 ratio, respectively),, designed to cover most of the Bacteria domain (Cole et al 2009). Thermal profiles consisted of a denaturation at 95C for 2 min followed by 27 amplification cycles of 95C for 15 s, 55C for 30 s and 68C for 45 s. A final extension at 68C for 3 min followed amplification cycles.

All amplicons were visualized on agarose gels for quality and subsequently purified from amplification reactions using Agencourt AMPure reagents (Beckman Coulter; Danvers, MA). A final check of amplicon quality and quantity was performed on an Agilent Bioanalyzer (Santa Clara, CA) using DNA 1000 reagents. Sequencing was performed on a 454-FLX instrument (Roche; Indianapolis, IN) following the manufacturer’s recommendations.

Sequences were extracted from raw FASTA files using the RDP’s Pipeline Initial Process. V4 amplicons were quality controlled by passing both forward and reverse primers (two mismatches each), a minimum sequence length of 200 nt and no ambiguous base calls. V1-2 amplicons were generally too long to completely sequence using FLX chemistry; therefore, the only the forward primer was used for data processing, with minimum sequences lengths of 200 nt and no mismatches allowed. Sequence yield and cocaging specifications are summarized in Table S1.

OTU-Based Sequence Analysis.Initially, mothur (V1.11.0) was used to further screen sequences for each SSUrRNA gene region. Sequences with homopolymers longer than 8 nt were removed. Remaining sequences were aligned to the SILVA database using a Needleman-Wunsch method, and those mapping to incorrect regions of the alignment were also removed from the dataset. Then, the mothur implementation of ChimeraSlayer (Haas et al 2011) was used to detect potentially chimeric sequences. These steps resulted in quality controlled sequence sets containing unequal numbers of sequences for individual mice. To control for unequal sequence coverage among individuals, sequences were separated into individual samples and equally subsampled to the minimum sequence number using the Perl script daisychopper.pl (

Amplicon libraries from both hypervariable regions of SSUrRNA gene were subject to stringent quality control procedures that reduced the number of sequences analyzed (Table S1). V1-2 region sequence numbers were reduced by 15% during this screening (from 345,742 to 293,928), with the majority (87.0%) of the purged sequences identified as chimeric.V4 region sequence numbers were reduced by 26% during screening (from 819,554 to 605,397), with 99.7% of the eliminated sequences being identified as chimeric.Mean sequences per mouse were 4982 for V1-2 and 6640 for V4 libraries.Equal subsampling for UniFrac analyses was based upon the minimum number of sequences observed for any mouse in each library, thus reducing V1-2 libraries to 1557 sequences per mouse and V4 libraries to 3128 sequences per mouse.

Identification of OTUs was performed in mothur (Schloss et al 2009) for each SSUrRNA gene region with the general approach of Huse et al. (Huse et al 2010). Remaining sequences were pre-clustered in mothur using “diffs=1”, and a distance matrix was calculated for pre-clusters. Data were then clustered using an average-neighbor method.

Phylogeny-Based Sequence Analysis. Representatives of each OTU were collected for both V1-2 and V4 regions at genetic distances of 0.03 and 0.05 using mothur and aligned using the RDP aligner. Phylogenetic trees were constructed by both neighbor-joining (Jukes-Cantor distances) in Geneious v5.4 or by maximum likelihood in RAxML-7.04 as described in Flores et al. 2011 (Flores et al 2011). These trees and their originating sequences, as well as a general SSU rRNA bacterial reference tree (greengenes.lbl.gov), were used for mapping the entire V4 and V12 sequence datasets or equally subsampled versions of them for unweighted Fast UniFrac analysis (Hamady et al 2009). Comparisons between the different types of trees and datasets at different genetic distances were made to evaluate the level of explained variation in the Principal Coordinates Analysis (PCoA) analysis and the intrastrain and interstrain differences. Final plots for all analyses were produced with Matlab, and trees were visualized and annotated using FigTree (v1.3.1). Data were also analyzed with respect to taxonomic affiliation of the SSU rRNA gene fragments using the RDP Classifier set at 80% confidence threshold.

SUPPLEMENTAL LITERATURE CITED

Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ et al (2009). The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res37: D141-145.

Flores GE, Campbell JH, Kirshtein JD, Meneghin J, Podar M, Steinberg JI et al (2011). Microbial community structure of hydrothermal deposits from geochemically different vent fields along the Mid-Atlantic Ridge. Environmental Microbiology13: 2158-2171.

Frank JA, Reich CI, Sharma S, Weisbaum JS, Wilson BA, Olsen GJ (2008). Critical Evaluation of Two Primers Commonly Used for Amplification of Bacterial 16S rRNA Genes. Appl Environ Microbiol74: 2461-2470.

Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G et al (2011). Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research21: 494-504.

Hamady M, Lozupone C, Knight R (2009). Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J.

Huse SM, Welch DM, Morrison HG, Sogin ML (2010). Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology12: 1889-1898.

Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS et al (2008). Evolution of mammals and their gut microbes. Science320: 1647-1651.

Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al (2009). Introducing mothur: Open Source, Platform-independent, Community-supported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol.

Weisburg WG, Barns SM, Pelletier DA, Lane DJ (1991). 16S ribosomal DNA amplification for phylogenetic study. J Bacteriol173: 697-703.

TITLES AND LEGENDS OF SUPPLEMENTARY FIGURES

Figure S1. A. Diagram of the experimental design for mouse housing, sibling information and identity ofeach mouse used in the study, with strain andsex information. B. Overall workflow for the cecum microbiota characterization.

Figure S2. Taxonomic diversity detected in amplicon libraries of both primer pairs. Each bar represents the mean percentage (± SEM) of each phylum across all mice surveyed. Phyla represented by fewer than 100 sequences were omitted from this graph. The y-axis is log-scaled to better depict low-abundance phyla.

Figure S3. PCoA representation of UniFrac distances (0.05 genetic distance) from the V1-2 hypervariable region of SSU rRNA gene. Samples were analyzed with sequences and subsampled randomly for equal coverage across all mice (n = 59).

Figure S4. PCoA representation of UniFrac distances (0.05 genetic distance) from the V4 hypervariable region of SSU rRNA gene. Samples were analyzed with all sequences and subsampled randomly for equal coverage across all mice (n = 94).

Figure S5. Hierarchical clustering (UPGMA) representation of OTU-based clustering (0.03 genetic distance) of data from the V1-2 hypervariable region of SSU rRNA gene. Counts of each OTU within each mouse (n = 59) were standardized to percentage, square-root transformed and a Bray-Curtis similarity matrix was calculated.

Figure S6. Hierarchical clustering (UPGMA) representation of OTU-based clustering (0.03 genetic distance) of data from the V4 hypervariable region of SSU rRNA gene. Counts of each OTU within each mouse (n = 94) were standardized to percentage, square-root transformed and a Bray-Curtis similarity matrix was calculated.

Figure S7. Jackknifed hierarchical clustering representation of UniFrac distances (0.05 genetic distance) of data from the V1-2 hypervariable region of SSU rRNA gene. Sequences were subsampled randomly for equal coverage across all mice (n = 59).

Figure S8. Jackknifed hierarchical clustering representation of UniFrac distances (0.05 genetic distance) of data from the V4 hypervariable region of SSU rRNA gene. Sequences were subsampled randomly for equal coverage across all mice (n = 94).

Figure S9. Box-and-whisker plot showing the effects of maternal lineage on gut bacterial communities within mouse strains. Distributions were formed by parsing strainwise data from the larger Bray-Curtis dissimilarity matrix (V4 only) of mouse-by-mouse comparisons. To illustrate the effects of maternal lineage, intrastrain dissimilarities were separated into two groups: 1) pairwise distances of siblings and 2) pairwise distances of all non-siblings. Distributions of non-siblings were plotted and distances of siblings were superimposed onto these distributions (*). Each maternal lineage is represented by a different color within each strain. Outliers are denoted by red plus characters (+).

Figure S10. Box-and-whisker plot showing the effects of cohabitation on gut bacterial communities within mouse strains. Distributions were formed by parsing strainwise data from the larger Bray-Curtis dissimilarity matrix (V4 only) of mouse-by-mouse comparisons. To illustrate this effect, intrastrain dissimilarities (Bray-Curtis) were separated into two groups: 1) pairwise distances of co-caged mice and 2) pairwise distances from all mice not co-caged. Distributions were plotted only for mice that were not co-caged, and distances of co-caged mice were superimposed onto these distributions (*). Each group of co-caged mice is represented by a different color within each strain. Outliers are denoted by red plus characters (+).

1