Supporting Information

Genome analysis of ‘Candidatus Ancillula trichonymphae’, first representative of a deep-branching clade of Bifidobacteriales, strengthens evidence for convergent evolution in flagellate endosymbionts

Jürgen F. H. Strassert1†, Aram Mikaelyan1, Tanja Woyke2, and Andreas Brune1*

Table of contents

Detailed experimental procedures

Supplementary Tables

Supplementary Figures

References

Detailed experimental procedures

Termites and sample preparation

Incisitermes marginipennis was obtained from the Federal Institute for Materials Research and Testing (BAM) in Berlin. The hindgut of a false worker (pseudergate) was removed and suspended in solution U (Trager, 1934). A single cell of Trichonympha paraspiralis (Fig. 1A) was isolated and washed in the same buffer using a micromanipulator (MMO-202ND; Narishige) equipped with a microinjector (CellTramm Oil; Eppendorf). The flagellate cell was physically fixed with a holding capillary tube (inner diameter: 20 µm; Fig. 2A) and perforated with a confocal laser beam (XYClone; Hamilton Thorne Biosciences) near the anterior cell pole (Fig. 2B), which contains the majority of the ‘Candidatus Ancillula trichonymphae’ endosymbionts (Strassert et al., 2012; Fig. 1B and C). Cytoplasm with bacterial cells leaking from the flagellate was collected with a glass capillary tube (inner diameter: 20 µm) connected to a second, identical micromanipulator (Fig. 2B). After sample collection, the flagellate was disrupted to locate the nucleus and ensure that it had not been unintentionally aspirated (Fig. 2C). The sample was mixed with Triton X-100 (0.1% final concentration), and heated to 95 °C for 10 min to release bacterial DNA, cooled on ice for 5 min, and centrifuged at 20,000 × g (4 °C) for 10 min to remove cell debris.

Whole genome amplification and purity check

Aliquots of each preparation were used to amplify genomic DNA by multiple-displacement amplification (MDA) with the REPLI-g UltraFast Mini Kit (Qiagen) following the manufacturer’s instructions, except that the incubation time was extended to 4 h. To ensure the successful amplification of ‘Ca. A. trichonymphae’ and the absence of potential contaminants, the MDA products were subjected to terminal restriction fragment length polymorphism (T-RFLP) analysis of the bacterial SSU rRNA genes using the FAM-labeled forward primer U341F (Baker et al., 2003) and the reverse primer 1390R (Thongaram et al., 2005). The PCR started with a denaturing step at 95 °C for 3 min, followed by 32 cycles at 95 °C for 30 s, 56 °C for 45 s, and 72 °C for 45 s, and a final extension step at 72 °C for 5 min. Aliquots of the PCR product were separately digested with the restriction enzymes MspI and TaqI and analyzed as described by Egert et al. (2003). Lengths of terminal restriction fragments (T-RF) were determined on an automatic sequence analyzer (ABI 3130; Applied Biosystems, Carlsbad, Calif., USA). For each preparation, the products of four replicate amplifications that originated from the same flagellate cell and yielded exclusively the predicted T-RFs of ‘Ca. A. trichonymphae’ were pooled for sequencing.

Sequencing

DNA was sheared into smaller fragments via sonication (Covaris) and ligated to sequencing adapters. The samples with the name ImTpAt0 and ImTpAt1 were sequenced at GATC Biotech (Konstanz, Germany), and at the Joint Genome Institute (Walnut Creek, CA, USA), respectively.

Report by GATC Biotech: The DNA was run on a 2% agarose gel with TAE buffer, and the band of a size of approximately 700 bp (approximate size after Covaris fragmentation) was excised and column purified. Size selection was followed by 12 cycles of amplification, and a final column purification. After concentration measurement, the resulting library was immobilized onto DNA capture beads, and the library beads obtained were amplified through emPCR according to the manufacturer’s recommendations. Following amplification, the emulsion was chemically broken, and the beads carrying the amplified DNA library were recovered and washed by filtration. The sample was sequenced on a half Genome Sequencer FLX Pico-Titer plate device with a GS FLX Titanium XLR70 sequencing kit in a 200 cycles run on a GS FLX+ Instrument (single reads, 420 bp). The GS FLX produced the sequence data as Standard Flowgram Format (SFF) file containing flowgrams for each read with basecalls and per-base quality scores. The data was analyzed with the GS FLX System Software GS De Novo Assembler (Newbler) Version 2.6 taking the “read flowgrams” (SFF file) as input and using default parameters for genomic libraries for the assembly. The assembly contained 5,824 contigs.

Report by the Joint genome Institute (JGI): The draft genome was generated using Illumina technology. An Illumina std shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform (paired end reads, 150 bp), which generated 24,764,830 reads totaling 3,714.7 Mb. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov. All raw Illumina sequence data were passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts. Artifact-filtered sequence data was then screened and trimmed according to the k-mers present in the dataset. High-depth k-mers, presumably derived from MDA amplification bias, cause problems in the assembly, especially if the k-mer depth varies in orders of magnitude for different regions of the genome. Reads with high k-mer coverage (>30X average k-mer depth) were normalized to an average depth of 30X. Reads with an average k-mer depth of less than 2X were removed. The following steps were then performed for assembly: (1) normalized Illumina reads were assembled using IDBA-UD version 1.0.9 (Peng et al., 2012), (2) 1–3 kb simulated paired end reads were created from IDBA-UD contigs using wgsim (https://github.com/lh3/wgsim), (3) normalized Illumina reads were assembled with simulated read pairs using Allpaths-LG (version r42328) (Gnerre et al., 2011), (4) parameters for assembly steps were: a) IDBA-UD (--no local), b) wgsim (-e 0 -1 100 -2 100 -r 0 -R 0 -X 0), and c) Allpaths-LG (PrepareAllpathsInputs: PHRED 64=1 PLOIDY=1 FRAG COVERAGE=125 JUMP COVERAGE=25 LONG JUMP COV=50, RunAllpathsLG: THREADS=8 RUN=std shredpairs TARGETS=standard VAPI WARN ONLY=True OVERWRITE=True MIN CONTIG=2000). The final draft assembly contained 437 contigs in 436 scaffolds. The total size of the genome is 5.3 Mb and the final assembly is based on 182.6 Mb of Illumina data. Based on a presumed genome size of 5 Mb, the average coverage of the genome was 743X.

The contigs of both assemblies (sample ImTpAt0 and sample ImTpAt1) were combined with CAP3 (Huang and Madan, 1999) using a sequence overlap of 100 bases and a sequence similarity of 99.0%.

Annotation

Coding DNA sequences of the combined assemblies (draft genome ImTpAt; 784 scaffolds) were identified with the Prokaryotic Dynamic Programming Gene-finding Algorithm (Hyatt et al., 2010) and manually curated using the Gene Prediction Improvement Pipeline developed by the JGI (Pati et al., 2010). tRNA genes were predicted with the tRNAScan-SE tool (Lowe and Eddy, 1997). Ribosomal RNA genes were found by searches against the SILVA database (Pruesse et al., 2007). Non-coding RNAs were identified by searching the genome for the corresponding Rfam profiles using INFERNAL (http://infernal.janelia.org). Annotation was further refined and metabolic pathways were reconstructed using the Integrated Microbial Genomes Expert Review software (IMG ER; Markowitz et al., 2009). All scaffolds that contained genes with a high sequence similarity (≥95%) to previously identified contaminations of the REPLI-g UltraFast Mini Kit (Woyke et al., 2011) were removed from the draft genomes. Also scaffolds with suspicious G+C content and k-mer patterns (analyses implemented in IMG ER) were scrutinized by BLASTp analysis of several randomly selected genes and removed if they were suspected contaminants.

Supplementary Tables

(see file Supplementary_Tables.xlsx)

Table S1. Presence of 182 single-copy genes generally conserved in most bacterial genomes (Martin et al., 2006) in the draft genome of ‘Candidatus Ancillula trichonymphae’ strain ImTpAt and its closest relative with a sequenced genome, Bifidobacterium asteroides strain PRL2011.

Table S2. Phylogenetic context of the 2,131 protein-coding genes in the draft genome of ‘Candidatus Ancillula trichonymphae’ strain ImTpAt with best BLASTx scores (>30% amino acid sequence identity) against homologs in other Actinobacteria in the IMG reference database (Integrated Microbial Genomes, https://img.jgi.doe.gov/). Top hits are shown for cut-off values of 30%, 60%, and 90% amino acid sequence similarity.

Table S3. Gene annotations in the draft genome of ‘Candidatus Ancillula trichonymphae’ strain ImTpAt. The annotations are based on the Integrated Microbial Genomes Expert Review platform (IMG/ER; see Supporting Information). Unless otherwise noted, the genes were grouped according to KEGG pathways. Top hits of BLAST searches against NCBI’s protein database are shown right of the vertical lines.

Supplementary Figures

(see file Supplementary_Figures.pdf)

Fig. S1. Phylogenetic tree based on maximum-likelihood (ML) depicting the relationship between the 16S rRNA sequences affiliated with ‘Candidatus Ancillula trichonymphae’ and other major actinobacterial groups. Nodes marked with circles indicate monophyletic clades in the ML tree that were well supported (○, ≥70%; •, ≥90%) by the parametric aBAYES test.

Fig. S2. Metabolic pathways of ‘Candidatus Ancillula trichonymphae’ involved in sugar metabolism, based on the gene annotations in the draft genome. (A) Glycolysis, gluconeogenesis, non-oxidative pentose-phosphate pathway, phosphoketolase pathway, and the pentose and glucuronate interconversions. (B) The non-oxidative branch of the citrate cycle. If a gene was not found in the draft genome, the corresponding reaction is indicated by a gray arrow.

Fig. S3. Detailed schemes showing the phosphotransferase system (A), imports of phosphate and sugar-phosphate (B and C, respectively), and the creation of a transmembrane proton gradient via the F1FO-ATPase (D). Gray arrows indicate reactions catalyzed by enzymes that are encoded by genes not detected in the draft genome of ‘Candidatus Ancillula trichonymphae’.

Fig. S4. Metabolic pathways for the synthesis of amino acids. Gray arrows indicate reactions for which the corresponding genes were not found in the draft genome of ‘Candidatus Ancillula trichonymphae’.

Fig. S5. Biosynthesis of cofactors and vitamins. Genes missing in the draft genome are indicated by gray arrows.

Fig. S6. Phylogenetic tree inferred from the maximum-likelihood analysis of bacterial pyruvate flavodoxin/ferredoxin oxidoreductase amino acid sequences (PF01855). The sequences were aligned and trimmed with MAFFT (‘auto’ flag activated; Katoh and Standley, 2013) and trimAL (‘automated1’ mode; Capella-Gutierrez et al., 2009), respectively. The tree topology was estimated using FastTree.

Fig. S7. Maximum-likelihood tree based on the analysis of bacterial [FeFe] hydrogenase amino acid sequences (PF02906). The tree topology was estimated as described for Fig. S6.

References

Baker, G.C., Smith, J.J., and Cowan, D.A. (2003) Review and re-analysis of domain-specific 16S primers. J Microbiol Methods 55: 541–555.

Capella-Gutierrez, S., Silla-Martinez, J.M., and Gabaldon, T. (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972–1973.

Egert, M., Wagner, B., Lemke, T., Brune, A., and Friedrich, M.W. (2003) Microbial community structure in midgut and hindgut of the humus-feeding larva of Pachnoda ephippiata (Coleoptera: Scarabaeidae). Appl Environ Microbiol 69: 6659–6668.

Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J. et al. (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA 108: 1513–1518.

Huang, X., and Madan, A. (1999) CAP3: a DNA sequence assembly program. Genome Res 9: 868–877.

Hyatt, D., Chen, G-L., LoCascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.

Katoh, K. and Standley, D.M. (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30: 772–780.

Lowe, T.M., and Eddy, S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25: 955–964.

Markowitz, V.M., Mavromatis, K., Ivanova, N.N., Chen, I-M.A., Chu, K., and Kyrpides, N.C. (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25: 2271–2278.

Martin, H.G., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C. et al. (2006) Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol 24: 1263–1269.

Pati, A., Ivanova, N.N., Mikhailova, N., Ovchinnikova, G., Hooper, S.D., Lykidis, A., and Kyrpides, N.C. (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7: 455–457.

Peng, Y., Leung, H.C, Yiu, S.M., and Chin, F.Y. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28: 1420–1428.

Pruesse, E., Quast, C., Knittel, K., Fuchs, B.M., Ludwig, W., Peplies, J., Glöckner, F.O. (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nuc Acids Res 35: 7188–7196.

Strassert, J.F.H., Köhler, T., Wienemann, T.H.G., Ikeda-Ohtsubo, W., Faivre, N., Franckenberg, S. et al. (2012) ‘Candidatus Ancillula trichonymphae’, a novel lineage of endosymbiotic Actinobacteria in termite gut flagellates of the genus Trichonympha. Environ Microbiol 14: 3259–3270.

Thongaram, T., Hongo, Y., Kosono, S., Ohkuma, M., Trakulnaleamsai, S., Noparatnaraporn, N., and Kudo, T. (2005) Comparison of bacterial communities in the alkaline gut segment among various species of higher termites. Extremophiles 9: 229–238.

Trager, W. (1934) The cultivation of a cellulose-digesting flagellate, Trichomonas termopsidis, and of certain other termite protozoa. Biol Bull 66: 182–190.

Woyke, T., Sczyrba, A., Lee, J., Rinke, C., Tighe, D., Clingenpeel, S. et al. (2011) Decontamination of MDA reagents for single cell whole genome amplification. PLoS ONE 6: e26161.