Supplementary material

A metagenome of a full-scale microbial community carrying out Enhanced Biological Phosphorus Removal

Mads Albertsen, Lea Benedicte Skov Hansen, Aaron Marc Saunders, Per Halkjær Nielsen, and Kåre Lehmann Nielsen

Department of Biotechnology, Chemistry and Environmental Engineering, Aalborg University, Sohngaardsholmsvej 49, DK-9000 Aalborg, Denmark

Supplementary Text 1 of 1.

An investigation of the reads mapping to the ppk1 gene of Accumulibacter was conducted to evaluate the sensitivity and specificity of the reference assembly against the Accumulibacter genome. 87 ppk1 sequences were obtained from NCBI and five ppk1 genes of closely related species were included.

All ppk1 sequences were trimmed to the length of the smallest ppk1 fragments (1073 bp) and clustered using cdhit-est v.4.2.1 (Li and Godzik, 2006) with the following parameters; -c 0.99 –r 1. A BLAST database was created from the resulting 68 non-redundantsequences. The ppk1 sequences were assigned to different nodes in the phylogenetic tree using MEGAN. As MEGAN assigns reads to nodes based on the species information in the BLAST hits, the header of the individual ppk1 sequences were changed to reflect the topology of the phylogenetic tree.

The metagenomic reads that matched the extracted region of the ppk1 gene in the Accumulibacter genome in the original reference assembly, were extracted to investigate the specificity of the reference mapping (inclusion of other bacteria in the mapping). These sequences were matched to the ppk1 database using BLASTn with default parameters except –word_size = 7, –outfmt 5 and –evalue 1e-5. The output was analysed in MEGAN.

In order to investigate the sensitivity (inclusion of most Accumulibacter clades in the mapping) a reference assembly was conducted against the 68 Accumulibacter ppk1 genes using CLCs reference mapping function requiring min. 85% identity over 70% of the read length. Only reads with a minimum length of 60 bp were used. Otherwise the analysis was conducted as the specificity analysis.

The high resolution of the diversity within the genus using the ppk1 gene was used as a test case to validate the specificity (false positive matches) and the sensitivity (ability to recruit reads from other Accumulibacter species) of the reference mapping. A total of 138 ppk1 genes were used to construct a phylogenetic tree (Figure S5) and the phylogenetic position of each sequence was mimicked in MEGAN for the assignment of individual reads to different nodes on the tree. The specificity analysis showed that only 10 of the 268 ppk1 reads assigned to the Accumulibacter IIA str. UW-1 ppk1 gene had a better match to non-Accumulibacter ppk1 sequences (Figure S7A). However, the sensitivity analysis showed that although we were able to recruit most clade IIA ppk1 reads using the clade IIA str. UW-1 ppk1 gene, we were not able to recruit more than approximately 30-50% of the reads from other Accumulibacter species (Figure S7C).

Supplementary Figure 1 of 7.

Supplementary Figure 1.Histogram of length distribution of the denovo assembled contigs. Contigs ≥ 300 bp were used for further analysis (blue bars).

Supplementary Figure 2 of 7.

Supplementary Figure 2. Histogram of the length distribution of ORFs with a significant BLAST hit (e-value ≤ 1e-5) compared to ORFs where no significant hit could be found. The “double curved” plots are due to the minimum contig size of 300 bp (100 amino acids).

Supplementary Figure 3 of 7.

Supplementary Figure 3. A) Species abundance curve. “Best hit” represent species assignment based on best BLASTP hit. “10% Bitscore filter” represent species assignment if the best BLASTP hit had a bitscore that is >10% higher than the second best BLASTP hit. The graph only shows species with more than 100 ORFs assigned (100 ORFs ≈ 0.05% of all ORFs). B) Species abundance chart. The 20 most abundant species are shown in the legend in decreasing abundance. ORFs were assigned based on best BLASTP hit. C) Rarefaction curves. The rarefaction function of MEGAN was used to create rarefaction curves at different phylogenetic levels. The assignment is based on a 10% bitscore filter and minimum 5 ORFs assigned.

Supplementary Figure 4 of 7.

Supplementary Figure 4. Annotation of ORFs in the largest contig (32884 bp). A yellow ORF denote a significant blast hit (e-value ≤ 1e-5) whereas brown denotes no significant hit.

Supplementary Figure 5 of 7.

Supplementary Figure 5.Phylogenetic tree of ppk1 sequences. Sequences from Aalborg East have been marked in red and the ppk1 sequence from “Candidatus Accumulibacter phosphatis” clade IIA str. UW-1 has been marked in blue. In addition clade assignments have been added. A putative new clade has been marked as IIx. The tree was first created on the basis of 87 general ppk1 genes and only selected representative sequences are shown in the final tree. The outgroup sequences (not shown on the tree) were Ralstonia eutropha YP_300029,Ralstonia eutropha YP_729175 and Stenotrophomonas maltophilia K279a CAQ44540.

Supplementary Figure 6 of 7.

Supplementary Figure 6.Comparison of genes prevalent in the different read pools based on a reference mapping to the Accumulibacter genome. The percent read length covered was used to compare presence or absence of genes. High identity reads (>95% identical at nt level, x-axis) was compared with the rest of the read pool (≤95% identical at nt level, y-axis). Each dot represents one gene. In order to compare which genes that differed between the high (>95%) and low (≤95%) identity read pools, the read pool size of the low-identity group was normalized (by subsampling) to the same size (179 741 reads) as the high-identity read pool, thereby effectively comparing the prevalent genes in both read pools.

Supplementary Figure 7 of 7.

Supplementary Figure 7. Investigation of the specificity and sensitivity of the mapping of metagenome reads to the genome of Accumulibacter clade IIA (NC_013194). MEGAN was used to visualize the BLASTn results. A 10% bitscore difference was used to assign reads to nodes. A) Investigation of the specificity of the mapping of the metagenome reads to the Accumulibacter clade IIA ppk1 gene. The metagenome reads mapping to the clade IIA ppk1 gene were extracted and mapped to 68 non-redundant accumulibacter ppk1 genes and 5 ppk1 genes from closely related species. Few reads had best match to other species than Accumulibacter. B) Investigation of the ability to include other Accumulibacter clades by the use of the Accumulibacter clade IIA genome. The metagenome reads were mapped to 68 non-redundant Accumulibacter ppk1 genes and the extracted read pool was searched (BLASTn) against all 68+5 ppk1 genes and visualised using MEGAN. C) The combination of panel A and B reveals that most clade IIA reads are extractable using the clade IIA genome, however only approximately 30% of reads matching other clades are extracted.

Supplementary Table 1 of 2.

Supplementary references for Table S1.

Crocetti GR., Hugenholtz P, Bond PL, Schuler A, Keller J, Jenkins D, Blackall LL (2000). Identification of polyphosphate-accumulating organisms and design of 16S rRNA-directed probes for their detection and quantitation. Appl Environ Microbiol 66:1175-1182.

Daims H, Nielsen JL, Nielsen PH, Schleifer KH, Wagner M (2001). In situ characterization of Nitrospira-like nitrite- oxidizing bacteria active in wastewater treatment plants. Appl Environ Microbiol 67:5273-5284.

Daims H, Bruhl A, Amann R, Schleifer K-H,Wagner M (1999). The domain-specific probe EUB338 is insufficient for the detection of all bacteria: development and evaluation of a more comprehensive probe set. Syst Appl Microbiol 22: 434–444.

Erhart R, Bradford D, Seviour RJ, Amann R, Blackall LL (1997). Development and use of fluorescent in situ hybridization probes for the detection and identification of ‘Microthrix parvicella’ in activated sludge. Systematic Appl Microbiol 20:310-318.

Flowers J, He S, Carvalho G, Peterson SB, Lopez C, Yilmaz S, Zilles JL, Morgenroth E, Lemos PC, Reis MAM, Crespo MTB, Noguera DR, McMahon KD (2008). Ecological differentiation of Accumulibacter in EBPR reactors. In: Proceedings of the Water Environment Federation, WEFTEC 2008 (12):31-42.

Gieseke A, Purkhold U, Wagner M, Amann R, Schramm A (2001). Community structure and activity dynamics of nitrifying bacteria in a phosphate-removing biofilm. Appl Environ Microbiol 67:1351-1362.

Giuliano L, De Domenico M, De Domenico E, Hofle MG, Yakimov MM (1999). Identification of culturable oligotrophic bacteria within naturally occurring bacterioplankton communities of the Ligurian sea by 16S rRNA sequencing and probing. Micro Ecol 37:77-85.

Hess A, Zarda B, Hahn D, Haner A, Stax D, Hohener P, Zeyer J (1997). In situ analysis of denitrifying toluene- and m-xylene-degrading bacteria in a diesel fuel-contaminated laboratory aquifer column. Appl Environ Microbiol 63:2136-2141.

Hugenholtz P, Tyson GW, Webb RI, Wagner AM, Blackall LL (2001). Investigation of Candidate division TM7, a recently recognizedmajor lineage of the domain bacteriawith no known pure-culture representatives. Appl Environ Microbiol 67:411-419.

Kanagawa T, Kamagata Y, Aruga S, Kohno T, Horn M, Wagner M (2000). Phylogenetic analysis of and oligonucleotide probe development for Eikelboom type 021N filamentous bacteria isolated from bulking activated sludge. Appl Environ Microbiol 66:5043-5052.

Kong YH, Beer M, Rees GN, Seviour RJ (2002). Functional analysis of microbial communities in aerobiceanaerobic sequencing batch reactors fed with different phosphorus/ carbon (P/C) ratios. Microbiology-Sgm148:2299-2307.

Kong Y, Nielsen JL, Nielsen PH (2005). Identity and ecophysiology of uncultured actinobacterial polyphosphate- accumulating organisms in full-scale enhanced biological phosphorus removal plants. Appl Environ Microbiol 71:4076-4085.

Kragelund C, Levantesi C, Borger A, Thelen K, Eikelboom D, Tandoi V, Kong Y, Krooneman J, Larsen P, Thomsen TR, Nielsen PH (2008). Identity, abundance and ecophysiology of filamentous bacteria belonging to the Bacteroidetes present in activated sludge plants. Microbiology 154:886-894.

Lajoie CA, Layton AC, Gregory IR, Sayler GS, Taylor DE, Meyers AJ (2000). Zoogloeal clusters and sludge dewatering potential in an industrial activated-sludge wastewater treatment plant. Water Environ Research 72:56-64.

Levantesi C, Rossetti S, Thelen K, Kragelund C, Krooneman J, Eikelboom D, Nielsen PH, Tandoi V (2006). Phylogeny, physiology and distribution of ‘Candidatus Microthrix calida’, anew Microthrix species isolated from industrial activated sludge wastewater treatment plants. Environ Microbiol 8:1552-1563.

Maixner F, Noguera DR, Anneser B, Stoecker K, Wegl G, Wagner M, Daims H (2006). Nitrite concentration influences the population structure of Nitrospira-like bacteria. Environ Microbiol 8:1487-1495.

Mobarry BK, Wagner M, Urbain V, Rittmann BE, Stahl DA (1996). Phylogenetic probes for analyzing abundance and spatial organization of nitrifying bacteria. Appl Environ Microbiol 62:2156-2162.

Nguyen HTT, Le VQ, Hansen AA, Nielsen JL, Nielsen PH (2011). High diversity and abundance of putative polyphosphate-accumulating Tetrasphaera-related bacteria in activated sludge systems. FEMS Microbiol Ecol 76:256-267.

Rossello-Mora RA, Wagner M, Amann R, Schleifer KH (1995). The abundance of Zoogloea ramigera in sewage treatment plants. Appl Environ Microbiol 61:702-707.

Schauer M, Hahn MW (2005). Diversity and phylogenetic affiliations of morphologically conspicuous large filamentous bacteria occurring in the pelagic zones of a broad spectrumof freshwater habitats. Appl Environ Microbiol 71:1931-1940.

Thomsen TR, Nielsen JL, Ramsing NB, Nielsen PH (2004). Micromanipulation and further identification of FISH-labelled microcolonies of a dominant denitrifying bacterium in activated sludge. Environ Microbiol 6:470-479.

Trebesius K, Leitritz L, Adler K, Schubert S, Autenrieth IB, Heesemann J (2000). Culture independent and rapid identification of bacterial pathogens in necrotising fasciitis and streptococcal toxic shock syndrome by fluorescence in situ hybridisation. Medical Microbiol Immun 188:169-175.

Supplementary Table 2 of 2

Table S2 Selected reference genomes from Dinsdale et al., (2008b) used for comparison with the metagenome obtained in the current study. In addition a metagenome from a non-EPBR wastewater treatment plant was included (Sanapareddy et al., 2009).

Metagenome Name / Environment / MG-RAST ID / Reference
Soudan Red Stuff / Subterranean / 4440281 / Edwards et al., 2006
Soudan Black Stuff / Subterranean / 4440282 / Edwards et al., 2006
Low Saltern microbes / Hyper-Saline / 4440437 / Rodriguez-Brito et al 2009
Medium Saltern Microbes (MB1110) / Hyper-Saline / 4440435 / Rodriguez-Brito et al 2009
Medium saltern microbes(MB1111) / Hyper-Saline / 4440434 / Rodriguez-Brito et al 2009
Low saltern pond plasmids (TT) / Hyper-Saline / 4440090 / Rodriguez-Brito et al 2009
High saltern microbial (HB1128) / Hyper-Saline / 4440419 / Rodriguez-Brito et al 2009
Salton Sea Bacteria 1 / Hyper-Saline / 4440329 / Swan et al., 2010
Medium salinity microbial (MB1116) / Hyper-Saline / 4440425 / Rodriguez-Brito et al 2009
Low salinity microbial (LB1128) / Hyper-Saline / 4440426 / Rodriguez-Brito et al 2009
Line Islands Kingman Reef B2 bacteria / Marine / 4440037 / Dinsdale et al., 2008a
Line Islands Christmas Reef B3 bacteria / Marine / 4440041 / Dinsdale et al., 2008a
Line Islands Palmyra F8 Bacteria / Marine / 4440039 / Dinsdale et al., 2008a
DMSP 1 (MAM.1) / Marine / 4440364 / Mou et al., 2008
DMSP 2 (MAM.2) / Marine / 4440360 / Mou et al., 2008
VAN 2 (MAM 4) / Marine / 4440363 / Mou et al., 2008
Tilapia pond microbes / Freshwater / 4440440 / Rodriguez-Brito et al 2009
Healthy Tilapia pond microbes / Freshwater / 4440413 / Rodriguez-Brito et al 2009
Healthy Prebead tank microbes / Freshwater / 4440411 / Rodriguez-Brito et al 2009
Tpond microbe 3 / Freshwater / 4440422 / Rodriguez-Brito et al 2009
Rios Mesquites Stromatolites bacteria / Microbialites / 4440060 / Breitbart et al., 2009
Pozas Azule II stromatolite microbes / Microbialites / 4440067 / Desnues et al., 2008
Healthy slime bacteria / Fish / 4440059 / Angly et al., 2009
Morbid slime bacteria / Fish / 4440066 / Angly et al., 2009
Healthy gut bacteria / Fish / 4440055 / Angly et al., 2009
Morbid gut bacteria / Fish / 4440056 / Angly et al., 2009
Non-EBPR wastewater treatment plant / WWTP / N/A / Sanapareddy et al., 2009

Supplementary references for Table S1.

Angly FE, Willner D, Prieto-Davó A, Edwards RA, Schmieder R, Vega-Thurber R, et al. (2009). The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Computational Biology 5:e1000593.

Breitbart M, Hoare A, Nitti A, Siefert J, Haynes M, Dinsdale E, et al. (2009). Metagenomic and stable isotopic analyses of modern freshwater microbialites in Cuatro Ciénegas, Mexico. Environ Microbiol 11:16-34.

Desnues C, Rodriguez-Brito B, Rayhawk S, Kelley S, Tran T, Haynes M, et al. (2008). Biodiversity and biogeography of phages in modern stromatolites and thrombolites. Nature 452:340-343.

Dinsdale EA, Pantos O, Smriga S, Edwards RA, Angly F, Wegley L, et al. (2008a). Microbial ecology of four coral atolls in the Northern Line Islands. PloS one 3:e1584.

Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, et al. (2008b). Functional metagenomic profiling of nine biomes. Nature 452:629–632.

Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, Peterson DM, et al. (2006). Using pyrosequencing to shed light on deep mine microbial ecology. BMC genomics 7:57.

Mou X, Sun S, Edwards RA, Hodson RE, Moran MA. (2008). Bacterial carbon processing by generalist species in the coastal ocean. Nature 451:708-711.

Rodriguez-Brito B, Li L, Wegley L, Furlan M, Angly F, Breitbart M, et al. (2010). Viral and microbial community dynamics in four aquatic environments. The ISME J 4:739-751.

Sanapareddy N, Hamp TJ, Gonzalez LC, Hilger HA, Fodor AA, Clinton SM. (2009). Molecular diversity of a North Carolina wastewater treatment plant as revealed by pyrosequencing. Appl Environ Microbiol 75:1688-1696.

Swan BK, Ehrhardt CJ, Reifel KM, Moreno LI, Valentine DL. (2010). Archaeal and bacterial communities respond differently to environmental gradients in anoxic sediments of a California hypersaline lake, the Salton Sea. Appl Environ Microbiol 76:757-768.

1 of 14