Supplemental Spreadsheets and Figures

Evolution of the Polyphosphate Accumulating Phenotype in Candidatus Accumulibacter phosphatis

Ben O. Oyserman1†, Francisco Moya1, Christopher E. Lawson1, Antonio L. Garcia1, Mark Vogt1, Mitchell Heffernen1, Daniel R Noguera1, Katherine D. McMahon1, 2

1Department of Civil and Environmental Engineering, University of Wisconsin at Madison, Madison, WI, 53706, USA; 2Department of Bacteriology, University of Wisconsin at Madison, Madison, WI, 53706, USA

Introduction

Spreadsheets

Supplemental Spreadsheet 1

Results from the identification of orthologous gene clustersSheet 1- All panRhodocyclaceae gene clusters identified using MCL and the number of representative genes in each genome. Sheet 2 -Summary statistics of pan Rhodocyclacaegene clusters Sheet 3 - All pan Accumulibacter gene clusters identified using MCL and the number of representative genes in each genome. Sheet 4 -Summary statistics of pan Accumulibactergenome Sheet 5 -Summary of pan-Rhodocyclaceae single copy orthologs.

Supplemental Spreadsheet 2

Summary of the ancestral genome reconstruction inferred gene presence at each node. Sheet 1-Summary of discrete evolutionary categories in the CAP2UW1 genome:Ancestral (1), derived (2), lineage specific (3), and flexible genes (4) Sheet 2-Statistics about key nodes within the Accumulibacter phylogeny Sheet 3 -Raw output from Count inference on gene gain, loss, duplications and contractions Sheet 4 – All gene clusters and the locus tags from each genome that comprise each cluster Sheet 5 – CAP2UW1 locus tags for each gene cluster Sheet 6 CAP2UW1 locus tags for ancestral gene clusters Sheet 7 -CAP2UW1 locus tags for ancestral gene clusters that were not lost at any internal node Sheet 8 -CAP2UW1 locus tags for ancestral gene clusters gained at the Dechlromonas/Accumulibacter node Sheet 9 -CAP2UW1 locus tags for gene clusters gained at Dechloromonas/Accumulibacter node Sheet 10 -CAP2UW1 locus tags for gene clusters gained at Dechloromonas/Accumulibacter node without being lost at any internal node Sheet 11 -CAP2UW1 locus tags for derived gene clusters, gained at Accumulibacter node that were not lost at any internal node Sheet 12 – Lineage specific CAP2UW1 gene clusters not found in any other Accumulibacter genome.

Supplemental Spreadsheet 3

Summary of kinetics and stoichiometry of Accumulibacter Clade IA Sheet 1 – Raw data and linear model of uptake and release for phosphorus, magnesium, potassium, acetate and polyhydroxybutyrate. Sheet 2– Calculated kinetic and stoichiometric values

Supplemental Spreadsheet 4

Analysis of all KEGG maps and pathways and the COG category Inorganic Ion Transport and Metabolism Sheet 1 – All CAP2UW1 pathways with locus tags and ancestral states Ancestral (1), derived (2), lineage specific (3), and flexible genes (4). Sheet 2 – Summary of all CAP2UW1 pathways.

Supplemental Spreadsheet 5

Results from BLAST analysis on derived genes Sheet 1 – The BLAST results for all derived genes summarizing the most dominant class, order and family as well as the highest classification with over 50 percent of reads. Sheet 2 –Shannon diversity of BLAST results and expression patterns from Oyserman et al 2015. For all derived genesSheet 3 – Summary of dominant lineage in BLAST results for laterally acquired genes.Sheet 4 –Sensitivity analysis of all HGT genes at the 10%, 5% and 0% thresholds Sheet 5 – Sensitivity analysis of HGT genes that were included in the evolutionary model at the thresholds of 10%, 5% and 0%.

Supplemental Spreadsheet 6

Determining a cut-off for including a gene family as core. This cut-off was then used this cut-off to identify core genes from a list potential core genesSheet 1– The probability of each pattern of presence and absence given the genome completeness estimates. Since there are 10 genomes, there are 210 possible patterns. The probability that a core gene family was present in genome is equal to its completeness; the probability that a core gene was absent is equal to 1 minus the completeness. The sum probability for each cut-off (e.g. 10, 9, 8… etc. genomes) was determined and a cut-off of 7 genomes was selected, as this would capture 99% of core gene families. Sheet 2 – Identifying core genes from a list of potential core genes. Potential core genes were defined as any gene inferred at the LCA of Accumulibacter and not last at any internal node. Sheet 3 – Accumulation curves of the observed and expected frequency of identifying core gene families using different cut-offs. Sheet 4 – The cumulative observed and expected frequency of identifying derived core gene families using different cut-offs.

Supplemental Spreadsheet 7

The relative abundance of Accumulibacter Clade IIA, IIC and IID was determined with quantitative PCR (qPCR) using clade specific primers. The qPCR results show that Clade IIA represented >99% of all Clade II sequences on each date tested. Sheets 1, 2 and 3 contain the raw outputs, and sheet 4 provides a summary of the results.

Supplemental Figures

Supplemental Figure 1

Phylogeny of seventy-four pan single-copy gene phylogeny with bootstrap values.

Supplemental Figure 2

Collapsed phylogeny with specific values for gene gain, loss and presence. Numbers at nodes are referenced in the text and in supplemental spreadsheets. For example, node 12 is the LCA of Accumnulibacter.

Supplemental Figure 3

Identifying cut-offs for very low, low, medium, high Shannon index diversity of BLAST results for laterally acquired genes.

Supplemental Figure 4

Locus tags included in an evolutionary model of CAP2UW1 depicting ancestral, laterally derived, flexible and lineage specific genes. Abberviations: Ac, acetate; AcAc-CoA, acetoacetyl-CoA; Ac-CoA, acyl-CoA; Ac-AMP, acetyl AMP; Ac-P, acetyl-P; ADP-Glu, adenosine 5-diphosphoglucose; CDPD, cytidinediphosphatediacylglycerol; C.I, complex I oxidative phosphorylation; C.II, complex II oxidative phosphorylation; C.III, complex III oxidative phosphorylation; C.IV, complex IV oxidative phosphorylation; E4-P, erythrose 4-phosphate; FNR, NADPH-ferredoxinreductase; Fru-1-6P, fructose 1,6-bisphosphate; Fru-6-P, fructose 6-phosphate; G3P, glyceraldehyde 3-phosphate; Glu, glucose; Glu-1-p, glucose 1-phosphate; Glu-6-P, glucose 6-phosphate; Gly, glycogen; GlyA, glycogen amylose; Glyc-P, glycerone-P; Long Chain FA, long chain fatty acid; PE, phosphatidylethanolamine; PEP, phosphoenolpyruvate; PGP, 1,2-diacyl-sn-glycerol-3p; pntAB, proton-translocatingtranshydrogenase; PolyP, polyphosphate; PPP, pyrophosphate-energized proton pump; Ptd-L-Ser, phosphatidylserine; Pyr, pyruvate; 1,3-bPG, 1,3-bisphosphoglyceric acid; Ri15P2, ribulose 1,5P2; Ri5-P, ribose 5-phosphate; Ru5P, ribulose 5-phosphate; S7-P, sedoheptulose-7-phosphate; SBP, sedoheptulose 1,7-bisphosphate; X5P, xylulose 5-phosphate; 3HB-CoA, (R)-3-hydroxy-butanoyl-CoA; 2-PG, 2-phosphoglycerate; 3-PG, 3-phosphoglyceric acid.

Supplemental Figure 5

The linear portions of measured acetate, phosphorus, magnesium, potassium, and polyhydroxybutyrate used to calculate kinetic parameters.

Supplemental Tables

Table 1

Relative abundance of Accumulibacter Clade IA and Clade IIA measured by fluorescent in situ hybridization on dates of kinetic and stoichiometric characterizations.