Supplemental Methods s3

Supplemental Methods

Sampling

Surface water was collected from the Kranji Reservoir, Singapore, as grab samples off the Public Utilities Board dock (1°26.23’N, 103°44.17’E). The Kranji Reservoir is a shallow impoundment freshwater reservoir (mean depth 3.5 m) covering approximately 2.8 km2 and surrounded by a catchment with diverse land uses including residential and agricultural (Gin et al 2011). Samples were obtained from 10-20cm beneath a visible algal surface scum from January 14 to 15, 2010 at 5pm, 8pm, 6am, 8am, 1pm and 3pm. 205-350 ml for each sample was filtered onto a GP 0.22μm sterivex filter (Catalog # SVGP01050) and immediately preserved in 2.5ml of RNAlater, followed by immediate refrigeration. Samples were shipped from Singapore to Cambridge, MA on dry ice where they were stored at -80°C until RNA extraction was performed.

Metadata

The environmental parameters temperature, Total Kjehldahl Nitrogen, Total Phosphate, Dissolved Organic Carbon, microcystin concentration and light intensity were measured for each sample. Light intensity was measured with a hand-held unit (Ruby electronics DT2232). Samples for nutrient concentration measurements were taken prior to filtration for RNA preservation and were separately processed by Setsco Services Pte Ltd (Singapore) using the American Public Health Association Standard Method for the determination of water and wastewater (Eaton et al 2005). The microcystin toxin levels in the surface samples (without lysing cyanobacterial cells) were measured with QuantiTube™ Kit for microcystin (Catalog # ET039) which uses a competitive Enzyme-Linked ImmunoSorbent Assay (ELISA). The calibration range for the kit is 0.4 - 2.5 ppb (parts per billion, ng/mL) and the Limit of Detection (LOD) is 0.18 ppb.

RNA extraction and cDNA synthesis

For each sample, the tip of the sterivex filter cartridge was cut open and the RNAlater was removed by pushing it through the filter with a syringe. The filter cartridge was then broken in a sterile Whirl-Pak plastic bag using a hammer to access the filter inside. Next, the filter was cut into pieces and put in 15 mg/ml lysozyme, vortexed and incubated at 37oC for 10 min. Sterile autoclaved beads were added to the tube which was bead beaten for 10min. The tubes were submerged in ice at 2 min intervals to keep cool. Tubes were spun down at 1,000 rpm for 1 min and the supernatant was collected to a new tube, which was centrifuged at 14,000 rpm for 3 min. To each pellet, 1 ml TRIzol was added, vortexed for 1 min until the pellet disappeared. Then, the tube was incubated at room temperature for 5 min. Next, 200μl chloroform was added to partition RNA into aqueous supernatant for separation, and then vortexed and incubated for 8 min. After centrifuging at 12,000x g for 15 min, aqueous phase was removed. 1μl glycogen (15mg/ml) was added (to increase precipitation efficiency from dilute RNA solutions) and then followed by 0.5 ml isopropanol to precipitate total RNA. After incubating at -80oC overnight, the tubes were centrifuged again at 14,000 rpm for 30 min. The pellets were washed with 70% ethanol, centrifuged at 12,000x g for 5 min and air-dried. Extracted total RNA pellets were resuspended in 50μl DEPC treated RNase-free water and stored at -80oC. To deplete eukaryotic mRNAs and to enrich for microbial mRNA, 100 mg of total RNA was used. PolyA subtraction of the total RNAwas carried out using the Poly(A)PuristTM MAG Kit (Catalog # AM1922).

To enrich for bacterial mRNAs and deplete eukaryotic RNA (18S, 28S rRNAs and polyadenylated mRNAs) from the polyA removed total RNA, the MICROBEnrichTM Kit (Ambion Part # AM1901) was used. Bacterial mRNA enrichment and bacterial rRNA (16S and 23S rRNAs) removal was carried out using the MICROBExpressä Kit (Ambion Part # AM1905). The removal of tRNA and 5S rRNA was done using the MEGAclearä kit (Ambion Part # AM1908). RNA remaining from the three-step subtraction above was precipitated at -80oC overnight in 0.1 volume 3 M sodium acetate, 0.02 volume glycogen (15mg/ml) and 3 volume ice cold 100% ethanol, with brief vortexing to mix. RNA was recovered by centrifugation at 13,000 rpm for 30 min. Supernatant was discarded and the pellet was twice washed with 70% ethanol (centrifuge at 13,000 rpm for 5 min). After washing, the supernatant was discarded and RNA pellets were air-dried for 5 min and suspended in 10μl of DEPC treated RNase-free water and stored at -80oC. To further remove any remaining 16S and 23S rRNA, the mRNA-ONLYä Prokaryotic mRNA Isolation Kit (EPICENTRE Biotechnologies, Cat. Nos. MOP51024) was used to further enrich for prokaryotic mRNA that would become substantially free of rRNA. The subtracted RNA remaining from the above four kits was subjected to in vitro transcription (IVT)-mediated linear amplification using the MessageAmpäII-Bacteria Kit (Ambion Part # AM1790). Amplified RNA was ethanol precipitated and re-suspended in DEPC-treated water. SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen, Catalog # 11917-020) and hexadecamers (Integrated DNA Technologies, Catalog # S1230S) were used to generate double stranded cDNA.

Illumina sequencing

cDNA (5 μg) was sheared using Adaptive Focused Acoustic technology on a Bioruptor(Diagenode, Inc.) to generate fragments of 350 bp in length. End-repair was done with Quick BluntingTM Kit (New England Biolabs, Catalog # E1201L) and the blunting reaction was purified using MinElute Reaction Cleanup Kit (Qiagen, Catalog # 28204). The Barcoded adaptors were ordered synthesized as ss DNA oligos (IDT) and were ligated to each other, prior to ligating to the sheared DNA fragment. The adaptor duplex was then ligated to the end-repaired “A”-tailed overhang ds cDNAs. Each barcoded adaptor duplex was Six unique Adaptor mixes were prepared by mixing 50 μl Adaptor PEx.1 (100 μM), 50 μl Adaptor PEx.2 (100 μM), 20 μl Oligo Hybridization 10X Buffer [500 mM NaCl, 10 mM Tris-Cl pH 8.0, 1 mM EDTA pH 8.0] and 80 μl dH2O. As each sample was uniquely barcoded with a distinct 6-nucleotide sequence, six pairs of adaptors were added to each reaction individually. The mixture was then purified using QIAquick PCR Purification Kit (Catalog # 28106) and eluted in 30 μl Elution Buffer. The ligated products were purified on a gel to remove such that unligated adaptors and self-ligated adaptors were removed. A size-range for paired end sequencing was selected by accounting consideringfor an 80 base pair (bp) adaptor and barcode (both ends together) and 200-250 bp RNA sequence. Thus each sample was excised from a 2% agarose gel at 350 bp and purified using NucleoSpin® Extract II Kit (Clontech, Catalog # 740609.50). The resulting DNA was PCR enriched using adaptor based primers to selectively enrich cDNA fragments that have adapter molecules on both ends as only these were able to attach to flow cells and generate clusters. Personnel from the MIT BioMicro Center carried out Illumina sequencing of the six libraries using a single lane of the GAII platform.

Post-sequencing processing

Using the short read toolbox (brianknaus.com/software/srtoolbox) Illumina reads from a single lane on a GAII machine were de-multiplexed based on the barcodes. The sorted reads were trimmed by removing bases following any base with a phred quality score below 33, then all duplicate reads and reads 20 bases were removed. Sequences were then screened for possible rRNA contamination by a blastn search against the large subunit and small subunit RNA databases from SILVA. Although adaptors were initially removed from the reads by the sequencing software, it is possible that short reads might still contain adaptor sequence at their ends. Thus unpaired short sequences with matches to adaptor sequences at their ends were removed. However, if a read was part of a pair they were run through a Perl script, which uses assembly information and adaptor sequence to aid in adaptor removal. Sequences containing long homopolymers were removed. Although a transcript sequence may have initially contained two read pairs after screening some pairs became singletons. For BLAST searches against NCBI’s non-redundant protein database members of paired reads were treated as separate. For rRNA analysis read pairs (called read 1 and 2) were analyzed separately and if one part of a pair matched rRNA the whole pair was considered rRNA.

Bacterioplankton community taxonomy and functional assignments

After rRNA sequences were removed, each set of sequences was used in a blastx (blastall 2.2.20) against NR, which was downloaded on August 2, 2012. Each dataset was composed of either the 1 or 2 reads. The command line for BLAST was: -m 8 -W 3 -e 20 -Q 11 -F m S. Results were then loaded into MEGAN version 4.7 (Huson et al 2007) with the parameters for the Lowest Common Ancestor assignment set as: a min support of 5, a min score of 40, top 10 percent, a win score of 0, using min complexity filter, a min complexity of 0.44 and the paired reads were loaded together. MEGAN was used to assign KEGG (Kanehisa and Goto 2000) and taxonomic classification to all sequences.

All reads classified into the top four phyla were saved in MEGAN with their original BLAST results. Comparisons of KEGG classification among these selected reads were made as five KEGG level-1 categories (“metabolism”, “genetic information processing”, “environmental information processing”, “organismal systems” and “cellular processes”) were further examined. Subcategories for “metabolism” and “cellular processes” were also further compared. Under the KEGG level-one category of “environmental information and processing”, specific types of ATP Binding Cassette (ABC) transporters, two component systems and secretion systems were identified. For each specific type, the totals were summed for each of the top four phyla, but for ABC transporters, only operons with >20 reads from at least one phylum were examined.

Ribosomal RNA analysis

Individual files with SSU matches were loaded into MG-RAST on February 1, 2013 (Meyer et al 2008). The results from “best-hit” matches against the SSU database were used to provide a taxonomic composition of each sample. The parameters were set such that threshold was an e-value cutoff of 1e-20, at >60% sequence identity and >50 bp alignment.

Calculation of RPKM values for a M. aeruginosa pan-genome

The abundance of M. aeruginosa transcripts in different Kranji Reservoir samples were compared by constructing an RPKM (reads per kilobase of exon per million mapped reads) matrix (Mortazavi et al 2008) in CLC Genomics Workbench (CLC, Denmark). The publicly available M. aeruginosa genome sequences NIES-843 (accession NC_010296) and PCC7806 (accession AM 778843- 778958) were used as reference sequences to calculate RPKM values from ortholog and paralog groups (next section). To optimize recruitment of reads to the reference genomes from M. aeruginosa, but not from other Cyanobacteria, simulated target and non-target metatranscriptomes were generated using MetaSim (Richter et al 2008) and mapped onto the M. aeruginosa reference genomes using the RNA-seq tool in CLC Genomics Workbench. A percent identity threshold that maximized recruitment of simulated transcripts from target Microcystis genomes while minimizing recruitment from non-target Nostoc punctiformes PCC73102 and Synechocystis PCC6803 genomes was selected for recruitment of reads from Kranji Reservoir (i.e. >90% nucleotide identity over >90% of the read length).

A Microcystis pan-genome consisting of core and strain-specific (i.e. flexible) genes was created for construction of the RPKM matrix for normalized transcript abundance. M. aeruginosa genome sequences from strains NIES-843 (Kaneko et al 2007) and PCC7806 (Frangeul et al 2008) were compared to identify a set of core genes found in both genomes and a set of flexible genes that are unique to each genome. The orthologs from the two genomes were determined via the Reciprocal Smallest Distance (RSD) algorithm using an e-value cutoff at <1e-5 and a minimum alignment of 80% of total length (Wall et al 2003). For genes with multiple RSD matches (i.e. genes with identical smallest distances), groups were formed, which are important for calculations of RPKM values to be explained later. All orthologs were considered as the core genes of M. aeruginosa while genes without orthologs were considered part of the flexible genome. For orthologs, the RPKM values for the two genes were summed, for ortholog pairs with identical paralogs (i.e. genes with identical smallest distances) within a genome, the sum of the RPKM values for all homologs were determined. For the final RPKM matrix, the RPKM values for the 1 and 2 reads were considered technical replicates and averaged for each time point.

Statistical analysis for differential transcript and gene set abundance

The expression matrix of the Microcystis pan-genome was analyzed within the MultiExperiment Viewer (MEV) (Saeed et al 2006). Expression values were log2 transformed before normalization. Principal Component Analysis (PCA) and Hierarchical Clustering (HCl) of Pearson correlations then clustered samples. Both PCA and HCl grouped the samples according to whether they were collected during the daytime or during the evening/night, thus a T-test and Significance Analysis for Microarrays (SAM) were conducted to test the null hypothesis of no difference in gene expression with stage of the diel cycle. Genes corresponding to nitrogen or phosphorous metabolism in the M. aeruginosa genome were selected as defined in (Harke, et al 2013) and RPKM values for different samples were clustered based on Spearman Rank correlation.

Gene Set Enrichment Analysis (GSEA)

Enrichment of transcripts in level 3 KEGG pathways under groupings emergent from PCA and HCL (i.e. “day” or “night” collection times) were determined by Gene Set Enrichment Analysis (GSEA) implemented through the online Broad Server (Subramanian et al 2005). Input files were the M. aeruginosa NIES-843 and PCC7806 pan-genome RPKM expression dataset; the gene set database where gene sets were defined as sets of KEGG Ortholog ids (KO ids) falling within the same KEGG pathways (KEGG release 65.0), or sets of secondary metabolite gene clusters identified previously (Frangeul et al 2008) or if not previously identified then by the presence of polyketide synthase (PKS) or non-ribosomal peptide synthase (NRPS) genes; and a sample ID file defining the day and night partition of the six samples. The initial gene set contained 347 level 3 KEGG pathways and five secondary metabolite gene clusters, in which individual genes may be present in more than one pathway. Genes within the RPKM matrix were assigned KO ids by using the KEGG online tool KEGG Automatic Annotation Server and by implementing single direction best blastx hit to the default gene-set representatives from KEGG and the M. aeruginosa NIES-843 genome. The bit-score for a hit was set at 60, which is the default for KEGG annotation. For genes with the same KO id the RPKM values were summed. Expression values were normalized in GSEA. The parameters in GSEA were set such that the minimum gene set that was included in the analysis was five genes and the gene sets were permuted. Enrichment of transcripts in each of the top four phyla relative to the other three top phyla was examined the same way as M. aeruginosa except KO quantities were based on absolute read counts.