Cysteine and methionine metabolism and its regulation in dairy starter and related bacteria redefined at higher resolution

Mengjin Liu1,2†, Celine Prakash2‡, Arjen Nauta1, Roland J Siezen2,3,4,5,6, Christof Francke2,4,5,6,*

1. FrieslandCampina Research, Deventer, the Netherlands

2. Center for Molecular and Biomolecular Informatics (260), NCMLS, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands

3. NIZO food research, Ede, the Netherlands

4. Kluyver Center for Genomics of Industrial Fermentation, Delft, The Netherlands

5. Netherlands Bioinformatics Center, Nijmegen, The Netherlands

6. TI Food and Nutrition, Wageningen, The Netherlands

Corresponding author mailing address: RUNMC, CMBI, P.O.Box 9101, 6500HB, Nijmegen, the Netherlands. Email:

†present address: Hero-Huishan Nutrition Co.,Ltd. , CA10 Shangri-La,8 Pubei Road, Shenbei, Shenyang, China. ‡present address: Laboratory of Gene Regulation and Inflammation, Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Biopolis, Immunos #04-00, 8A Biomedical Grove, Singapore 138648

Keywords: Lactic acid bacteria, transcription regulation, sulphur metabolism, T-box, LysR-famliy
ABSTRACT

Sulphuric volatile compounds provide many dairy products with a characteristic odor and taste. The volatile compounds mainly originate from the catabolism of the sulphur-containing amino acids cysteine and methionine by the lactic acid bacteria applied as starter cultures. To better understand and control the environmental dependencies of sulphuric volatile compound formation by starter bacteria, we have used the available genome sequence and experimental information to systematically evaluate the presence of the key enzymes and to reconstruct the general modes of transcription regulation for the corresponding genes in species of the order Lactobacillales. The genomic organization of the key genes is suggestive of a sub-division of the reaction network into five modules, where we observed distinct differences in the modular composition between the families Lactobacillaceae, Enterococcaceae and Leuconostocaceae and the family Streptococcaceae. These differences are mirrored by the way transcription regulation of the genes is structured in these families. In the Lactobacillaceae, Enterococcaceae and Leuconostocaceae, the main shared mode of transcription regulation is Met T-box (methionine) mediated regulation. In addition, the gene metK, encoding S-adenosylmethionine (SAM) synthetase is controlled via the SMK-box (SAM). The SMK-box is also found upstream of metK in species of the family Streptococcaceae. However, the transcription control of the other modules is mediated via three different LysR-family regulators, MetR/MtaR (methionine), CmbR (O-acetyl-(homo)serine) and HomR (O-acetyl-homoserine). Redefinition of the associated DNA binding motifs helped to identify/disentangle the related regulons, which appeared to perfectly match the proposed sub-division of the reaction network.

INTRODUCTION

Many of the characteristic flavours in fermented dairy products such as cheese and yoghurt are the result of metabolic reactions involving sulphur-containing amino acids. The micro-organisms applied in these products degrade cysteine and methionine, resulting in the production of flavour components such as methanethiol, dimethyl sulphide (DMS), dimethyl disulphide (DMDS) and dimethyl trisulphide (DMTS). Insight in the regulatory signals and pathways that control the corresponding metabolic fluxes involved in the formation of these flavour compounds and their precursors is essential to rationally control and steer the flavour profiles of said dairy products.

The micro-organisms used to produce fermented dairy products belong to the taxonomic order Lactobacillales which includes the families Enterococcaceae, Lactobacillaceae, Leuconostocaceae, and Streptococcaceae. Many of the respective species are characterized by the fact that they produce lactic acid and are therefore known as the lactic acid bacteria (LAB). The transcription of genes encoding the proteins that are involved in cysteine and methionine metabolism in lactic acid bacteria and other Lactobacillales, is controlled by both regulator-binding and RNA riboswitch mechanisms. In various Streptococcaceae the LysR-family transcription regulators MtaR and CmbR have been shown to be involved in activation as well as repression of genes such as cysD, cysK, metA, metC, metE, and metF (e.g. for Lactococcus lactis (24, 68) and Streptococcus mutans (35, 66)). The transcription regulator HomR was reported to control the expression of metB in S. mutans and Streptococcus thermophilus (67).

In addition, three types of riboswitches for the regulation of cysteine and methionine metabolism have been reported for low-GC Gram-positive bacteria: the T-box, the S-box and the SMK-box (14, 21, 57, 76, 77). A riboswitch is a structural sequence element at the 5’ untranslated region of an mRNA molecule that can change conformation depending on the binding of an effector molecule. The conformational change can terminate transcription (when forming a terminator structure) or allow read through (when forming an anti-terminator structure) (73, 75). In the case of the T-box, a terminator structure is formed shortly after transcription initiation unless an uncharged tRNA related to a specific amino acid binds to the specifier codon present in the T-box element, whereupon the anti-terminator structure is formed (see (28)). In the case of the S-box and the SMK-box the terminator structure is formed in the presence of S-adenosylmethionine (SAM), whereas in the absence of this molecule transcription will continue (21, 26, 46).

Several studies describe the regulation of sulphur-containing amino acid metabolism for specific LAB and other closely related gram-positive bacteria. For instance, Hullo et al. (29) reported on the regulatory mechanisms related to cysteine and methionine conversions in Bacillus subtilis. Sperandio et al. described these relations for Lactococcus lactis (68) and Streptocococus mutans (66, 67). Rodionov et al. (57) and Kovaleva and Gelfand (35) performed a comprehensive comparative in silico study for the transcriptional regulators CmbR and MtaR within gram-positive bacteria. However, the availability of additional experimental and sequence data now allows an overview of the transcriptional control of the key enzymes involved in cysteine and methionine metabolism at a higher resolution. We therefore decided to extend the latter studies and to focus our efforts on the LAB and other Lactobacillales.

In a previous study, we improved the annotation of key enzymes involved in the metabolism of cysteine and methionine in LAB using genome-wide comparative analyses (40). Here, we extend the list of enzymes on basis of the pathway information present in the KEGG database (31). Redefinition of the binding motifs for CmbR, MetR/MtaR and HomR in Lactococci and Streptococci allowed the identification of transcription factor specific binding sites for these regulators. Also, Met-specific T-boxes and SMK-boxes were identified in recently sequenced and published genomes of e.g. L. bulgaricus, L. reuteri and L. casei. The absence of S-boxes (SAM-I) in the Lactoballilales as observed hitherto, was confirmed. Potential structure forming elements associated with the cysK gene and the hom-thrBC operon in various LAB were revealed as presented below.

MATERIALS AND METHODS

Genomic information, Tools and Data. Genomic information was retrieved from the ERGO resource (as of Dec 2009 (52)) and from the NCBI microbial genome database (as of September 2011 (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi)). BLAST searches were performed according to (1). Multiple sequence alignments and neighbour-joining trees (corrected for multiple substitutions) were generated using ClustalX (36). BioEdit was used to manipulate the alignments and to toggle between translated protein and nucleotide sequence (version 7.0.9, http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Hidden Markov Models (HMMs) were made and genome-wide HMM searches were performed using the HMMER package (http://hmmer.janelia.org/)(13). Genome context was visualized and upstream sequence data was collected using Microbial Genome Viewer 2 (version 1 in (33), version 2 [http://mgv2.cmbi.ru.nl/genome/index.html]; Overmars unpublished). Potential transcription factor binding sites and other regulatory elements of fixed composition and size were searched using the Similar Motif Search approach described by (18). The original data supporting the analyses presented in this paper can be found at www.cmbi.ru.nl/bamics/supplementary/Liuetal_2012_CysMetregulation.

Collection of genes related to cysteine and methionine metabolism. The species and strains that were analysed included all complete Lactobacillales genomes published before june 2011 and present within the NCBI database. The KEGG map ‘cysteine and methionine metabolism’ (map 00270) was used to define a core set of enzyme activities. The set extends the set of enzymes we previously defined (40). The protein sequences of experimentally verified members of the set (supplementary file 1) were used to search orthologs/functional equivalents in other species using BLAST. An orthologous relationship and/or functional equivalency was defined on basis of our earlier analyses (40), BLAST e-values, and in some cases multiple sequence alignments followed by clustering on basis of neighbour-joining (as described in (71)). The complete list of enzymes, their function annotation and the relevant experimental literature is given below. The annotation data was taken from the NCBI (55), KEGG (31) and PFAM (17) reference databases. The enzymes have been grouped in five clusters on basis of the composition of the related operons and shared EC numbers.

- Enzymes group 1: homoserine dehydrogenase (hom, EC 1.1.1.3, COG0460E, K00003, PF03447 and 00742 (7, 43, 53)); homoserine kinase (thrB, EC 2.7.1.39, COG0083E, K00872, PF08544 and 00288 (43)); aspartate kinase III (thrA (Bsubtilis _yclM), EC 2.7.2.4, COG0527E, K00928, PF01842 and 00696 (7, 34)); threonine synthase (thrC, EC 4.2.3.1, COG0498E, K01733, PF00291 (44, 61, 64, 70)).

- Enzymes group 2: serine acetyltransferase (cysE, EC 2.3.1.30, COG1045E, K00640, PF06426 (22, 29, 68)); homoserine O-acetyltransferase (metA, EC 2.3.1.31, COG1897E, K00651, PF04204 (80)); cysteine synthase A and cysteine synthase-like protein (Bsubtilis_cysK and Bsubtilis _ytkP, EC 2.5.1.47, COG0031E, K01738, PF00291 (24, 29, 74)); cystathionine gamma-synthase and O-acetylhomoserine (thiol)-lyase (Bsubtilis_yjcL, EC 2.5.1.48, COG0626E, K01739, PF01053 (3, 32)); O-acetyl-L-homoserine sulfhydrolase and O-acetyl-L-serine sulfhydrolase (cysD, EC 2.5.1.49, COG2873E, K01740, PF01053); cystathionine beta-synthase for the reverse transsulfurase pathway (Bsubtilis_yrhA, EC 4.2.1.22, COG0031E, K01738, PF00291 (29)); cystathionine beta/gamma-lyase and homocysteine gamma-lyase (Bsubtilis_yrhB Ecoli_metB, EC 2.5.1.48 and 4.4.1.8, COG0626E, K01760, PF01053 (12, 16, 29, 30)); cystathionine beta/gamma-lyase (Bsubtilis_yjcJ, EC 4.4.1.8 and 4.4.1.1, COG0626E, K01760, PF01053 (3)); PLP-dependent C-S lyase (Bsubtilis_patB Llactis_ytjE Ecoli_malY, EC 4.4.1.8 and 4.4.1.1, COG1168E, K14155, PF00155 (2, 30, 45)).

- Enzymes group 3: 5,10 methylenetedrahdrofolate reductase (metF, EC 1.5.1.20, ?, K00297, PF02219 (62)); bifunctional homocysteine S-methyltransferase 5,10-methylenetetrahydrofolate reductase protein (Bsubtilis_yitJ, EC 2.1.1.10 and 1.5.1.20, COG0646E (cobalamin dependent), K00547, PF02219 and 02574 (41)); homocysteine S-methyltransferase (mmuM, EC 2.1.1.10, COG2040E, K00547, PF02574 (72)); MmuM associated amino acid permease (mmuP, COG0833E, K03293, PF00324); methyltransferase (Bsubtilis_yxjG and Bsubtilis_yxjH Llactis_yhcE, EC 2.1.1.14?, COG0620E (cobalamin-independent), K00548, PF01717 (9, 37)); 5-methyltetrahydropteroyltriglutamate--homocysteine S-methyltransferase (metE, EC 2.1.1.14, COG0620E (cobalamin-independent), K00549, PF08267 and 01717 (20, 25)); S-ribosylhomocysteinase (luxS Llactis_ycgE, EC 4.4.1.21, COG1854T, K07173, PF02664; (37, 56)).

- Enzymes group 4: C-5 cytosine-specific DNA methylase and SP-beta prophage DNA (cytosine-5-)-methyltransferase (Bsubtilis_ydiO Bsubtilis _ydiP Bsubtilis_mtbP, EC 2.1.1.37, COG0270L, K00558, PF00145 (50, 78)); 5'-methylthioadenosine nucleosidase and S-adenosylhomocysteine nucleosidase (mtn Streptococci_pfs, EC 3.2.2.16 and 3.2.2.9, COG0775F, K01243, PF1048 (10)).

- Enzymes group 5: S-adenosylmethionine synthetase (metK, EC 2.5.1.6, COG0192H, K00789, PF02773 and 00438 (21, 47)).

Identification of putative regulatory elements and their regulons. Cis-regulatory elements were defined according to the specific footprinting method set out by Francke et al. (19). The method relies on the definition of Groups Of Orthologous Functional Equivalents (GOOFEs) on basis of orthology and conserved genomic context. The comparative linear genome maps generated by the Microbial Genome Viewer were used to visualize and inspect the context. For every GOOFE, the upstream regions (normally ~200 nucleotides) were collected and conserved sequence elements were searched by eye from a multiple sequence alignment and by using MEME (4). The conserved elements were compared and potential regulatory regions identified. In case the conserved elements resembled transcription factor binding motifs reported in literature, experimental data on regulators of the same protein family was searched directly via PubMed (59) or in the reference databases Regulon DB (23) and DBTBS (63). Because members of the same regulator-protein family will in general adopt the same fold, the DNA-binding motif should be similar (i.e. similar composition, and the same size and spacing). Therefore, established binding motifs of regulator-protein family members were taken into account to define the actual binding motif. In addition, we defined the motifs such that they obey general constraints imposed by the molecular nature of the binding process and the helical nature of the DNA molecule. Since most regulator proteins bind to the DNA as a dimer, a binding-site will in general be made up of two monomer binding sites and will have to be either palindromic or represent a direct repeat. Moreover, since the DNA is helical the actual monomer binding-site in general has to be shorter than 7 nucleotides and the two sites that make up the dimer binding-site have to be interspaced by a fixed number of nucleotides.

The defined motifs (given in supplementary file 2) were converted to a position frequency matrix, which was used directly to score potential transcription factor binding sites and other regulatory elements of fixed composition and size. In this way the score of any DNA sequence will relate directly to its similarity to the input motif. We have validated and used this approach with success to identify potential binding sites of CcpA and Spo0A in low GC gram positive organisms and the sigma-54 promoter in all organisms (18). A cut-off score of >83% relative similarity and a positioning of maximally around 200 nucleotides upstream (with some exceptions) of the translation start was used to select potential binding sites for the various regulators. The uniform cut-off score was chosen such that the number of false positive assignments should be limited, i.e. such that experimentally validated sites were included and that the number of correctly positioned sites was high (position in terms of distance and orientation with respect to translation start of gene downstream). The identified regulatory elements were related to all genes present in the downstream operon, where an operon was defined as those genes on the same strand that are separated by an intergenic region of less than 250 nucleotides and which does not contain a termination signal. The analyzed results of the motif searches are given in supplementary file 3.

Identification of riboswitches and other structural elements. Hidden Markov Models were constructed for the T-box motif and for the S-box (SAM-I) motif on basis of the available literature (57, 76, 77). Both HMMs were used to scan the selected genomes (cut-off e-value 1 (37)) and the locations of putative boxes were identified. The amino acid specificity of the detected T-boxes was established on basis of the specifier codon as described by Wels et al. (77) and exemplified in Figure S1. Two characteristic SMK-box sequences were defined on basis of (21), as given in Figure 2A, and these were used to scan the selected genomes using the Similar Motif Search procedure (results in supplementary file 3). Only in case both motifs were found directly upstream of a gene and they were complementary we considered the site a putative SMK-box.