Supplementary Material

Evolution of cell–cell signaling in animals: did late horizontal gene transfer from bacteria have a role?

Lakshminarayan M. Iyer1, L. Aravind1, Steven L. Coon2, David C. Klein2 and Eugene V. Koonin1,

1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

2Section on Neuroendocrinology, Laboratory of Developmental Neurobiology, National Institute of Child Health and Development, Bethesda, MD 20894, USA

Corresponding author: Eugene V. Koonin ().

Phyletic patterns of enzymes involved in messenger metabolism: identification of orthologous sets

The sequences of vertebrate enzymes catalyzing each step in the messenger-metabolism pathways were obtained from the GenBank database and used as seeds to initiate PSI-BLAST searches [1] of the non-redundant (nr) protein sequence database at the National Center for Biotechnology Information (NBCI) to detect their probable orthologs and other homologs across the three primary kingdoms (bacteria, archaea and eukaryota). In addition to the GenBank database, specialized databases of genomic and EST sequences from individual organisms were screened in search of possible additional orthologs of the messenger-metabolism enzymes using the appropriate version of the BLAST program [1]. The phyletic patterns were extracted from the BLAST searches using the Tax_Break program of the SEALS package [2]. When non-vertebrate (or non-metazoan) eukaryotic homologs were detected but showed less significant similarity to the vertebrate enzymes than bacterial homologs, it was hypothesized that the respective eukaryotic proteins were not orthologs but rather paralogs of the analyzed vertebrate enzymes. Of course, conclusions on the absence of orthologs can be made with confidence only for complete genomes.

To test this hypothesis, reciprocal searches were performed, in which the sequences of the non-vertebrate or non-metazoan homologs of the vertebrate messenger-metabolism enzymes were used as queries. If these searches showed that the most similar sequence from vertebrates (or metazoan) was not the original query but a different protein, it was concluded that the respective eukaryotic lineage did not have the ortholog of the respective messenger-metabolism enzyme from vertebrates. The results of this analysis are summarized in Table 1. For example, the search with the human hydroxyindole-O-methyltransferase (HIOMT) sequence as the query did not yield a significantly similar homolog from nematodes or arthropods. Therefore, the most similar sequences from these lineages were identified by limiting the BLAST search space to sequences from the respective taxa; even in this case, the similarity was not statistically significant (Table 1) but examination of the conserved motifs in the alignments indicated that the detected nematode and arthropod proteins were, indeed, methyltransferases. On the basis of these observations alone, it is impossible to rule out that the detected proteins from nematodes and arthropods are highly diverged orthologs of the human protein. However, the reverse database searches started with each of these sequences showed highly significant similarity to human mehtyltransferases other than catechol methyltransferase (COMT) (Table 1). These methyltransferases are the most likely human orthologs of the nematode and arthropod proteins with the highest similarity to vertebrate COMTs, whereas the vertebrate COMTs have no orthologs in nematodes and arthropods.

The case of HIOMT was different in that the most similar proteins from plants and fungi detected HIOMT as the best hit among the vertebrate proteins. However, the plant homologs, which appear to be 3'-hydroxy-N-methylcoclaurine 4'-O-methyltransferases, are most closely related to a distinct set of bacterial methyltransferases, particularly those from cyanobacteria (ZP_00109917 from Nostoc punctiforme, E=2e-35), which points to an acquisition from chloroplasts (Figure 2b in the main article and Figure 8). Thus, despite the existence of a symmetrical best hit, plants do not seem to have orthologs of vertebrate HIOMT. The origin of the HIOMT homolog in fungi is less clear but, in the phylogenetic trees, it clusters with the plant homologs and the respective set of bacterial proteins, suggesting a distinct origin from the vertebrate HIOMTs.

Similarly, in the case of MAO, the fungal homologs were placed outside the cluster consisting of vertebrate MAOs, the slime mold ortholog and a diverse set of bacterial orthologs in the phylogenetic trees (Figure 10).

Phylogenetic tree analysis

Methods

Multiple protein-sequence alignments were constructed using the T-Coffee program [3] and positions containing 70% gaps were excluded. Phylogenetic analyses were carried out using the maximum-likelihood (ML), neighbor-joining and least squares methods. Simple neighbor-joining trees were constructed using the Mega package [4]. In addition, weighted neighbor joining trees with corrections for long-branch effects were constructed using the WEIGHBOR program [5]. In all cases with a manageable total number of taxa, full ML trees were constructed using the PROML program of the Phylip package (100 bootstrap replicates and global rearrangements) [6] For this procedure, the program TreePuzzle 4.0.2 [7] was used to estimate the parameters with a gamma correction for among site rate variation plus a correction for invariant sites (eight plus one rate categories) from the datasets. The parameters thus estimated were used to input the user-defined variation models in Proml. For additional distance tree analyses, TreePuzzle 4.02 (Gamma distribution with eight variable and one constant rates) was used to calculate maximum likelihood distance matrices along with Puzzleboot for 1000 replicates (www.tree-puzzle.de) [7]. The resampled matrices were then analyzed using the FITCH program from the Phylip package (with global rearrangements and ten times jumbling) and the WEIGHBOR program. The consensus of these trees was obtained using the Consense program of Phylip. Alternatively, a least squares tree was constructed using the FITCH program [8,6] with the maximum likelihood distance matrices employed as input, followed by local rearrangement using the Protml program of the Molphy package [9,10], to arrive at the ML tree. The statistical significance of various nodes of this ML tree was assessed using the relative estimate of logarithmic likelihood bootstrap (PROTML RELL-BP), with 10 000 replicates. Bayesian posterior-probability trees were constructed using the MrBayes3 program [11]. The Kishino–Hasegawa test, Shimodaira–Hasegawa test, weighted Kishino–Hasegawa test, weighted Shimodaira–Hasegawa test, Approximately Unbiased test and Bayesian probability test for alternative phylogenetic hypotheses were performed using the Consel program [12] (Tables 3,4).

Results

Table 3 lists the characteristics of protein sequence alignments used for phylogenetic tree construction and of the resulting trees. The trees for all genes, which are candidates for horizontal gene transfer (HGT) from bacteria to eukaryotes (Figure 3a), showed consistent and significant support for the bacterial–eukaryotic clade below the node corresponding to the putative HGT. The trees are shown in Figures 1–12. Alternative phylogenetic hypotheses were tested for HIOMT (Table 4), monoamine oxidase (MAO) (Table 5) and COMT (Table 6). In all cases, the tests for alternative phylogenetic hypotheses strongly support the critical internal node, on which the HGT hypothesis depended. In particular, strong support was obtained for the grouping of the vertebrate HIOMT with the archaeo–bacterial clade that includes the TCMO protein from Streptomyces glaucescens (Table 4). All tests also supported the grouping of the vertebrate–slime mold MAOs with a bacterial clade typified by the Pflu5344 protein from Pseudomonas fluorescens (Table 5). In the case of COMT, there was a strong support for the grouping of the animal–fungal COMT with a bacterial group typified by the bll4229 protein from Bradyrhizobium japonicum (Table 5).

Data availability: protein sequence alignments used for phylogenetic analysis are available from the authors upon request.

Table 1. Missing orthologs of vertebrate enzymes of first-messenger metabolism in other eukaryotic lineages
Human / Most similar homologsa
enzyme / Other vertebrates (non-mammal) / Nematodes / Arthropods / Fungi / Plants / Bacteria
Best hit / Reciprocal
human best hit / Best hit / Reciprocal human best hit / Best hit / Reciprocal human best hit / Best hit / Reciprocal human best hit
Catechol O-
methyl-transferase (COMT, AAA68929) / 2.3e-77, AAH49292 X. laevis, / 2.2 (taxon-delimited search), AAK70661,
C. elegans, O-methyl-transferase / AAQ88840, 1.2e-30,
Uncharacterized predicted methyl-transferase / 7.2 (taxon-delimited search), AAF51216,
D. melanogaster, uncharacterized, predicted methyl-transferase / 3.5e-32, Q9Y5N5, Predicted N6-methyl-transferase / 4.7e-36,
S. pombe, CAD31744 / COMT,
3e-36 / 1e-05, AAM65527
A. thaliana / 1e-30, BAB85077
un-characterized, predicted methyl-transferasease / 2.6e-32, AAK46012
M. tuberculosis
Serotonin N- acetyl-transferase (AANAT, NP_001079) / 4e-73, P79774
Chicken / 0.61,
CAE64790
C. briggsae, uncharacterized, putative acetyl-transferase / 1e-30
NP_932332
glucosamine-phosphate N-acetyl-transferase / 0.3 (taxon-delimited search), EAA09480
A. gambiae, uncharacterized putative acetyltransferase / 9e-60
AAH11267, predicted uncharacterized acetyltransferase / 2e-06,
T39187
S. pombe / AANAT,
1e-06 / Not detectable even in taxon-delimited searches / NA / 0.36 (1e-15 in second iteration), AAN68191
P. putida
Histamine N-methyl-transferase (HNMT, P50135) / 1e-103, AAH54281
X. laevis / Not detectable even in taxon-delimited searches / NA / Not detectable even in taxon-delimited searches / NA / 0.064,
AAR90259
Cochlio-bolus hetero-strophus, polyketide synthase / 2.8e-88
(taxon-delimited search), AAH63242,
Fatty acid synthase / Not detectable even in taxon-delimited searches / NA / 0.072 (2.9e-05 in second iteration)
ZP_00074498
T. erythraeum
Phenyl-ethanolamine N- methyl-transferase (PNMT, P11086) / 4e-51, AAH61684
X. laevis / 2e-12, CAA98282
C. elegans / 7e-17,
PNMT / 0.42,
AAF47531
D. melanogaster, predicted methyltransfrase / 6e-61
NP_859076, predicted methyl-transferase / 0.36, Q9P7L6,
S. pombe, predicted methyl-transfrase / 1e-39, NP_859076, predicted methyl-transferase / 1.59 (taxon-delimited search)
BAC98693, predicted methyl-transferase / Not detectable even in taxon-delimited searches / 0.001, second iteration, CAB69746
S. coelicolor
Hydroxyindole O-methyl-transferase (HIOMT, P46597) / 4e-108, Q92056, Chicken / Not detectable; even in taxon-delimited searches / NA / Not detectable; even in taxon-delimited searches / NA / 6e-11
EAA66994
A. nidulans / 5e-11,
HIOMT / 9e-17
AAP45314,
P. somniferum, S-adenosyl-L-methionine:3'-hydroxy-N-methyl-coclaurine 4'-O-methyl-transferase 2 / HIOMT,
3e-15 / 4e-54, NP_864498, Pirellula sp.
Dopamine b hydroxylase (DBH, P09172,) / 3e-79, CAB75354, Chicken / 4e-94, CAB17071
C. elegans / 2e-94,
DBH / 1e-108, EAA03589, Anopheles / 1e-108,
DBH / Not detectable even in taxon-delimited searches / NA / Not detectable even in taxon-delimited searches / NA / 0.12,
BAB41569
S. aureus
Monoamine oxidase
(MAO, P21397) / 0.0,
P49253, Trout / 6e-15, CAE72847
C. briggsae
Predicted oxidase / 1e-103, BAC03663, Predicted oxidase / 5e-09,
EAA00081, Anopheles Predicted oxidase / 0.0
AAH48134,
Amine oxidoreductase / 8e-34,
EAA63259
A. nidulans / 1e-37
MAO / 1e-16, AAO85405
A. thaliana
Predicted oxidase / 4e-47, BAC03663, Predicted oxidase / 4e-66,
O53320
M. tuberculosis
aFor each of the messenger-metabolism enzymes, the random expectation (E) value of the best hit to a protein from the respective lineage is given (e-n = 10-n). In all cases where non-significant or marginally significant E-values (0.01) are indicated, the relevance of the sequence similarity was supported additionally by iterative PSI-BLAST searches, reverse searches and manual examination of conserved motifs. For each of the proteins, the GenBank accession number and the activity, if distinct from that of the respective enzyme of messenger metabolism, are indicated; some of the predictions are our own (the respective proteins are denoted as uncharacterized in GenBank). The homologs that, in a reverse search, had the best hit among human proteins that were different from the original query are shown by the gray shading. For all bacterial homologs, the reverse search showed the original query as the best hit (data not shown).
Table 2. Phyletic patterns of amino acid metabolism enzymesa
Enzyme / Gene / Pathway / Bacteria / Archaea / Plants / Fungi / Animals / Other eukaryotes
Nematodes / Insects / Uro-chordates / Vertebrates
Glycine and serine hydroxymethyl-transferase / GlyA / Gly and ser interconversion / + / + / + / + / + / + / + / + / + (Lma, Pfa)
Phosphoserine phosphatase / SerB / Ser metabolism / + / + / + / + / + / + / + / + / -
Phosphoglycerate dehydrogenase / SerA / Ser metabolism / + / + / + / + / + / + / + / + / +(Lma, Eca)
Glycine cleavage system H protein / GcvH / Gly degradation / + / + / + / + / + / + / + / + / +(Pfa)
Glycine cleavage system P protein / GcvP / Gly degradation and NO synthesisb / + / + / + / + / + / + / + / + / -
Glutamate and leucine dehydrogenase / GdhA / Glu and Leu biosynthesis / + / + / + / + / + / + / + / + / +(Giin, Pfa, Eca)
Glutamine synthase / GlnA / Glu and Nucleotide metabolism / + / + / + / + / + / + / + / + / +(Pfa)
Ornithine carbamoyl transferase / ArgF / Arg biosynthesis / + / + / + / + / + / + / + / + / +(Giin, Pfa, Ddi, Tcr)
Chorismate synthase / AroC / Aromatic amino acid biosynthesis / + / + / + / + / - / - / - / - / + (Pfa, Tgo)
5-enolpyruvylshikimate-3-phosphate synthase / AroA / Phe, Tyr and Trp biosynthesis / + / + / + / + / - / - / - / - / -
Tryptophan synthase / TrpA/B / Trp biosynthesis / + / + / + / + / - / - / - / - / -
Methionine synthase II / MetE / Met biosynthesis / + / + / + / + / - / - / - / (Mm?) / +(Len)
Homoserine dehydrogenase / ThrA / Thr and Met biosynthesis / + / + / + / + / - / - / - / - / -
Aspartokinase / LysC / Thr andMet biosynthesis / + / + / + / + / - / - / - / - / -
aAbbreviations: Cisa, Ciona savignyi; Ddi, Dictyostelium discoideum; Giin, Giardia intestinalis; Len, Leishmania enrietti (fragment only); Lma, Leishmania major; Mm, Mus musculus; Pfa, Plasmodium falciparum; Phpo, Physarum polycephalum; Tgo, Toxoplasma gondii; Tcr: Trypanosoma cruzi; Eca, Entodinium caudatum; Tepy, Tetrahymena pyriformis (fragment only); Tbr, Trypanosoma brucei.
bPlant NOS.

Table 3. Characteristics of the alignments used for phylogenetic analysis and the resulting treesa

Enzyme / Number of alignment positions used for tree analysis / Number of species analyzed / Support for the critical nodeb
FullML / Bayes / Puzboot / FitchML
Aromatic amino acid decarboxylases / 538 / 58 / 100 / 1.00 / 100 / 100
Glutamate decarboxylase / 538 / 58 / 100 / 1.00 / 100 / 100
Amino acid hydroxylases / 240 / 43 / 100 / <0.5 / 100 / 85
NOS oxygenase domain / 363 / 14 / 98 / 0.99 / 93 / 97
Catechol O-methyltransferase / 278 / 55 / 71 / 0.7 / 76 / 95
Serotonin N- acetyltransferase / 166 / 24 / 100 / 0.89 / 100 / 100
Histamine N-methyltransferase / 323 / 20 / 100 / 0.97 / 62 / 100
Phenylethanolamine N- methyltransferase / 283 / 15 / NA / NA / NA / NA
Hydroxyindole O-methyltransferase / 385 / 40 / 94 / 0.98 / 93 / 88
Dopamine b hydroxylase / 321 / 29 / NA / NA / NA / NA
Monoamine oxidase / 535 / 38 / 80 / 0.95 / 45 / 83
Choline acetyltransferase / 779 / 41 / NA / NA / NA / NA
Acetylcholinesterase / 546 / 70 / 72 / 0.7 / 66 / 92
aAbbreviations: FullML, full maximum-likelihood; Puzboot, puzzleboot; FitchML, maximum-likelihood local optimization of the minimum evolution tree; NA, not applicable.
bThe critical node, where applicable, is defined as that node which defines the bacterial–eukaryotic clade on which the horizontal gene transfer (HGT) hypothesis is based for the respective enzyme (see methods for details).

Table 4. Tests for alternative phylogenetic hypotheses for Hydroxyindole-O-methyltransferase (HIOMT)a