Sequence similarity search and phylogenetic analysis

The Arthrospira platensis PCC 7345 nucleotide sequences were BLASTed against the whole cyanobacterial database (taxid: 1117) at NCBI. The resulting hits were multiple-aligned using ClustalW and the % sequence similarity with the known genes was calculated. The ORFs were determined manually. Conceptual translation for the genes for which complete sequences were obtained was done using the tool –TANSLATE ( The translated amino acid sequence was used as a query for identification of conserved domains using CD-Search tool at NCBI. The database used for comparison was CDD---32089 PSSMs and the threshold was 0.01 along with the high complexity filter. The sequences thus obtained were uploaded and submitted to GenBank. A combination of search strategies based on sequences (BLASTp) and gene annotations were used to obtain phylogenetically relevant sequences for further analysis. All the hits above E value 0.0 were taken, removing the sequences with incomplete ends and analyzed in MEGA 4.0 (Kumar et al. 2008) and the trees were generated using Neighbour-Joining method. The trees were rooted using Bacillus subtilis and Staphylococccus aureus sequences. This approach led to highly imbalanced datasets for analysis. Then, the phylogenetic analyses were carried out on all the proteins involved in N assimilation and Phylogenetic trees were constructed for all the 39 cyanobacteria whose genomes have been sequenced ( based on protein sequences for all the five N assimilatory genes. Phylogenetic tree generated using the 16S rRNA gene was taken as control (Gupta 2009). BLAST searches were conducted on each of these proteins to determine whether their homologues were present in all 39 cyanobacteria and the two outgroup species (Bacillus subtilis and Staphylococccus aureus) used in this work. The genomes for Arthrospira sp. 8005, Arthrospira maxima CS-328 and Arthrospira platensis strain paracaca were downloaded from NCBI database directly and the specific N assimilatory gene entry was added to the table for similarity search and analysis. The multiple sequence alignments for these proteins were created by using the CLUSTAL W program and analyzed in Mega 4.0 (Kumar et al. 2008). Since NR and nrtP are not present in most of the strains of Prochlorococcus sp., it was eliminated from analysis for both these genes (Rocap et al. 2003; Martiny et al. 2009).

Results

The phylogenetic analysis of NrtP, NR and NiR in all the 39 species of cyanobacteia revealed a separate cluster of all the Arthrospira species which were found clustered with other members of the oscillatoriales like Trichodesmium erythraeum IMS101, Lyngbya sp. PCC 8106and Oscillatoria sp. PCC 6506 (Fig 2 a-f).Also,different clusters for fresh water and marine unicellular species were observed. The GS and GOGAT proteins were most conserved among all the cyanobacteria species, with more than 90% query coverage showing E value greater than 0.0. On extending this analysis to vascular plants, GOGAT protein also showed more than 57 % similarity to Fd-GOGAT (chloroplastic) from vascular plants (data not shown). The phylogenetic analysis revealed that although the N uptake and assimilatory mechanisms in various cyanobacteria are different which are a result of adaptation of specific habitats and environmental conditions, yet the genes involved in ammonia assimilation and incorporation have been evolutionary conserved as described for GS and GOGAT.

Figure S1- Clones obtained by PCR cloning

Agarose (1 %) gel electrophoresis of EcoR1 digests of clones obtained in pGEM- T vector. The upper panel (a) shows the partial clones, with inserts of narB (2100 bp), nrtP (1550 bp), nirA (1400 bp), glnA (1500 bp) and gltS (2300 bp). The lower panel (b) shows full length clones of nrtA (1299 bp), nrtB (1233 bp) and nrtD (831 bp) and ntcA (639 bp)genes obtained by PCR cloning.

Figure S2(a-f)Phylogenetic tree of genes involved in N assimilation in Arthrospira platensis PCC 7345

Phylogenetic tree of genes involved in N assimilation in Arthrospira platensis PCC 7345, where, a represents tree for (nrtP), brepresents tree for (narB), c represents tree for (nirA), d represents tree for (glnA), e represents tree for (gltS), f represents tree for (ntcA). The tree was generated using the Neighbor-Joining method and MEGA 4.0 was used for evolutionary analysis. The optimal tree with the sum of branch length is shown in a-f. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances, computed using Poisson correction method, is used to infer the phylogenetic tree. The analysis involved 39 amino acid sequences. All positions containing gaps and missing data were eliminated.