Table S4Selection of the most interesting and novel genes identified from the gvSAG annotation (full annotation can be found in Dataset S2).

Gene Name / Putative Function / Notes (including gvSAG summary) / Reference
gvSAG-572-A11 contained many transcription initiation and termination factors. Many sequences similar to Algal Mimiviridae, with some notable differences noted below..
gvSAG-A11_0061 / Exostosin-like protein / Not an NCVOG unlike other exostosin-like genes in this study.
gvSAG-A11_0158 / Phosphoribulokinase / One of several in this gvSAG. Doesn’t belong to any NCVOGs. Important in the Calvin cycle.
gvSAG-A11_0124 / Uncharacterised / NCVOG0424 Only found in iridoviruses to date. This gvSAG is clearly not an iridovirus and falls in thePgV clade.
gvSAG-566-A18characterised by repeat motifs that bare some resemblance to Synucleins in their repetitive regions, are not NCVOGs.
gvSAG-A18_0029 / Ornithine decarboxylase / NCVOG0007 First step in polyamine biosynthesis. This would be the smallest ODC reported at 270 amino acids (previous smallest at 372 aa from PBCV) / (1)
gvSAG-A18_0092 / Deoxyribodipyrimidine photolyase / NCVOG2204 DNA repair enzyme, repairs UV-induced pyrimidine dimers, converting them to monomers.
gvSAG-566-A22contained several version of NHL repeat proteins, found on contigs with no other known virus genes. Multiple tandem copies which is typical for this family.
gvSAG-A22_0080 / NTPase/helicase / NCVOG1100, unique for gvSAGs in this study.
gvSAG-A22_0129 / Superoxide dismutase / NCVOG1326 Several viruses contain these genes, some even package into virion, however their role is still unclear. / (2)
gvSAG-566-C13Phylogenetic analysis suggests this gvSAG is an Iridovirus. However, it also has a range of genes that have not previously been associated with iridoviruses. There are several histone-like genes and this gvSAG would be the first virus with all 4 eukaryote histone like orthologs.
gvSAG-C13_0011 / H4 Histone protein / Many others in the gvSAG. This would be the first observation of a virus containing all 4 eukaryote Histone proteins, including H4. Lausannevirus and Marseillevirus have H2A, H2B, H3, but not H4. / (3)
gvSAG-566-F22Characterised by proteins full of collagen repeats which make up the majority of the hits in this gvSAG.
gvSAG-F22_0027 / Hyaluronan-binding / Also gene 67 (complete version). NCVOG2638. Not a common NCLDV gene. Though to be involved in cell surface changes during infection to prevent further infection by other viruses. / (4)
gvSAG-F22_0115 / Collagen triple helix repeat motif-containing protein. / NCVOG4533 Notable since it is different to the other collegen repeat proteins in this gvSAG.
gvSAG-566-F23 Phylogenetically falls in the Algal Mimiviridae
gvSAG-F23_0090 / Transposase / NCVOG3063 This protein is restricted to the PgVsand belongs to a different family to the one found in gvSAG-566-O14.
gvSAG-572-I12 Phylogenetically unresolved.
gvSAG-I12_0040 / Mitochondria localization / Also gene 111. No NCVOG hit. Novel among NCLDVs. The repeated portion of the protein suggests it may be a Mitochondria Localization Sequence (pfam14962). Members of this family are putative virulence proteins of Mycoplasma and Ureaplasma species. Members share a region of sequence similarity (see TIGR04524) with protein M, a Mycoplasma genitalium protein that binds a conserved light chain region of IgG and blocks its protective function of antigen-specific binding.
gvSAG-I12_0054 / Patatin phospholipase / NCVOG0245. Required for NCLDV morphogenesis as well as for the escape of the virus from host phagosomes. / (5)
gvSAG-I12_0069 / Ribonuclease / Potential new ribonuclease for a NCLDV. Not an NCVOG.
gvSAG-566-K07 belongs to the Algal Mimiviridae
gvSAG-K07_0002 / Metalloproteinase / NCVOG0663 it has only been observed inmimiviruses and CroV to date. This gene may represent a link between these and PgV-like viruses.
gvSAG-K07_0035 / Chitenase-like protein / Despite not being an NCVOG, it is on a contig with several NCVOGs. It has a catalytic glycosyl hydrolase family 18 (GH18) domain found in ChiDchitinases. Of interest since it didn’t hit to known NCVOG chitinases, so it may have a completely different but associated function. The GH18 domain, O-Glycosyl hydrolases (EC:3.2.1.), are a widespread group of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety, so the enzyme might not use a chitin substrate.
gvSAG-566-L12 This putative Algal Mimivirus has a large mix of NCVOGs and novel virus genes. In addition, it contains several heat shock proteins, none of which have homology to those derived from NCVOGs, suggesting convergent evolution.
gvSAG-L12_0006 / Tubulin tyrosine ligase-like / Possible novel virus gene. This protein is thought to post-translationally tyrosinate tubulin and microtubules PFAM PF03133.
gvSAG-L12_0023 / Erythromycin esterase. / Novel virus gene. Why would a NCLDV encode for erythromycin resistance? It derives from a contig with several NCVOGs and is a strong hit, hence unlikely to be contamination.Further analysis using anHHpred search revealed the reference sequences with the best alignment were 2 Bacillus cereus genes that belong to the erythromycin esterase superfamily, but had been annotated as "succinoglycan biosynthetic proteins". However, this designation has not been experimentally validated. There was also strong homology to validated erythromycin estereases (ereA and ereB,respectively). It was determined that theBacillus proteins (Bcr) are definitely esterases, but are unable to hydrolyse erythromycin or any other macrolide antibiotic (the erythromycin family) in. The substrate that Bcr reacts with has not been experimentally verified. Our gvSAG-L12_0023 Ere-like protein has all the conserved residues in the active site pocket that are critical for esterase activity so it is almost certainly capable of ester hydrolase activity. The general structure of the protein is also analogous to Ere, even though the sequences themselves are not that similar. However, it is unlikely that our enzyme can use macrolides as a substrate since it is more similar to the Bacillus proteins (Bcr) of unknown function than to bona fide Ere. Based a quick ML phylogenetic analysis, it looks like our gene represents a third (or perhaps fourth?) type of esterase in the Ere superfamily (data not shown) and looks like it's clearly related to but probably not a bona fide erythromycin esterase. / (6, 7)
gvSAG-L12_0014 / GIY-YIG endonuclease / NCVOG2645, among NCLDVs only found in PgV and CroV to date. / (8)
gvSAG-566-M24 This putative Algal Mimivirus contains a combination of novel virus genes and NCVOGs.
gvSAG-M24_0032 / Sphingosine-1-phosphate lyase / A novel viral sphingolipid biosynthesis molecule, responsible for the ultimate step in sphingolipid degradation, found twice in the gvSAG (gene 50). Low scoring hit to NCVOG1264, however more significant hits were to birds (water fowl).
gvSAG-M24_0038 / Orthininecyclodeaminase / Arginine and proline biosynthesis.
gvSAG-566-O14 This putative Mimivirus is noteworthy since it contains a wide range of novel virus genes as well as NCVOGs. It has large number genes with different ankryin repeat proteins, plus some collagen repeat proteins. It contains several unique NCVOG hits (i.e. not observed in other gvSAGs). In addition, it contains at least 6 different tRNAsynthetases, including two that haven’t previously been reported in NCLDVs, lysine-tRNA ligase and glutamine--tRNA ligase (gvSAG-O14_0037 and gvSAG-O14_0194).
gvSAG-O14_0072 / Reverse transcriptase / NCVOG1062, though top hits were eukaryote, not viral. This gene is present on a large contig with several NCVOGs, hence it is unlikely to be contamination.
gvSAG-O14_0227 / Transposase / NCVOG0321, although found in a range of NCLDVs, what makes this unique is this would be the first NCDLV to contain both a transposase and a reverse transcriptase, suggesting this virus may encode its own Class I retrotransposon system, which may be important to the propagation strategy of this virus.
gvSAG-O14_0025 / Sulfotransferase / Novel virus protein. Enzyme that transfers sulphate to carbohydrate groups in glycoproteins and glycolipids.
gvSAG-O14_0020 / Polysaccharide deacetylase / Novel virus protein. May be a chitenase based on the domain hit.
gvSAG-O14_0309 / Translation initiation factor 4E / NCVOG0651. Important role in translation. Combined with all the tRNAsynthetases makes this gvSAG more cellular (like the Megavirales) in that it encodes its own translational genes.

1.Morehead TA, et al. (2002) Ornithine decarboxylase encoded by chlorella virus PBCV-1. Virology 301(1):165-175.

2.Lartigue A, et al. (2015) The Megavirus Chilensis Cu,Zn-Superoxide Dismutase: the First Viral Structure of a Typical Cellular Copper Chaperone-Independent Hyperstable Dimeric Enzyme. Journal of Virology 89(1):824-832.

3.Thomas V, et al. (2011) Lausannevirus, a giant amoebal virus encoding histone doublets. Environmental Microbiology 13(6):1454-1466.

4.Graves MV, et al. (1999) Hyaluronan synthesis in virus PBCV-1-infected chlorella-like green algae. Virology 257(1):15-23.

5.Yutin N, Wolf YI, Raoult D, & Koonin EV (2009) Eukaryotic large nucleo-cytoplasmic DNA viruses: Clusters of orthologous genes and reconstruction of viral genome evolution. Virology Journal 6:223-223.

6.Morar M, Pengelly K, Koteva K, & Wright GD (2012) Mechanism and Diversity of the Erythromycin Esterase Family of Enzymes. Biochemistry 51(8):1740-1751.

7.Söding J, Biegert A, & Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research 33(suppl 2):W244-W248.

8.Fischer MG, Allen MJ, Wilson WH, & Suttle CA (2010) Giant virus with a remarkable complement of genes infects marine zooplankton. Proceedings of the National Academy of Sciences 107(45):19508-19513.