Supplemental Methods section for Tassy, Dauga, Daian, Sobral et al., 2010

Functional annotation of gene models.

The pipeline is run on all transcript models and the resulting information is then inherited by the gene models.

The pipeline first runs InterproScan on each transcript model and shows the organization of conserved domains along the coding part of the transcript. The pipeline then annotates each transcript model will all Gene Ontology terms detected by Interproscan. Next, the pipeline identifies in the ENSEMBL proteomes the Mouse, Human, and Drosophila orthologs for each transcript model, using InParanoid V2. Each Ciona transcript model then inherits the GO terms of its ENSEMBL orthologs. To maximize the likelihood of finding by its name the Ciona ortholog of a gene of interest, all synonymous ENSEMBL ortholog names are inherited by Ciona transcript models. Finally, each transcript model is compared using BlastP to the UniProt knowledgebase (UniProt consortium 2010). Ciona transcripts without orthologs inherit the name of their most similar UniProt hit, preceded by an indication of the quality of the hit (“Highly similar to” for an E-value better than 10-10, “Similar to” for an E-value comprised between 10-4 and 10-10, and “Weakly similar to” for an E-value comprised between 10-3 and 10-4). In the absence of orthologs, the Cellular Compartment and Molecular Function GO terms describing the best Blast hits are associated to the relevant Ciona transcript models.

Naming and description of cis-regulatory regions and their activity

Experimentally tested regulatory regions are classified into natural (genomic) or artificial (mutations, synthetic sequence) elements. Two difficulties complicate the comparison of experimental results obtained at a given genomic locus in independent studies. First the coordinate systems used by different authors vary. We standardized the coordinates and the names of wild type genomic regions as follows: "gene name" "start of region "/" end of region" (e.g. Ci-ZicL -4037/313) where “+1” corresponds to the 5’ end of transcript model that best reflects the extensive EST information, generally the KH model. When the ANISEED coordinates differ from the original publication, the original coordinates are indicated in the comments field. Second, because of the high polymorphism observed in ascidian genomes, the precise sequence of experimentally tested regulatory regions published in the literature may differ from the consensus genome sequence (Dehal et al. 2002). We thus distinguished the concept of "regulatory region", which by definition corresponds to a segment of the consensus JGI version 1 genome assembly, from that of the "construct" used to test it, constituted of the actual tested regulatory sequence placed upstream of a basal promoter, when needed, and of a reporter gene. We standardized the nomenclature of the constructs (e.g. "pfog -214/-74 pbra::NLS LacZ"). Construct names start with “p” (for plasmid) followed by the gene name and the basal promoter name. A standard symbol ("::") separates the cis-regulatory part of the construct to the reporter gene used. As changes in basal promoters (Gehrig et al. 2009) or reporter genes may affect the outcome of functional CRM assays, precise spatio-temporal activities are associated to individual constructs. The activity of regulatory regions is characterized by a set of eight qualifiers as indicated in the main text.

When the regulated gene is known, a cis-regulatory region is associated to the corresponding gene model. This association is not compulsory as pairing of an enhancer to a precise gene can be difficult.

Annotation strategy for spatio-temporal expression data

Spatio-temporal expression data include in situ hybridization, immunohistochemistry and gene reporter assays, described by selecting individual anatomical ontology terms where staining was detected. Broad expression patterns are described using high-level terms in the hierarchy, while restricted patterns use leaves of the anatomy tree. When a territory is marked, all sub-territories are considered to express the gene. Three levels of staining intensity can be indicated to compare intensities of staining between territories with a given probe. Comparison of staining intensities between probes is difficult. The sub-cellular localization of the staining is described with a set of keywords (e.g. nucleus, membrane...). In case of doubt about the expression status of a tissue, the annotator can flag its annotation with a question mark. When only part of the territory is stained, a "part of" term can be added.

Experimental perturbations often give partial phenotypes in which only a fraction of experimental samples present the strongest phenotype ( e.g. Fig. 6 from Yasuo and Hudson 2007). In such cases, we annotated the expression pattern with the strongest phenotype, and included the quantitative graphs shown in the publication as an illustration, to indicate the frequency of this phenotype (e.g. http://aniseed-ibdm.univ-mrs.fr/insitu.php?id=2769283).

Anatomical perturbations, such as cell ablations or explants, are defined by the removed anatomy parts and the developmental stage of the perturbation. In cell ablation experiments, only one side of the embryo is typically perturbed, the other side serving as an internal control showing WT expression (e. g. Hudson and Yasuo 2006). To facilitate data mining, such experiments are described as if the ablation was bilateral, and the experiment is linked to a "virtual" wild type experiment describing the pattern of the “control” side for the same gene.

Biological curation pipeline

Expression data in ANISEED have three main origins: large scale screens imported from other databases, manually selected literature data, and unpublished data communicated by members of the ascidian community. These three types of data use different pipelines.

Large scale data:

Ciona large-scale in situ hybridization data reported in 8 articles (Imai et al. 2004; Kusakabe et al. 2002; Miwata et al. 2006; Mochizuki et al. 2003; Nishikata et al. 2001; Ogasawara et al. 2002; Satou et al. 2002; Fujiwara et al. 2002) were imported from the GHOST Database (Satou et al. 2002) via NISEED Manager scripts. All Halocynthia in situ hybridization data originated from the study by Makabe and colleagues (2001) and were likewise downloaded from the MAGEST database (Kawashima et al. 2002) and imported via specific scripts. Information from 68 cis-regulatory regions and corresponding gene reporter activity was directly imported from DBTGR (Sierro et al. 2006).

The main difficulty in importing data from these databases was the establishment of correspondence between anatomical terms initially used to describe the expression pattern, and the ANISEED anatomical ontology. We first partially re-annotated data originated from the automatic import to take advantage of the more precise anatomical ontology proposed by ANISEED, in particular in epidermal and neural tissues. This partial re-annotation primarily focused on expression profiles for transcription factors and signaling molecules (Imai et al. 2004) and on gene reporter activity (DBTGR). Finally most of transcription factor and signaling molecule of the GHOST pictures and of gene reporter activity of DBTGR pictures has been reprocessed in order to show for each stage a representative, oriented embryo picture (5,260 expression profiles re-annotated). This re-annotation/curation effort is ongoing with the other classes of high throughput data.

Data extracted from the literature:

Individual experiments and cis-regulatory information described in 160 articles were entered via the NISEED Curator. ANISEED biological annotators selected papers of interest and, after authors' agreement, manually entered molecular information via specific annotation pages of the NISEED Curator. Annotations are then checked and, when needed, amended by the biological curator before their public release to the Developmental Browser website. Annotations can be updated or refined even after release to the public website.

Unpublished expression data:

Users from the ascidian community can contact to open an annotator account. They can then enter unpublished data via the NISEED Curator or, for larger scale submission, via a modified Excel spreadsheet that can be downloaded from the "About/Submit your data" section of the website. Data from filled Excel forms are automatically processed by a dedicated parser and enter the curation pipeline. Users are provided with a template that indicates required information and facilitates data entry via keyword selection lists and a detailed user guide (see “submit your data” section). User-contributed data have by default a “Private” status. They can enter the curation pipeline by clicking on the “Submit to curation pipeline” button in the Curator. After verification by the ANISEED curators, the data become available on the public website. In recognition of the importance of their contribution, several members of the community were offered authorship of this article (SD, JSJ, LC, HA, CL CH and UR).

Automatic inference of GRNs

To infer individual regulatory relationships, we compared the expression patterns of putative target (T) genes in wild-type conditions and following loss-of-function of a putative regulatory (R) gene obtained by injection of a specific antisense Morpholino or treatment with a pharmacological antagonists. Only experiments in which wild-type and experimentally modified expression patterns were determined in the same experiment were considered.

A regulatory interaction was inferred when loss of function of gene R affected the expression profile at stage S of a gene T in part A of its domain of expression, then by stage S gene R regulates gene T, directly or indirectly, in A. The type of regulation (positive or negative) was deduced from the down- or up-regulation of target gene T. The regulatory interaction was considered to take place at a stage S1≤S, which corresponded to the onset of expression of T in A.

The regulation of gene T by gene R in territory A was inferred to be direct when a cis-regulatory region driving T in A at stage S1 was known, and contained functional binding sites for R (i.e. mutation of these binding sites affected the activity of the element).

References:

Dehal, P. et al. 2002. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298: 2157-2167.

Fujiwara, S., Maeda, Y., Shin-I, T., Kohara, Y., Takatori, N., Satou, Y., and Satoh, N. 2002. Gene expression profiles in Ciona intestinalis cleavage-stage embryos. Mech. Dev 112: 115-127.

Gehrig, J. et al. 2009. Automated high-throughput mapping of promoter-enhancer interactions in zebrafish embryos. Nat. Methods 6: 911-916.

Hirano T, Nishida H. 1997. Developmental fates of larval tissues after metamorphosis in ascidian Halocynthia roretzi. I. Origin of mesodermal tissues of the juvenile. Dev Biol 192: 199–210.

Hirano T, Nishida H. 2000. Developmental fates of larval tissues after metamorphosis in the ascidian, Halocynthia roretzi. II. Origin of endodermal tissues of the juvenile. Dev Genes Evol 210: 55–63.

Hudson, C., and Yasuo, H. 2006. A signalling relay involving Nodal and Delta ligands acts during secondary notochord induction in Ciona embryos. Development 133: 2855-64.

Imai, K.S., Hino, K., Yagi, K., Satoh, N., and Satou, Y. 2004. Gene expression profiles of transcription factors and signaling molecules in the ascidian embryo: towards a comprehensive understanding of gene networks. Development 131: 4047-58.

Kawashima, T., Kawashima, S., Kohara, Y., Kanehisa, M., and Makabe, K.W. 2002. Update of MAGEST: Maboya Gene Expression patterns and Sequence Tags. Nucleic Acids Res 30: 119-120.

Kusakabe, T. et al. 2002. Gene expression profiles in tadpole larvae of Ciona intestinalis. Dev. Biol 242: 188-203.

Makabe, K.W. et al. 2001. Large-scale cDNA analysis of the maternal genetic information in the egg of Halocynthia roretzi for a gene expression catalog of ascidian development. Development 128: 2555-2567.

Miwata, K., Chiba, T., Horii, R., Yamada, L., Kubo, A., Miyamura, D., Satoh, N., and Satou, Y. 2006. Systematic analysis of embryonic expression profiles of zinc finger genes in Ciona intestinalis. Dev. Biol 292: 546-554.

Mochizuki, Y., Satou, Y., and Satoh, N. 2003. Large-scale characterization of genes specific to the larval nervous system in the ascidian Ciona intestinalis. Genesis 36: 62-71.

Nishikata, T., Yamada, L., Mochizuki, Y., Satou, Y., Shin-i, T., Kohara, Y., and Satoh, N. 2001. Profiles of maternally expressed genes in fertilized eggs of Ciona intestinalis. Dev. Biol 238: 315-331.

Ogasawara, M., Sasaki, A., Metoki, H., Shin-i, T., Kohara, Y., Satoh, N., and Satou, Y. 2002. Gene expression profiles in young adult Ciona intestinalis. Dev. Genes Evol 212: 173-185.

Satou, Y., Takatori, N., Fujiwara, S., Nishikata, T., Saiga, H., Kusakabe, T., Shin-i, T., Kohara, Y., and Satoh, N. 2002. Ciona intestinalis cDNA projects: expressed sequence tag analyses and gene expression profiles during embryogenesis. Gene 287: 83-96.

Satou Y, Imai KS, Satoh N. 2004. The ascidian Mesp gene specifies heart precursor cells. Development 131: 2533–2541.

Sierro, N., Kusakabe, T., Park, K., Yamashita, R., Kinoshita, K., and Nakai, K. 2006. DBTGR: a database of tunicate promoters and their regulatory elements. Nucleic Acids Res 34: D552-555.

Shirae-Kurabayashi M, Nishikata T, Takamura K, Tanaka KJ, Nakamoto C, Nakamura A. 2006. Dynamic redistribution of vasa homolog and exclusion of somatic cell determinants during germ cell specification in Ciona intestinalis. Development 133: 2683–2693.

Tokuoka M, Satoh N, Satou Y. 2005. A bHLH transcription factor gene, Twistlike 1, is essential for the formation of mesodermal tissues of Ciona juveniles. Dev Biol 288: 387–396.

UniProt consortium. 2010. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res 38: D142-148.

Yasuo, H., and Hudson, C. 2007. FGF8/17/18 functions together with FGF9/16/20 during formation of the notochord in Ciona embryos. Dev Biol 302: 92-103.