COMPARATIVE GENOMICS WORKSHOP

‘RESSOURCEMENT’

BASIC RESOURCES

● General

NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search programs.

TAIR http://www.arabidopsis.org/ The Arabidopsis information resource.

TIGRhttp://plantta.jcvi.org/index.shtml AnnotatedArabidopsis, rice etc genomes. TIGR Gene Indices (analysis of public EST data (contig assembly, analysis of expression patterns).

EcoliHub http://ecolihub.org/ & EcoliWiki http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki

SGD http://www.yeastgenome.org/ Saccharomyces genome database

ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames

ExPASy Compute PI/Mol Wt Tool http://www.expasy.ch/tools/pi_tool.html

ExPASy AACompIdent Tool http://www.expasy.ch/tools/aacomp/ Identification of a protein from its amino acid composition.

Primer3 http://frodo.wi.mit.edu/primer3/ Primer design site

NEBcutter http://tools.neb.com/NEBcutter2/index.php Molecular Biology restriction digests site

Seq Massager http://www.attotron.com/cybertory/analysis/seqMassager.htm Cleaning up sequences for Bioinformatics platforms

ABIM http://sites.univ-provence.fr/~wabim/english/logligne.html ABIM online sequence analysis tools listing

GOLD http://www.genomesonline.org/cgi-bin/GOLD/index.cgi Resource for centralized monitoring of genome and metagenome projects

DiArk http://www.diark.org/diark/ Compilation of eukaryotic genome and EST sequencing projects

● Multiple sequence alignment, phylogeny, Venn diagrams

Computational Approaches in Comparative Genomics http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.TOC&depth=1 On-line textbook by EV Koonin & MY Galperin.

Multalin Sequence Alignment http://multalin.toulouse.inra.fr/multalin/ Aligns sequences (output in color) and makes phylogenetic trees.

ClustalOmega and Phylogenetic trees http://www.ebi.ac.uk/Tools/msa/clustalo/ Aligns protein sequences and makes phylogenetic trees.

T-Coffee & M-Coffee http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi Tools for combining and comparing multiple sequence alignments.

WebLogos http://weblogo.berkeley.edu/ Creates Logos from multiple alignments

MEGA http://www.megasoftware.net/ The MEGA phylogeny program, downloads and manual.

Phylogeny.fr www.phylogeny.fr/ Very good phylogentic tree platform for beginners, has a great tool BlastExlorer to collect sequences for alignments and trees

iTOL Interactive tree of life

VENNY - http://bioinfogp.cnb.csic.es/tools/venny/index.html - Interactive tool for comparing lists with Venn Diagrams.

● Transmembrane and organellar targeting predictions

TMHMM http://www.cbs.dtu.dk/services/TMHMM/ Prediction of transmembrane helices.

TargetP http://www.cbs.dtu.dk/services/ Prediction of protein localization.

Predotar http://urgi.versailles.inra.fr/predotar/predotar.html Prediction of protein localization.

iPSORT http://hc.ims.u-tokyo.ac.jp/iPSORT/ Prediction of protein localization.

WoLF PSORT http://wolfpsort.org/ Prediction of protein localization.

Signal-3L http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/# Signal peptide prediction.

COSMOSS Ambiguous Targeting Predictor http://www.cosmoss.org/bm/ATP

● Long-range homology searches

PSI-BLAST Position-Specific Iterated BLAST)

Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index Protein Homology/analogY Recognition Engine V 2.0

PSIPRED GenTHREADER http://bioinf.cs.ucl.ac.uk/psipred/ Protein structure prediction server.

MESSA http://prodata.swmed.edu/MESSA/MESSA.cgi MEta-Server for protein Sequence Analysis - provides predictions of local sequence features, spatial structure, domain architecture and function for a protein sequence

FFAS03 http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl Fold & Function Assignment System.

COMPASS http://prodata.swmed.edu/compass/compass.php COmparison of Multiple Protein sequence Alignments with assessment of Statistical Significance.

● Protein structures

PDB and PDBe http://www.pdb.org/ and http://www.ebi.ac.uk/pdbe/ PDB is the main structure databases PDBe has user friendly features

TargetTrack http://sbkb.org/tt/ Gives experimental progress and status of targets selected for structure determination.

MMDB http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure Molecular Modeling DataBase with >40,000 structures, linked to the rest of the NCBI databases.

● Conserved domains, motifs, and protein families

COGs http://www.ncbi.nlm.nih.gov/COG/ Clusters of Orthologous Groups (COGs), delineated by comparing protein sequences encoded in many complete genomes representing 30 major phylogenetic lineages. Each COG consists of proteins from at least 3 lineages and thus corresponds to an ancient conserved domain.

CCD http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml NCBI Conserved Domain Database

NCBI PSSM viewer http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi? Position-Specific Scoring Matrix data used for Conserved Domain Database

PFAM http://pfam.janelia.org/ Protein FAMily database

Superfamily database http://supfam.cs.bris.ac.uk/SUPERFAMILY/index.html

PRODOM http://prodom.prabi.fr/prodom/current/html/home.php PROtein DOMain families database

Seq2Ref http://prodata.swmed.edu/seq2ref/ Performs BLAST search for a query protein and retrieves the reference proteins (= experimentally studied or manually curated proteins) from NCBI, PDB and Swiss-Prot

ENZYME & METABOLIC PATHWAY RESOURCES

Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISS-PROT protein database, BRENDA, KEGG, etc)

IntEnz http://www.ebi.ac.uk/intenz/ Integrated relational Enzyme database

BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database.

KEGG http://www.genome.ad.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes metabolic pathways, and compound structures that can be captured.

IUBMB http://www.chem.qmul.ac.uk/iubmb/ and the subsection on Reaction schemes http://www.chem.qmul.ac.uk/iubmb/enzyme/reaction/ The website of the International Union of Biochemistry and Molecular Biology – Searchable database on enzyme, enzyme nomenclature; some high quality information on pathways etc.

Thermodynamics of Enzyme-Catalyzed Reactions http://xpdb.nist.gov/enzyme_thermodynamics/

EcoSalhttp://www.ecosal.org/ EcoSal, a new, continually updated Web resource based on the ASM Press publication Escherichia coli and Salmonella: Cellular and Molecular Biology. EcoSal is a comprehensive archive of knowledge on the enteric bacterial cell and a good source of the latest knowledge of metabolic pathways.

BioCyc, EcoCyc & MetaCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and Metabolism; MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases.

AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis. Software allows querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway overview diagram.

MetaCrop http://pgrc-35.ipk-gatersleben.de/pls/htmldb_pgrc/f?p=269:111: Summarizes diverse information about around 40 metabolic pathways in crop plants

HMDB http://www.hmdb.ca/ Human metabolome database

COMPARATIVE GENOMICS (‘PHYLOGENOMICS’) RESOURCES

General integration platforms (genome browsers, genome comparisons (pathways or synteny), phylogenetic distribution queries, physical clustering etc.)

SEED http://www.theseed.org/wiki/Main_Page Database containing hundreds of genomes and many valuable tools.

Patric http://www.patricbrc.org/ Emphasis on Pathogenic bacteria (but contains all sequenced bacterial genomes), multiple tools

MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php Microbial genome annotation platform (Strong on metabolism)

IMG http://img.jgi.doe.gov/ Integrated Microbial genomes data analysis system (most up to date in terms of genomes)

MGDB http://mbgd.genome.ad.jp/ Microbial Genome DataBase for comparative genomics.

NMPDR http://www.nmpdr.org/cur/FIG/wiki/view.cgi/Main/WebHome National Microbial Pathogen Data Resource.

EFI http://enzymefunction.org/ The Enzyme Function Initiative (EFI) is developing a robust sequence / structure based strategy for facilitating discovery of in vitro enzymatic and in vivo metabolic / physiological functions of unknown enzymes discovered in genome projects.

To classify a sequence into a superfamily, subgroup, or family using Hidden-Markov-Models or a BLAST search: From face page click on EFI - Informatics (SFLD) * Click on Search by Enzyme tab * Paste in sequence and select HMM or Blast.

Multiple Associations platforms

STRING http://string.embl.de/ Database of known and predicted protein-protein relationships, derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (co-expression), and the literature. STRING quantitatively integrates data from bacteria and other organisms.

STITCH http://stitch.embl.de/ STRING incorporating small molecules

eNet http://ecoli.med.utoronto.ca/ E. coli gene function prediction database integration of microarray and protein interaction data.

AraNet http://www.functionalnet.org/aranet/search.html Gene associations in Arabidopsis.

DIP http://dip.doe-mbi.ucla.edu/dip/Main.cgi database of interacting proteins from different organisms.

Genome Projector http://www.g-language.org/GenomeProjector/

The following two are a bit outdated but can still be useful:

PHYDBAC http://igs-server.cnrs-mrs.fr/phydbac/ PHYDBAC displays phylogenomic profiles (fusions, co-occurrence, co-localization in genome) of bacterial protein sequences. Analyzing the annotation of a protein’s phylogenomic neighbors helps generate hypothetical functions for the query protein(s).

FusionDB http://igs-server.cnrs-mrs.fr/FusionDB/main.html FusionDB is a database of bacterial and archaeal gene fusion events.

Specific for Algae

AFAT http://pathways.mcdb.ucla.edu/algal/index.html Algal Functional Annotation Tool

Phylogenetic distribution tools

JGI Phylogenetic Profiler http://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm Phylogenetic Profiler for Single Genes.

MicroScope Phyloprofile Exploration https://www.genoscope.cns.fr/agc/microscope/compgenomics/phyloprofil.php?

MicrobesOnline Phyletic Pattern http://www.microbesonline.org/cgi-bin/matchphyloprofile.cgi

Regulatory sites prediction and analysis in bacteria

RegPrecise http://regprecise.lbl.gov/RegPrecise/ Regulon database (Is intergrated in MicrobesOnline)

RegPredict http://regpredict.lbl.gov/regpredict/ Platform to discover regulator binding sites

RegTransbase http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=main Knowledge database on bacterial regulators

Promoter prediction in bacteria

http://nucleix.mbu.iisc.ernet.in/prombase/

àIn general the Meme suite is great to identify motifs both DNA and Protein:

Meme Suite http://meme.nbcr.net/meme/intro.html

Glam2 for example is very powerful

Associations based on phenotypes

E. coli Phenotypic Landscape http://ecoliwiki.net/tools/chemgen/ Profiling of all viable E.coli mutants on hundreds of chemicals

Yeast Fitness database http://fitdb.stanford.edu/ Profiling of all S. cerevisiae mutants on hundreds of chemicals

MICROARRAY – RNASeq DATABASES AND ANALYSIS RESOURCES

● General

GEO http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus

● Plants

Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html Good tools for getting an overview of gene expression in Arabidopsis, and for finding co-responses.

ATTEDhttp://atted.jp/ A simple site to use to look for co-expression patterns in Arabidopsis; it shows gene networks, not just lists of correlated genes.

COEXPRESdb http://coxpresdb.jp Co-expression in yeasts and animals.

GeneCAT http://genecat.mpg.de/ GeneCAT Gene Co-expression Analysis Toolbox for Arabidopsis, rice, poplar, and barley

Diurnal http://diurnal.mocklerlab.org/ Circadian/Diurnal gene expression data for an individual or set of Arabidopsis, rice, or poplar genes

Translatome eFP http://efp.ucr.edu/ Transcriptome profiling of 13 discrete Arabidopsis cell populations

PRIMe http://prime.psc.riken.jp/ Server for metabolomics and transcriptomics, tools for metabolomics, transcriptomics and integrated analysis of different omics data.

PLEXdb http://www.plexdb.org/ Plant Expression Database

Botany Array Resource http://bbc.botany.utoronto.ca/ Tools for finding co-responses, electronic Northerns.

MetaOmGraph http://metnetdb.org/MetNet_MetaOmGraph.htm Tool to plot and analyze large datasets

qteller http://qteller.com/ RNAseq data for maize, sorghum, rice. Simple tools for expression in various organs, correlation of expression of two genes.

● Bacteria

MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated gene expression in E. coli and other bacteria

EcoGene http://ecogene.org/ A rich resource on E. coli that includes Microarray data on the major changes in gene expression observed in various experiments.

GenExpDB http://chase.ou.edu/oubcf/ E. coli Community Gene Expression DataBase

Porteco http://expression.porteco.org/ E. coli microarray analysis they also have analysis of the phenotype data

● Yeast

SPELL http://imperio.princeton.edu:3000/yeast Co-response search tool for yeast

● Mammals

BioGPS http://biogps.gnf.org/#goto=welcome

Comparing bacterial genomes

MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php

Seedviewer http://pubseed.theseed.org/seedviewer.cgi

IMG http://img.jgi.doe.gov/

All have genome synteny viewers

ESSENTIAL GENES DATABASES (Pro- and Eukaryote)

OGEE http://ogeedb.embl.de/#summary Online GEne Essentiality database

DEG http://tubic.tju.edu.cn/deg/ Database of Essential Genes

PLANT PHENOME DATABASES

RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database, phenotypic data in transposon-insertional mutants.

SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation.

Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of chloroplast genes.

BAPDB http://bioweb.ucr.edu/bapdb/ Bioassay And Phenotype DataBase

PLANT METABOLOME DATABASE

PlantMetabolomics http://tht.vrac.iastate.edu:81/ Consortium profiling the metabolome of specific T-DNA knockout alleles for targeted genes

PLANT PROTEOME DATABASES

PPDB http://ppdb.tc.cornell.edu/ The Plant Proteome DataBase

SUBA3 http://suba.plantenergy.uwa.edu.au/ SUB-cellular location database for Arabidopsis proteins (includes GFP and MS-MS data)

pep2pro http://fgcz-pep2pro.uzh.ch/ Organ-specific characterisation of the Arabidopsis proteome containing 14,522 identified proteins

http://www.grenoble.prabi.fr/at_chloro/ AT_CHLORO stores information for proteins that have been identified in stroma, thylakoid, and envelope fractions of Arabidopsis chloroplasts

NBrowse http://www.arabidopsis.org/tools/nbrowse.jsp Arabidopsis protein-protein interaction database

UNKNOWN GENE/ENZYME DATABASES

POND http://bioweb.ucr.edu/scripts/unknownsDisplay.pl Plant Unknown-eome DB (POND) – Arabidopsis Unknown-eome

ORENZA http://www.orenza.u-psud.fr/ ORphan ENZyme Activities database (lists 1,200 orphan enzymes)

ADOMETA http://vitkuplab.cu-genome.org/html/adometa/adometa.html ADoption of Orphan METabolic Activities (Orphan enzyme activities in E. coli, B. subtilis, and S. cerevisiae).

GREP http://bisscat.org/GREP/ Generator of Reaction Equations & Pathways look for reported and putative enzyme reaction equations, especially designed for finding metabolic pathways on orphan metabolites (compounds known to be present at least in a living organism, but whose synthetic/degradation pathways are unknown).

LITERATURE MINING RESOURCES

PubMed Central http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc

HighWire Press http://highwire.stanford.edu/

Google Scholar http://scholar.google.com/

eTBlast http://etest.vbi.vt.edu/etblast3/

iHOP http://www.ihop-net.org/UniPub/iHOP/ information Hyperlinked Over Proteins

MAIZE GENOME RESOURCES

Maizesequence.org http://www.maizesequence.org/index.html Browser providing the latest sequence and annotation of the maize genome from the Maize Genome Sequencing Project

MaizeGDB http://www.maizegdb.org/ Maize genetics and genomics database

Gramene http://www.gramene.org/ Curated, open-source, data resource for comparative genome analysis of grasses

Updated 5/2/13
Links verified 5/2/13