COMPARATIVE GENOMICS WORKSHOP
‘RESSOURCEMENT’
BASIC RESOURCES
● General
NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search programs.
TAIR http://www.arabidopsis.org/ The Arabidopsis information resource.
TIGRhttp://plantta.jcvi.org/index.shtml AnnotatedArabidopsis, rice etc genomes. TIGR Gene Indices (analysis of public EST data (contig assembly, analysis of expression patterns).
EcoliHub http://ecolihub.org/ & EcoliWiki http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki
SGD http://www.yeastgenome.org/ Saccharomyces genome database
ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames
ExPASy Compute PI/Mol Wt Tool http://www.expasy.ch/tools/pi_tool.html
ExPASy AACompIdent Tool http://www.expasy.ch/tools/aacomp/ Identification of a protein from its amino acid composition.
Primer3 http://frodo.wi.mit.edu/primer3/ Primer design site
NEBcutter http://tools.neb.com/NEBcutter2/index.php Molecular Biology restriction digests site
Seq Massager http://www.attotron.com/cybertory/analysis/seqMassager.htm Cleaning up sequences for Bioinformatics platforms
ABIM http://sites.univ-provence.fr/~wabim/english/logligne.html ABIM online sequence analysis tools listing
GOLD http://www.genomesonline.org/cgi-bin/GOLD/index.cgi Resource for centralized monitoring of genome and metagenome projects
DiArk http://www.diark.org/diark/ Compilation of eukaryotic genome and EST sequencing projects
● Multiple sequence alignment, phylogeny, Venn diagrams
Computational Approaches in Comparative Genomics http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.TOC&depth=1 On-line textbook by EV Koonin & MY Galperin.
Multalin Sequence Alignment http://multalin.toulouse.inra.fr/multalin/ Aligns sequences (output in color) and makes phylogenetic trees.
ClustalOmega and Phylogenetic trees http://www.ebi.ac.uk/Tools/msa/clustalo/ Aligns protein sequences and makes phylogenetic trees.
T-Coffee & M-Coffee http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi Tools for combining and comparing multiple sequence alignments.
WebLogos http://weblogo.berkeley.edu/ Creates Logos from multiple alignments
MEGA http://www.megasoftware.net/ The MEGA phylogeny program, downloads and manual.
Phylogeny.fr www.phylogeny.fr/ Very good phylogentic tree platform for beginners, has a great tool BlastExlorer to collect sequences for alignments and trees
iTOL Interactive tree of life
VENNY - http://bioinfogp.cnb.csic.es/tools/venny/index.html - Interactive tool for comparing lists with Venn Diagrams.
● Transmembrane and organellar targeting predictions
TMHMM http://www.cbs.dtu.dk/services/TMHMM/ Prediction of transmembrane helices.
TargetP http://www.cbs.dtu.dk/services/ Prediction of protein localization.
Predotar http://urgi.versailles.inra.fr/predotar/predotar.html Prediction of protein localization.
iPSORT http://hc.ims.u-tokyo.ac.jp/iPSORT/ Prediction of protein localization.
WoLF PSORT http://wolfpsort.org/ Prediction of protein localization.
Signal-3L http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/# Signal peptide prediction.
COSMOSS Ambiguous Targeting Predictor http://www.cosmoss.org/bm/ATP
● Long-range homology searches
PSI-BLAST Position-Specific Iterated BLAST)
Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index Protein Homology/analogY Recognition Engine V 2.0
PSIPRED GenTHREADER http://bioinf.cs.ucl.ac.uk/psipred/ Protein structure prediction server.
MESSA http://prodata.swmed.edu/MESSA/MESSA.cgi MEta-Server for protein Sequence Analysis - provides predictions of local sequence features, spatial structure, domain architecture and function for a protein sequence
FFAS03 http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl Fold & Function Assignment System.
COMPASS http://prodata.swmed.edu/compass/compass.php COmparison of Multiple Protein sequence Alignments with assessment of Statistical Significance.
● Protein structures
PDB and PDBe http://www.pdb.org/ and http://www.ebi.ac.uk/pdbe/ PDB is the main structure databases PDBe has user friendly features
TargetTrack http://sbkb.org/tt/ Gives experimental progress and status of targets selected for structure determination.
MMDB http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure Molecular Modeling DataBase with >40,000 structures, linked to the rest of the NCBI databases.
● Conserved domains, motifs, and protein families
COGs http://www.ncbi.nlm.nih.gov/COG/ Clusters of Orthologous Groups (COGs), delineated by comparing protein sequences encoded in many complete genomes representing 30 major phylogenetic lineages. Each COG consists of proteins from at least 3 lineages and thus corresponds to an ancient conserved domain.
CCD http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml NCBI Conserved Domain Database
NCBI PSSM viewer http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi? Position-Specific Scoring Matrix data used for Conserved Domain Database
PFAM http://pfam.janelia.org/ Protein FAMily database
Superfamily database http://supfam.cs.bris.ac.uk/SUPERFAMILY/index.html
PRODOM http://prodom.prabi.fr/prodom/current/html/home.php PROtein DOMain families database
Seq2Ref http://prodata.swmed.edu/seq2ref/ Performs BLAST search for a query protein and retrieves the reference proteins (= experimentally studied or manually curated proteins) from NCBI, PDB and Swiss-Prot
ENZYME & METABOLIC PATHWAY RESOURCES
Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISS-PROT protein database, BRENDA, KEGG, etc)
IntEnz http://www.ebi.ac.uk/intenz/ Integrated relational Enzyme database
BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database.
KEGG http://www.genome.ad.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes metabolic pathways, and compound structures that can be captured.
IUBMB http://www.chem.qmul.ac.uk/iubmb/ and the subsection on Reaction schemes http://www.chem.qmul.ac.uk/iubmb/enzyme/reaction/ The website of the International Union of Biochemistry and Molecular Biology – Searchable database on enzyme, enzyme nomenclature; some high quality information on pathways etc.
Thermodynamics of Enzyme-Catalyzed Reactions http://xpdb.nist.gov/enzyme_thermodynamics/
EcoSalhttp://www.ecosal.org/ EcoSal, a new, continually updated Web resource based on the ASM Press publication Escherichia coli and Salmonella: Cellular and Molecular Biology. EcoSal is a comprehensive archive of knowledge on the enteric bacterial cell and a good source of the latest knowledge of metabolic pathways.
BioCyc, EcoCyc & MetaCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and Metabolism; MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases.
AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis. Software allows querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway overview diagram.
MetaCrop http://pgrc-35.ipk-gatersleben.de/pls/htmldb_pgrc/f?p=269:111: Summarizes diverse information about around 40 metabolic pathways in crop plants
HMDB http://www.hmdb.ca/ Human metabolome database
COMPARATIVE GENOMICS (‘PHYLOGENOMICS’) RESOURCES
General integration platforms (genome browsers, genome comparisons (pathways or synteny), phylogenetic distribution queries, physical clustering etc.)
SEED http://www.theseed.org/wiki/Main_Page Database containing hundreds of genomes and many valuable tools.
Patric http://www.patricbrc.org/ Emphasis on Pathogenic bacteria (but contains all sequenced bacterial genomes), multiple tools
MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php Microbial genome annotation platform (Strong on metabolism)
IMG http://img.jgi.doe.gov/ Integrated Microbial genomes data analysis system (most up to date in terms of genomes)
MGDB http://mbgd.genome.ad.jp/ Microbial Genome DataBase for comparative genomics.
NMPDR http://www.nmpdr.org/cur/FIG/wiki/view.cgi/Main/WebHome National Microbial Pathogen Data Resource.
EFI http://enzymefunction.org/ The Enzyme Function Initiative (EFI) is developing a robust sequence / structure based strategy for facilitating discovery of in vitro enzymatic and in vivo metabolic / physiological functions of unknown enzymes discovered in genome projects.
To classify a sequence into a superfamily, subgroup, or family using Hidden-Markov-Models or a BLAST search: From face page click on EFI - Informatics (SFLD) * Click on Search by Enzyme tab * Paste in sequence and select HMM or Blast.
Multiple Associations platforms
STRING http://string.embl.de/ Database of known and predicted protein-protein relationships, derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (co-expression), and the literature. STRING quantitatively integrates data from bacteria and other organisms.
STITCH http://stitch.embl.de/ STRING incorporating small molecules
eNet http://ecoli.med.utoronto.ca/ E. coli gene function prediction database integration of microarray and protein interaction data.
AraNet http://www.functionalnet.org/aranet/search.html Gene associations in Arabidopsis.
DIP http://dip.doe-mbi.ucla.edu/dip/Main.cgi database of interacting proteins from different organisms.
Genome Projector http://www.g-language.org/GenomeProjector/
The following two are a bit outdated but can still be useful:
PHYDBAC http://igs-server.cnrs-mrs.fr/phydbac/ PHYDBAC displays phylogenomic profiles (fusions, co-occurrence, co-localization in genome) of bacterial protein sequences. Analyzing the annotation of a protein’s phylogenomic neighbors helps generate hypothetical functions for the query protein(s).
FusionDB http://igs-server.cnrs-mrs.fr/FusionDB/main.html FusionDB is a database of bacterial and archaeal gene fusion events.
Specific for Algae
AFAT http://pathways.mcdb.ucla.edu/algal/index.html Algal Functional Annotation Tool
Phylogenetic distribution tools
JGI Phylogenetic Profiler http://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm Phylogenetic Profiler for Single Genes.
MicroScope Phyloprofile Exploration https://www.genoscope.cns.fr/agc/microscope/compgenomics/phyloprofil.php?
MicrobesOnline Phyletic Pattern http://www.microbesonline.org/cgi-bin/matchphyloprofile.cgi
Regulatory sites prediction and analysis in bacteria
RegPrecise http://regprecise.lbl.gov/RegPrecise/ Regulon database (Is intergrated in MicrobesOnline)
RegPredict http://regpredict.lbl.gov/regpredict/ Platform to discover regulator binding sites
RegTransbase http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=main Knowledge database on bacterial regulators
Promoter prediction in bacteria
http://nucleix.mbu.iisc.ernet.in/prombase/
àIn general the Meme suite is great to identify motifs both DNA and Protein:
Meme Suite http://meme.nbcr.net/meme/intro.html
Glam2 for example is very powerful
Associations based on phenotypes
E. coli Phenotypic Landscape http://ecoliwiki.net/tools/chemgen/ Profiling of all viable E.coli mutants on hundreds of chemicals
Yeast Fitness database http://fitdb.stanford.edu/ Profiling of all S. cerevisiae mutants on hundreds of chemicals
MICROARRAY – RNASeq DATABASES AND ANALYSIS RESOURCES
● General
GEO http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus
● Plants
Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html Good tools for getting an overview of gene expression in Arabidopsis, and for finding co-responses.
ATTEDhttp://atted.jp/ A simple site to use to look for co-expression patterns in Arabidopsis; it shows gene networks, not just lists of correlated genes.
COEXPRESdb http://coxpresdb.jp Co-expression in yeasts and animals.
GeneCAT http://genecat.mpg.de/ GeneCAT Gene Co-expression Analysis Toolbox for Arabidopsis, rice, poplar, and barley
Diurnal http://diurnal.mocklerlab.org/ Circadian/Diurnal gene expression data for an individual or set of Arabidopsis, rice, or poplar genes
Translatome eFP http://efp.ucr.edu/ Transcriptome profiling of 13 discrete Arabidopsis cell populations
PRIMe http://prime.psc.riken.jp/ Server for metabolomics and transcriptomics, tools for metabolomics, transcriptomics and integrated analysis of different omics data.
PLEXdb http://www.plexdb.org/ Plant Expression Database
Botany Array Resource http://bbc.botany.utoronto.ca/ Tools for finding co-responses, electronic Northerns.
MetaOmGraph http://metnetdb.org/MetNet_MetaOmGraph.htm Tool to plot and analyze large datasets
qteller http://qteller.com/ RNAseq data for maize, sorghum, rice. Simple tools for expression in various organs, correlation of expression of two genes.
● Bacteria
MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated gene expression in E. coli and other bacteria
EcoGene http://ecogene.org/ A rich resource on E. coli that includes Microarray data on the major changes in gene expression observed in various experiments.
GenExpDB http://chase.ou.edu/oubcf/ E. coli Community Gene Expression DataBase
Porteco http://expression.porteco.org/ E. coli microarray analysis they also have analysis of the phenotype data
● Yeast
SPELL http://imperio.princeton.edu:3000/yeast Co-response search tool for yeast
● Mammals
BioGPS http://biogps.gnf.org/#goto=welcome
Comparing bacterial genomes
MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php
Seedviewer http://pubseed.theseed.org/seedviewer.cgi
IMG http://img.jgi.doe.gov/
All have genome synteny viewers
ESSENTIAL GENES DATABASES (Pro- and Eukaryote)
OGEE http://ogeedb.embl.de/#summary Online GEne Essentiality database
DEG http://tubic.tju.edu.cn/deg/ Database of Essential Genes
PLANT PHENOME DATABASES
RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database, phenotypic data in transposon-insertional mutants.
SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation.
Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of chloroplast genes.
BAPDB http://bioweb.ucr.edu/bapdb/ Bioassay And Phenotype DataBase
PLANT METABOLOME DATABASE
PlantMetabolomics http://tht.vrac.iastate.edu:81/ Consortium profiling the metabolome of specific T-DNA knockout alleles for targeted genes
PLANT PROTEOME DATABASES
PPDB http://ppdb.tc.cornell.edu/ The Plant Proteome DataBase
SUBA3 http://suba.plantenergy.uwa.edu.au/ SUB-cellular location database for Arabidopsis proteins (includes GFP and MS-MS data)
pep2pro http://fgcz-pep2pro.uzh.ch/ Organ-specific characterisation of the Arabidopsis proteome containing 14,522 identified proteins
http://www.grenoble.prabi.fr/at_chloro/ AT_CHLORO stores information for proteins that have been identified in stroma, thylakoid, and envelope fractions of Arabidopsis chloroplasts
NBrowse http://www.arabidopsis.org/tools/nbrowse.jsp Arabidopsis protein-protein interaction database
UNKNOWN GENE/ENZYME DATABASES
POND http://bioweb.ucr.edu/scripts/unknownsDisplay.pl Plant Unknown-eome DB (POND) – Arabidopsis Unknown-eome
ORENZA http://www.orenza.u-psud.fr/ ORphan ENZyme Activities database (lists 1,200 orphan enzymes)
ADOMETA http://vitkuplab.cu-genome.org/html/adometa/adometa.html ADoption of Orphan METabolic Activities (Orphan enzyme activities in E. coli, B. subtilis, and S. cerevisiae).
GREP http://bisscat.org/GREP/ Generator of Reaction Equations & Pathways look for reported and putative enzyme reaction equations, especially designed for finding metabolic pathways on orphan metabolites (compounds known to be present at least in a living organism, but whose synthetic/degradation pathways are unknown).
LITERATURE MINING RESOURCES
PubMed Central http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc
HighWire Press http://highwire.stanford.edu/
Google Scholar http://scholar.google.com/
eTBlast http://etest.vbi.vt.edu/etblast3/
iHOP http://www.ihop-net.org/UniPub/iHOP/ information Hyperlinked Over Proteins
MAIZE GENOME RESOURCES
Maizesequence.org http://www.maizesequence.org/index.html Browser providing the latest sequence and annotation of the maize genome from the Maize Genome Sequencing Project
MaizeGDB http://www.maizegdb.org/ Maize genetics and genomics database
Gramene http://www.gramene.org/ Curated, open-source, data resource for comparative genome analysis of grasses
Updated 5/2/13
Links verified 5/2/13