Computational skills to handle large amounts of molecular data are essential in modern biological studies. Currently most research universities have university-wide bioinformatics programs. For example, Duke University, UNC, UNCC and NCSU all have bioinformatics institutes or departments. However, no formal Bioinformatics course exists at ECU. As a research university, the Department of Biology graduate faculty determined that such a course is urgently needed to prepare our graduate and undergraduate students for the enormous challenges and opportunities in this new genomics era. The proposal is designed for graduate students, in particular those enrolled in the Ph.D. programs of Biology, Biomedical Physics, Biochemistry and Molecular Biology, Microbiology and Immunology and other related departments, to gain skills necessary for biological data analyses.

7880, 7881. Bioinformatics (4, 0) 1 2-hour lecture and 2 2-hour labs per week. P: Course in biochemistry or consent of instructor. Bioinformatic skills necessary for routine molecular sequence analyses using computational programs.

Textbook: Xiong, J. 2006. Essential Bioinformatics. Cambridge University Press.

Course objectives:

The student will be able to:

·  Apply NCBI and other biological databases to analyze data

·  Interpret the output information

·  Perform analyses of molecular sequence data

·  Utilize Unix and associated commands to predict protein structure, gene and motif, and regulatory elements prediction

·  Align multiple sequences in order reconstruct biochemical networks

·  Annotate genome sequences

·  Compare pairwise sequence similarity

·  Design and develop simple bioinformatic programs using programming languages such as Python or Perl

·  Solve problems in their own research areas

Course content:

·  Primary and secondary databases

·  GenBank, EMBL and other data formats conversion

·  Unix environment, file and directory processing etc

·  GenBank non-redundant databases and RefSeq

·  Sequence downloading and database creation using formatdb and xformat

·  Entrez, Pubmed, NCBI Taxonomy, and MapViewer

·  Genome assembly and chromosomal reconstruction

·  Pairwise sequences alignment (dotplot, local and global alignments, Needle and Wunsch as well as Simth-Waterman algorithms)

·  Scoring matrices (PAM and BLOSSUM), alignment score calculation, and alignment quality assessment

·  BLAST algorithm and programs (including PSI-BLAST) and output interpretation


·  Python and programming using Python (data types, operations, methods and functions, control flow, file processing)

·  Multiple sequence alignments (progressive, consistency-based, and iterative alignment algorithms)

·  Clustalw, T-Coffee, ProbCons, Muscle, Mafft, Dialign, ProDA, POA, SATCHMO, Protal2DNA and RevTrans etc.

·  Sequence editors

·  Principles of phylogenetic analyses

·  Phylogenetic application programs (PAUP, PHYLIP, Tree-Puzzle, PHYML, and MrBayes etc.)

·  Phylogenetic tree visualization and manipulation programs (NJPlot, Treeview, Phyloverde, TreeEdit, and Phylo-win etc.)

·  Motif and domain prediction (regular expression-based and profile-based)

·  PROSITE, Emotif, Pfam, ProDom, PRINT, BLOCKS, and SMART etc.

·  Sequence logos

·  Hidden Markov Model and applications

·  Gene Prediction (prokaryotes and eukaryotes)

·  Gene prediction application programs (Glimmer, GlimmerM, GenMark, FGENES, GenScan etc)

·  Prediction of promoters and regulatory elements (BProm, Eponine, McPromoter, FirstEF, TS-W/TSSG, Consite, PromH, Footprinter, rVISTA, CUBIC, MEME, AlignAce, Melina etc.)

·  Protein structure prediction

·  Biochemical network reconstruction

·  Automation and pipeline program development

Evaluation/Grading Scale:

Homework and lab exercises 40%

Midterm exam 25%

Final exam 35%

A= 90-00

B= 89-80

C= 79-70

F= fail

