Bioinformatics: Molecular Sequence Data

Plant IT Summer Workshop 2010 E. Stanley and T. Lafferty

http://bioquest.org/myplantit-2010/

http://sciencegames.4you4free.com/science_organisms.gif

Bioinformatics: Molecular Sequence Data

Overview:

· Learn how to distinguish between protein and nucleic acid sequences.

· Search an online database to identify an unknown protein or an unknown DNA sequence.

· Find other organisms with similar sequences.

View your unknown sequence:

1. Download your unknown file from the Plant IT workshop schedule.

2. Open the text file to view the sequence information. The file is in a FASTA format that makes it possible to use online bioinformatics programs to compare it with other sequences.

3. Indicate if your unknown is a protein or nucleic acid sequence by checking the appropriate description below.

Protein sequence ____ DNA sequence ____

How could you tell?

Using NCBI to Identify a Sequence:

The National Center for Biotechnology Information (NCBI) advances science and health by providing access to biomedical and genomic information. Like researchers, teachers and students can find, upload/download, and compare sequence information for proteins and nucleic acids.

You will be accessing a collection of publicly available sequences called GenBank, the NIH genetic sequence database. The sequences are annotated so you know what organism or lab the sequence came from, who submitted it, and usually its function. GenBank is part of DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and the GenBank at NCBI, the International Nucleotide Sequence Database Collaboration.

Go to the NCBI Blast site at http://blast.ncbi.nlm.nih.gov/Blast.cgi to use the similarity between biological sequences to find out about the structure and function of your unknown sequence.

4. To start, choose the appropriate BLAST search:

A NUCLEOTIDE SEQUENCE

· Go to the BLAST home page and click "nucleotide blast" under Basic BLAST.

A PROTEIN SEQUENCE

· Go to the BLAST home page and click "protein blast" under Basic BLAST.

5. Open your unknown file, choose select all, and then copy your file content.

6. Under Enter Query Sequence, paste the file content into the query box.

7. Under Choose Search Set:

If your unknown is a protein, choose non-redundant protein sequences (nr)

If your unknown is a nucleic acid, choose nucleic acid sequences (nr/nt)

8. Under Program Selection:

If your unknown is a protein, choose blastp (protein-protein BLAST)

If your unknown is a nucleic acid, choose highly similar sequences (megablast)

9. Now click the blue BLAST button at the bottom left of the screen.

Note: You may have to wait for results while everyone is trying to do this.

10. Consider your BLAST Results:

· Scroll down to the table that lists the sequences producing significant alignments (having the greatest sequence similarity).

11. Choose the top record. This sequence should have the maximum score for alignment with your unknown sequence. Enter the following information:

Accession Number: ______

Description:

12. Record information about the top record.

· Review the accession record by clicking on the accession number link to learn more.

· Fill in the information below

Kind of organism: ______

Genus and species: ______

Name of researcher(s) who produced this record:

Where the research was done:

13. List three other organisms with closely related sequences.

Are you surprised by these results? Explain.