CAMP WILDNESS 2005On-line sequence data analysis

The purpose of this exercise is to introduce you to a few web tools for analysis of rRNA and DNA sequence data. I will demonstrate these tools and you will then have an opportunity to try them out yourselves. This will prepare you for analysis of your own 16S rRNA sequence data if we decide to obtain it from your DGGE bands. Using the test sequence below you should obtain the results that are italicized below. You will each be given your own unknown sequences in class.

  1. Ribosomal Database Project (RDP, This website contains many thousands of rRNA sequences and provide a number of subroutines for comparative analysis of them. It is also a handy reference for determining the phylogenetic relationships among microorganisms. It is out of date with respect to other major sequence repositories. We will use it to
  1. Identify the closest relative of an unknown 16S rRNA sequence that will be provided to you.
  2. Open the website
  3. Select “Online Analysis”
  4. Locate “Sequence Match” for small subunit rRNA in the table and click on “run”.
  5. Either copy and paste your sequence data into the box provided. For now, use the text sequence at the end of this document. At the bottom of the page, select “Submit Sequences”
  6. The result will be a list, possibly showing the hierarchical phylogeny, of the closest relative in the database.

Example: The test sequence was found to be identical to the SSU rRNA sequence of Nostoc muscorum, a member of the Nostoc group of Cyanobacteria, within the kingdom containing cyanobacteria and chloroplasts (prokaryotic oxygenic phototrophs) of Domain Bacteria.

BACTERIA
CYANOBACTERIA_AND_CHLOROPLASTS
CYANOBACTERIA
NOSTOC_GROUP
Cyls.7417 / 0.878 / 1319 / Cylindrospermum sp. PCC 7417
Nost.muscr / 1.000 / 1412 / Nostoc muscorum PCC 7120
AB016520 / 0.993 / 1369 / "Anabaena variabilis" IAM M-3
Anbn.cyli2 / 0.861 / 1422 / "Anabaena cylindrica" str. NIES19 PCC 7122
Nost.punct / 0.837 / 1322 / Nostoc punctiforme PCC 73102
AF062637 / 0.837 / 1378 / Nostoc GSV224 str. GSV224
AF062638 / 0.844 / 1391 / Nostoc ATCC53789 ATCC 53789
AF027653 / 0.842 / 1322 / Nostoc TDI#AR94 str. TDI#AR94
  1. Note that by clicking on the organism name, you can obtain the primary sequence data and also information about the reference that reported the sequence.
  2. Select an unknown SSU rRNA sequence, which is available on the class website. Report in the space below the closest relative in the RDP database for the unknown sequence and also the fractional relatedness value to this sequence.

Unknown SSU rRNA sequence number:

Closest RDP relative to unknown sequence:

Fractional similarity score:

Hierarchical phylogeny of unknown sequence:

  1. BLAST search ( This website allows you to compare an unknown sequence to very large and up-to-date gene sequence databases to find the closest relative. There are many options. We will use “blastn”, which rapidly compares a DNA sequence to other DNA sequences.
  2. Open the website
  3. Under “Nucleotide” select “Nucleotide-nucleotide BLAST (blastn)”
  4. Copy and paste the test SSU rRNA sequence into the window and click on “BLAST NOW”
  5. Click on “Format” in the resulting window after 20-30 seconds. If not result is obtained, try again at 20 second intervals.
  6. The results should be a listing of the most closely related sequences in the database together with a score and E value.

Score E

Sequences producing significant alignments: (bits) Value

gi|39010|emb|X59559.1|AS16SRNA Anabaena sp. 16S rRNA gene 1392 0.0

gi|23978183|dbj|AB074502.1| Anabaena variabilis gene for 16... 1392 0.0

gi|8896059|gb|AF247593.1|AF247593 Anabaena variabilis NIES2... 1392 0.0

gi|17134031|dbj|AP003598.1| Nostoc sp. PCC 7120 DNA, comple... 1392 0.0

gi|17133115|dbj|AP003595.1| Nostoc sp. PCC 7120 DNA, comple... 1392 0.0

gi|17131110|dbj|AP003588.1| Nostoc sp. PCC 7120 DNA, comple... 1392 0.0

gi|17130808|dbj|AP003587.1| Nostoc sp. PCC 7120 DNA, comple... 1392 0.0

gi|4126696|dbj|AB016520.1| Anabaena variabilis gene for 16S... 1376 0.0

gi|29124941|gb|AY218829.1| Anabaena flos-aquae UTCC 64 16S ... 1366 0.0

gi|15011024|gb|AF317631.1|AF317631 Nostoc sp. PCC 7120 16S ... 1350 0.0

gi|6522669|emb|AJ133163.1|CSP133163 Cylindrospermum sp. (st... 1277 0.0

gi|5814235|gb|AF132789.1|AF132789 Cylindrospermum ATCC29204... 1277 0.0

gi|32400305|dbj|AB085687.1| Nostoc sp. HK-01 gene for 16S r... 1273 0.0

gi|8896061|gb|AF247595.1|AF247595 Anabaenopsis circularis N... 1273 0.0

gi|16944861|emb|AJ293110.1|AFL293110 Anabaena cf. cylindric... 1253 0.0

gi|29824074|dbj|AB093486.1| Tolypothrix sp. IAM M-259 gene ... 1243 0.0

gi|8896058|gb|AF247592.1|AF247592 Anabaena cylindrica NIES1... 1237 0.0

gi|6522637|emb|AJ133162.1|ASP133162 Anabaena sp. (strain PC... 1235 0.0

gi|15011022|gb|AF317629.1|AF317629 Anabaena sp. PCC 7108 16... 1211 0.0

gi|29824078|dbj|AB093490.1| Nostoc entophytum IAM M-267 gen... 1205 0.0

gi|29124940|gb|AY218828.1| Nostoc muscorum CENA61 16S ribos... 1201 0.0

gi|23978189|dbj|AB074504.1| Calothrix brevissima gene for 1... 1201 0.0

  1. Click on the top hit and record information about the gene and the organism (and habitat, if possible) from which it came (e.g., 16S rRNA sequence from Anabaena sp.; N.B. Nostoc muscorum is on the list a bit further down)
  2. “Go back” to the previous page and scroll down to the first sequence beneath the list of hits

gi|39010|emb|X59559.1|AS16SRNA Anabaena sp. 16S rRNA gene

Length = 1489

Score = 1392 bits (702), Expect = 0.0

Identities = 716/723 (99%)

Strand = Plus / Plus

Query: 1 gctagttggtgtggtaagagcgcaccaaggcgacgatcagtagctggtctgagaggatga 60

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 214 gctagttggtgtggtaagagcgcaccaaggcgacgatcagtagctggtctgagaggatga 273

Query: 61 tcagccacactgggactgagacacggcccagactcctacgggaggcagcagtggggaatt 120

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct: 274 tcagccacactgggactgagacacggcccagactcctacgggaggcagcagtggggaatt 333

  1. The information under “Identities” gives a ratio of the number of matching nucleotides divided by the total number of nucleotides compared, and the percent similarity (e.g., 716/723 = 99%).
  2. Examine the phylogeny of the closest relative using the RDP website.
  3. Follow steps 1.a.i and ii above
  4. Run “Heirarchy Browser” for small subunit rRNA
  5. Click on “Search”
  6. Enter the genus and species name of closest relative from the BLAST search
  7. Click on “Search”
  8. The output is a hierarchical display of the phylogenetic “neighborhood” of the organism. Record the Domain and first Subdomain level of the organisms phylogeny.

Example: If you search for “Sulfolobus acidocaldarius” you should find that it belongs to a subgroup of Domain Archaea, Kingdom Crenarchaeota.

  1. Report the following information for your unknown sequence below:

Closest BLAST relative to unknown sequence:

Percent similarity to closest relative:

Hierarchical phylogeny:

Information you can glean from clicking on the closest relative (e.g., habitat):

  1. The Institute for Genome Research (TIGR, This is one of the primary websites for genomic sequences and their analysis. We will do a couple of simple exercises to demonstrate the breadth and depth of this database.
  2. Open the website
  3. Under “Genome Databases” click on “Comprehensive Microbial Resources”
  4. Search through the click down list of sequenced genomes
  5. Has your unknown organism’s genome been sequenced?
  6. If not, go back to and under “Genome Databases” click on “Unfinished Genomes”.
  7. Is your unknown organism’s genome being sequenced?
  8. Click on any genome in 3c above to link to the genome page. There you will find a circular display and access to the genome itself.
  9. At the top of the genome page click on the “Searches” tab
  10. Then click on “Name” and enter “DNA polymerase”
  11. This will take you to a list of matching results.
  12. Click on any one of these to get to an individual “gene page”
  13. Then browse to find out all of the various kinds of information available for every gene in the genome.

Test sequence:

GCUAGUUGGUGUGGUAAGAGCGCACCAAGGCGACGAUCAGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAGGCAGCAGUGGGGAAUUUUCCGCAAUGGGCGAAAGCCUGACGGAGCAAUACCGCGUGAGGGAGGAAGGCUCUUGGGUUGUAAACCUCUUUUCUCAGGGAAUAAAAAAAUGAAGGUACCUGAGGAAUAAGCAUCGGCUAACUCCGUGCCAGCAGCCGCGGUAAUACGGAGGAUGCAAGCGUUAUCCGGAAUGAUUGGGCGUAAAGCGUCCGCAGGUGGCACUGUAAGUCUGCUGUUAAAGAGCAAGGCUCAACCUUGUAAAGGCAGUGGAAACUACAGAGCUAGAGUACGUUCGGGGCAGAGGGAAUUCCUGGUGUAGCGGUGAAAUGCGUAGAGAUCAGGAAGAACACCGGUGGCGAAAGCGCUCUGCUAGGCCGUAACUGACACUGAGGGACGAAAGCUAGGGGAGCGAAUGGGAUUAGAUACCCCAGUAGUCCUAGCCGUAAACGAUGGAUACUAGGCGUGGCUUGUAUCGACCCGAGCCGUGCCGGAGCCAACGCGUUAAGUAUCCCGCCUGGGGAGUACGCACGCAAGUGUGAAACUCAAAGGAAUUGACGGGGGCCCGCACAAGCGGUGGAGUAUGUGGUUUAAUUCGAUGCAACGCGAAGAACCUUACCAAGAC