Bioinformatics at the DNALC!

In April 2003 it was announced that the final draft sequence of the human genome was complete. This monumental achievement is fueling tremendous research efforts to understand the information our DNA sequence encodes. Scientists have begun to identify genes, define the proteins these genes may produce, and understand how these proteins function. To achieve these goals, biologists are integrating computer-based tools into their research routines. This new field, called bioinformatics, allows scientists to make sense of huge amounts of sequence data and to "mine" genomes for meaning.

Students visiting the DNALC will have the unprecedented opportunity to work with the same computer tools and data that genome scientists use. The six computer-based modules listed below integrate enticing content with hands-on computer exercises. Students will analyze human, plant, bacterial, and viral genomes; compare DNA sequences across species; study the evolution of modern humans; understand how variations in DNA sequence contribute to disease; view three-dimensional structures of proteins; and learn about new strategies for developing therapeutic drugs.

All classes are two and a half hours in length, and will be conducted in our state-of-the-art Biomedia computer lab.

Sickle Cell Anemia – A Disease of Diverse Populations

This computer-based lab will explore the molecular biology of sickle cell anemia from DNA sequence, to protein structure, and ultimately to disorder. Students will compare sickle cell gene sequences from patients around the world to elucidate the multiple origins of the disease. Computer simulations will address questions about why the sickle cell mutation continues to persist in several areas of the world. Students will also learn about current and emerging therapies to improve the lives of individuals with this disorder.

RESERVATION DETAILS

·  Bioinformatics labs are restricted to students in grades 10, 11, and 12.

·  Each Curriculum Study school is limited to four reservations – one free! – during academic year 2003-04. Non-Curriculum Study schools are limited to three reservations.

·  The group lab rate is $13 per student with a minimum fee of $260.

·  Unless other arrangements have been made in advance, all Bioinformatics labs begin promptly at 9:30 AM.

·  Classes cancelled less than one month prior to their scheduled date will not be permitted additional computer lab visits.

·  Reserve by phone; contact Amanda McBrien at (516) 367-5175.


Sickle-Cell Anemia – A Beneficial Genetic Disorder?

A Disease Persists – Does Looking at the Distribution Help Understand?

Map: distribution of sickle-cell anemia

Maps: distribution of malaria and anopheles spec.

How DNA Encodes Life

DNA à RNA à Protein

Schematic representations in Genome à Genome Mining à Gene Features

Animations on transcription and translation in Code

What Is a Genetic Disorder?

Example: Sickle Cell Anemia

Caused by faulty protein, due to mutated gene

What’s a mutation

SS is lethal, SA benign.

Why does it persist? Map: coincidence of malaria and HBS because HBS provides some resistance for people infected with malaria.

How can it be detected? Electrophoresis of proteins: HBS runs faster than HBB à AA: 1 large band; AS: 2 bands, 1 large and 1 small; SS: 1 small band

Proteins – How/Why They Work

Example: Hemoglobin

Goal: To find the hemoglobin beta in protein in databases, to examine its structure, and to learn how to handle a protein structure viewer.

Physiology: cells need glucose and oxygen to make energy – how do they get oxygen?

Red blood cells: circulating; loading oxygen in lungs; unloading oxygen in tissue

Why red? Iron.

Hemoglobin molecule: consists of four pairwise identical subunits: hemoglobin alpha (HBA) and hemoglobin beta (HBB). Each carries a ring-shaped molecule (porphyrin-ring) which harbors an iron atom to form heme. The iron is the atom that captures the oxygen, the porphyrine anchors iron in the protein, the protein directs the iron to bind and to release the oxygen.

What are proteins? Chains of amino acids.

How do they work? Because they have a very specific 3D-structure AND because they have very specific amino acids in very specific places.

What’s with the amino acids?

Let’s look at four:

Glu

Pro

Val

A big one!!!

The characteristics of the amino acids in proteins direct proteins’ folding AND reaction capacity.

Hemoglobin: Usually, iron would just be by itself. However, correctly folded hemoglobin-chains are able to harbor porphyrin-rings which, in turn, are capable of holding iron atoms (one per chain).

Usually, iron binds oxygen very strongly. However, correctly placed amino acids in the surrounding hemoglobin-chain regulate the ability of iron to hold on to oxygen.

Let’s look at cool images of proteins.

Go to NCBI (http://www.ncbi.nlm.nih.gov/)

Find the words Search Entrez for

Change Entrez to Nucleotide

Into the search window type HBB homo sapiens mRNA; hit Go.

Click on NM_000518

Click on Links (right side of the page)

Click on Protein

Click on NP_000509

How many amino acids does the protein have? 147 aa

Click on BLink

Click on 3D structures

Find the listing for accession 1KOYB; click on the blue circle for this entry.

Click Open

Maximize the graphic window (Cn3D 4.1); move the Sequence/Alignment Viewer window to the lower edge of the screen; close the Cn3D Message Log window.

HOW TO ZOOM IN AND OUT

Left- click on the black background and press z to zoom in.

Click on the black background and press x to zoom out one step.

Hit x again.

HOW TO ROTATE THE MOLECULE

To rotate the molecule grab a corner of the molecule with your left mouse button, move your mouse.

HOW TO MOVE THE MOLECULE

To move the molecule hold Shift down, grab the molecule with your left mouse button, move your mouse right or left, up or down.

What different things can you identify in this view? Protein, porphyrin rings, iron, some organic and anorganic molecules.

HOW TO CHANGE WHAT’s HIGHLIGHTED

Go to Style

Click Rendering Shortcuts.

Click Worms.

Go to Style.

Click Coloring Shortcuts.

Click ….

What different types of structures can you identify in this view? Beta sheets and alpha helices.

What’s the protein structure? Hemoglobin beta chain

What protein is HBB a part of? hemoglobin

How many different iron-bearing structures (porphyrin rings) can you identify? 4

What are they and what function do they serve? Heme-groups; oxygen-transport

Four subunits, two alpha and two beta chains.

HIDE/SHOW

Go to Show/Hide, click on everything that’s not highlighted, click Apply, click Done.

What happened? The entire hemoglobin molecule became visible.

CHANGE RENDERING

Go to Style, Coloring Shortcuts, click Molecule

What happened? The four different subunits are shown.

Go to Style, Rendering Shortcuts, click Worms.

What happened? The two different protein substructures alpha-helices and beta sheets became visible.

How did they get to these images? They are determined by x-ray crystallography.

Purify and concentrate the protein.

Crystallize the protein.

Project x-rays through protein and onto film.

Develop film.

Analyze the patterns of shades and white spaces on the film.

Put the puzzle together to see where the atoms are that produced the shades – Voila! You see a protein.

Not guess-work but thorough analysis!!!

A Mapquest For The Human Genome

Goal: To find out where in the human genome hemoglobin genes are located and what the characteristics of the hemoglobin beta gene (HBB) are. Is HBB an ORF gene or a spliced gene?

Go to NCBI (http://www.ncbi.nlm.nih.gov/)

Find the words Map Viewer; click on them.

The genomes of how many different organisms can be accessed through Map Viewer? 19 organisms

Find Homo sapiens (human); click on it.

How many chromosomes does the human genome consist of? 23, 24, or 25 chromosomes. Justify your response.

How many chromosomes are on display? 25 chromosomes

Why this number? The human genome consists of 25 separate DNA entities.

Find Search for; type hemoglobin into the search window; click Find

On what chromosome(s) can you locate entries containing the word hemoglobin? 6, 7, 8, 11, 16, and X

Check the list underneath the image and find out on which chromosome hemoglobin beta, HBB is located. Chromosome 11.

Click on the link HBB.

Use the ruler next to the gene and determine the length of the gene. (Hint: the numbers on the ruler mark nucleotide positions on the chromosome. Use your computers calculator to determine the length of the HBB gene.) 1,600 bp

What three different structures can you identify in the cartoon representing the HBB gene? Filled blue box, empty blue box, blue line

What do you think the blue boxes represent? Coding sequence that’s translated into protein

What the blue lines? introns

The Coding Sequence

Goal: To isolate and examine the DNA sequence encoding the hemoglobin beta protein.

Go to NCBI (http://www.ncbi.nlm.nih.gov/)

Find the words Search Entrez for

Following the word for type the words hemoglobin homo sapiens

Click the button Go

How many nucleotide entries did Entrez locate? 9,686

How many protein entries did Entrez locate? 535

How many protein structures did Entrez locate? 107

Find and click Nucleotide: sequence database (GenBank)

Does the listing only show entries for humans? If no, which else? No, rat, worm, mustard weed, a bacterium

On the page listing the hits find Homo sapiens hemoglobin, beta (HBB), mRNA

Click on NM_000518

Study the entry

How many basepairs (bp) long is the nucleotide sequence displayed? 626bp

At what nucleotide position is the start codon located? That is the position where the coding sequence of the mRNA (CDS) begins. 51

Where does the coding sequence end? 494

How many nucletoides long is the coding sequence? (The result must be a multiple of 3!) 444

Which of the three possible different stop codons TAA, TAG, TGA terminates the CDS? TAA

How many aminoacids (aa) is the protein long? 147

What kind of nucleotide polymer is the sequence represent? mRNA

How do you explain that the sequence does not contain U’s? To simplify the work with nucleotide sequences databases only use A, C, G, and T, even though RNA does not contain T’s but U’s. This is ok, however, because all you need to know whether a molecule is DNA or RNA. If it’s RNA you could just replace all T’s with U’s to get a realistic representation of the RNA molecule.

In order to work with the sequence, transfer it to a database called Sequence Server.

Highlight and copy the entire nucleotide sequence from nucleotide 1 through 626.

Go to the DNALC BioServers at http://www.bioservers.org/bioserver/

Under SequenceServer click Enter.

Close the pop-up Manual.

Click CREATE SEQUENCE.

Paste the sequence into the Sequence window.

Give the sequence some name (e.g. HBB mRNA) and write it into the Name window.

Click OK.

This is the RNA sequence, it’s 626 bp long.


The Gene

Goal: To find the HBB gene in human genomic DNA.

Identify the gene in the gene sequence (link) by aligning it with the mRNA sequence.

Highlight and copy the HBB mRNA sequence from SequenceServer.

Open http://pbil.univ-lyon1.fr/sim4.php

Paste the sequence into the first window cDNA Sequence.

Copy and highlight the genomic DNA sequence (link) and paste it into the lower window Genomic Sequence.

Click Submit.

Visualize the alignment with LalnView.

Which represents the genomic sequence, Seq1 or Seq2? Which the RNA (cDNA)? Seq1 is cDNA, Seq2 is gDNA

Which are the introns and exons? Exons are black boxes, introns are empty boxes

At which position in the RNA did the coding sequence (CDS) begin? Where was the start codon? Nucleotide 51 Would that be at the beginning of a black box or somewhere within? 51 nucleotides into it So, do exons beging with start codons or do the begin before an actual start codon? Exons begin before the start codon, a start codon is usually located within an exon.

The Mutation

Goal: To identify the DNA mutation and amino acid change leading to the HBS protein

Mutations in DNA:

Align the mRNA/coding sequence for HBB and HBS to find out where they differ.

Go to http://www.bioservers.org/bioserver/

Under SequenceServer click Enter.

Click Manage Groups.

In the upper right-hand corner find Sequence Sources, change Classes to Public.

Find HBB and Sickle Cell Anemia.

Check the box at the LEFT margin, click OK at the bottom of the page.

Go to the word None, click on the downward arrow at the right to open pull-down menu.

Click HBS CDS, homo sapiens.

Go to COMPARE and set the box to the right of it to Align: CLUSTAL W (it may be set already)

Click COMPARE to align the two sequences – WAIT!

Maximize the result window.

How many differences can you find between the two sequences? 4

What are the differences between the two sequences? T/C, A/T, T/A, T/A

Which nucleotides are different? (Hint: count the A in the start codon ATG as 1) 9, 20, 172, 387

Which amino acid positions in the protein would these differences effect? 3, 7, 58, 129

Mutations in proteins:

Get the amino acid sequences for betaglobin and sickle-cell globin by translating the DNAs.

Find HBB cDNA, homo sapiens, click Open.

Move the cursor just before the A of the ATG on the third line.

Hit Return/Enter on your keyboard, this moves the ATG to the fourth line..

Highlight and copy the sequence from the ATG to the end (don’t worry about the stop codon …).

Click Done,

In a new browser window open Gene Boy (http://www.dnai.org/geneboy/)

Click Your Sequence.

Paste the sequence into the workspace.

Click Save Sequence (your sequence should have 576 nucleotides).

On the Operations panel to the right click Transform Sequence, select Amino Acids.

Highlight the sequence under Reading Frame RF1 and copy it.

Open the Word program and paste the amino acid sequence into it.

Place a carriage return at the end of the sequence.

Place a “>” sign in front of the sequence, followed by the letters “HBB”.

Type a carriage return.

Repeat this process for the sickle cell mRNA (HBS CDS, homo sapiens) with the following modifications:

use HBS CDS, homo sapiens instead of HBB cDNA, homo sapiens;

copy the entire sequence;

pasting this sequence into GeneBoy should yield you 444 amino acids;

write HBS before the sequence instead of HBB.