Bioinformatics Exercises
Over the last two decades, information has been gaining increasing importance in both teaching and learning biochemistry. The most obvious case is the sequencing of the human genome and many other complete genomes. In 1990, the determination of the sequence of a protein was often the topic of a full publication in a peer-reviewed journal such as Science, Nature, or The Journal of Biological Chemistry. Now entire genomes are the topic of individual research papers. The term "bioinformatics" is a catch-all phrase which generally refers to the use computers and computer science approaches to the study of biological systems. The main chapters where this information is discussed in the text are chapters 3 (Nucleotides, Nucleic Acids and Genetic Information), 5 (Proteins: Primary Structure), 6 (Proteins: Three-Dimensional Structure), 12 (Enzyme Kinetics, Inhibition and Regulation) and 13 (Introduction to Metabolism). Here we provide exercises appropriate to these chapters aimed at introducing the techniques of bioinformatics that involve the use of computers, Internet-accessible databases and the tools that have been developed to “mine” those databases.
General principles
1. Open ended questions. The exercises may include some questions that have definite answers, but in many cases there will also be questions which may be answered in a number of ways, depending on the approach you take or the topic you select.
2. Stable Internet Resources. As much as possible, the exercises will be based on well established, stable web sites. If it is necessary to use less reliable sites and/or resources, attempts have been made to provide multiple sites that perform similar functions.
3. Here are the stable online resources that will be used most frequently:
a. Genbank (http://www.ncbi.nlm.nih.gov/)
b. Protein Data Bank (http://www.rcsb.org)
c. Expasy Proteomics Server (http://us.expasy.org/)
d. European Bioinformatics Institute (http://www.ebi.ac.uk/)
e. Pfam (http://www.sanger.ac.uk/Software/Pfam/)
f. SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)
g. CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
h. PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi)
i. PubMed Central (http://www.pubmedcentral.nih.gov/)
4. Answer key. Where a definite answer is known, it will be provided in an answer key. For more open-ended questions, a typical correct answer will be presented.
5. Historical perspective. If historical resources are available online (including PubMed), there may be questions designed to help students identify some of the historical roots of biochemistry and molecular biology.
Project 4 Structural Alignment and Protein Folding
Protein Folding is described in Section 6-5 in Fundamentals of Biochemistry (4th ed). The famous “Levinthal paradox” is presented along with the current thinking about different aspects of protein folding. You are encouraged to review the textbook before proceeding.
Background
The textbook describes the role of energy and entropy in protein folding. The role of kinetics in protein folding in represented in Figure 6-41, while the thermodynamics of protein folding can be described by the famous folding funnel diagram in Figure 6-42. Remember that protein folding is guided by a series of constraints that involve kinetics, thermodynamics, and function.
1. Kinetics. Once proteins are synthesized, they tend to fold rapidly; other processes such as disulfide isomerization and chaperone-assisted folding are important here, but those will not be the focus of these exercises.
2. Thermodynamics. The folding funnel presents the idea that a protein goes through one intermediate or a series of intermediates until it reaches a more stable (lower energy) form. This is not usually the thermodynamically most stable form of the protein.
3. Function. To function properly, proteins must have the correct conformation or fold, which means that they need to fold rapidly (to avoid getting selected for destruction) to the functional state.
You are certainly free to skip around this exercise, but you will probably understand it best if you go through it in a linear fashion. The exercise begins with installation of some software which you will use to explore some simple molecules and see how much variation can happen in just a small structure. The second half of the exercise is about the bienniel CASP competition for protein structure prediction, including a literature exploration exercise and an exploration of the 3D structural alignment options found on the Protein Data Bank web site (http://www.rcsb.org/pdb/home/home.do).
Software installation
To begin, you will need to visit the ChemAxon web site and install the Marvin Suite (http://www.chemaxon.com/products/marvin/marvinsketch/) on your computer. The Marvin Suite is free for everyone. We will be using a few more advanced features found in the Marvin Suite. These features are free for students and faculty members at universities and colleges, but you will need to apply for an academic license (http://www.chemaxon.com/free-software/academic-license/), a process that may take a day or so, to unlock these features.
I will include some screen shots of the installation process to assist you. This software was chosen for this exercise because it is free and available for Macintosh, Windows and Linux platforms. Your instructor may prefer other software, such as ChemSketch, a free chemical drawing program for the Windows operating system from ACD Labs (http://www.acdlabs.com/resources/freeware/chemsketch/) or one of several commercial software products. If that is the case, your instructor will be able to lead you through this exercise.
Figure 1. The opening screen for downloading the Marvin Suite (http://www.chemaxon.com/products/marvin/marvinsketch/)
Click on the Download Marvin Suite button. This will take you to a page where you need to decide what kind of user you are. Unless you have good reason to do otherwise, choose “End User”. You will then go to the download page. For Windows, your best bet is probably simply the “Windows Installer” since most Windows installations already include Java. There is only one option for the Mac (OS X Installer). Linux, like Windows usually includes Java, so your best bet is “Linux Installer”. When you select your preferred installer, you will need to register with ChemAxon to complete the download. The remainder of this installation is shown in Mac OS X (Snow Leopard), but the installation should be very similar on other platforms and operating systems.
For Mac OS X, the download is “marvinbeans-5.5.1.0-macos.dmg” as of 23 July 2011. The filename may change slightly over time but the installation should proceed as described below. When you double-click on the file, a window should pop up that contains “ChemAxon Marvin Beans Installer”.
Double click on the the icon or filename and the installer wizard will appear.
Click the Next > button to get the License Agreement page.
Click on the “I accept the agreement” radio button, but click on Next > to select the directory for the application. Again you are encouraged to use the default choice unless you have a specific reason to do otherwise.
Click on the Next > to go the Select Additional Tasks window and choose the “Create desktop icons” if you would like them.
The installation process will then proceed and may take a few minutes. The image below was captured early in the installation process.
The Finish screen will appear next and you just need to click the Finish button to complete the installation.
If you chose to the icons appear on your desktop, they should look like this.
1. Alignment of Small Molecules
Ethane: 2D and 3D
Open MarvinSketch by double clicking on the icon. You will see a window that looks like this.
Let’s start with something very simple - ethane. Use the line tool (2nd icon on the left hand side of the window - the default selection) and click in the middle of the screen to make ethane to get this image.
Use the select tool (upper left hand corner) to pick the ethane molecule. Move to the dropdown menu and select Structure...Add...Add Explicit Hydrogens to get this image.
From the dropdown menu, select File...Save As.. and call the file ethane_2D.mrv and save it as a ChemAxon Marvin Document / MRV (*.mrv), the first option under the File Format menu.
Select the ethane with the Selection tool again. From the dropdown menu select View...Transform...Rotate in 3D. What is nature of the bonds in ethane in this image?
With ethane still selected, choose Structure...Clean 3D...Clean in 3D from the dropdown menu. Then view it in 3D again (View...Transform...Rotate in 3D). How did the structure change and why did it change?
From the dropdown menu, select File...Save As.. and call the file ethane_3D.mrv.
Now explore the molecule. Rotate it in 3D (View...Transform...Rotate in 3D) to see the relative positions of the hydrogen atoms. Think back to your organic chemistry course to explain why the hydrogens on the two carbons are not superimposed when you look down the carbon-carbon single bond.
To learn more about how MarvinSketch optimizes the 3D conformation of a molecule, look at the Structure...Clean 3D...Cleaning Method menu. What is the significance of each of those options? You can find answers in the MarvinSketch help files by searching for Cleaning Methods.
Lysine: 2D and 3D
Now let’s repeat the ethane exercise on something a bit more complex - the amino acid, lysine. Here are the steps to take.
1. Draw the structure of lysine.
2. From the dropdown menu, select Structure...Add...Add Explicit Hydrogens to get this image.
3. From the dropdown menu select View...Transform...Rotate in 3D. Once again, it is just a flat structure.
4. From the dropdown menu select Structure...Clean 3D...Clean in 3D. Then use View...Transform...Rotate in 3D again. Notice the changes in the structure to take advantage of the 3D space.
5. For further interest, you are encouraged to create 2D drawings, then perform 3D optimizations on some common small molecules found in biochemistry: citric acid, acetyl CoA, ATP, NAD+.
2. Alignment of Peptides
The purpose of the simple exercises with ethane and lysine was to remind you of the difference between the 2D representations we often use for molecules and the 3D conformations that those same molecules assume in solution. You may wonder, “How does this relate to protein folding?” Protein folding is a multi-dimensional problem that must include thermodynamics (stability), kinetics (formation in a useful time frame) and function (preparing a protein that can do its job). As proteins fold, they are also constrained by the steric and electronic requirements of the amino acids that make up their chains. The bulky tryptophan side chain needs enough room to maneuver into position. The negatively charged glutamate side chain will be repelled by another negatively charged side chain and attracted by a positively charged lysine or arginine side chain.
In this exercise, you will construct a peptide consisting of 12 amino acids, optimize that peptide in 3D space, and then compare the conformation of the same exact peptide sequence that is found in dihydrofolate reductase from Candida albicans (PDB entry 3QLW): Lysine - Glutamine - Proline - Lysine - Serine - Glutamate - Leucine - Glutamine - Lysine - Phenylalanine - Valine - Glycine, which are residues 158 - 169 in the primary sequence of the enzyme.
1. You can draw this structure if you’d like the challenge. Otherwise you can download this file (link to KQPKSELQKFVG_peptide.mrv), which contains the 2D structure of the peptide that has not been optimized. It should look something like this:
2. From the dropdown menu, select Structure...Add...Add Explicit Hydrogens to get this image.
3. From the dropdown menu select View...Transform...Rotate in 3D so you can see it as a flat structure.
4. From the dropdown menu select Structure...Clean 3D...Clean in 3D. Save the file as KQPKSELQKFVG_3D_optimized_in_MarvinSketch.mrv. Then use View...Transform...Rotate in 3D again. Notice the changes in the structure to take advantage of the 3D space. How long did it take to clean the structure in 3D compared to ethane or lysine?
At this point, you have a 12 amino acid peptide that is optimized in 3D space to account for steric and electronic effects. Did this take more time than the lysine? Please note that this is a simple optimization; a commercial program such as Spartan (Wavefunction, Inc.; http://www.wavefun.com/products/spartan.html) would do a much more rigorous optimization and would probably take a great deal more time.
Explore the structure to see if the peptide bonds are planar as they are known to be in proteins. Do you see other features of the peptide where parts of the molecule rotated out of position to minimize steric or electronic conflicts?
5. Now we will move on to compare this optimized structure with the peptide that is found as a turn-helix portion of the structure of dihydrofolate reductase (DHFR) from Candida albicans using the tools in MarvinSketch and MarvinSpace.
a. Download the DHFR peptide in the file 3QLW_158_169.pdb, which contains residues 158-169 from PDB entry 3QLW.
b. Open three new windows in MarvinSketch. In the first window open the file 3QLW_158_169.pdb. In the second window open the file KQPKSELQKFVG_3D_optimized_in_MarvinSketch.mrv. Go to the third window and save it as KQPKSELQKFVG_3D_aligned_to_3QLW_158_169.mrv.
c. Copy the peptide from 3QLW_158_169.pdb and paste it into KQPKSELQKFVG_3D_aligned_to_3QLW_158_169.mrv.
d. In the window containing KQPKSELQKFVG_3D_optimized_in_MarvinSketch, use Structure...Remove...Remove Explicit Hydrogens to simplify the structure. Then use View...Transform...Rotate in 3D to rotate the peptide so that you can clearly see the phenylalanine side chain.
e. Copy the peptide from KQPKSELQKFVG_3D_optimized_in_MarvinSketch.mrv and paste it into KQPKSELQKFVG_3D_aligned_to_3QLW_158_169.mrv below the previous peptide. Just by looking at the image, you can see that the lower peptide (the one you made) is more extended than the upper peptide (the segment from PDB entry 3QLW).