EvoGen Honors – In Silico

First Semester Exam – December 16, 2010

Name: ______

Honor Statement (please write the ENTIRE honor code and sign it):

You may access your email now to download the applicable sequence to complete this portion of the assignment. You may also use this handout as a resource, MS Word and/or google docs to complete the written portion of this assignment and the internet to access whatever genome browser needed, but NCBI, the DNA Subway, Primer 3 plus, and Target/Muscle should do it. Other than accessing your email, utilizing word/google docs and utilizing the internet on the specified sites, you should not access other materials from your computer to complete this portion of the assignment.

Instructions to help you with the DNA Subway for the Final Exam: A Step-By-Step Guide

Thefirst step will be to create a new project. You will upload a large sequence into DNA Subway, run several of the DNA Subway Lines, upload the applicable intron, annotate the applicable region and then design your primers using Primer 3 Plus. Along the way, questions that you will need to answer will be bolded & underlined. Please type all questions into the google document, answer the questions and paste applicable screen shots, etc… on your document. When finished, please make sure to share the google document with me at .

Please make sure I have a copy of your google document before you leave the final!

1. Open After logging in with your user name and password, click on Annotate a Genomic Sequence to begin a new project.

2.You will want to download the two sequences that have been given to you in an email from me, which will provide you with the information you will need to start the project. The first sequence is a very long sequence. It is attached to the email in a file entitled, ‘6351verylongsequence.’ The second is a shorter sequence entitled ‘intron6351.’ Download both files to your desktop for easy access later in the process.

3.It is time to create the project on the red line. Once you have clicked on “annotate a genomic sequence (see step one), your screen should look as below. You will follow the steps below to download the ‘very long sequence’ file from your desktop.

A.Click the “Choose File” button and use the file box to select your genomic sequence file. This is the file that you should have saved to your desktop labeled ‘6351 verylongsequence.’

B. Enter a project description.

C. Enter a meaningful Project title.

D. Enter Zea mays for the scientific name and “corn” for the common name. (Maize is not allowed.)

E. Choose Monocotyledons. This is the large group of plants that maize belongs to.

F. Click “Continue” to create this project.

You have now uploaded your BIG sequence and your project is started. But, you have to run several programs along the way to ultimately annotate your sequence.

Step II. Genome annotation: Repeat Masking

4. This is the main annotation page. You will see that there are stops (circles) and sidelines along the main Red Line. One circle has a green “R” for Run, while all the others have red “X’s.” which means blocked. You have to start by clicking the green “R” in the RepeatMasker station.

QUESTION #1: Click on B above to look at a visual of the local gene browser in GBrowse. Based on the predicted repeats, indicate a range of bases within the uploaded sequence where you would NOT expect the gene prediction programs to find a gene? (Your answer should include a base pair range, eg, 1000-2000k). (3 pts)

5. Now, move down the line and run all of the predict gene programs, which include Augustus, FGenesH, and Snap (there is no need to run tRNA Scan for this project). To do so, just click on each box. They will have an “R” when you can run them (as is now the case for the search database options below), and a “V” when they are finished running and are ready to be viewed (as is the case for the find repeats and predict gene options below).

Once all of the gene predictions have a “V” beside them, you will want to look at the data visually. To do so, click on the black button on the far right labeled “local browser.” A new window will appear that looks like the picture below.

The gene predictions or models are shown in shades of green. You can see that while the gene models are somewhat similar, they are not exactly the same when you compare Augustus, FgenesH and Snap. QUESTION #2: Why is it important to look for EST evidence before selecting the gene model from which you will design primers? (3 pts)

6. Run BlastN and BlastX under the “Search Databases” section of the Red Line.

7. While BlastN and BlastX are running, click on “Upload Data,” and click on the Add DNA data in FASTA format. Find the document entitled “Intron6351” that you saved to your desktop and upload it now.

8. Open the Local Browser again and view the results. You will now see there is plenty of experimental evidence for the gene predictions, based on the results from BlastN (below is an example from a different blast).

9. Next, you must run User BlastN to blast the intron you just uploaded against the large sequence you have already been annotating. To do so, click on User BlastN under search databases.

10. When the UserBlastN finishes, view the result in the Local Browser.

You can now see where your intron is located within the gene of interest and begin to annotate the flanking exons surrounding the intron of interest (again, the above picture is from a different project). To do so, it makes sense to zoom into the region of interest. QUESTION #3: Given the results of BlastN and User BlastN, which Gene Prediction model is the best fit for gene annotation and primer design? (2 pts)

11. To zoom in, Click and Drag the mouse in the Chromosome Track under Details. A purple box appears and you can zoom to however close in you would like. When you release the mouse GBrowse will refresh and zoom in on the region you selected. This refresh may take a minute so be patient.

Once you are zoomed in, click on a green exon from the gene model prediction you would like to use, and a box asking you to show details will come up, as shown below. Click on show details.

16. In the new window, you will see much information about the gene prediction. Scroll down towards the end of the report and you will see the sequence in hot pink and pale pink. The hot pink is the exon sequence and the pale pink is the intron sequence. Copy and paste the sequence of the targeted intron and flanking exons and save in your Google Document. You should know how to select the proper exon and intron sequence based on the picture in GBrowse. The sequence below is example sequence, it is NOT from Intron 6351.

Question #4: Which exons did you select to use to design your primer? Why did you select these exons? In general, why is it a good idea to design primers within exons of a gene? (12 points)

17. Change the font to “Courier New” and the font size to 8pt.(NOTE: To return to the GBrowse window use the “Back” button of the web browser window.)

QUESTION #5: Make a sketch of the intron of interest (including the size of the intron) and the flanking exons (including the size of each flanking exon). You can either do this sketch electronically on your google document or you can make a sketch by hand on a separate piece of paper. (10 pts)

QUESTION #6: Capture the selected sequence from the DNA Subway and save it in your google document. (5 pts)

Using Primer 3 Plus to Design Primers

Now that we have annotated the sequence and have the information and the sequence we need from DNA Subway, we will use Primer 3 Plus to design primers. Your drawing of the applicable intron and flanking exons should look similar to what is below (except it should have the specific locations and lengths of each exon and intron).

Now, you need to double-check the length of each exon and intron so that you have the correct information for designing primers in Primer 3 Plus.

1. Highlight an exon. Select “Tools->Word Count.”

2. Write in your notes the number of characters (no spaces) for each exon and the intron. You will need these numbers for primer design.

Actin 1Exon 1: 536 characters – Intron 1: 983 characters – Exon 2: 67 characters

3. Update your drawing of the targeted actin sequence to include the appropriate starting points and stopping points for the intron and flanking exons.

Once we have worked out where the exons and introns start and stop, we are ready to design primers.

4. Open Primer3Plus and enter the following information:

A. Copy and paste the sequence for the exons and intron from the GBrowse window into the large textbox. Copying from Google Docs to the web form can create problems so use GBrowse. Enter a short Sequence Id in the small textbox.

B. Enter the appropriate Targets coordinates. You need to tell Primer3Plus the location of the intron so that the primers will flank the intron and not be inside of it. The first number is the first nucleotide of the intron. The second number is the length of the intron. QUESTION #7: What target numbers did you use and what does each number signify? Why is it important to provide the program with these particular numbers? (9 pts)

C.Click the “General Settings” tab.

D & E. Delete all of the numbers in the Product Size Ranges and replace with appropriate product size for your example. Click the “Advanced Settings” tab. Question #8: What numbers did you place in the Product Size Ranges field? Why did you select these numbers? (6 pts)

F& G. Change the “GC Clamp from ‘0’ to ‘2.’ This tells Primer3Plus that the last two nucleotides at the 3’ end must be a ‘G’ or ‘C.’ Click “Pick Primers” button. With any luck, Primer3Plus will be successful at finding primers that meet our criteria.

5.Results of Primer3Plus.

Question #9: Make a copy of the Primer 3 Plus results. What is the predicted product size? Is this a good size PCR product to amplify at the bench? Why or why not? (12 pts)

For the next section, you have a choice.

  • Use NCBI to examine the applicable sequences (provided electronically)
  • Use Target/Muscle to analyze the applicable sequences (provided electronically).

If you choose to do both exercises, one will be awarded as EXTRA CREDIT.

Using NCBI & Blast to Analyze Sequence

1.Open Firefox and go to the following link to obtain the Blast program:

2.Click on the ‘nucleotide blast’ link.

3.Check the ‘Align two or more sequences’ checkbox.

4. A) Enter the ‘GENOMIC DNA sequence’ in the Query (top) textbox, as provided via email and

B) the Subject (bottom) ‘cDNA sequence’ in the bottom text box, as provided via email.

C) Click ‘Blast.’

5. The results are presented in this page.

When blasting two sequences to compare them, the first sequence is termed the Query. The second sequence compared against the query is termed the Subject. In our case, the query is the Genomic DNA sequence and the subject is the cDNA sequence.

QUESTION #1: The Query contains more sequence than the Subject. Why is this expected given what you know about the content of the two sequences? (4 pts)

The bottom half of the results page shows more details about the hits broken down nucleotide by nucleotide.

QUESTION #2: How many alignments/hits does this BLAST have? What do these hits represents? (4 pts)

QUESTION #3: How many exons/introns are represented in the above genomic DNA? Make either an electronic sketch (on your google document) or a sketch on a separate piece of paper of both the genomic DNA and cDNA. (10 pts)

QUESTION #4: How much smaller is the cDNA compared to the genomic DNA and why is the cDNA smaller? (3 pts)

Using MUSCLE to do a Multiple Sequence Alignment

Access the TARGeT website by visiting the following url:

1.All the sequences will be provided to you in a single FASTA file, and each sequence will have already been trimmed, so there is no need to worry about this step.

2.Open the TARGeT MUSCLE web page target.iplantcollaborative.org/class_index.php and click on “Multiple Sequence Alignment.” Copy and paste the sequences into the text window. Click “Align.” There are not very many parameters for a multiple sequence alignment program and you will almost always use the defaults.

3. The results are presented on the next page.

The text output of the alignment program will put in a dash to represent gaps in the sequence. If all of the nucleotides are the same at a given position a ‘*’ will be placed below the alignment at that position. As you can see a multiple sequence alignment makes it very easy to find sequence polymorphisms.

4. Another way to view the sequence is using a program called Jalview. This viewer provides many ways to view the alignment. In Jalview the sequences can be color coded in several different ways. Here they are colored by base. Jalview also creates a “Consensus” sequence where the most common nucleotide at each position is used. The bars above the consensus sequence indicate the degree of consensus. These bars also make it easy to scan for polymorphisms especially in alignments with many sequences.

QUESTION 1: Why is B73 the positive control when doing research with maize? Why is it predicted that any experimental results from B73 will match any sequence derived from a genome browser? (6 pts)

QUESTION 2: Of the three sequences given, do any of them show large polymorphisms? If so, which sequence (s) is different? Would large polymorphisms be more likely to occur within an intron or an exon of a gene? Explain your answer. (8 pts)

QUESTION 3: For sequence that seems to match between the three strains, how many single nucleotide polymorphisms (SNP’s) can you find? When a SNP occurs within the coding region of a gene, when is it most likely to result in a synonymous change? (6 pts)

Congratulations!!!! You are finished (Phew)!

Enjoy your Holiday Break . . . I know I will!