Functional Genomics 470, Winter ‘09

Bold questions are fair game. All others are great study materials.

1. You are studying the functions of three related genes Dnmt1, Dnmt2, and Dnmt3, which each encode a DNA methyltransferase with a specific function. Previous studies indicate that Dnmt3 encodes a de novo DNA methyltransferase, which establishes initial methylation patterns during embrygenesis, while Dnmt1 encodes a “maintenance” methyltransferase, which upkeeps the methylation patterns pre-established by Dnmt3. Your lab is especially interested in the function of Dnmt2, which thus far has not been determined. Given what you know about the related genes, what molecular technique would you use to study the expression of this gene? How could you use these results to infer a gene function?

2. What was the program Otto used for in the HGP and what were the program’s requirements to perform with “good confidence”? Below is a table of estimates from the WGA paper, how did the gene programs come up with these estimates?

3. What are the problems with WGA and CSA genomic sequencing and how does Otto address these issues?

4. Table 1 in The Sequence of the Human Genome shows the data that Celera gathered in their sequencing. Does anything seem weird about the numbers of sequencing reads for the different individuals and if so, why? Why are most of Individual B’s reads 2 kbp?

5. As humans further explored the reaches of the earth, the eventually found creatures very similar in appearance to them, dwelling near the center of Earth. Excited to see what these new being were like, teams quickly sequenced their genomes (using the whole genome approach, of course). Brashly assuming they were quite similar to humans they used the OTTO program created for the HGP to find and annotate genes.

A)  Describe the two approaches OTTO uses to find and annotate genes.

B)  After running OTTO the results are:

Method / Sensitivity / Specificity / # of genes found
OTTO (RefSeq only) / 0.903 / 0.921 / ~5000
OTTO (homology) / 0.611 / 0.734 / ~30,000

(Assume a total genome of ~38,000 genes)

-Were the researchers close in assuming similarities between the two species?

-Are the validation results as expected?

6. The humanoid-like creatures have not made it this far without making significant scientific progress of their own. They have a substantial amount of substances that are known to be very toxic, and humans need to know how these substances could potentially interact with their genes. As human experiments are extremely unethical, researchers want to use yeast to see if any genes that have human homology are affected by these new poisons.

A) How could research teams obtain yeast strains that are each deficient for one particular gene?

B) After obtaining the suite of knock-out strains, each one is exposed to a spectrum of their most common poisons. For each poison there are around 50-100 strains that show increased fitness compared to the rest of the strains after exposure. What do these results mean? How can we figure out if these genes might be found in humans?

7. When using yeast two-hybrid assays what are its benefits as well as how could you get false positives and negatives?

8. When you nebulize the DNA you have extracted to run through a whole genome assembly why it is important to sheer the DNA to equal lengths, and how can knowing these lengths help you later in the WGA process?

9. You’re using the WGA to sequence the genome of a new organism. At one step, you realize that a certain unitig has 24 contigs. What is the problem and how do you solve it? Briefly describe the subsequent steps to finish sequencing the genome. Once you have the sequence finished, you now want to know every single gene (and it’s purpose) the genome codes for. How do you proceed?

10. You discover a yeast protein that you know causes a nasty disease. You suspect that it doesn’t act alone. In order to test your hypothesis, how do you proceed (using the techniques we discussed in class – not PCR) starting with cells that produce this protein.

11. Transposon-based shuttle mutagenesis and site-directed mutagenesis are two techniques in studying genes in a particular genome. Explain how each would process, complement one another and its limitations to study proteins.

12. Describe Whole Genome Assembly and Compartmentalized Shotgun Assembly

and explain why one method is better than the other, worse than the other, or equivalent to each other when sequencing a genomic library.

13. How does shuttle mutagenesis allow for the study of conditional mutants compared to site-directed mutagenesis? How does this allow for studying essential genes?

14. You are working in a lab studying correlations of Alzheimers Disease (AD) to other cellular proteins in hopes of finding a “biomarker” which would enable early diagnoses and preventive treatment. You find that protein X is at high concentrations in subjects diagnosed with AD and have successfully purified it. Suggest a method for determining the protein sequence using what you know from techniques learned in class. After you have identified this “biomarker”, propose a technique you could employ to see what other cellular proteins interact with protein X.

15. You are a researcher who studies a prevalent human disease. Recently, a new protein Z has been discovered that plays a role in the disease and you want to see what other proteins may be involved.

(a)  Of the two methods that involve protein-protein interactions discussed in class, which would be the most useful for this experiment? Why?

(b)  Describe the technique chosen in part a using paragraphs and/or diagrams.

(c)  Suppose that your results show that there is an interaction between all of the proteins tested. What could this result represent?

16. In the race to sequence the human genome the private sector, run by Craig Venter, used a WGA assembler that contained five stages. One of the major difficulties when sequencing the genome was repeats;

1.  Which stage is used to distinguish between the “true overlaps” from the “repeat induced over laps” and what are they? What term can be used to describe these “true overlaps”?

2.  Why are all the repeats not removed during the Screener stage?

3.  What is meant by the term overcollapsed unitigs and how are they identified?

4.  What are U-unitigs and what part of the assembler is used to find these in the genome?

17. A yeast two hybrid is a discovery based analysis. What does it mean to be a discovery based analysis and how does the yeast two hybrid work?

18. What were some initial problem with the Hierarchial clone-by-clone method for sequencing? With the Whole Geneome Assembly approach? How were these problems resolved?

19. You have just discovered an unknown microorganism. You know every gene sequence within the yeast genome, and want to find out whether this unknown organism has a homology to this specific gene within the yeast that you are working with. What approach should you take to find out whether this unknown organism have the homologous gene to the yeast gene?

20. Besides mutagenesis, can transposons be used to add additional genes, therefore, adding extra function to the yeast? Is this an ideal method for inserting genes into a genome?

21. Critics of Craig Venter’s WGA claimed that it would be unable to deal with repetitive sequences…what problems did the project have, and what is a method that the project used to ultimately prove those critics wrong?

22. What is an example of yeast deletion mutants being used to effectively perform reverse genetics?

23. The multi-functional transposable element construct (Tn) can be used to randomly mutageneize a yeast genomic library. Transposable elements are pieces of DNA that can insert and excise themselves from other DNA molecules.

24. You have found a new organisim called H. ardtosequence whose genome has yet to be sequenced, but a genomic library has just become available. Based on your previous work you know that it is closely related to a type of yeast, E. asytosequence. You wish to create a total knockout library of H. ardtosequence ORFs. Outline a method to achieve this goal.

25. Protein Z is thought to interact with protein A but definitive evidence of this interaction has yet to be found. Describe a method that could be used in order to determine whether or not these two proteins do interact? How is the life cycle of yeast utilized in this method? What (if any) benefit does this method have in studying interacting proteins in humans?

26. Gene annotation is an important method used for identifying and characterizing genes in the human genome. Manual gene annotation was used to annotate the Drosophila genome, but it has since become an automated process. In order to enumerate the gene inventory of the human genome, a rule-based expert system called Otto was developed. Using your knowledge of Otto and its components, please answer the following questions;

a)  Name two ways in which the Otto system can promote observed evidence to a gene annotation.

b)  What do the terms ‘computational pipeline’ and ‘gene bin’ refer to, and by what process is a gene ‘bin’ created?

c)  How has the Otto system been validated, and how does its accuracy compare to other systems such as Genscan? (you may refer to figure 7 from Then Human Genome)

27. In the paper Emerging Technologies In Yeast Genomics, the authors describe a method of gene disruption used in yeast, that utilizes the high rate of DNA integration through homologous recombination. Using existing sequence data a deletion cassette can be designed for precise gene replacement. However a study of yeast deletion strains has indicated aneuploidy for whole chromosomes or chromosome segments in ~8% of these deletion strains. Why does this pose a significant problem for this type of gene deletion analysis in yeast?

28. Why is it so difficult to accurately describe the number of genes from the sequenced human genome, and why in some cases might the gene predicting system Otto, which is an evidenced-based system, split one gene into two or more genes?

29. You are working in a research lab that is studying the proteins that bind to a promoter in yeast to activate the IPA gene. The binding protein is known (HOProtein) but there are many possibilities for the activation protein. Your boss assigns you the task of forming an experiment that will clearly demonstrate which protein is the activation protein. Please explain the experiment you would do and why you would perform each step.

30. Using what you know about the Whole Genome approach to sequencing please answer the following.

a. In reference to Table 1 in “The Sequence of the Human Genome”, why were the majority of the fragments used 2kbp fragments?

b. Why did Venter choose to use the idea of mate-pairs verses the idea of chromosome walking?

31. As a recent graduate from WWU with a Bachelors of Science degree you are hired in a cancer research laboratory at the Fred Hutchinson Cancer research center in Seattle. The lab you are working in has created cDNA from a cancer cell line. The cDNA appears to code for a gene that has a homologue in the S. cerevisiae (yeast strain), of which the function has yet to be characterized. As a first step in the characterization, you have been asked to perform immunolocalization experiment to determine when and where the protein carries out its function in yeast. Your lab is equipped with a state of the art fluorescent confocal microscope. Describe the method, including important steps along the way. What problems might occur that would interfere with interpreting the results.

32. In the Science paper The Sequence of the Human Genome, Venter developed an approach named Otto to enumerate the gene inventory. Describe how the Otto results were validated and compared to Genscan.

33. Table 4 on pg. 1315, (WGA paper)

(1a) What were the two genome-wide types of map information that were referenced/used to map scaffolds to the genome?

***two scaffolds were created (by Celera) based on the markers (STS or BAC)on these maps below

(1b) Briefly summarize what the data illustrates in this table

34. Referring to the whole-genome assembly process (using the Celera data and the PFP data), neither the location in a BAC in the genome nor its assembly of bactigs was used in the process, why? And, what are bactigs?

35. Describe how the human genome was sequenced by Venter. Explain figure

2 and figure 4 from Science: human genome paper. What do these boxes

mean? Why were there 2 computational approaches chosen (WGA and CSA)?

Compare this to the nature article. are these essays opposed? Why does

WGA, and most modern genome sequencing, reject the concept of mapping? Is

mapping necessary? Is hybrid a alternative approach? Of the two methods

(see figure 1 from nature: strategies for systematic sequencing) which is

better: clone by clone or whole genome shotgun? How do you know?

36. If Venter gave you a protein sequence that he found on his boat around the world, given the tools learned so far, how would you go about figuring out what the protein does.

37. What is Sanger (dideoxy) sequencing?

1b) What were some of the evolutionary advances to the technology that occurred prior to the human genome project?

1c) What approaches were used by HGP researchers at Celera to overcome the remaining obstacles to applying this technology to sequencing the human genome?