Project: DNA Splicing

There exist patterns in a DNA strandthat can be split at a certain place with new DNA inserted.The original DNA strand is cut into two pieces with a new small DNA inserted to make a larger DNA string. In the real chemical process, the original strand may be split into several pieces at multiple binding sites.In some experiments, and in the simulation you'll run, new DNA material will be spliced into the separated strand wherever the pattern being searched for is found in the original DNA strand.

Consider an example where the original DNA strand is "…CGAATTCG…" (see below). The letters C, G, A, and T represent nucleotides, the molecules that when joined together, make up the structural units of RNA and DNA. In this first example experiment, the pattern to search forwill be "GAATTC". Your code will search for "GAATTC"in the original DNA strand and split that strand into two pieces. The new DNA "TGATA" is spliced into the original strand to increase the original DNA strand. Consider this happening when one of the patterns is found in the original strand:

1) Find the pattern representing the insertion point ("GAATTC" below)

2) index into the recombinant enzyme (if index == 1, go right once, in index == 2 go right twice)

3) insert the new DNA material, "TGATA" in this example, at the correct location

The index of the split in this example is 1 indicating that the first nucleotide/character of the restriction enzyme adheres to the left part of the split. The new DNA will be inserted after G in this case. The small DNA strandTGATA will nowbe spliced in like this.

When the spliced-in strand joins the split strand we see a new strand of DNA as shown below. The shaded areas indicate where the original strand was inserted into the new DNA material.

Your code will be a software simulation of this process: the restriction enzyme will be searched for anda small DNA strand will be spliced-in. Don't forget there is a number like 0, 1, or 3 to indicate how many nucleotides(letters G, A, T, or C) are to the left of the small spliced in strand.

IDnaStrandWe have designed a new type to represent DNA strands with a method to simulate the chemical experiment. Given this interface IDnaStrand, complete the @Test method below to insert TGATA everywhereGAATTC occurs at index 1GAATTC becomes GTGATAAATTC exactly two times.

publicinterface IDnaStrand {

/**

*Add all nucleotides in str to the end of this LinkStrand (convert each letter

* into a new node to be added at the end of the current Strand. This must run O(1).

* Note: with a linked structure, maintain a reference to the last node.

*/

publicvoid append(String str);

/**

* Return the number of nucleotides (G, A, T, or C) in this strand.

*/

publiclong size();

/**

*Return the nucleotides in the strand as a String such as "catgaatccat".

*/

public String getStrand();

/**

* Perform a cut at each occurrence of patternToSearchFor and insert toBeSplicedIn the

* correct number of characters into the patternToSearchFor. If the restrictionEnzyme

* is not found, no cutor splice is made. The Strand remains the same.

* Precondition: index < patternToSearchFor.length()

*/

publicvoid findPatternAndInsert(String patternToSearchFor,

String toBeSplicedIn,

int index);

}

@Test

public void testOneSimulationWithTwoFindsAndOInsert() {

IDnaStrand strand = newLinkStrand(); // Assume class LinkStrand implements IDnaStrand

To be arranged during Monday's lecture or section.

Or write your own tests to gain an understanding of this process.

Implementation Specifics

We could use a String to store the letters representing the strands. However, studies have shown that computers cannot handle the size of this problem (you run out of memory) and the simulation runsslowly. Instead we will use a linked list with a reference to the first node to begin searches and a reference to the last node so we can append nucleotides (actually single characters) to the end of the singly–linked structure O(1).

You'll be developing/coding classLinkStrand that implements interfaceIDnaStrand(copy and paste above) Your class is to simulate cutting a strand of DNA when the given pattern is found splicing-in a new small strand, also supplied in the call to this method

publicvoid findPatternAndInsert(String patternToSearchFor,

String toBeSplicedIn,

int index);

You must use a singly-linked structure to support the operations--specifically the classLinkStrand should maintain references to a singly-linked structureused to represent a strand. You should keep and maintain a reference to the first Node of the singly-linked structure and to the last node of the linked structure. Store single chars in the node. No generics needed.

Grading Criteria (Subject to Change)

___ / 100 points WebCat

Ways to lose points. Please read

-80 You did not use a singly linked list as your data structure

-20 append is O(n) rather than O(1). You can do this by maintaining a reference to the last node

-20 The data in each node is not char but rather String or any other type

-100 Your code did not compile soWebcat gives you 0 for unknown symbols

-100 Our tests exposed an infinite loop in your code so WebCat times out and gives you 0

-100 You tests do not pass when turned in so WebCat gives you 0 for this sometimes