STRUCTURAL VARIATION EXERCISE

1. Paired-end mapping is a technique that allows us to detect structural variants in the DNA of an individual of interest by obtaining paired-end reads and the comparison of their positions in a reference genome. You have three pairs of paired-end reads generated from Sanger sequencing of three clones of a fosmid library from DNA of a Japanese individual with an average insert fragment length of ~40 Kb. Determine by paired-end mapping if these fosmids include any kind of structural variant with respect to the reference genome.

The paired-end reads of the three fosmids are located at the end of this file. To compare the sequence of these reads with the reference human genome sequence you will need to use a mapping algorithm named BLAT that it is located on the following link: http://genome.ucsc.edu/cgi-bin/hgBlat?command=start. Check that you have selected the most recent version of the human genome (Feb. 2009, GRCh37/hg19), copy and paste the sequence of the read that you want to analyze in the window and press submit.

The result page shows in each line a region in the reference human genome that has sequence similarity with the sequence used. It is possible to obtain more than one hit (different lines), but then just consider the first one, which is the most significant. Once the most likely location of the read is identified, you just need to pay attention to the data under the labels IDENTITY (percent identity between your sequence and that of the reference genome), CHRO (chromosome where this sequence is located), STRAND (orientation of DNA strand), and START and END (start and end positions of the region of similarity with the reference genome). Do a BLAT with the two reads of each fosmid, write the results in the table below, and answer the following questions:

READ / IDENTITY / CHR / STRAND / START / END
Fosmid1 Fwd
Fosmid1 Rev
Fosmid2 Fwd
Fosmid2 Rev
Fosmid3 Fwd
Fosmid3 Rev

(a)  Did you detect any structural variant?

(b)  Which kind of structural variant have you found in each case?

(c)  Can you give an estimate of the size of the detected structural variant?

(d)  With regard to the BLAT results, why identity between our reads and the reference genome is not always 100%? Why in some cases there are several hits when we do a BLAT with one read?


Fosmid paired-end sequences:

FOSMID 1

>Fosmid1_Forward

GAGGGTAGACTCTTATTAATTCCTTGGTAAACCTTGAGCCAATTGTTGTCTATGTTCTCTGCCTCTGTCTTGCTCCTTCCTTCTGGGATTCACTGTGGGAATGCGGGATTGTTAATCTGGGGATGCTGTCCAATCCTGCCTCTCTCAAGCTTTGCTATTGATCTCCCTCCCAGTGATAATAAAGCTTGAAGAAAATGAAAGTAGCGTTAGTATTGGTCCTCAAACTCAAGAACAGGATGAAACTTAAATCTTGAGTCATACAATTGTGTCTACATACTGCTCCCCAAAAAGAGAAGTAAAGAAGATGCTAACTTTCCCTTTTAAGTTGCAGTACTTAGCAATTTGTTTTCTTGAGGGTTAAGTAATAACAGTGGAAGAAAAAAGGGTTAAAATGCCACCAAGAACCCAATTCCATGTTTAGTTTGAAAGTGGGAAATCAGCTGCCACTGGGAAGTCTGAATCCAATGCCATGATGTTCTTTGAATCCTTCTGAGAAATAATCATGTGTAGCCATAACATACCTGTATAACAGAGCAGAGAACATAAACAAATGAAGGTGAAGGGAAGATTAAGACAGAAGAGAAAAATTCCAGAATCGACTGATCATTTTTATCTGTTTAGATGATTTCAGGCAGAATCCTAGAGACCAACTTTATCACAACTGAATTTTAAAAATCACCAGCTTTGTCATTGTGATGCAGCATCAGTTTCAGTATTATCCCTTGCAGTATTA

>Fosmid1_Reverse

CACTTTACGTAAATCTTCAGATTGAAATAAAACATTTGTTGACTTATCACATGTTTTGCTGGTACTTGTGAACTGATGAGATTGTAGCTGAGGTGATAACTGTGGACTCTGTTTTTTTTGTTTTTTTTTTTTAACTTCTGCCTTCATAAAAGGGTTTTTTGAACAATAAAGTATGGTTATCAAGGTTTTACAACACAATTGACCTTGCAGGTATATCTTAGAAGTGTTATGGGTTATTTGCTAGAGGCAGATTTCAGATCTATGCTCTTAAAAGCCAAGTATTTAACATATAATGTTTTTACATTGATAAAAGAAAAGAGAGAAGAAGGGTAAAGCAATATCCAAATAAGTCCATCAAAACAGCTTTTTATAATGACTTTGCAGATCCAAGTTAGTAACACATGGTAGTGAATCAGGATGATGCAAGCACATGCCTTTGAAACAACCTTGAATTGTGGTTGTTAGGTAGCTAAATTAATTAGAATTCACCTAAATGATGCCTCGGCAATTTGGTCTCTTGTCTGGACAATACATGCAGAAAAGCAGAGTAGGTTCTTATGGGTTCAATGCACGTGCTGTTCCAAAAACTTTAAATTTGTAGGGCTCATAGGATGATAAAAATCCAACTCAAAATAAAGGCGTGGTTAC

FOSMID 2

>Fosmid2_Forward

CCTATGCACTTTGTAATTTTTTTCAGATGTAATTTGGTTGGTATGTTTTTGTTACATTTACATGTCTAAAGGAATGAGGGCCTTTGGTATTTTCCTATGCCATCTTGTTTATGTAGTTTGTATAATAAGTTAGATTTGAAAACTTTGTCTGTGTCTAGCAAGTAATTTGTTATTTTTATTTCTTTCAGTTATATGTTCTCATTTTGCCCAAGACCTTTGGGCAGAGCAGGACATTAAAGATTCTTTTCAAGAAGCGATTCTGAAAAAATATGGAAAATGTGGACATGACAATTTACAGTTACAAAAAGGCTATAAAAGTGTGGATGAGTGTAAGGTACACAAAGAAGATGATAACAAACTAAACCTGTGTTTGATAACTACCAAGAGCAATATATTTCAATGTGATCCATATGAAAAAGTCTTTCATACATTTTCAAATTGAAATAGACATAAGATAAGACATACTAGAAAGAAACCTTTCAAATGTAAAAAATGTGAAAAATCATTCTGCATGCTTTTACACCTAACTCGACATAAAAGATTTCATATTACAGAGAATTCCTACCAATGTGAAGATTGTGGCAAAGCCTTCAACTGCTTCTCAATTCTTACTGAACACAGGAGAATTCATACTGGAGAGAAATCCTACCAATGTGAAGAATGTAAGAAAGAATTTAAAACGGTCCCCCCACCTTTACTCACATAAAGATAATTCCTACTTGGAGAGAAACCGTACAGAAGGTGAAGAAGGTGGGAAAGGCCTTTAACTGGGTGTTTCACCCCTCCCTAACCCCATATAAAAAAATTCCTTATTGGAAAAAAAACCCCTCCAAATGGTGAAAAAATGGTGGGCAAAACTTTTTTCCCCCAATCCCTTCAACCCCCTAACCTGGCCCCCATTAAAAAAAAACCCCTCGGGCGGGGAAAAAAAAAAAACC

>Fosmid2_Reverse

TTTAAAAAGTAAAATTTTTTGAGAGAAAAGGGCAAAAATTTAAAAAATTATTTAGTGAGGAGTAAATGAGACTGAGTAAGATGAGTAGACCTCACTTATCTTTTATGGTTTTCAGCTTAAGTTCTTCTATTTTTTCACATTGATATTGAGGACGTTCCTCTGGGCCATCAGGGGTTGCTCCCTCAGCTCTTCAGGCTTTGATTTGAGTGTCATGTATTCAGAACTTGATACCTGTAACTTTTACTGCTGAGGGAGAAGAACAGTGCAGGGCCCTTCCTAGCTTGGCTTAGGGAAGCAGAGAAGGGAGAGTTTTCACTAATACCAAATCTGGGTTAAATAGAGATGGTTCTATTTCCTGAGGTTGGACTTTTTCTAGTTGTGTTAATTACTGTTGGAAGTTCCTAAGTGAAAATCTTGGTGCAGGATTTTAATGACTTTCCATTGGCTGGAGGCTGGCAAATGGAGTTTGCCATCCCCTGACTGTAGCCATCCTGAGGGATGGAAAGTATGCCCTCAAGAAATGGCTTATTTTGGCTGGGCACAGAGGTTCATGCCTGTAATCCCAGCACTTTTGGAGGCCGAGGAGGGTGGATCACCTGAGGTAAGGAGTTTTGAGACCAGCCTGGCCAACATGGTTGAAACCCTATCTCTAATAAAAATTACAAAATATCAGCCAGGCATGGTCGCGGGCACCTGTAATTTCCAGCTACTTTGGGAGGCTGA

FOSMID 3

>Fosmid3_Forward

GGGGATTCTACAGACAGCCACGCTTGCTTGTATTCAGGTGCAAAATGTCAGGTTTGGTGAGACCACAGGACACATAAGCGCACGGTGGCAACAAATGGAGCAACAGATTCCATGATTAAAATGTGCAATTTCTAAACACAGGAAAGAGCCCCCAAAGCCTCACTTCAGTCCCCTAGAAGGTTCTAGGCAGAAACAGTAGGGCAGATCAAAGGAATGAGAGCATAGGAGCCATGATCACCCAGCGAGGGGAGTGAGCCCTGTTGCCAGCTTGGCCTGGCTGTGGCTTCTGAGTAAACACGTGTGTTTGTTAACTGTTAAGAACCTGGTAATGAATACTCCCTGGAAGATGCCATGGACCCTGGCCTCTGGCCAGTCCCCTCGCCCTTGAGTCCTCCTAACATCTTATTGCTCTGATTACTTCCCATGGAGAAGCCAGCCAGCAGGGACTCGGGAACCATGTGTCCCAAAACCTGGCTTGAATCCCCAAATCTGCTTCTTGACTGTGGGGCCTTGGGTAGCTGTAGAAGGGGATGATGGTGCTGATGACGGGGACAGCATGTATCCCACAAAGCTGTGGTCAGGATGTTAGTGCCATCTTTTCGAAACACCCATGCAGGCCCACCCTGCATACCTGGCAGCTTCCTACTCCCCAGCACCCCCAGCACGCCCTCGGCTGGCCAGCCCCATCGGCCCCTCAGCACGATCCTTCCCCGCCTTGCTGGGAT

>Fosmid3_Reverse

GCCTGGCTTCACTTGCTGGCTCATGAGTAGGCAGCACCTTGACCTTGAAGGGGCTGTGAGGGATTGGTGTTGTGAGCAGTCAGACAGGTTCTCAGCATCCAGCCCGGGCCACTCCCCACAGGCAGCAGGCCCTGCCTCTTACCTCCGGGGTACCTCTTCATCTCCATACAGTACTGAGATGCTGTAGGGCCCTTCTCGGCTGGGCACATAATTGACGGTCTGGGTGCCATCAGCGTTGTCTACCACGTCCACTGGCTCCACCAGGCCTGGCCCCAGCCCCAGGGACAGAGCATCAGCTAGTCTCCTGGGCCTCCATTCCTACCCCACCACACTGGACGGCCAGGACCCAGCCCCAGGCCTAGACCTCGCTTTATCCTCATCCACCCGACCCTGGGAATGGCCTTGACCCCCTCTGGCACGAAAGAACCCATCTGGGTGCCAGCTGGGACCCTTGCCTGCTGCCTTCCTGCCACTCTGCTCATGACAATCCCTCTGTCACCTCCTCCAGTCCCA