SEQUENCE ALIGNMENTS

Sequence alignment, searches for identity, for orthologous and paralogous genes, phylogeny, protein domains

ASSIGNED WORK:

  1. Study the Blast tutorial in or to understand how to use the program in NCBI.
  2. Study the first part of this page to become familiar with Megablast, blastn, blastp, translated blast searches, blast 2 sequences, and search short nearly exact matches
  3. Read the section entitled “Sample questions that can be answered with Blink”
  4. Clustalw is a program that is useful when you have to align more than two sequences. There is a good description of the program in this website: (go to ClustalW help); but the program runs much faster (though with fewer options) in this website from Japan:

Answer these questions:

  1. It is often difficult to select from among the many entries in Entrez. You will recall finding many entries for human tissue plasminogen activator, including the one with the accession number AAO34406.1. You will also recall that there were three isoforms for human TPA – 1 (GenBank NP_000921, 2 (GenBank NP_000922, and 3 (GenBank NP_127509. The sequence with numberAAO34406.1. was submitted earlier, quite likely before the existence of several isoforms had been validated. No doubt, it is one of the isoforms, or even represents another that has not yet been recognized. Use appropriate programs to find out. Show your work and explain your conclusion.
  2. Align the three human TPA isoforms and describe the relationship in terms of conserved regions. Are they similar? Identical?
  3. Say you are a teaching assistant and one of your students suggests that the three isoforms are coded for on three separate but homologous genes. The second student suggests that they are coded for on the same gene. Looking at the alignment, is this second option even possible? How?
  4. Since the human genome is completely sequenced, it should be possible to find out whether the three isoforms arise from different genes or from one. If different genes, note down the chromosome and the region (base #-base#) for each. If the same gene, explain how it came about that one gene gives rise to the three isoforms.
  5. Align human TPA isoform 1 with human anionic trypsinogen (GenBank P07478). Compare the Blast scores, the length through which you got alignment, the percent similarity. After finding out the function of trypsinogen, give a first guess as to whether you think that plasminogen activator and trypsinogen are evolutionarily related or whether the similarity is only by chance.
  6. Study the entry for human TPA in Swiss-Prot to study the other domains on the protein and their contribution to catalytic activity.
  7. Make hypotheses with regard to the catalytic activities of isoforms 1, 2 and 3.
  8. How would you find out if TPA is exclusively mammalian?

Tools: Blast, Mapviewer, Blink, ClustalW

Tips on using Blast

  1. It is very important to read and understand what the various Blast programs in NCBI are tailored for. They all work on the same program, but NCBI puts in the parameters (choice of matrix, gap penalty, extension penalty, word size etc.) that is most suitable for the different tasks of Megablast, blastn and blastp, tblastn, etc.
  2. Blast is very forgiving of spaces. After filling up the box, you do not need to close the spaces.
  3. The default output of Blast (blastn or blastp) when searching the non-redundant database is pairwise. This means that you see the sequence you submitted first, then one by one the other sequences. Pay attention. Sometimes you will see multiple pairwise alignments for one comparison. This is because Blast works on segments. Keep track of the numbers as well so you can deduce whether the two sequences are aligned from end to end or whether the sequences match up only in one stretch of amino acid residues.
  4. In this output, when you click on the score, you get the alignment. Click on the accession number and you get to Entrez. You can easily navigate back.
  5. Blast output also shows the alignment in the form of aligned lines and graphs that are self-explanatory.
  6. You do not need to keep to the pairwise alignment. There are other possibilities. Experiment.
  7. You can search the database for just one organism, if that is in the drop-down menu.
  8. Study the scores (bit score and expect value), also the length and % similarity, to see if the similarity could arise by chance, not attributed to evolution.

Tips on using Blink

  1. You enter Blink from the entry in Entrez.
  2. This is a very useful program in NCBI. It essentially runs a Blast program for you just by clicking on the word, Blink. This is better than taking the sequence and entering it through the Blast program because you don’t have to deal with the redundancies, ie. the multiple entries per RNA or protein sequence.
  3. You can ask Blink to show the list of related sequences by Blast score, starting from the most similar, or you can choose to do it by Taxonomy proximity.
  4. You can also ask Blink to give you only the Best Hits per species rather than All Hits.

Tips on using Mapviewer

  1. At the entry in Entrez of NCBI, you can link to the exact location on the chromosome sequence by clicking on Link and selecting Mapviewer. Look for the highlighted entry, and click on sv (sequence viewer).
  2. You will see a segment of sequence showing 2000 bases. The dark blue arrow represents the cDNA sequence, the light blue lines are bases on the gene but not the cDNA. Pay attention to where the circle and sharp ends are on the dark blue lines. That tells you the direction of transcription. Pay attention also to the numbers to see if they are going upward or downward. The two pieces of information cue you in as to the direction you are looking at on the chromosome. You can reverse the direction by clicking on View Reverse Complement
  3. To get more bases on the one page, change the 2000 number and click Refresh. You can also shift your view in either direction using the blue arrows either at the top or just below the sequence.
  4. When you reach the coding region, you will get the protein sequence as well.

Tips on using ClustalW:

  1. If you are using the program that is at the EBI site, be careful in submitting your sequences. It does not allow for even slight deviations. One way is to use the following format:

>Human_name_of_enzyme [notice the underlines keep the words together, now enter]

Give the Fasta format here. Delete any spaces at the end of the line and don’t leave any space at the end of the entry. Even if it looks blank, delete the blank space. Skip one line and enter the next.

>Rabbit_name_of_enzyme [follow the same procedure.

  1. Inspect the alignment by eye. Clustalw aligns the whole sequence, unlike Blast which looks at segments. Depending upon the parameters entered, sometimes multiple sequence alignment programs give too high a penalty for not starting at the same place. If that happens, repeat, trimming overhanging unaligned sequences but keep track of your numbers.