Class

9

Essentials of Next Generation Sequencing 2013 Page 1 of 3

Supplemental Exercise

Preparing MiSeq Reads for QIIME Analysis

MiSeq reads obtained from BaseSpace are in .fastq format and the reads have already been split according to barcodes and had the barcodes removed. Additionally, each 16S amplicon is represented by a forward (R1) and reverse (R2) read. For these reasons, MiSeq reads do not fit neatly into the QIIME pipeline, which takes multiplexed .fasta and .qual files, representing single-end reads that still have barcodes at the beginning of the sequences.

The dataset that we will use was generated using the Schloss amplicon protocol, which uses two 8-mer barcodes – one at each end of the amplicon (Schloss et al 2013). QIIME is not designed to handle such dual-indexing and so we must modify the barcodes so they can be detected using the existing QIIME pipeline. In addition, individual MiSeq reads often do not span the entire PCR amplicon under study and pairedends are required to capture the complete sequence. The main pipeline in Class 9 does not include a step to merge pairedend reads, so we need to perform this operation manually.

To allow the use of MiSeq reads as inputs to the standard QIIME pipeline, we will do the following:

1.  Recombine the de-multiplexed reads back into a single sequence file

2.  Merge the forward and reverse reads into a single, contiguous sequence

3.  Concatenate the dual barcodes into a single entity

4.  Prepend the merged barcodes to the start of each sequence read

5.  Split out the libraries using split_libraries.py

6.  Generate a mapping file

After completing these steps, the sequences will appear to QIIME as if they were simple 454 reads.

¨  Make sure you are in the qiime_supplement directory. All of the following commands assume that you are in this directory. Remember, you do not need to change directories to list, copy or move files

¨  Take a look at the file SampleSheet.csv. This file, which is produced by the MiSeq machine when the run is started, describes which barcodes go with which sequences.

¨  Read barcodes and add the correct barcode combination to the start of each sequence read:

·  perl Prepend_barcodes.pl SampleSheet.csv R1_reads/

The Prepend_barcodes.pl script performs the following operations:

  1. Reads the SampleSheet.csv file to determine barcode affiliations
  2. Opens and reads the sequence files
  3. Adds the appropriate barcodes to the beginning of each R1 sequence read

¨  Use cat to combine the modified reads into a single .fastq file:

·  cat R1_reads/*qiime* MiSeq_R1.fastq

¨  Combine the R2 reads (in the R2_reads directory):

·  cat R2_reads/* > MiSeq_R2.fastq

¨  Merge the forward and reverse reads into a single sequence:

¨  join_paired_ends.py –f MiSeq_R1.fastq –r MiSeq_R2.fastq –o Merged

¨  Rename the merged sequence file:

¨  mv Merged/fastqjoin.join.fastq Merged/MiSeq_merged.fastq

¨  Generate .fasta and .qual files from the merged, re-barcoded .fastq file:

·  perl Fastq_2_Fasta_n_Qual.pl Merged/MiSeq_merged.fastq

¨  View the SampleSheet.csv file and write down the two barcodes that go with each sample

¨  Open the MiSeq_Map.txt file in a text editor

¨  For each sample, enter the corresponding barcodes as a single string in the BarcodeSequence field. Match up the SampleID from MiSeq_Map.txt with the BarcodeNum from SampleSheet.csv. Concatenate the two sequences: so if the two sequences in the spreadsheet are “TATAGCGA” and “TAGCGATG”, you would enter TATAGCGATAGCGATG as the barcode sequence.

¨  In the LinkerPrimerSequence field, for every sample enter A (make sure you retain the correct tab delimitation).

¨  Run split_libraries.py on the resulting files:

¨  Examine the seqs.fna file in the Split_Libraries folder to see how the read names are now prefixed by the Sample IDs.

¨  Take a look at the split_library_log.txt file to see how many reads were identified for each barcode.

Essentials of Next Generation Sequencing 2015 Page 3 of 3