AWC Summer Student Report 2011/12 by Mei Lin Tay

Invertebrate diversity on Hauturu

AWC summer student report 2011/12 by Mei Lin Tay

Introduction

Documentation of species’ spatial distributions along geographical and environmental gradients is important for ecological understanding and in particular for conservation efforts. Knowledge of spatial distributions may lead to better understanding of species richness, turnover and endemism; three important criteria for conservation assessment (Fisher 1999). A main focus of community ecology has been the pattern of species richness over elevational gradients. Two patterns are frequently documented: decrease of species richness as elevation increases; and mid-elevation peaks (Sanders 2002, Sanders et al. 2003). However, different sample sites with similar characteristics may reveal different patterns of species richness across elevations (eg, Andrew et al. 2003). Therefore, more studies on species richness across an elevational gradient will help in better understanding species’ spatial distributions.

Little Barrier Island (Hauturu) is a 2817 hectare island in the Hauraki Gulf, New Zealand. Feral cats were eradicated in 1981 and rats are thought to have been successfully eradicated through a massive operation in 2004. Hauturu is therefore effectively an important pest-free reserve that has important implications for biodiversity conservation efforts.

A large-scale project is being conducted by the Allan Wilson Centre to investigate biodiversity richness on Hauturu along an altitudinal gradient, with an aim of establishing technologies to sample ecosystems along the gradient. The objective of this summer studentship is to establish methods for sampling ecosystems (focusing on invertebrates in this instance) and also to provide some characterisation of invertebrate biodiversity (composition and richness) for a larger-scale metagenomic study. This will be done by obtaining DNA sequences from invertebrates collected from sample plots across an elevational gradient in Hauturu. Two candidate gene regions that are frequently used in invertebrate systematic studies, COI and 28S, will be trialed for efficiency in species identification and in reconstructing phylogenies.

Methods

DNA extraction

Samples were collected from soil and leaf litter from different plots from Hauturu across an altitudinal gradient. DNA samples were obtained from colleagues at Landcare Research. To obtain DNA, each sample was dipped in extraction buffer and then extractions were automated through QIAxtractor (Qiagen) (performed by technicians at Landcare Research).

28 plates of DNA samples were obtained in total (BOXes I, II, VII, VIII, IX, X, XI, XII, XIII, XIV, XV and XVI; and LBI inv 0001, 0003, 0004/0005, 0006, 0007, 0008, 0009, 0010, 0011/0014, 0012, 0013, 0015/0016, 0017/0018/0019, 0020/0021/0022, 0022/0023 and 0024). BOX IX did not contain any DNA (checked by PCR and by visualising samples on a gel) and is excluded from any further analysis.

In this study, 28S regions were amplified and sequenced for the 27 plates excluding BOX IX. COI regions were amplified and sequenced for the 9 plates labelled BOX VII through to BOX XVI) as the remaining plates have previously been processed by Leah Tooman (Plant & Food Research).

PCR was performed on a further three plates of DNA samples, labelled as “3N, 2C, 1L”, “2C” and “3N”, but due to time constraints, they have yet to be sequenced. The statistics for PCR success rates of these three plates can be found in the Appendix.

PCR amplification

PCR mastermixes were prepared either on the general lab bench (for COI samples) or in a sterile flowhood that is isolated only for PCR preparations (28S samples). PCR amplifications were performed using 50uL reactions of the following: 1x PCR buffer (20mM Tris-HCl (ph8.4), 50mM KCl) (Invitrogen); 2.5mM MgCl2; 200uM dNTPs; 1.5U Platinum Taq (Invitrogen) and 10pM of each primer. 200mM Cresol Red (in a solution of 60% glycerol, 50mM Tris-HCl) was initially included in the PCR mastermix for use as a loading buffer when visualising DNA on a gel. However, trials of PCR reactions with and without the dye revealed that adding dye after the PCR reaction led to better PCR results. Therefore, Cresol Red was removed from the PCR mastermix and PCR products were visualised using a 1x loading buffer added after the PCR reaction instead.

5uL of each template DNA was added to each reaction. However, DNA concentration was variable across samples. For some samples, the amount of DNA added to the PCR reactions needed to be doubled and in some cases it was reduced to 0.25uL or even less to achieve a high quality PCR product of the correct length.

Primer pairs used to amplify the COI region were: HCO2198 TAAACTTCAGGGTGACCAAAAAATCA and LCO1490 GGTCAACAAATCATAAAGATATTGG (Folmer et al. 1994); and for the D2/D3 region of 28S: 500F ACTTTGAAGAGAGAGTTCAAGAG and 501R TCGGAAGGAACCAGCTACTA (Nadler et al. 2000). The expected sample sizes were predicted to be around 710 base pairs (bps) for the COI region (eg, Folmer et al 1994) and 750-800 bps for the 28S region based on sequences of invertebrate D2/D3 region of 28S downloaded from GenBank.

Amplification of the COI region was carried out with a thermocycling profile of an initial 5 min at 94°C, followed by 30 cycles of 30 sec at 94°C, 30 sec at 48°C, 1 min at 72°C, and ending with a final extension time of 10 min at 72°C. For the 28S D2/D3 region the same protocol was used but with an annealing temperature of 54oC.

PCR products were visualised on 1.5% agarose gels before being cleaned for sequencing. Gel images were printed out and these were used for counts of amplified bands.

Sequencing

SAPEX (0.6U SAP, 12U ExoI, 2x ExoI buffer (134mM Glycine-KOH, 134mM MgCl2,20mM 2-Mercaptoethanol; New England Biolabs) was used for purifying PCR products at 37oC for 1 hour, followed by deactivation at 85oC for 15 min.

Cleaned PCR products were sequenced by Macrogen Korea (Geumchen-gu, Seoul, Korea) using an ABI3730XL (Applied Biosystems Inc., Foster City, USA). Sequencing was performed in both directions for COI using HCO2198 and LCO1490. For the 28S region, sequencing was carried out in one direction using the reverse primer 501R.

Sequences were checked and aligned using Geneious Pro v5.5 (Biomatters, Auckland, New Zealand). Alignments were then exported in FASTA format and sent to the bioinformatics team at the University of Auckland for further analysis. An alignment of all 493 COI sequences was performed using Geneious. Due to time and resource (specifically computer RAM) constraints, an alignment of all the 1275 28S sequences was not done. Some trial alignments of the 28S sequences in smaller subsets of about 200 sequences were performed instead.

Table 1. Summary of the success rates of PCR and sequencing of the COI region.

a. PCR success rate reported is the percentage of the number of amplified bands of the expected size over the number of samples in each plate; b. Sequencing success rate reported is the total number of sequences of the expected size over the number of amplified bands per plate; c. Number of amplified bands are not reported for these plates due to gels not being of good enough quality to count bands accurately; d. Repeated PCRs reported here are the samples that appeared as a smear (ie, too much DNA) at the first PCR attempt and were repeated with a 30x dilution (ie, using 1.5uL of a 1:10 dilution of the sample).

Table 2. Summary of the success rates of PCR and sequencing of the 28S region.

a. PCR success rate reported is the percentage of the number of amplified bands of the expected size over the number of samples in each plate; b. Sequencing success rate reported is the total number of sequences of the expected size over the number of amplified bands per plate; c. These plates produced a successful sequence for one sample each marked as “extracontrxx”. These samples do not contribute towards the number of good sequences reported here; d. This plate produced a successful sequence for one negative control. For this plate there were 7 other negative controls that did not produce any sequences. This sample does not contribute towards the number of good sequences reported here; e. Number of amplified bands are not reported for this plate due to gels not being of good enough quality to count bands accurately.

Results and discussion

COI

For the COI region, 493 out of 686 samples (72%) were successfully sequenced. The number of amplified bands of the expected size obtained after one PCR run and the subsequent number of successful sequences obtained are reported in Table 1. For the five gels (BOXes VII, X, XI, XIV, XVI) that were of good enough quality for bands to be counted accurately, 76% of the samples yielded amplified PCR products. A further 85% of these PCR products were then successfully sequenced.

COI sequences ranged from 479 to 762 bps in length. 208 sequences spanned the entire product between the two primers. Three sequences were below 500bps and seven sequences did not align with the others. The remaining sequences were shorter by up to about 100 bases; this truncation occurred mostly at the beginning of the sequence.

28S

For the 28S region, 1275 out of 2067 samples (62%) were successfully sequenced. The number of amplified bands of the expected size obtained after one PCR run and the subsequent number of successful sequences obtained are reported in Table 2. For 26 gels that were of good enough quality for bands to be counted accurately (all, excluding BOX I), 68% of the samples yielded amplified PCR products. 91% of these PCR products were then successfully sequenced.

28S sequences ranged from 263 to 1041 bps in length. Due to time and resource constraints, an alignment of the 1275 sequences was not done and so an exact figure of the number of sequences that span the desired region is not reported here. However, it can be observed that 1140 sequences were consisted of 600bps or more, and 1247 sequences consisted of 400bps or more. Therefore I predict that the number of sequences spanning the entire product is somewhere between this range. Trial alignments of smaller subsets of the data suggest that the final aligned 28S dataset will contain multiple gaps.

For the 28S region, mispriming was frequently observed after PCR amplification. Out of a total of 2004 samples (excluding BOX I), there were 193 instances of mispriming (10%) (Graph 1). This was observed as multiple bands in the regions of either 600-800bps, 300bps, 100bps or a combination of the three. The shorter sequences obtained after sequencing are probably a direct result of these non-specific bands appearing after PCR. Accordingly, a lot of these sequences had messy sequences at one or both ends. Some optimisation of the PCR protocol may help abate this; reducing PCR primer concentration by up to a quarter did help reduce the strength of these non-specific bands.

Graph 1. Summary of total PCR reactions per locus. Gels that were of good enough quality for bands to be counted are included; these are BOXes VII, X, XI, XIV and XVI for COI (n=435), and all plates, excluding BOX I, for 28S (n=2004).

Graph 2. Summary of total sequencing success per locus for BOXes VII, X, XI, XIV and XVI.

The 28S primers were designed to be universal primers across invertebrates (Folmer et al. 1994). It also shows affinity for other organisms including fish and bacteria (see Blankenship & Yayanos 2005). Thus, it is not surprising that some issues with contamination were observed during the amplification of the 2067 samples in this study. Replacement of all PCR reagents successfully removed the contamination and extra precaution was used when preparing PCR reagents for the 28S regions. This included the use of a sterile flowhood intended only for PCR mastermix preparation, and maintaining small aliquots of all the reagents to reduce the risk of contamination. Only PCR reactions that had good negative controls were sent for sequencing (initially this was one negative control per 96-well plate, after discovery of contamination issues the control was increased to 2-8 negative controls per 96-well plate). Only plates that were confirmed to be contamination free were sent for sequencing. Despite this, there was one negative control sent for sequencing that produced a sequence; and three samples marked as “extracontr” during the DNA sampling process that produced sequences. The source of these sequences warrant further investigation. It is worth noting that the “extracontr240811” from BOX I also resulted in sequences for COI, suggesting that there may be DNA in this particular sample marked as a negative control.

DNA concentration and PCR primer specificity

Based on the results of this study, it appears that variable DNA concentration (or a lack of DNA) may be a factor in the failure of amplifying desired products during PCR. In addition to the 10% of samples that yielded no DNA for both regions, there was a further 4% of samples that yielded too much DNA, causing a failed PCR reaction that appeared as a smear on the gels (Graph 1). As mentioned above, the success rate of PCR amplification for the 28S region was also decreased by mispriming of the primers (10% of samples).

351 out of the 686 samples (51%) yielded sequences for both the COI and 28S regions (Graph 2). 71 out of the 686 samples (10%) did not produce a sequence for either the 28S or COI region. A probable cause of this is due to a lack of DNA in these samples. 122 samples only yielded sequences for 28S (18%) and 142 only for COI (21%). That there is a high occurrence of samples that yield PCR products for only one of the two regions (264 in total; 38%) suggests that successful amplification of the two regions may also vary according to samples, perhaps an indication of decreased primer specificity for some taxa.

It is worth mentioning that from a previous investigation into the cause of PCR failures of the COI region (work done in November 2011), we determined that for samples that amplified faint or very faint products, repeating the PCR with more DNA (1.5x or 2x more) led to PCR products 30% of the time. In some instances, increasing the amount of initial template did not lead to amplified products, but reducing the amount of initial template worked in some instances (11% of the time 10x less sample was successful, 44% of the time 2.5x less sample worked). For those that smeared, reducing the initial template concentration 20 times worked well for all samples (with one exception; for this using even less DNA worked – 33.33 times less).

Conclusion

For a large-scale study, efficient standardised protocols and near-automated processes for taking the samples from collection stage through to sequencing will be of extreme importance to manage huge amounts of samples. Universal primers are also needed in order to capture the large diversity of taxa.

From this study, it was found that the amplification of DNA through to sequencing was successful using standardised protocols for half (51%) of the samples. Failure to generate the desired product for the rest of the samples can be accounted for by variable DNA concentrations in the starting sample (10% lack of DNA, 4% too much DNA) and non-specificity of universal primers (38%).

Breaking these figures down, amplification of DNA during PCR was successful for 87% and 68% of the COI and 28S samples, respectively (Tables 1 & 2). Subsequently, 85% (COI) and 91% (28S) of the samples were successfully sequenced.

The protocols reported here have potential to be extended to a huge-scale study, however the results presented in this study indicate that some optimisation of the protocols and standardisation of DNA samples, if possible, will help improve success rates.

References

Andrew NR, Rodgerson L & Dunlop M (2003). Variation in invertebrate-bryophyte community structure at different spatial scales along altitudinal gradients. Journal of Biogeography 30: 731-746.

Blankenship LE & Yayanos AA (2005). Universal primers and PCR of gut contents to study marine invertebrate diets. Molecular Ecology 14: 891-899.

Fisher BL (1999). Improving Inventory Efficiency: A Case Study of Leaf-Litter Ant Diversity in Madagascar. Ecological Applications 9(2): 714-731.

Folmer O, Black M, Hoeh W, Lutz R & Vrijenhoek R (1994). DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and Biotechnology 3(5): 294-299.

Nadler SA, Adams BJ, Lyons ET, DeLong RL & Melin SR (2000). Molecular and morphometric evidence for separate species of Uncinaria (Nematoda: Ancylostomatidae) in California sea lions and northern fur seals: hypothesis testing supplants verification. The Journal of Parasitology 86(5): 1099-1106.

Sanders NJ (2002). Elevational gradients in ant species richness: area, geometry, and Rapoport’s rule. Ecography 25: 25-32.

Sanders NJ, Moss J & Wagner D (2003). Patterns of ant species richness along elevational gradients in an arid ecosystem. Global Ecology & Biogeography 12:93-102.

Additional notes

BOX IX does not contain DNA/contains contaminants in the buffer. Does not show up as DNA when run on gel, and does not amplify any DNA products. Smells different from the buffers from the other plates.
BOX XIII, H4 has buffer in it and not labelled on spreadsheet – but no PCR products amplified. Possibly an empty well with buffer?
On plate VII, 2 samples labelled NZAC03012286 (F8 and A9). I have relabelled A9 to NZAC03012286_2.
In BOX VII there were some samples marked with EP21* non-destr (A10 – G10) – these did not work well for COI amplification; only 1 out of 7 produced a PCR band and was successfully sequenced. The samples marked as EP21* destr produced PCR bands and yielded good sequences for all 9 out of 9 samples.
The success of retrieving sequences for these samples were different for the 28S region: for the EP21* non-destr samples, 4 out of 7 produced a PCR band and was successfully sequenced. For the EP21* destr samples, 6 out of the 9 samples produced PCR bands and were subsequently sequenced.
For plate LBI inv 0024, column 6 has got 5 DNA samples (ie, A6-E6) – only A6-C6 is labelled in the spreadsheet. One of the two unlabelled samples produced a band that was not subsequently sequenced.
Sequencing of extrcontr240811 (B9) from BOX I yields a sequence for both 28S and COI – Blasting the sequence using GenBank produced hits for either a beetle or a fungus.
A lot of smears were present when running a PCR for BOX XIV. This equates to more DNA in the plot = bigger insects?)
New box (Plot 2C) – B7 and B12 had too much sample in tube (~50x more). These did not yield amplified PCR products.

Appendix

Appendix 1.1. Summary of the success rates of PCR for boxes labelled “3N, 2C, 1L”, “2C” and “1L”.