Inference with Viral Quasispecies Diversity Indices

Inference with viral quasispecies diversity indices

Josep Gregori, Miquel Salicrú, Esteban Domingo, Alex Sanchez, Francisco Rodríguez-Frías,

Josep Quer

SUPPLEMENTARY MATERIAL

Sensitivity to filtering levels

To assess the sensitivity of this method to small changes in the parameters we explored the results filtering the haplotypes at abundance levels of 0.2% and 1%, and fringe trimming at 80%, 90% and 99% confidence.

Filtering deeper into the noise level, at 0.2%, the differential bias is exacerbated for S and Sn, and a differential bias is introduced in Mf. In these circumstances the fringe trimming alleviates both, the absolute and the differential bias, of S, Sn and Mf, but does not completely cancel them (Figures 1 to 3).

On the other hand filtering well above noise level, at 1%, the absolute and the differential bias are rather limited and the fringe trimming strategy alone is able to compensate for the differential bias, even at lower confidence levels (Figures 4 and 5).

In all cases the rarefaction of the big sample to the small sample size provides the less biased comparisons, although at the cost of a magnified absolute bias (Figures 6 and 7).

Figure 1 - Distribution of S, Sn and Mf values by sample size and reference population, as estimated on the filtered haplotypes above 0.2% abundance. A big differential bias is observed in the three reference populations on the three diversity indices.

Figure 2 - Distribution of S, Sn and Mf values for samples of size 400 and 1000 reads from the same population, after filtering haplotypes at 0.2% abundance, and fringe trimming at 90% confidence. A differential bias is still observed.

Figure 3 - Distribution of S, Sn and Mf values for samples of size 400 and 1000 reads from the same population, after filtering haplotypes at 0.2% abundance, and fringe trimming at 99% confidence. This provides a less biased comparisons.

Figure 4 - Distribution of S, Sn and Mf values by sample size and reference population, as estimated on the filtered haplotypes above 1% abundance. Some differential bias is still observed in a few cases.

Figure 5 - Distribution of S, Sn and Mf values for samples of size 400 and 1000 reads from the same population, after filtering haplotypes at 1% abundance, and fringe trimming at 80% confidence. Trimming at lower confidence levels already provides good results, in this case.

Figure 6 - Distribution of S and Sn values for samples of size 400 and 1000 reads from the same population, after filtering haplotypes at 0.2% abundance, and computing the diversity indices of the big sample by rarefaction to the small sample size. This provides the less biased comparison for S and Sn.

Figure 7 - Distribution of S and Sn values for samples of size 400 and 1000 reads from the same population, after filtering haplotypes at 0.5% abundance, and computing the diversity indices of the big sample by rarefaction to the small sample size. This provides the less biased comparison for S and Sn.