Supplementary Material for:

How host heterogeneity governs tuberculosis reinfection

M. Gabriela M. Gomes, Ricardo Águas, João Lopes, Marta C. Nunes, Carlota Rebelo, Paula Rodrigues, Claudio J. Struchiner

Parameter Estimation

I. Using a Markov Chain Monte Carlo approach

Preliminary results showed a strong correlation between  and  (Figure S1), which resulted in estimations for these parameters with wide confidence intervals.

Figure S1. Plot of the trace of  against the trace of  from a MCMC method assuming the homogeneous model with uninformative prior distributions.

With the aim of overcoming this problem, we used a MCMC method with an informative prior for  so that the estimation process was constrained towards lower values for this parameter in accordance to the literature (e.g. [1,2]). For the remaining parameters we choose uninformative prior (i.e. uniform) distributions as described in Table S1.

Table S1. Prior distributions used in the MCMC runs.

Parameter / Distribution
 / Uniform(0,1)
 / Uniform(0,1)
 / Uniform(0,1)
 / Exponential(1/0.0025)
Standard error of estimate (SEE) / Uniform(0,1)

We implemented a MCMC algorithm using the package pymc [3]. We used a single Metropolis-Hastings algorithm with a burn-in period of 10,000 iterations followed by 40,000 iterations. The recording interval was set to 10 so that the autocorrelation between samples was negligible. We run two replicates for each model in order to verify convergence of the chains. The mode of the posterior distributions and their 95% credible interval was calculated using the locfit package [4]. Our results are presented in Table S2.

Table S2. Estimated parameters using MCMC for both homogeneous and heterogeneous model. The 95% credible intervals are presented in square brackets.

Parameter / Heterogeneous model / Homogeneous model
 / 0.98 [0.67, 1.00]
0.98 [0.67, 1.00] / NA
NA
 / 0.35 [0.13, 1.00]
0.38 [0.12, 1.00] / NA
NA
 / 0.85 [0.00, 7.99]
0.74 [0.00, 8.09] / 1.71 [0.03, 8.83]
1.74 [0.02, 8.80]
 / 0.01 [0.00, 0.02]
0.01 [0.00, 0.02] / 0.01 [0.00, 0.02]
0.23 [0.12, 0.39]
SEE / 0.23 [0.12, 0.39]
0.23 [0.12, 0.40] / 0.25 [0.17, 0.42]
0.25 [0.17, 0.42]

These results suggest a value for  higher than what is commonly considered in the literature. In fact, even with a constrained prior distribution with 95% density between 0.0007 and 0.0035, the posterior distribution has a skewed peak at around 0.01 for both heterogeneous and homogeneous models (Figure S2).

Although a value of 0.01 for the rate of endogenous reactivation is higher than previously stated for European countries [1,2], recent studies in African [5] and Asian [6,7] countries suggest values that are compatible with our estimates. This can be due to higher prevalences of HIV in those settings or just reflect regional differences in nutrition, smoking patterns, environmental conditions, population structure or the natural history of tuberculosis [6,7,9,10].

Figure S2. Estimation of : prior distribution in full line; posterior distributions for both replicates of homogeneous model in dotted line; posterior distribution for both replicates of heterogeneous model in dashes line.

Regarding estimates of , MCMC results support different immunity mechanisms depending on the model assumed. Particularly, when assuming a heterogeneous model,  estimates are well below 1 (i.e. infection reduces susceptibility), whereas when assuming a homogeneous model,  estimates are well above 1 (i.e. infection increases susceptibility). Nevertheless, these estimates present unsatisfactorily wide credible intervals (Figure S3). This lack of precision can be due to high stochasticity in the estimation procedure resulting from trying to estimate  and simultaneously, while considering a small dataset with potentially large experimental errors.

Figure S3. Estimation of : prior distribution in full line; posterior distributions for both replicates of homogeneous model in dotted line; posterior distribution for both replicates of heterogeneous model in dashes line.

In order to decrease the uncertainty of the estimations of , we opted to follow up the MCMC approach with the implementation of a Gaussian-Newton algorithm based on a least squares minimization criterion and fixed . The Gaussian-Newton algorithm is a method typically used to solve non-linear regression problems, hence our choice.

II. Using a Gaussian-Newton algorithm (fixed )

We implemented a Gaussian-Newton algorithm using the nls function provided in the R package stats [11]. We used a tolerance level for the relative offset convergence criterion of 10-4 and a minimum step-size factor of 10-12. In order to calculate confidence intervals we chose a bootstrap approach since this method does not rely on a linear approximation [12]. This was done using the nlsBoot function from the R package nlstools [11]. We used an F test and a Log-likelihood test to assess whether the heterogeneous model provides a significantly better fit to the data. Table S3 summarizes the results obtained.

Table S3. Estimated parameters ().

Parameter / Heterogeneous model / Homogeneous model
 / 0.98 [0.95, 1.00] / NA
 / 0.15 [0.00, 0.56] / NA
 / 0.51 [0.00, 2.37] / 3.87 [1.61, 7.79]
RSS / 0.30 / 0.74
SEE / 0.16 / 0.24
F test / 8.12 (0.007) / NA
Log-likelihood test / 12.70 (0.002) / NA

Since we fixed the value of , we back up our results with a sensitivity analysis on this parameter. We consider a set of values between 0.0075 and 0.0125. The results are presented in Figure S4.

Figure S4. Sensitivity analysis on the fixed parameter  regarding the estimation of  and its 95% confidence intervals: analysis assuming a homogeneous model in dotted line; analysis assuming a heterogeneous model in dashed line.

In accordance with the MCMC estimations of  and , when considering uninformative priors (Figure S1) we observe a strong correlation between these two parameters in the sensitivity analysis. Indeed, as the value considered for  increases, the estimate of  also increases. This analysis also shows that the size of the confidence interval for  increases as the estimated value increases.

Furthermore, the sensitivity analysis shows that our  estimations for both the homogeneous and heterogeneous models are robust. Within the considered range for , when assuming a homogeneous model the reinfection factor is consistently larger than 1, whereas, when assuming a heterogeneous model this parameter is consistently lower than 1.

References

  1. Vynnycky, E. & Fine, P.E.M. 1997 The natural history of tuberculosis: The implications of age-dependent risks of disease and the role of reinfection. Epidemiol. Infect.119, 183-201.
  2. Tocque, K., Bellis, M.A., Tam, C.M., Chan, S.L., Syed, Q., Remmington, T. & Davies, P.D.O. 1998 Long-term Trends in Tuberculosis Comparison of Age-cohort Data between Hong Kong and England and Wales. AJRCCM158, 484-488.
  3. Patil, A., Huard, D. & Fonnesbeck, C.J. 2010 PyMC: Bayesian Stochastic Modelling in Python. J. Stat. Softw.35, 1-81.
  4. Loader, C. 1999 Local Regression and Likelihood. Springer, New York.
  5. Mills, H.L, Cohen, T. & Colijn, C. 2011 Modelling the performance of isoniazid preventive therapy for reducing tuberculosis in HIV endemic settings: the effects of network structure. J. R. Soc. Interface8, 1510-1520.
  6. Vynnycky, E., Borgdorff, N.W., Leung, C.C., Tam, C.M. & Fine, P.E.M. 2008 Limited impact of tuberculosis control in Hong Kong: attributable to high risks of reactivation disease. Epidemiol. Infect.136, 943-952.
  7. Peng, W., Lau, E.H.Y., Cowling, B.J., Leung, C.C., Tam, C.M. & Leung, G.M. 2011 The transmission Dynamics of Tuberculosis in a Recently Developed Chinese City. Plos One5, e10468.
  8. Lonnroth, K., Jaramillo, E., Williams, B.G., Dye, C. & Raviglione, M. 2009 Drivers of tuberculosis epidemics: The role of risk factors and social determinants. Soc. Sci. Med.68, 2240-2246.
  9. Brooks-Pollock, E., Cohen, T. & Murray, M. 2010 The impact of Realistic Age Structure in Simple Models of Tuberculosis Transmission. Plos One5, e8479.
  10. Trauer, J.M. & Krause, V.L. 2011 Assessment and management of latent tuberculosis infection in a refugee population in the Northern Territory. Med. J. Aust.194, 579-582.
  11. Bates, D.M. & Watts, D.G. 1988 Nonlinear Regression Analysis and Its Applications, Wiley.
  12. Ritz, C. & Streibig, J.C. 2008 Nonlinear regression with R, Springer Verlag.

1