Supplemental MaterialsData-Driven Medication Dosing

A Data-Driven Approach to Optimized Medication Dosing: A Focus on Heparin

Authors:

Mohammad M. Ghassemi1, MPhil, <>

Stefan E. Richter2, MD, <>

Ifeoma M. Eche3, PharmD,<>

Tszyi W. Chen3, PharmD,<>

John Danziger3*, MD, <>

Leo A. Celi1,3*, MD, <>
Affiliations:

  1. Massachusetts Institute of Technology, Cambridge, MA
  2. Tufts Medical Center, Boston, MA
  3. Beth Israel Deaconess Medical Center, Boston, MA

* Please note that LAC and JD should be considered co-senior authors

Appendix: Additional Analysis for Validation of Proposed Approach and BIDMC Guidelines.

Section 1: Choice of Logistic Regression Approach

Ordinal logistic regression is a special form of multinomial logistic regression which is commonly used to model categorical outcomes with some meaningful order. Besides the assumption that outcomes can be ordered in a meaningful way, ordinal regression also makes the assumption of proportional odds. This assumption leads to regression coefficients that are maintained across classes, with only the intercept terms of the models changing. That is, it assumes that explanatory features maintain identical effects across varying ranges of the outcome. At the surface, it would seem that modeling therapeutic aPTT categories would lend itself nicely to this kind of approach. However, we did not feel that the assumption of proportional odds was a valid one. Specifically, the constant nature of the regression coefficients enforces zero skew on the P(therapeutic) function, and thereby limits the flexibility of the approach.

Standard multinomial logistic regression (SMLR) does not make the assumption of proportional odds. In SMLR, each model may have its own unique intercept and regression coefficients, allowing greater flexibility in model form. In computing the coefficients for standard multinomial logistic regression however, the categorical outcome is dummy coded. This entails selecting one of the categories as a reference, and then computing the effects of the features on the probability of the class outcome with respect to the selected reference category.

Our approach is similar to SMLR with the exception that we utilize a flexible reference category. More specifically, the reference we use is not a single class, but all classes whose binary probability of outcome we are not modeling. For example, when we model the probability of sub-therapeutic aPTT, we would reference therapeutic and supra-therapeutic aPTT categories, while the SMLR approach would reference either therapeutic or supratherapeutic aPTT. We have found in our validation of the method that this simple difference in the choice of reference category can have drastic effects on the predictive performance of the method, as we will now illustrate.

Section 2: Validation of Logistic Method

To validate our approach's performance compared to standard forms of multinomial, and ordinal logistic regression, we performed several validation experiments. In these experiments we generated a model of therapeutic outcome using a multinomial distribution which described the therapeutic state of the patients as a function of some dose. The function generated categorical outputs Sub-therapeutic,Therapeutic, and Supra-therapeutic, according to the probability functions sub(dose),ther(dose) and supra(dose), where:

supra(dose)+ ther(dose) + supra(dose) = 1, for any dose.

In general, we wish to select a value for dose which would maximize the probability of the function producing a therapeutic outcome. To do this, we must first estimate sub(x), ther(x) and supra(x) from a set of observations. Given that we know the ground truth in our case, we can compare the sum of squared error in the estimated sub(x), ther(x) and supra(x) curves produced by various methods as a means of comparing overall efficacy.

Hence, we randomly generated regression parameters for 20 sub(dose) and supra(dose) functions, which we modeled as sigmoidal. The ther(x) function was modeled as 1 - [sub(dose) + supra(dose)]. For each value of dose, from 0 to 10, in increments of 0.25, we generated 1000 outcome data points. This data was then provided to each of the methods, which attempted to reconstruct sub(x), ther(x) and supra(x)using the data. We found that in 19/20 tested simulations, the method we proposed exhibited a smaller sum of squares error than the ordinal or standard multinomial approaches.

In Figures 5 and 6 we illustrate two examples of the logistic probability models for sub-therapeutic, supra-therapeutic, and therapeutic dosing as a function of dose. Tables A-1 and A-2 relate the sum of squared errors for the estimation of these curves using each method. It is clear that our method improves upon the standard approaches in the estimation.

Section 3: Exclusion of Model Interaction terms

In the earliest phases of our work, we intended to include interaction terms in our model. To identify features, we utilized forward and backward stepwise regression with the Bayesian information criteria (BIC) of the model as the metric to add or remove terms. Using this approach we identified only two interactions which met the BIC based inclusion criteria:

  1. Age : Dose/Weight
  2. Dose/Weight : Creatinine

We initially included these interaction terms in the logistic regression (See Table A-11).However, we observed insignificant effects from these features on model AUC and predictive performance during method validation. Specifically, we partitioned our dataset into a training set (70%) which we used to generate model coefficients, and then attempted to predict categories from a testing (30%) sets.The terms were found to be statistically significant according to an analysis of deviance test.However, we found that the inclusion of the interaction terms in the training set had no impact on the performance in the testing set. TablesA-3 and A-4 show the confusion matrices of the models with and without the interaction terms. It is noteworthythat both models exhibited equal overall performance on testing set. Given these observations, we felt we could not justify the inclusion of the interaction terms in our final model.

Section 4: Comparison of Methods

In addition to the logistic regression based methods, we attempted several other forms of analysis as a means of validating our approach. Specifically, we compared our approach against a single and double layer neural network, two ensemble learning based approaches, and a multiclass support vector machine. In each case, the methods were allowed to train and validate on 70% of the data, and tested on the remaining 30%.

Section 4.1: Ordinal regression

To build off our results in section 2,we trained an ordinal regression model and evaluated it's performance on the testing set. The resulting confusion matrix is shown in Table A-5

Section 4.2: Ensemble Learning

Ensemble learning methods have been touted for their ability to provide impressive out-of-the-box performance on a range of classification problems. The essence of ensemble learning is to utilize a combination of multiple models, trained on varying segments or perturbations training data, to achieve enhanced performance.

To validate our approach, we attempted two types of ensemble learning on our dataset, boosting and bagging. Bagging, also known as Bootstrap aggregation, trains several models on bootstrapped replicas of the data, and makes final predictions by taking an average over all models. Boosting is similar to bagging in that it utilizes multiple classifiers trained on resampled versions of the data. In boosting however, special partitioning of the training data is performed in an attempt to maximize classification accuracy.Both approaches attempts to reduce misclassification errors by averaging over the classification of the various models.

Ensemble techniques require the investigator to choose several learning parameters including the class of learning models (decision trees, neural networks, etc.) and the size of the ensemble. In our case, we choose a decision tree based approach, and set the ensemble size to 300. Tables A-6 and A-7 show the results of these approaches. We observed that the Bagging approach’s performance comes close to the performance our approach (Table A-4) in terms of overall classification performance, with our method performing only slightly better, overall.Realistically, these differences are negligible enough for us to say that the bagging ensemble approach is a fair alternative to our method in terms of overall classification performance. However, the simplicity of our model makes it preferable in this case.

Section 4.3: Neural Networks

We observed that for a single, and double layer neural network (with layer sizes of 1000) our method performed better in overall classification (See Table A-4 and A-8). Some differences in class-specific performance existed, with the neural network performing better in classifying therapeutic patients than our model. In general, neural networks are a useful machine learning approach, but have a known tendency to over-fit without large training sets. Hence, it is possible that the size of our data-set played a role in the neural network’s lower performance.

Section 4.4: Support Vector Machines

Support vector machines are a useful and flexible technique for classification. While Support Vector Machines are were classically used for a linear, two-class problem, multi-class, non-linear versions are now common-place. [[1]K. Crammer and Y. Singer. On the Algorithmic Implementation of Multi-class SVMs, JMLR, 2001.]

To validate our method, we compared its performance on a testing set against a multiclass SVM with a Gaussian Radial Basis Function kernel. The resulting confusion matrix is shown in Table A-9. Here again, we see that the overall classification performance is lower than our proposed approach.

Section 4.5 Conclusion:

We found that the more complex methods did not perform as well on the testing portion of the classification task as our proposed approach. This is not to say that the more sophisticated techniques are intrinsically worse than our approach, but rather, it serves to illustrate the great reliance that these more complex methods have on investigator specified learning parameters. The choice of these parameters may have tremendous impact on the outcome of the learning algorithm, and its resulting performance. We hope to have demonstrated that the simplicity of our method, in combination with the performance levels,makes it useful for this type of classification paradigm.

Section 5: Heparin Bolus Subgroup Analysis

Of our original cohort, only 485 patients had data available on their initial heparin bolus in addition to our selected features. After removing transfer patients, we were left with 353 patients who met the inclusion criteria for a subgroup analysis. For this subgroup, a weight-normalized bolus term was not found to be statistically relevant (See Table A-10 of the Appendix). This suggests that our analysis is sound despite the lack of information on heparin bolus. Comparison of the patients who did and did not have data for heparin boluses is given in Table A-12 of the Appendix. It is worth noting that the population which verifiably received bolus is not perfectly representative of the population as a whole.

Section 6: BIDMC Heparin Dosing Guidelines

1. Obtain baseline PT, PTT, platelet count and Hct < 24 hours of initiation

2. If starting a new infusion for venous thromboembolism or for arterial thromboembolism other than acute coronary syndrome:

o Give an initial bolus of 80 units/kg

o Start the infusion at an initial rate of 18 units/kg/hr.

3. If starting a new infusion for acute coronary syndrome:

o Give an initial bolus of 60 units/kg/hr with a maximum of 4000 units o Start the infusion at an initial rate of 12 units/kg/hr.

4. If starting a new infusion for stroke (also used as the default for other indications):

o No initial bolus

o Start the infusion at an initial rate of 13 units/kg/hr.

5. If patient is currently on low molecular weight heparin, give the first IV heparin dose 8 hours after

the last dose of low molecular heparin.

6. Check PTT (Process STAT) and adjust according to sliding scale with the following frequency:

o After infusion is begun, check PTT every 6 hours.

o After any dose change, check PTT every 6 hours.

o When PTT is therapeutic for two consecutive tests, check PTT once daily.

7. Adjust heparin infusion according to the following sliding scale:

For acute coronary syndrome:

Supplemental MaterialsData-Driven Medication Dosing

PTT (sec) Under 40

40 - 49 50 - 80*

81 - 100

101 - 120 Over 120

Bolus (units/kg)

15

-----

Hold (min)

----

30 60

Rate Change (units/kg/hr)

Increase infusion rate by 4 units/kg/hr Increase infusion rate by 2 units/kg/hr

No change

Reduce infusion rate by 2 units/kg/hr Reduce infusion rate by 4 units/kg/hr Reduce infusion rate by 5 units/kg/hr

Supplemental MaterialsData-Driven Medication Dosing

*Therapeutic

For all other indications:

Supplemental MaterialsData-Driven Medication Dosing

PTT (sec) Under 40

40 - 59

60 - 100*

101 - 120 Over 120

Bolus (units/kg)

40 20 ---

Hold (min)

----

60

Rate Change (units/kg/hr)

Increase infusion rate by 4 units/kg/hr Increase infusion rate by 2 units/kg/hr

No change

Reduce infusion rate by 2 units/kg/hr Reduce infusion rate by 4 units/kg/hr

Supplemental MaterialsData-Driven Medication Dosing

*Therapeutic

8. Notify 24/7 Critical Result Contact:

•Two consecutive PTTs are greater than 150 seconds

•Two consecutive PTTs are less than the lower limit of Therapeutic

•Change in neurological status or clinical signs of bleeding

Supplemental MaterialsData-Driven Medication Dosing

9. Platelet monitoring

In general, patients that are determined to be at increased risk for developing Heparin Induced

Thrombocytopenia (HIT) should have their platelet count monitored every 2-3 days from days 4-14 of

heparin therapy. To insure compliance with the Joint Commission Venous Thromboembolism (VTE) Performance Measure # 4, patients receiving intravenous unfractionated heparin for the treatment of

deep venous thrombosis and/or pulmonary embolism will have automatic orders for platelet count monitoring on days 4, 7, and 10 of therapy.

Section 7: Heparin Indications

The population is quite heterogeneous with regards to indication for heparin therapy. Unfortunately, we did not have sufficient information captured in the MIMIC database to determine the exact indication for anticoagulation in this population of patients. Hence we included the type of ICU (medical vs. surgical) in our analysis as a rough proxy. Not surprisingly, we found that admission to a medical ICU (MICU or Cardiac ICU) as opposed to a surgical ICU (CTU or SICU) was an independent risk factor for supra-therapeutic aPTT at six hours.

For patients in which the indication was known, the distribution of therapeutic outcome (goal aPTT range set by BIDMC guidelines) vs. initial heparin dose is shown by indication in figures 7 through 10. We have also included information regarding the ICD9 based primary diagnosis for enrolled patients, which is now located in Tables A-13 and A-14. In these tables we observe that 21.3% of patients had acute coronary syndromes as their primary diagnosis, followed by 12.8% with venous or arterial thrombosis (not in the brain or heart), 5% with primary valve disorders, and 3.4% with ischemic strokes. 1.6% of patients had atrial fibrillation listed as a primary diagnosis, while 55.7% had a primary diagnosis unrelated to anticoagulation. In the subgroups for which information regarding primary indication was available, an indication of acute coronary syndrome or valvular disorder was associated with a higher probability of sub-therapeutic dosing and a lower probability of supra-therapeutic dosing. An indication of arterial or venous thrombosis was associated with a higher probability of supra-therapeutic dosing, with no change to the probability of sub-therapeutic dosing. Given the scarcity of indication for dosing, we opted not to include it in the final model. When models were constructed using primary indication for heparin therapy as a covariate, the AUC increased in each case by <0.02. This analysis is shown in the Appendix, in Table A-15.

Tables

Table A-1: A comparison of SSE between estimated and actual sub(dose), ther(dose), and supra(dose) functionsfor

SSE / sub(dose) / ther(dose) / supra(dose) / Total
Ordinal / 2284 / 2782 / 895 / 5960
Standard Multinomial / 545 / 629 / 91 / 1265
Our Approach / 13 / 64 / 53 / 131

Table A-2: A comparison of SSE between estimated and actual sub(dose), ther(dose), and supra(dose) functions.

SSE / sub(dose) / ther(dose) / supra(dose) / Total
Ordinal / 273 / 1327 / 1143 / 2743
Standard Multinomial / 1247 / 1267 / 1241 / 3132
Our Approach / 70 / 1310 / 1279 / 2660

Table A-3: Performance of our method trained with interaction term, Age-Dose/Weight, and Dose/weight-Creatinine on a novel testing set.

Number (% of total) / Predicted Subtherapeutic / Predicted Therapeutic / Predicted Supratherapeutic / Sensitivity
Actual Subtherapeutic / 184 (40.4%) / 59 (13.0%) / 27 (5.9%) / 68.1%
Actual Therapeutic / 20 (4.4%) / 25 (5.5%) / 16(3.5%) / 41%
Actual Supratherapeutic / 21 (4.6%) / 31 (6.8%) / 72 (15.8%) / 58.1%
1 - Specificity / 81.8% / 21.7% / 62.6% / 61.8%

Table A-4: Performance of our method trained without interaction terms on a novel testing set.

Number (% of total) / Predicted Subtherapeutic / Predicted Therapeutic / Predicted Supratherapeutic / Sensitivity
Actual Subtherapeutic / 187 (41.1%) / 64 (14.1%) / 31 (6.8%) / 66.3%
Actual Therapeutic / 17 (3.7%) / 23 (5.1%) / 13 (2.9%) / 43.4%
Actual Supratherapeutic / 21 (4.6%) / 28 (6.2%) / 71 (15.6%) / 59.2%
1 - Specificity / 83.1% / 20% / 61.7% / 61.8%

Table A-5: Confusion matrix reflecting ordinal logistic regression classification performance on testing data.

Number (% of total) / Predicted Subtherapeutic / Predicted Therapeutic / Predicted Supratherapeutic / Sensitivity
Actual Subtherapeutic / 42.2% / 71 (15.6%) / 33 (7.3%) / 64.9%
Actual Therapeutic / 7 (1.5%) / 7 (1.5%) / 8 (1.8%) / 31.8%
Actual Supratherapeutic / 26 (5.7%) / 37 (8.1%) / 74 (16.3%) / 54%
1 - Specificity / 85.3% / 6.1% / 64.3% / 60%

Table A-6: Confusion matrix reflecting Bagging classification performance on testing data.

Number (% of total) / Predicted Subtherapeutic / Predicted Therapeutic / Predicted Supratherapeutic / Sensitivity
Actual Subtherapeutic / 185 (40.7%) / 53 (11.6%) / 34 (7.5%) / 68%
Actual Therapeutic / 19 (4.2%) / 30 (6.6%) / 16 (3.5%) / 46.2%
Actual Supratherapeutic / 21 (4.6%) / 32 (7.0%) / 65 (14.3%) / 55.1%
1 - Specificity / 82.2% / 26.1% / 56.5% / 61.5%

Table A-7: Confusion matrix reflecting Boosting classification performance on testing data.

Number (% of total) / Predicted Subtherapeutic / Predicted Therapeutic / Predicted Supratherapeutic / Sensitivity
Actual Subtherapeutic / 66 (14.5%) / 23 (5.1%) / 17 (3.7%) / 62%
Actual Therapeutic / 13 (2.9%) / 6 (1.3%) / 12 (2.6%) / 19.4%
Actual Supratherapeutic / 146 (32%) / 86 (18.9%) / 86 (18.9%) / 27%
1 - Specificity / 29.3% / 5.2% / 74.8% / 34.7%

Table A-8: Confusion matrix reflecting 2-Layer Neural Network classification performance on testing data.

Number (% of total) / Predicted Subtherapeutic / Predicted Therapeutic / Predicted Supratherapeutic / Sensitivity
Actual Subtherapeutic / 135 (29.7%) / 52 (11.4%) / 32 (7.0%) / 61.6%
Actual Therapeutic / 57 (12.5%) / 32 (7%) / 27 (5.9%) / 27.6%
Actual Supratherapeutic / 33 (7.3%) / 31 (6.8%) / 56 (12.3%) / 46.7%
1 - Specificity / 60% / 27.8% / 48.7% / 49.0%

Table A-9: Confusion matrix reflecting support vector machine classification performance on testing data.

Number (% of total) / Predicted Subtherapeutic / Predicted Therapeutic / Predicted Supratherapeutic / Sensitivity
Actual Subtherapeutic / 155 (34.1%) / 45 (9.9%) / 27 (5.9%) / 68.3%
Actual Therapeutic / 32 (7.0%) / 42 (9.2%) / 34 (7.5%) / 38.9%
Actual Supratherapeutic / 38 (8.4%) / 28 (6.2%) / 54 (11.9%) / 45.0%
1 - Specificity / 68.9% / 36.5% / 47.0% / 55.2%

Table A-10: Results of logistic regression for modeling supra-therapeutic (Table 2-10a) and sub therapeutic (Table 2-10b) aPTT for the heparin bolus subgroup.

2-10a
Validation AUC Mean (Standard Deviation) = 0.79(0.06) / Odds Ratio / 95% Confidence Interval (Lower/Upper) / p-value
Age (Years) / 1.02 / 1.00 /1.04 / 0.02
SOFA Score / 1.08 / 1.01 / 1.15 / 0.02
Elixhauser / 1.03 / 0.99 / 1.08 / 0.17
Heparin Dose, Units/Kg*Hr / 1.34 / 1.23 / 1.47 / <0.01
Measurement Time b ,hours / 1.12 / 0.88 / 1.44 / 0.36
Creatinine mg/dLa / 0.94 / 0.77 / 1.14 / 0.51
Ethnicity, White / 0.66 / 0.35 / 1.23 / 0.19
Gender, Male / 0.50 / 0.30 / 0.84 / 0.01
ICU Type, Medical / 0.84 / 0.45 / 1.58 / 0.59
AST/ALT > 5x Upper Limit of Normal / 0.51 / 0.09 / 2.81 / 0.44
Heparin Bolus (Units / Kg) / 1.00 / 0.98 / 1.01 / 0.58
2-10b
Validation AUC Mean (Standard Deviation) = 0.79(0.07) / Odds Ratio / 95% Confidence Interval (Lower/Upper) / p-value
Age / 0.98 / 0.96 / 0.99 / 0.01
SOFA Score / 0.96 / 0.90 / 1.03 / 0.25
Elixhauser / 1.00 / 0.96 / 1.04 / 0.96
Heparin Dose, Units/Kg*Hr / 0.80 / 0.73 / 0.86 / 0.00
Measurement Time b / 0.93 / 0.73 / 1.20 / 0.59
Creatinine mg/dLa / 1.18 / 0.97 / 1.42 / 0.09
Ethnicity, White / 1.25 / 0.68 / 2.33 / 0.47
Gender, Male / 1.69 / 1.00 / 2.85 / 0.05
ICU Type, Medical / 0.52 / 0.28 / 0.95 / 0.03
AST/ALT > 5x Upper Limit of Normal / 1.73 / 0.39 / 7.65 / 0.47
Heparin Bolus (Units) / 0.99 / 0.98 / 1.00 / 0.11

a SI conversion factor: To convert creatinine to mol/L, multiply by 88.4