Supplement 2. Expert Panel Commentary
VLBW volume
Validity:
•Definition/validity of measure is questionable
•No agreement in literature that volume significantly effects outcomes; future research should attempt to better delineate low volume cut-off
•Could be a warning flag or a stratification tool but not suitable for a composite
Usability:
•Not clear how hospitals can improve on this measure
•Some states impose legal limits on access to care in rural areas
Antenatal steroids
Definition/reliability:
•A good measure given limits on available data – future focus should be on how to best extract this data regarding time frame and repeat course
•Measure could be based on time of maternal admission; however, this complicates data collection as the information has to be collected from the maternal chart
Validity:
• Impact of repeated courses not clear; too many courses could be a sign of poor quality
• Potential for gaming by providing steroids in light of imminent delivery
1st h temp measured
Definition/reliability:
•Panel by and large supportive but concerns regarding validity and gaming
Validity:
•Not clear whether measurement of a temperature is associated with better outcome
1st h hypothermia
Definition/reliability:
•Concern regarding different ratings based on systematic difference in measurement (skin vs. axillary vs. rectal); concern about safety regarding rectal temperatures
•Hard to measure reliably since multiple care procedures interfere with exact measurement during the first hour; need to better operationalize and define this measure
Validity:
•Not clear that hypothermia is detrimental, may be beneficial
•Overheating may also result in poor patient outcome
•Temperature range for individual infant should be reported
•Varying local resuscitation practices may introduce bias (delivery room versus NICU)
Early surfactant
Definition:
•Denominator should better delineate patients that are eligible for surfactant, such as those in respiratory distress and needing >40% oxygen
•Some panelists proposed to exclude patients primarily started on CPAP from the denominator
•Others argued against excluding patients with a primary course of CPAP, as it is unclear whether such a strategy is supported by evidence
Validity:
•Not all infants need surfactant
•Lack of evidence that early surfactant (versus rescue surfactant) results in lower rates of CLD
Timely ROP exam
Definition:
•Does not take into account patients unable to undergo eye exam due to of intercurrent illness
•Current data definitions are not able to assess whether ROP exam occurred at recommended age; rather assess whether patient was in hospital at the age an exam is recommended and whether an exam was performed before the initial disposition
Validity:
•Recommendations regarding tracking of timely exams after dc from NICU
Severe ROP
Definition/Reliability:
•Large amount of inter-rater variability in assigning degree of ROP
•Change variable definition to severe ROP rate 3 standard deviations greater than group mean
•One panelist proposed use as a sentinel indicator
Validity:
•Potential for transfer bias where transfer of healthier for convalescent care may artificially inflate the ROP rate in the remaining patients
•Highest stage of ROP may be recorded after discharge and not included in database
•Potential trade-off between ROP and other morbidities
ROP surgery
Definition/Reliability:
•A better measure might be one that assesses whether surgery was performed at the appropriate time for infants with pre-threshold disease (percentage of infants reaching criteria for surgery who had surgery within 3 days)
Validity:
•Rare event, likely poor discrimination
•Ophthalmologists may have different thresholds for proceeding with surgery
Any IH
Definition/Reliability:
•Documented poor inter-rater reliability
•Use of all patients rather than survivors only may be appropriate as most infants receive head ultrasound prior to death
•Consider limiting to inborn infants
•Consider change of cut-off to 30 days due to variation of timing of head imaging among NICUs
Validity:
•Different imaging modalities are differentially sensitive to recognizing IH
Usability:
•Unclear path to improvement; but improvement by use root cause analysis has been seen
Severe IH
Similar to any IH but inter-rater reliability better than for Measure 9
Cystic PVL
Definition/Reliability:
•Poor inter-rater reliability, panel feels that inter-institutional differences may reflect differences in reading and reporting rather than true incidence
•One panelist proposed use as a sentinel indicator since severe hypocarbia can increase the risk of PVL
Validity:
•Ultrasound is less sensitive than MRI, which may lead to bias against centers that use MRI
•Rare event, likely poor discrimination
Usability:
•Unclear path to improvement
Use of AV
Definition/Reliability:
•Unclear definitions
•Issues with accuracy of data collection
Validity:
•Unclear what proportion of AV is appropriate
•Useful for assessing local practice, not for comparison
Duration of AV
Definition/Reliability:
•Difficult to obtain accurate data, various episodes of intubation/extubation
•Recommendation for use as a binary variable “length of AV >2 weeks” defined in terms of in excess of a risk-adjusted expected value based on top performing NICUs
Pneumothorax
Definition/Reliability:
•Concern about inter-rater reliability
•Recommendation to include present on admission codes to avoid bias against referral centers
Validity:
•There are no extant population data demonstrating what the baseline rate for spontaneous pneumothorax in VLBW infants actually is
Steroids for CLD
Definition/Reliability:
•Define which steroid to include and total dose
•One panelist proposed use as a sentinel indicator
Validity:
•Different types of steroids/routes of administration may result in different outcomes
•Recent studies (use of intratracheal steroids as surfactant and DART study1 may increase steroid use
Oxygen on day 28
Validity:
•VON uses altitude adjustment in RA model
•Oxygen can be administered for various reasons of different severity
•Some centers use O2 to prevent progression of ROP
Malleability:
•No clear path to improvement
Oxygen at 36 wks
Preferred by most compared to Measure 16, most comments similar
Definition/Reliability:
•Consider exclusion of intermittent oxygen
•Consider counting patients on CPAP, even if they receive room air
•A physiologic definition for CLD would be preferable
Oxygen at dc
Definition:
•Transfer bias against centers that transfer out healthier babies
Validity:
•Measure used by VON to assess discharge policies rather than quality
•Needs additional adjustment for socioeconomic status, postmenstrual age at dc, and altitude
•If appropriate infants can be sent home sooner on oxygen then a higher rate could be good
•Does not get at reason for oxygen therapy
Malleability:
•No clear path to improvement
Dc on AV
Validity:
•Measure used by VON to assess dc policies rather than quality
•Rare event, poor discrimination
•Inadequate risk adjustment may not be adequate for special cases;
•Ability to be discharged on AV may reflect a broader issue of accessibility of long-term facilities and have little to do with quality at an individual unit level.
Malleability:
•No clear path to improvement
NEC
Definition/Reliability:
•Since no one understands what causes NEC, this is not suitable for a quality measure
•There is no agreement as to what constitutes NEC, since inter-rater reliability for abdominal films has not been quantified across a large number of centers
NEC surgery
Similar to Measure 20
Definition/Reliability:
•How will isolated perforation be distinguished from true NEC?
•The number of NEC cases should be the denominator
Validity:
•Large variability depending on pediatric surgeon
Only Human milk at dc
Most panelists preferred Measure 23
Definition/Reliability:
•Recommendation to use this measure a few weeks after dc
Validity:
•Not a valid measure for perinatal centers because of distance issues
Any human milk at dc
Preferred to Measure 22, else similar comments
Growth velocity
Definition/Reliability:
•Adjust for race and gender
•Measure gives a good idea of consistency in NICUs
Validity:
•May require risk adjustment development work beyond what is used by VON
•Not clear that overly fast growth is desirable
Infection
Definition/Reliability:
•Difficult to define infections in a way that "gaming" cannot occur
•Difficult to capture clinical sepsis
Malleability:
•Highly modifiable
Length of stay
Definition/Reliability:
•Easy to quantify
•Balancing measure of readmission rates and death needed to make this a meaningful indicator
•Needs to be expressed as postmenstrual age at discharge
•A broader measure of the percent that exceeds the expected time could be useful
Validity:
•Do not want to incentivize inappropriate discharge
•Influenced by back transfer policies, socioeconomic status, and access to health care
28 day mortality
Definition/Reliability:
•Measure reflects decisions about life support. Survival may not always better than death
Validity:
•Potential for systematic bias depending on whether delivery room deaths or deaths before 12 hours of life are excluded
•Potential for gaming. NICUs may transfer out babies when death is imminent
NICU Mortality
Comments similar to Measure 27. Measure 28 was preferred by all but one panelist
Definition/Reliability:
•Value as a balance indicator stressed, competing outcome
•More robust measure than measure 27
•Consider measure to assess unexpected mortality after discharge

AV - assisted ventilation, CLD - chronic lung disease, dc - discharge, h - hour, IH - intracranial hemorrhage, NEC - necrotizing enterocolitis, NICU - neonatal intensive care unit, PVL - periventricular leukomalacia, ROP - retinopathy of prematurity, temp - temperature, wks - weeks

Reference List

1. Doyle, L. W. et al. Outcome at 2 years of age of infants from the DART study: a multicenter, international, randomized, controlled trial of low-dose dexamethasone. Pediatrics119, 716-721 (2007).

1