Transcript of Cyberseminar
VIReC Database and Methods Seminar
Using VA Corporate Data Warehouse for Health Services Research
Presenters: Polly Noel Noel, PhD, and Laurel Copeland, PhD
June 4, 2012
Margaret:Welcome to VIReC’s Database and Methods Cyber Seminar entitled “Using VA Corporate Data Warehouse for Health Services Research.” Thank you, Decider, for providing technical and promotional support for this series. Today’s speakers are Polly Noel Noel, PhD and Laurel Copeland, PhD. Dr. Noel is a Associate Director, Veteran’s Evidence Based Research Dissemination and Implementation, South Texas Veteran’s Health Care system.
Dr. Copeland is a Research Health Scientist at the Center for Applied Health Research, established in 2010 by Scott and White Health Care System and Central Texas Veteran’s Health Care System.
Questions, as Heidi mentioned, will be monitored during the talk in the Q&A portion of the GoToWebinar, and will be presented to doctors Noel and Copeland at the end of their talk. A brief evaluation questionnaire will pop when you close GoToWebinar. Please take a moment to complete it. I am pleased to welcome today’s speakers, Dr. Polly Noel and Dr. Laurel Copeland.
Dr. Polly Noel:Hello, I am Polly Noel, and I am going to be starting off the cyber seminar today. And we would like to thank everyone for joining us, and also thank for VIReC for inviting us back to make this presentation. This presentation is meant to serve as an introduction to the Corporate Data Warehouse. We are going to review data contained in the Corporate Data Warehouse. Some of the limitations in using the CDW and our own experiences in using CDW data for health services research.
During our presentation, we are going to provide a brief overview of CDW, review possible sources in CDW data, briefly describe two evaluations of the quality of CDW data and relate our own experiences in using CDW anthropometric data in an HSR&D-funded research project. Describe a more limited experience using blood pressure data from the CDW, and also offer some recommendations to consider when using CDW data, and identify some additional resources that might be helpful.
Before we begin, we would be interested in knowing how many members of the audience could rate their overall knowledge of the Corporate Data Warehouse.
Margaret:The responses are coming in. We are at about sixty-seven percent. I will give it a few more seconds before I close this out. [short silence] There is your responses, and if you could just take a second to talk through them. We do need them for the recording.
Dr. Polly Noel:Oh, yes. It appears that about a third of the participants indicate they have no prior knowledge of the CDW. And with the additional third indicating slightly more knowledge about the CDW, and another third indicate more moderate to expert-level knowledge of the CDW. So, a range of participants today.
Margaret:I am just going to go right into your second poll, if that is okay?
Dr. Polly Noel:We were also interested in knowing from the participants, how many of you have ever used data from the CDW for research?
Margaret:The responses are coming in. We are at about sixty-six percent right now. I will give it a few more seconds before I close this out. There is your responses.
Dr. Polly Noel:Okay. About two thirds have never used CDW data, but hope to in the future. Whereas, about a third are either currently using the data for their research, or have even published papers using CDW data. Let’s see. I think we also had the whiteboard question or…?
Margaret:Yeah, but we don’t have a whiteboard so I will need you to bring your slides up so that we can see the question.
Dr. Polly Noel:Okay, okay.
Margaret:And to Polly, as people type answers to your question in the Q&A pane, I will read them to you, okay?
Dr. Polly Noel:Okay. So, yes. Now we are wanting people to indicate if they have never used the CDW, what you want to learn about the CDW or what data you want to actually get from the CDW, some day in the future.
Margaret:And as we no longer have a whiteboard available to us in our Webinar software, if our audience could use your Q&A pane to submit your questions to us, and we will put those out over the phone line. Thank you.
Okay, Polly, first question: Access TIU data. Second question: How do I access this data, and how do I get the necessary permissions to extract the data? How do I access data for HPDP, patient care and utilization data, pharmacy and/or lab data? It goes on and on. I think that Heidi can record this and we can get it to you. You probably want to continue with your talk.
Dr. Polly Noel:I think we can address some of those at the end of the presentation, or they will be covered by the presentation. Okay, we would like to start by giving an overview of the CDW. First, we want to just review the types of research that Corporate Data Warehouse is useful for. We knew that the Corporate Data Warehouse would be made available to researchers several years ago.
It generated a lot of excitement, because the CDW contains information that had not previously been available at the national level. Specifically relevant to my group, it included data from the Vitals Package of the CPRS, such as blood pressures, pain assessments, and heights and weights needed to calculate body mass index to assess obesity status.
This type of data is extremely useful for health services researchers, and it can be used to define patient cohorts, control for disease severity or co morbidity, and/or assess for patient outcomes. These are some of the example of the types of research questions, health services research questions, that might be addressed with data from the CDW. Such as a study that is interested in looking at quality of care, those patients in the BMI class that recommend preventive screening. Other examples are listed on the slide.
The Corporate Data Warehouse is a national repository comprised of data from several VHA clinical and administrative systems. So, it provides, in a sense, a nationwide view of all CPRS and Vista systems or data contained within those. It was originally created to provide data and tools to support management decisions, performance measurement, and research objectives. The CDW contains information dating back to fiscal year 1999. And current data is added nightly.
This means, that unlike the medical staff datasets, the CDW files are not static, and that files may not be precisely replicable if data is re-extracted at a later date. Or if the same data is re-extracted at a later date. Also, unlike some of the other national databases such as the medical staff outpatient/inpatient databases, which are organized into sets of files separated by fiscal year and date type, the CDW is a relational database.
Currently, the CDW includes data on consults, health factors, and vital signs data, as well as…it now contains the DSS national data extracts lab, pharmacy, data, and inpatient and outpatient encounter data.
Our seminar today will focus primarily on the CDW’s Vital’s data. I would like to start by clarifying that by vitals, we mean data originally derived from the Vital’s Package of the DHA’s electronic medical records. These include vital signs such as blood pressure, respiration, and temperature, anthropometric measures such as height, weight, and waist circumference, as well as other measures such as pain scores. In particular, we will be discussing our own experiences in using anthropometric data in blood pressure. Dr. Copeland is going to briefly review the history of the population of the vital signs within the CDW.
Dr. Laurel Copeland:This slide was actually prepared by Betsy Lancaster who is a VISN sixteen programmer, who has taken on a major role with the CDW. And, she did a lot of data validation work with the VISN sixteen data, when the CDW was getting started. Here you can see that the proportion of patients that are included from 2007 to 2009…keep in mind, this is a rather old type, is ultimately large compared to the all time numbers. That is to say, the inclusiveness has definitely gotten better over time. Polly?
Dr. Polly Noel:Okay. Now we would like to discuss the different type of data errors that might be present in VW Vitals data and other similar types of data. Before we go into details, it is helpful to first summarize the process by which anthropometric and blood pressure data is generated in the VA and transferred to the CDW. First, vitals data and anthropometric data are assessed by clinical staff, and hand-entered into a CPRS, which of course is the user interface for VistA, the VA’s integrated system of local information systems.
The data are stored in VistA and transmitted via HL7 messages to the VA’s Data Health repository, from which the CDW extracts, transforms, and loads selective data into its own structured query language data fields. In addition, many VISNs have, over time, created data marts or data warehouses that also support clinical administrative functions. So, the same data may exist at several different levels within local VistA systems, within VISN data warehouses, and also within the CDW. The CDW is updated daily, and any data values that might have been changed or written over are not maintained.
The CDW is a continuously, regularly updated warehouse holding no stable reference files, which is unlike the VA’s medical staff data sets. Furthermore, while out-of-range values are clean from medical staff datasets, errors and values that are out of range, will be found in the Corporate Data Warehouse, so it is important to be aware of this if you use CDW data for your research. Now are going to spend a bit of time reviewing the different types of errors that might exist in CDW data. These include measurement or reporting errors, data entry errors, and data transfer or extraction errors. Although many of the examples we are going to give will be based on anthropometric data, they can also be applied to blood pressure data.
Errors can arise during the initial measurement or reporting process in a variety of ways. Specific to anthropometric data, equipment may be incorrectly calibrated or patients may be inconsistently measured with or without shoes or clothing. The clinicians who take the measurements may round the values up or down.
We know from our clinical colleagues self-reported heights may be entered into CPRS, instead of heights that are measured by staff or clinicians. Sometimes, heights are not re-measured over time, and that it is part of reports or in data, which suggests that the last entered data is just simply carried forward without being formally re-measured.
In addition, data may be biased in that weight or height may be less likely to be measured for specific populations, such as morbidly obese patients, amputees, and inpatients - especially if appropriate equipment are not available. And this, we know, was a particular problem at our local facility a decade ago, in that the facility simply did not have scales that accommodated more obese patients.
So it was a problem in obtaining accurate weight assessments of these individuals. But, I believe that has changed over time. Other errors can arise during the data entry process. During data entry, clinical staff may inadvertently transpose numbers. For example, they might enter two hundred and eight-one pounds instead of two hundred and eighteen pounds. They may accidentally key a number adjacent to the target, such as typing in eight hundred and thirty-two pounds, instead five hundred and thirty-two pounds. Or they may add or delete numbers by mistake. For example, typing in one thousand, one hundred and sixty pounds instead of one hundred and sixty pounds.
We have also seen cases where staff may erroneously transform values. For example, entering in a height for someone who is five feet, six inches as fifty-six inches instead of sixty-six inches. Although the data fields in CPRS have filters or range checks to prevent the inadvertent entry of erroneous values, at least in some systems the ranges can be so extreme that they still leave substantial room for error.
When we checked last year at our facility, the range checks on CPRS in our system allowed for entry of weights between zero and fifteen hundred pounds, and heights between zero and one hundred inches. And if these extreme values are entered in locally, then they will be rolled up into the Corporate Data Warehouse.
Finally, data errors can also occur during the data transfer or extraction process. As I mentioned before, the same data can exist in several different sources at different levels within the VA, VistA systems, data warehouses, and the CDW. Variations can occur from the time data were originally entered in the CPRs, and when that same data were uploaded into CDW or a VISN data warehouse.
Variations among these sources can occur for a variety of reasons. For example, data can be lost in transmission. Different filters can result in the inclusion of slightly different subsets of data. So, it is possible for cases or records to be in one form and not in others. But, these cases appear to be rare. But, it is also importantto keep in mind that VistA systems and the CDW are constantly changing. CDW is refreshed nightly, but similar updates are not performed simultaneously by all VISN data warehouses. So there may be temporary differences between the CDW and individual VISN data warehouses.
Other types of problems can arise during transfer and extraction of data from one source to another. Numeric data can be redefined as character data or rounded as stored with a smaller number of decimal places. And specific to the CDW, anthropometric data are storedin both text and numeric form. The text still displays as-is from the VistA extraction, while the numerical field is generated by a very conservative transformational algorithm. So, it is important to know what you are requesting and what you get.
Errors can also arise due to miscommunication between programmers and members of the research team. A programmer may misinterpret a request or the research team may fail to specify exactly what they need. So, it is important to be clear and to check the interpretations out on both ends.
Now I would like to briefly review two complementary projects that evaluated the quality of the CDW when the data first became available a couple of years ago. Generally, it is important to assess the quality of any novel database or data. But, the assessment of the CDW data log was complicated by the massive volume of data it contained. And, the general lack of easy access to the over one hundred separate VistA systems maintained by the VA’s new regional network.
Since the VA no longer uses paper charts, VistA is considered to be the gold standard to check the quality of the VA’s national or regional data repository. Although both projects were able to access VistA at minimal levels, some of the primary quality assessments compared to CDW data to VISN data warehouse data, viewing the VISN warehouses as a proxy for VistA data.
Our first evaluation was conducted by the VHASupportServicesCenter or VSSC by Betsy Lancaster. The VSSC monitors key indicators of quality, quantity, and cost of VA patient care, as well as compliance with mandated clinical practice guidelines. Because the usefulness of its work relies on the quality of the various data sources it uses, the VSSC undertook an evaluation of CDW when it became available. So, in particular, it compared CDW fields, data fields to one another and to overall patient utilization over time to see if they were populated as expected. Examined the data for biologically implausible values, and compared data from ten facilities. Data from both the CDW and one VISN warehouse.
So this slide summarizes the results of their evaluation, as to whether or not the data fields were populated as expected, over time. As a reference point, between 2004 and 2007, the number of unique patients in the VA grew from 4.8 million to 5.2 million. The most highly populated data fields included blood pressure and the pain assessments. Specific to blood pressure, the number of records increased from about 3.6 million to 35.3 million. In comparison, weights were recorded far less frequently and increased only from 14.7 million to 15.5 million weights recorded each year between 2004 and 2007.
There were approximately forty percent fewer heights recorded than weights. There were even fewer recorded assessments of weight circumference. We believe this pattern reflected general, clinical practice during the time period. As research in non-VA and VA studies have also found that heights tend to be recorded less often than weights, in reaching clinical practice. Another interesting pattern that we observed is that of all the measures, only height decreased over time.