Collecting and Processing Data
Introduction
You have seen the kinds of data to collect that will be improve the management of a district health service, and that should, therefore, be collected as part of the district management health information system.
How do you collect this data? What is likely to go wrong when collecting data, and how could you avoid the traps that undermine the quality of data?
The study sessions in this unit look at various aspects of making sure that the data is collected and processed appropriately, minimising error in the process.
You will look at various data collection methods and tools, the problems commonly experienced with these methods and tools, and find ways to organize and manage your information system to minimize problems and ensure collection of only the highest quality data. We will also introduce procedures to check data for errors and correct those errors. Finally, we will go sequentially through the processing of data, to render raw data into useful, meaningful information.
As part of this module, we have included a fairly extensive dataset for practical application of the data processing skills you will learn in the course of this, and the following units. We introduce and outline this dataset in the third study session – Introducing the Case Study. The actual dataset is to be found in the Reader. We strongly urge you to take the time to do the practical exercises; it will be well worth it. Make contact with your lecturers at once, if you feel uncertain of your calculations.
There are four Study Sessions in this Unit:
Study Session 1:Data Collection.
Study Session 2:Ensuring Data Accuracy.
Study Session 3:Introducing the Case Study.
Study Session 4:Analysis – Turning Data into Information.
Learning Outcomes of Unit 2
By the end of Unit 2, you should be able to:- Relate the Information Cycle to the data collection process.
- Define the process of data collection.
- Outline the different types of data collection tools available and their uses.
- Describe common problems associated with different data collection tools.
- Be able to select the best data collection tools for specific purposes in a routine information system.
- Argue the necessity for quality data.
- Critically gauge the quality of raw data.
- Detect and correct common errors found in routine data.
- Implement strategies that will ensure that errors do not reoccur.
- Explain the difference between collation and analysis.
- Calculate indicators.
- Explain the uses and assumptions underlying proxy indicators.
- Do basic calculations to analyse data.
- Explain terminology and categories of information used in the case study.
- Discuss the main features of one district in the case study.
Unit 2 – Session 1
Data Collection
Introduction
This session aims to alert you to the things that could go wrong in the collection of data, and how to correct or avoid them. We will consider what is involved in data collection.
There are many opportunities for things to go wrong at any point in the data collection process. This is something to guard against, as the more things there are that do go wrong, the poorer the value of the information produced. This could lead to incorrect decisions, with serious consequences. As with health problems, the sooner they can be detected, or even avoided, the better the prognosis. Since data collection is the first executive activity in the operation of a routine information system, one should try to avoid problems occurring. This means that we should carefully consider the logistics of data collection during the design stage of the routine information system, and continuously during its evaluation. Effectively dealing with potential or actual problems could strengthen the successful operation of a routine information system significantly.
We will start by introducing framework known as the Information Cycle.
Session Contents
1Learning outcomes of this session
2Readings
3The Information Cycle
4Should all data in the routine information system be collected?
5Types and uses of data collection tools
6Common problems associated with data collection tools
7Data Collation
8Session summary
Timing of the session
This session contains two tasks and two readings. It could take you up to three hours to complete. A good point at which to take a break would be after section 4.
1LEARNING OUTCOMES OF THIS SESSION
By the end of this session, you should be able to:- Relate the Information Cycle to the data collection process.
- Define the process of data collection.
- Be familiar with the different types and uses of data collection tools available.
- Describe common problems associated with data collection tools.
- Be able to select the best data collection tools for specific purposes in a routine information system.
2READINGS
The readings for this session are listed below. It may be helpful to read the first one before you start the session. You are expected to read all of the readings provided.
Author/s / Publication detailsLippeveld, T, Sauerborn, R. & Bodart, C. / (2000). Design and Implementation of Health Information Systems. Geneva: WHO: 88 - 113.
Heywood, A & Rohde, J. / (2002). Using Information for Action. A Manual for Health Workers at Facility Level. Pretoria: Equity Project: 35 - 41.
3THE INFORMATION CYCLE
The Information Cycle should be used to guide the management of the routine health information system in all districts (Heywood and Rohde, 2002). The Information Cycle describes the process from generating to using information in a succinct manner that follows a logical sequence. It can be represented in a diagram as shown in Figure 8. Note the similarities of the process to the Management Planning Cycle – both provide logical and systematic guidance for managers. Both are cyclical, i.e., while showing the step-by-step sequence to follow, they also demonstrate that the process has no true beginning or end, but loops around with each stage joined endlessly to the next.
The Information Cycle poses four questions which, if used and answered rigorously and systematically will lead us through the processes involved in ensuring that information received from the routine health information system yields useable information that can be used by managers in the management planning cycle.
- What do we collect?
The short answer to this question is: the Minimum/Essential Dataset. In order to refresh your memory of the MDS/EDS, you are advised to consult sections 5 and 6 of Study Session 2 in Unit 1. We learnt that the data contained in the MDS/EDS must only be essential to know information, with a small amount of valuable to know information. Also, we saw that the MDS/EDS must be developed in a manner that will ensure that it is information-led. This means that we need to first determine our priorities in the situation analysis stage of the management planning cycle. For these priorities, we must state goals, specific objectives with indicators and then determine what data would have to be collected. (See box in section 6 of Unit 1, Session 2 -Criteria to Judge Data Elements for Inclusion in the MDS/EDS).
- What do we do with it?
Once data has been collected, and at the correct level, it has to be processed. This process starts with collation, which is a process of grouping together similar data, and adding it. This would lead to data summaries or aggregated data (also known as group data). The aggregated data is then checked for accuracy. Once these checks have been applied and errors are corrected, the data is analysed. At this point, data becomes information– and the difference is that between discrete and unrelated pieces of information, and information organised in meaningful patterns that show a clearer picture of the situation being examined. This means that the data has been placed in a standardised context and comparisons are now possible. All these processes will be explained in detail later.
Once you have completed the analysis stage, you would have produced indicators. As mentioned in the management planning cycle, the indicators are used to gauge achievement of specific objectives. This is useful to bear in mind when one starts with the next stage of the Information Cycle.
- How do we present it?
It goes without saying that information, once collected and organised, needs to be made available to potential users. There are numerous presentation techniques. These are described in detail in Unit 3, Session 3. The manner of presentation can help in provoking action that will improve service delivery. For example, explicitly showing the performance of indicators in relation to specific objectives determined by the managers during strategic planning focuses attention on the important things. The presentation could either be in written or verbal format, or be a combination of both. However, information must be presented regularly. It is advisable to stick to a specific technique determined initially for specific pieces of information. The following example shows a line graph that is related to a specific objective:
Verbal feedback at district management team meetings is crucial. This is where continuous monitoring and evaluation occurs. As a result, presentation is a joint effort between the information unit of a district office and the relevant managers. Managers should specify their information requirements and presentation formats to the information unit. The information unit, in turn, should ensure that the information is timeous and of acceptable quality. Of course, interpretation of the reports is entirely the responsibility of managers.
- How do we use it?
Managers who receive the information from the information unit in their district must accept the responsibility of interpreting the information and devising appropriate action to deal with problems identified by the information. Managers should then use the information to adapt their operational plans appropriately, according to the progress depicted by the information. This implies a dynamic aspect to operational plans, in that they need to be responsive to information from, and about, the operational environment. The relationship between operational plans, indicator values and specific objectives is as follows:
- The strategic plan states what we want to achieve, in numerical terms (specific objectives).
- Presented information is related to the specific objectives and states whether we have achieved, or will achieve, them.
- Action- or operational plans are adapted and reviewed to ensure that planned activities are geared towards the achievement of the specific objectives and presented to the district management team as a mechanism of ensuring accountability.
True use of information is the overall purpose of any routine health information system. It has been said that the effectiveness of an information system is gauged by the actions that it provokes. The ability to demonstrate improvement in indicator values over time is the only proof of information use.
4OTHER SOURCES OF DATA
Should the routine information system be the only source of data? The short answer to this question is No. In certain situations, we can also use information gathered by people outside the system (usually as collated data), e.g. census data. It often happens that other sectors or organisations have the same information needs as the district health services.
When such a situation exists, one should make sure that the data collected by the other parties outside the district health services:
- Is accurate enough for our purposes
It is better to have no data than having inaccurate data. Having inaccurate data could be very costly as it can lead to incorrect management decisions.
- Is in a useful format for our needs
Data would often be used together with other pieces of data, for comparison, during analysis. If the format provided by the outside source does not allow us to do the necessary analysis, then we may have to collect it ourselves so that it can be in an acceptable format. Consider the following scenario: The department/ministry of education uses district boundaries that are different to those used by the department/ministry of health. The department of education collects useful data on dental needs of scholars, but aggregates the data to ‘their’ district level. This data is made available as a total number of scholars requiring dental services in the district. It would then be difficult for the district health services to use the information if, say, a single district of the department/ministry of education encapsulates three districts from the department/ministry of health. Unless the data can be obtained per school, the needs in a particular health district (with high-risk areas identified) would not be possible. For this reason, data gathered by outside agencies must be in a useful format for the district health services.
- Is available to us
If an outside agency is not prepared to share the data they possess, then it is effectively useless to the district health services. In certain instances, data is also sold by outside agencies. If the district health services do not find the price to be cost-effective, then the data should rather be collected by the district health services themselves.
- Is aggregated down to the lowest levels required by the DHS
This is crucial for the identification of high-risk areas where intervention may have to be prioritised. For example, if we get figures for the entire district, we would not be in a position to detect areas where special care would have to be taken to meet the needs of the communities we serve.
- Is timely and would be available at timeframes required by the DHS
One of the most difficult problems faced by district health services is that when information is collected by other agencies, the data may not available for long periods so that by the time it reaches us, the data is stale and does not allow any useful prospective planning or projections by the district health services. This limits the value of the data to the district health services. It is also important to ensure that the data will be continuously available for our purposes specified in the strategic plans of the district.
- Is valid and collected in a standardised manner
Different data collectors could have different interpretations of what constitutes accurate data. If the data elements are not clearly defined and explained to all data collectors involved, the data might then be collected in a manner that is not standardised. Standardisation of the criteria used in defining data elements is important to ensure that the collected data is valid.
Gathering data from other sources is an advisable strategy to keep the amount of data to be collected by health care providers to a minimum. If the load on the health care providers (usually at health facility level) is too great, the quality of the data will suffer, which renders it less useful. However, we must be mindful of the acceptability of data gathered from sources where the collection is outside our control. Therefore, it is useful to investigate the aspects described in this section before a decision is taken to gather data from other sources.
5TYPES AND USES OF DATA COLLECTION TOOLS
Before you start with this section, you are advised to consult section 7 of Session 2 in Unit 1, to obtain a listing of all the data collection tools available to us. You should also study the following reading and perform the task provided below.
READING
Heywood, A. & Rohde, J. (2002). Using Information for Action. A Manual For Health Workers at Facility Level. Pretoria: Equity Project: 34 - 41.TASK 1 - IDENTIFY POTENTIAL PROBLEMS WITH SPECIFIC DATA COLLECTION TOOLS
As you read the text written by Heywood and Rohde (2002), try to draw out the potential problems related to each of the data collection tools they describe.FEEDBACK
Some of the problems that are often found with the use of the different tools described in the text are as follows:
- Births and Deaths
Undercounting is frequently noted because the parents of newborns are required to return the forms to the Department of Home Affairs. This requires time to go to busy offices and may involve the spending of money on taxi fees. As a result, undercounting is almost certain to occur, because returning the form is too inconvenient for the parents.
Death forms are usually completed with very vague diagnoses indicated. This frequently causes the largest group of deaths to be classified as unspecified/unknown. Having this category as the leading cause of death makes intervention very difficult since you could not know how to intervene. Remember, one needs to know what to change before change can be planned and introduced. Also note that a relative of the deceased must take the form to the Department of Home Affairs which leads to the same problems as described for the notification of births, i.e. undercounting due to time- and cost implications.
The analysis of each perinatal death could be construed as a ‘witch-hunt’ and motivate staff to intentionally provide inaccurate data. This is problematic as information, in general, could be seen as a stick with which staff are beaten. The motivational potential of information could be entirely lost and make the inculcation of information use difficult by lower categories of staff.
- Patient Record Cards
Patient record cards that are kept at health facilities are usually poorly filed and often get lost. Also, poor filing systems make the retrieval of patient cards time consuming and stifles the flow of patients in busy health facilities. As a result, new cards are often opened for patients who already have a card; because it just could not be found, or could not be found quickly enough. This does not lead to double counting, but defeats the purpose of the patient record card, which is to maintain a history for the patient. Similar problems could also arise when patients use different health facilities during the course of their lives.
A further problem that results from these cards is incomplete or illegible entries that make data contained in the cards useless, even for medico-legal purposes. It is also worth mentioning that less than five medico-legal cases required patient cards in South Africa over the past ten years. This is in the context of a public health service that has dealt with much more than 800 million patient visits over the past 10 years!