Designing a Programme Evaluation

Introduction

Welcome to Unit 3 of Monitoring and Evaluation in Health and Development Programmes. In this unit, we consider various evaluation study designs. You will receive guidance on a number of the practical processes in designing and implementing your evaluation. This includes, among others, deciding whether to use quantitative or qualitative evaluation, and considering the pros and cons of participatory evaluation. Much of this is not new to you, as you have covered these processes in Measuring Health and Disease II, and possibly in other modules too. In this instance, however, you are required to apply all these skills in the context of evaluation.

This unit offers greater flexibility in the order in which you work through the concepts. You could study the sessions in the order presented – this would enable you to gain an overview of quantitative evaluation design (as outlined in Session 1), followed by the sampling processes for that type of design; and only then follow with an overview of qualitative designs and their sampling processes (contained in Session 3). Alternatively, you might wish to quickly gain an overview of quantitative designs (Session1) and of qualitative designs (Session 3) in succession, in order to compare. You might only then look over Session 2 for a detailed exploration of sampling processes in quantitative designs. It’s your choice.

There are four study sessions in this unit:

Study Session 1: Designing a QuantitativeEvaluation.

Study Session 2: Sampling Procedures in Quantitative Designs.

Study Session 3: Designing a Qualitative Evaluation.

Study Session 4: Conducting Participatory Evaluation

Learning outcomes of Unit 3
By the end of this unit, you should be able to:
  • Consider the differences between quantitative and qualitative approaches.
  • Decide on a suitable approach to evaluation in your programme context.
  • Discuss the value and disadvantages of the participatory approach to evaluation.
  • Develop an appropriate evaluation design.
  • Discuss factors that may threaten the validity of your evaluation.
  • Choose an appropriate sampling procedure.

Unit 3 - Session 1

Designing a Quantitative Evaluation

Introduction

This session clarifies what we mean by design, and introduces selected

evaluation designs for undertaking a quantitative evaluation of a programme. In addition, we consider the issue of validity in relation to these designs.

When we speak of design in the evaluation context, we mean the process of specifying what or who will be studied, how the units of study will be selected, the timing of the study, and so on.

Contents

1 Learning outcomes of this session

2 Readings

3 Evaluation designs and issues of validity

4 Experimental designs

5 Non-experimental designs

6 Quasi-experimental designs

7 Session summary

Timing

This session contains three readings and three tasks. It should take you about two and a half hours to complete, depending on your familiarity with the topic.

1 LEARNING OUTCOMES OF THIS SESSION

By the end of this session you should be able to:
  • Develop an appropriate evaluation design.
  • Discuss factors that may threaten the validity of an evaluation.

2 READINGS

There are three readings for this session.

Author/s / Publication details
Mwadime, R. et al. / (1999). Monitoring and Evaluation of Nutrition and Nutrition-Related Programmes. A Training Manual for Programme Managers and Implementers. The Applied Nutrition Programme, University of NairobiSchool of Nutrition and Policy, TuftsUniversity: 1.23-1.24.
Weiss, C. H. / (1998). Ch 8 - Design of the Evaluation. In Evaluation.New Jersey: Prentice Hall: 180-188.
Key, M., Hudson, P. & Armstrong, J. / (1976). Evaluation Theory and Community Work.London: CPF Papers on Community Work and Youth Work: 19-27.

3 EVALUATION DESIGNS AND ISSUES OF VALIDITY

We have spoken of many aspects of planning programme evaluations, and you have, in fact, already made some of the design decisions for your evaluation.

On pages 1.23 – 1.24, Mwadime et al (1999) outline the steps within the evaluation process. While reading, locate the selection of an evaluation design in relation to other components of the programme evaluation process.

READING

Mwadime, R. et al. (1999). Monitoring and Evaluation of Nutrition and Nutrition-Related Programmes. A Training Manual for Programme Managers and Implementers. The Applied Nutrition Programme, University of Nairobi School of Nutrition and Policy, Tufts University: 1.23-1.24.

What is design?

Now, what precisely do we mean by design in the context of evaluation? An evaluation design is “… the plan or structure which an evaluator develops to guide the study. … [It] specifies which groups to study, how many units in a group, by what means units are selected, at what intervals they are studied, and the kinds of comparisons that are planned.” (Weiss, 1998: 330)

Selecting a design

Identifying appropriate goals and objectives is the start of the design process.

The programme manager has the responsibility of choosing a study design that will help to answer questions about the effectiveness of the programme. The design specifies which people or units will be studied, how they will be selected, the kinds of comparison which should be made and the timing of the investigation.

When you evaluate a programme, you are trying to answer a particular question, e.g. In an intervention for primary prevention of cardiovascular diseases, did a group of community health workers change their practices with regard to eating and taking exercise? To answer this question, we selected an appropriate evaluation design which enabled us to answer this question. Answering the question involved measuring a change. An appropriate evaluation design enables the evaluator to determine whether the programme caused the change in the beneficiaries that were targeted.

Issues of validity

Read the chapter by Weiss (1998), and read up to page 188, making sure that you fully understand the concepts validity and unit of analysis in this context.

READING

Weiss, C. H. (1998). Ch 8 - Design of the Evaluation. In Evaluation. New Jersey: Prentice Hall: 180-188.

When you select an evaluation design, the choice is based on the validity of the design, i.e. its capacity to reflect the way things actually are. As Weiss (1998) points out, there are two levels of validity:

Internal validity: This refers to the causal link between independent variables (which, for example, describe the participants or the features of the service they receive). Internal validity indicates whether the observed relationship between the programme inputs and observed outcome is causal.

External validity, or generalisability, is concerned with whether the findings of the programme can be generalised and apply to other programmes of a similar type. For example, if the programme is successful in reducing the prevalence of case fatality rates, can we generalise and say it applies to other programmes of similar type? How far can we generalise the findings? If we can generalise the findings, then the evaluation results have external validity.

Research designs differ in the degree to which they achieve internal and external validity; the evaluator should therefore consider factors that may affect or distort the results of the evaluation. These are referred to as threats to validity.

3.1 Threats to Validity

When conducting an evaluation, it is important to be aware of potential threats to the validity of the findings in relation to the design. There are a number of factors which should be borne in mind in this regard.

These factors are:

History: This means a change or event that occurred in the course of the programme implementation, and produces an effect that influences evaluation results.

Selection bias: This is where the selected units of study differ. For example, if the selection of the control group differs completely from the selection of the experimental group; such as using, and comparing groups from clinic and outreach facilities, or comparing the responses of Christian and Muslim participants on an issue where cultural factors influence their perceptions.

Testing effect: This is where the same pre-test is given as a post-test, which might have effects on the evaluation results.

Instrumentation bias: This is where there is a change in the way data is collected, for example, the way questions are asked or the way in which instruments are used. Using, for example, different field workers who are not properly trained to collect evaluation data may produce different results, as one interviewer might probe while the other interviewer might not.

Maturation: This is an effect of the passing of time. People actually mature and change in the course of a programme, for reasons unrelated to the programme, resulting in a change of responses (where comparisons are made on findings from evaluations conducted on the same issue at different times). For example, people may decide to make healthy choices about the food they eat, not because of their exposure to the programme but because they feel that they are now ready to make life changes.

Mortality / attrition: People drop out of a programme, others die, and some move away for reasons unrelated to the programme. The evaluation results may be affected if data, in an ongoing evaluation, is collected from the remaining participants only.

Regression artifacts: These are changes in the measurements caused by the fact that the groups have been selected for the experiment on the basis of their extreme scores on other tests. For example, most of the CHWs who participated in the programme on primary prevention of CVD were found to be overweight. This was found to be due to the fact that the community members selected them because they value women who are overweight.

The following reading extends and adds to this discussion on evaluation designs. Read up to page 25 at this stage.

READING

Key, M., Hudson, P. & Armstrong, J. (1976). Evaluation Theory and Community Work. London: CPF Papers on Community Work and Youth Work: 19-27.

In the next section, we will discuss the first of three kinds of designs: experimental designs, non-experimental designs and quasi-experimental designs.

4EXPERIMENTAL DESIGNS

The most-used design for quantitative evaluations is the true experimental design. The basis of such designs is that the “[i]ndividuals or (precincts, work teams, or other units) are randomly assigned to experimental and control groups” (Weiss, 1998: 215).

4.1Pretest/Post-test Control Group Design or True Experimental Design

Random assignment (RA) ensures that the two study groups are equal on

main baseline variables before the start of the intervention. Thus, any

difference observed between measurements - Q2 and Q4 in the example below - are basically due to X (the intervention or the test variable). This is one of the strongest designs which controls for, or seeks to avoid, threats to external validity.

Use this code to interpret the experimental design diagrams:

Q1 =measurements/test before intervention in the experimental/intervention group

Q2 =measurements/test after the implementation of the programme in the experimental/intervention group

Q3 =measurements/test only in the control group, no intervention

Q4 =Measurements/test without intervention (intervention was only applied to the experimental group

X =Intervention

The line indicates time

RA = Random assignment

Randomisation or random assignment to experimental and control groups is sometimes impossible from a practical standpoint; ethical or political issues might arise, particularly in the field of health, e.g. denying one group a preventive programme while providing it for another raises problems. Instead, what usually happens is that the intervention or experimental group is exposed to the treatment being studied, e.g. training on primary prevention of CVD, while the control group is provided with another type of intervention, e.g. an education campaign to increase awareness about causes and prevention of HIV/AIDS. In this way, the control group also benefits in a way.

EXAMPLE 1: TRUE EXPERIMENTAL DESIGN

For the purpose of evaluating an intervention programme on the primary prevention of CVD - introduced in the case study in Unit 1, two groups (community health workers from Sites B and C) were established, and randomly assigned to either the intervention group or the control group. Baseline data was gathered. The members of the two groups were measured as being, on average, equivalent in weight, height, knowledge and attitudes. Their Body Mass Index (BMI) was calculated (Q1 and Q3), and attitudes were checked through questions.

The intervention group (Q1) received training and the control group (Q3) did not. A year after completion of the intervention, both groups were measured again (Q2 and Q4). The group that received training had improved knowledge and attitudes about risk factors for CVD and a reduction in the mean BMI compared to the group that did not receive training. It could therefore be concluded that the training was responsible for the changes in knowledge, attitudes and in the mean BMI in the Intervention group of community health workers.

Having read about this design, consider whether there are any threats to (external) validity by doing Task 1.

TASK 1 – CONSIDER THE VALIDITY OF AN EXPERIMENTAL DESIGN

Before you come to a conclusion about the effectiveness of the training in reducing body weight, can you identify any other factors that may have lead to the reduction in the mean BMI in the intervention group?

FEEDBACK

These factors may include:

  • Maturation. As CHWs matured, they became aware of the health consequences of overweight and took actions for weight reduction.
  • Testing effects. The CHWs may have remembered the questions that were asked in the pre-test, and this could be mistaken for a change in knowledge and attitudes.
  • Instrumentation. The reduction in BMIs may be due to the use of a different scale, or to differences in the techniques used by the researcher (one balanced the scale carefully while the other did not).
  • Regression artefacts. Because this group of CHWs had extreme measurements of BMI, as they were all overweight, it was easy to notice a change which could not have been evident in a group with greater variety in weight measurements.
  • Dropout of the programme. Those who dropped out of the programme might not have changed their behaviour towards adoption of healthy lifestyle, therefore a reduction in BMI would not have been noticed.

Although this design is said to be the most reliable in terms of validity, you can see that these factors would still need to be taken into account when considering the external validity of the results. A variant of experimental design is presented in the next section.

4.2 Post-test-only Control Group Design

Since individuals are assumed to have been randomly assigned to the intervention and control groups, these groups are assumed to be similar before the programme intervention. This design allows the experimenter to measure the effect of the intervention on the intervention group by comparing it with the control group.

This, however, does not allow the researcher to measure the extent or magnitude of the change, since he or she does not have the baseline pre-test measurement.

EXAMPLE 2: POST-TEST-ONLY CONTROL GROUP DESIGN

An intervention to reduce the prevalence of smoking among youths is introduced in one school. At the end of the intervention, evaluation is done in the experimental school and in another school where no intervention was introduced. The control school is assumed to be similar in the prevalence of smoking as the other school, but no baseline study is conducted before the evaluation.

5NON-EXPERIMENTAL DESIGNS

The second type of design that can be used for quantitative evaluation is a non-experimental design (NED).

NEDs are used by many researchers, and are most appropriate for descriptive studies or small case studies of a particular situation. They are useful for diagnostic studies that try to determine why a problem (or success) exists. We will discuss three versions.

5.1 Post-test-only Design

Time

Experimental group / X / Q1

An intervention X has already taken place for a certain duration, after which

measurement Q1 is made. Since a control group is not available, or a pretest measurement was not made, there is no possibility of comparison. All that measurement Q1 can do is to provide descriptive information. The threats to validity of history, maturation, selection and mortality are not controlled for and this should be considered when examining the findings. Multivariate data analysis techniques can be used if comparative analysis is desired. This is a statistical analysis technique where many variables can be ranked by computer programme in terms of the significance of their contribution. You will learn about this later should you study advanced statistics.

EXAMPLE 3: POST-TEST-ONLY DESIGN (AN EXAMPLE OF A NED)

Assume that you are a new programme manager working in a health facility offering maternal and child health services. You want to assess the skills of health workers on growth monitoring and promotion activities. Since the skills of the community health workers were not assessed before their training, (i.e. no baseline survey was conducted), you can collect data in order to describe their current activities, but you will not be able to say whether the training was effective in improving their skills or in initiating a change in their competences.

5.2Pre-test-Post-test Design

In this instance, no control group is established, but available pretest measurements can be used to make a comparison and examine changes over time.

Time

Experimental group / Q1 / X / Q2

EXAMPLE 4: EVALUATION OF THE PROGRAMME TO IMPROVE THE HOSPITAL MANAGEMENT OF MALNOURISHED CHILDREN BY PARTICIPATORY ACTION RESEARCH