Using Randomised Control Trials to Evaluate Public Policy

Workshop Summary

Introduction

On Thursday 31 January 2013 the Department of Industry, Innovation, Science, Research and Tertiary Education (DIISRTE) and the Department of Department of Education, Employment and Workplace Relations (DEEWR) hosted a joint workshop (the workshop) on the use of randomised control trials (RCTs) in the evaluation of public policy.

Expert facilitation was provided by two leading academics with extensive experience in RCT policy evaluation: Professor Jeff Borland of the University of Melbourne; and Associate Professor Bruno Crépon of ENSAE-CREST, Paris.

This paper provides a summary of the key themes and outcomes of the workshop.

The summary is divided into two sections, reflecting the structure of the workshop program (Attachment A). The first section considers discussions arising from session one of the workshop, which introduced case for RCTs in policy evaluation; the second section outlines outcomes from session two, which considered the practicalities of undertaking an RCT evaluation.

Section 1 - Why randomised control trials?

The first session of the workshop began with presentations (Attachments Band C) from Professor Jeff Borland and Associate Professor Bruno Crépon. Presentations outlined the principles underlying RCT and described examples of RCT policy evaluations to generate and guide discussion. The following deliberations focused on a number of key themes, including:

The wider context of policy evaluation

The workshop facilitators opened the discussions by distinguishing between two related concepts of policy evaluation:

impact evaluation – which attempts to provide a measure of how outcomes are changed by participation in a program or initiative; and
cost-benefit evaluation – which is intended to provide an overall measure of the net benefit to society from a program or policy.

Impact evaluation is by its nature a narrower exercise than cost-benefit evaluation and forms only part of the information required to effectively measure the net benefit of a policy intervention. Impact evaluation does not typically attempt to value benefits, nor incorporate any information on program costs; both necessary elements of a cost-benefit analysis.

RCTs provide an alternative to traditional techniques of impact evaluation.

Counterfactuals - the evaluation problem

After discussing the wider context of policy evaluation the workshop considered the fundamental challenge that motivates the use of RCTs: the evaluation problem.

Evaluating the impact of a policy initiative ideally involves observing and comparing outcomes for the same individual as both program participant and non-participant. The evaluation problem arises because it is only ever possible to observe one of these two possible outcomes.

RCTs involve constructing an account of the counterfactual case that enables the impact of the intervention to be evaluated.

Impact evaluation using RCT

The RCT approach to impact evaluation involves the random assignment of a population between program participation and non-participation. The random assignment ensures that participation in the intervention is independent of other factors that may influence observed outcomes. For a sufficiently large sample, this approach addresses the evaluation problem by generating groups of participants and non-participants that, on average, share the same characteristics.

A comparison of the average outcome for participants with the average outcome for non-participants therefore provides a valid estimate of the impact of participation in the program or policy initiative.

A more detailed description of the RCT evaluation process is provided by Haynes etal. (2012) (Attachment D).

Alternatives to RCT

Before discussing comparative advantages of RCTs the facilitators provided a brief description of a number of alternative quasi experimental evaluation techniques. These included:

regression discontinuity – an approach that estimates the impact of a policy by comparing outcomes for program participants and non-participants who are respectively ‘just above’ and ‘just below’ the threshold that defines eligibility for participation;
matching methods – which use data on outcomes of non-participants in the period after program commencement to estimate non-participation outcomes for the group of participants. Comparisons between the two outcomes are conditioned on observable characteristics that affect both the outcome and whether individuals are assigned to the program; and
natural experiments – that involve evaluating outcomes in cases where providence generates a random allocation between treatment and non-treatment.

It was noted that while quasi experimental approaches provide a valid evaluation approach in certain circumstances, they also typically require stronger assumptions about the role of unobserved characteristics.

NB: the above descriptions are taken from Borland et al. (2005) (Attachment E), which provides an excellent treatment of experimental and quasi-experimental approaches to policy evaluation.

The comparative advantages of RCT

Session one of the workshop ended with a discussion on the advantages of RCTs relative to other evaluation techniques. The facilitators outlined a number of key benefits, including:

A clean and tractable procedure for addressing the issue of unobserved characteristics – unlike quasi-experimental approaches, RCTs require fewer assumptions to account for unobserved heterogeneity.
A targeted approach – RCTs allow the intervention to be evaluated exactly as it is intended to be implemented. In contrast, quasi-experimental evaluation techniques typically involve retrospective analyses of broadly similar interventions.
Flexibility – the prospective nature of RCTs provides a flexible approach to evaluation that allows researchers to:

-consider different aspects of the proposed policy

-explicitly test the hypothesised causal mechanisms

-evaluate possible indirect consequences and

-integrate evaluation into the policy development cycle at an early stage.

This flexibility was considered to be an important advantage over retrospectively applied evaluation techniques, which are often constrained by the data available for analysis.

Section 2 - Undertaking randomised control trials

The second session of the workshop opened with a discussion on the principles that support best practice application of RCTs. The facilitators noted the preparatory workshop papers that addressed these issues (Attachments D F) and endorsed the views presented in these papers. This led to further discussions on a number of practical issues that should be considered when conducting RCTs; these included:

The primacy of randomisation

Random assignment is the essential feature that enables RCT evaluation to deliver valid estimates of the impact of a policy or intervention. Features of implementation and administration that compromise the randomness of the allocation between treatment and non-treatment are likely to produce unreliable results. Given this, the facilitators emphasised the need to adopt a coordinated approach to design and implementation that preserved the integrity of the random assignment.

The facilitators also stressed the importance of ensuring that all stakeholders involved in the trial understood the critical role of randomisation in the evaluation process, particularly those engaged in delivering the program or initiative.

Research committees and functional structure

The facilitators also indicated that appropriate functional structures were critical to successful RCT evaluation, and advocated the use of a Research Committee to achieve these arrangements.

Ideally, the Research Committee provides a regular forum that enables the different groups involved in the evaluation to identify threats and methodological issues particular to the trial. This is likely to be particularly useful when evaluating policies with a cross disciplinary dimension, or in instances where an emerging issue requires the joint consideration of the designers, implementers, and analysts.

In discussing the importance of functional structures the facilitators also suggested that successful RCTs typically require the services of a dedicated expert researcher, usually employed on a full time basis.

Pilot phases

Pilot phases were also identified as a valuable tool in developing RCT evaluations. Pilot phases provide a number of important benefits, including the opportunity to:

identify factors particular to the policy or trial that may undermine the credibility of the evaluation;
consider unintended consequences of the policy or trial;
educate stakeholders on the need to preserve the integrity of the random assignments; and
test survey instruments.

More generally, pilot phases were considered to provide a low risk environment that allowed issues to be identified early and resolved cheaply.

Cultural challenges and the need for RCT champions

Cultural attitudes towards evaluation were also identified as an important factor in determining the success of an RCT evaluation. The facilitators suggested that employing RCT evaluation may require a cultural shift that provides the department with a greater licence to undertake prospective research.

It was noted that small pilots are often a successful way of introducing new policy ideas to the Minister and that a number of carefully selected evaluations of small interventions may provide an effective means of means of building the confidence in the RCT approach.

The workshop also considered the importance of working with partner organisations involved in the delivery of the initiative to build support for the evaluation. The facilitators suggested need for RCT champions: staff within these organisations who understand the value of RCTs and have the authority to influence how they are implemented.

Costs and Timeframes

Given the current political and fiscal environments many workshop members were keen to establish the costs and timeframes associated with RCT evaluation. The facilitators noted that timeframes and costs typically depend on the nature of the policy being evaluated.

As a guideline the facilitators suggested that a longer, more comprehensive RCT evaluation may extend over a period of three years at a cost of around$0.5 million.

However, the facilitators were also keen to stress that many evaluations require fewer resources and less time to complete. In addition, it was also suggested that longer evaluations tended to produce intermediate results that often provided valuable feedback that could be presented to the Minister.

Ethical considerations

The workshop noted potential objections to the use of RCTs on the grounds of ethical considerations but reached a consensus that suggested that the ethical dimensions of RCTs were often overstated.

The facilitators indicated that there were a number of creative alternatives to absolute denial of participation, including randomised allocation to different stages of a staggered policy implementation, or randomised allocation to groups of differing intervention intensity.

Discussions also noted that in cases where applications to a program are oversubscribed, random assignment provides an equitable means of allocating the provision of service.

Next steps

The second session closed with a discussion on the most appropriate next courses of action.

It was agreed that with the benefit of the day’s discussions members were now in a position to return to their respective departments and divisions to identify small scale interventions amenable to RCT evaluation. It was also agreed that this process should consider the appropriateness of such evaluations in the context of the electoral cycle.

List of Attachments

Attachment A: Workshop Agenda, Using Randomised Control Trials to Evaluate Public Policy

Attachment B: Workshop presentation by Professor Jeff Borland, University of Melbourne

Attachment C: Workshop presentation by Associate Professor Bruno Crépon, ENSAE-CREST, Paris

Attachment D: Haynes, L. et al. 2012, Test, Learn, Adapt: Developing Public Policy with Randomised Control Trials, UK Cabinet Office, London.

Attachment E: Borland, J. 2005, Experimental and Quasi-Experimental Methods of Microeconomic Program and Policy Evaluation, Working Paper No. 8/05, Melbourne Institute Working Paper Series, Melbourne University, Melbourne.

Attachment F: List, J 2011, ‘Why Economists Should Conduct Field Experiments and 14 Tips for Pulling One Off’, Journal of Economic Perspectives, Vol. 25, No. 3, pp. 3-16.