Changing bureaucrats and politicians: the transformative potential of an experimental public administration*

Peter John

Department of Political Science

University College London

The Rubin Building,
29/31 Tavistock Square,
London, UK
WC1H 9QU.

Tel: +44 (0)20 7679 4999
Fax: +44 (0)20 7679 4969

*Versions of this paper have been presented at the University of Southampton, C2G2 seminar, 30 October 2013; to the departmental seminar, University of Exeter, 4 December 2013; to the UK Political Studies Association Annual Conference, in Manchester, 14-16 April 2014; and to the workshop, to the workshop “Causal effects in political science: The promise and pitfalls of experimental methods” held at the Nordic Political Science Association Conference in Gothenburg, 12-15 August 2014. I thank participants at these events for their comments and reactions.

Abstract

Randomised controlled trials have a number of advantages for policy-makers in testing out interventions and improving standard administrative procedures. The use of a more robust form of evidence can challenge old ways of doing business and can lead to the redesign of existing administrative systems. Moreover, the greater use of experiments can set off chain reactions within bureaucracies, encouraging innovation to become a more common practice, whereby bureaucrats and their political principals start to learn through testing and adapting what they do, creating a more experimental public administration. Policy experiments have been used more frequently since the 1960s, but the recent interest in theories of behaviour change have given them a new opportunity to test out nudges, which can help improve relationships between citizens and the public sector. However, implementing behavioural interventions with RCTs involves challenges as well as successes, and the paper gives examples of both from the UK at local and national levels.

One of the constant themes of public administration—both in theory and practice—is the tension between stability and innovation (see Kelman 2005). For the former, the terms path dependence, standard operating procedures, and incrementalism are often referred to; for the latter, punctuations and lurches are frequently used concepts. Much of the experience of public administration is a consequence of the counter-pressures between these two tendencies, with some bureaucracies engaging in reform, others embarking on periods of change, and then returning back to more conventional pathways. Such a tension is unlikely to be resolved, and exploring it will engage researchers for many years to come just as it has done in the past. But recent developments in the social sciences—greater use of randomised controlled trials (RCTs) and more generalisable findings about individual behaviour—have the capacity to tip the balance toward more towards reform and innovation. The two interlocking developments of RCTs and the science of behaviour change—sometimes called nudging—have the potential to open up public administration and to set off cycles of positive feedback based on the sorting of evidence and using it to design new policies and administrative procedures.

This paper makes an argument about the salience of the diffusion of new practices of acquiring knowledge within bureaucracies and the knock-on effects of their extended use, in particular in the way in which politicians and bureaucrats make and implement policies. It suggests that behaviourally-informed policies tested by randomised evaluations can empower those who are prepared to risk refutations or confirmations of their ideas, and can help create a more knowledge-hungry public bureaucracy, which may be called an experimental public administration, committed to continual testing and improvement, and less concerned with self-justification and the preservation of existing practices. With its focus on understanding individual behaviour, bureaucrats and politicians can take a more citizen-focused approach to public policies as they are encouraged to think outside their own assumptions.

These claims amount to an ideational argument: new practices and intellectual ideas have the power to shape the behaviour of bureaucrats and politicians. With its strong claims for causal inference (Gerber and Green 2012), the randomised research design has become an accepted and influential within the academic disciplines of economics, political science and social policy, which is also linked to the rediscovery of experimental foundations of statistical analysis and the growing interest in natural and quasi experiments (Dunning 2012). The same can be said for the influence of behavioural approaches that have spread across disciplines beyond economics. As a result of these intellectual confluences, there has been a re-engagement of social science with more practical applications of theory, especially since it is not possible to do field trials without some manipulation of the world and this requires the assistance of policy-makers. Given the influences on the modern policy process from experts, the media and international organisations, where ideas about behaviour change and the utility of experiments have diffused, it is no surprise that they should be readily taken up by public agencies, especially given how many social scientists are employed by governments. Some even talk of the emergence of the ‘psychological state’ using these new techniques as a more precise form of social control (Jones et al 2013).

The more difficult question is how these ideas intersect with the interests of politicians and bureaucrats with pressures to claim credit, to use evidence selectively and to progress careers, which are classic themes in the study of the utilisation of evidence by policy-makers (e.g. Weiss 1972, chapter 6). There is also resistance to using randomised evaluations within bureaucracies. In other words, in spite of the improvements, there is no nirvana on the horizon, as politics intervenes and remains an integral part of policy-making and evaluation.

To make these arguments the paper is divided into the following sections. The first is a review of experimental methods and their advance in the study of public policy and in political science; the second is about the more frequent use of trials in the policy world, the emergence of policy experiments; the third is a chart through the behavioural revolution in the social sciences, the adoption of behavioural public policies by government and their direct link to randomised evaluations; the fourth section sets out the practical dimensions of the process just described, using examples and setting out limitations; and then the conclusion seeks to summarise the developments overall and assess the balance of forces currently in play as well as draw implications for understandings of public bureaucracies.

Experimental methods

An experiment occurs when human beings manipulate the world in order to understand causal relationships. It is a common means by which scientists acquire knowledge. The secret in science is to have precise control of the different elements to the experiment, and then to measure carefully what happens during and after the intervention. It is this precision of knowledge based on careful measurement that is the big attraction of experiments, and explains why other disciplines have sought to emulate the experimental method. Experiments unlock knowledge in ways that cannot always be achieved by observation alone.

In social science the researcher looks for or creates some random variation and ensures the differences in outcomes across the subject pool or sample are only associated with the variable or variables of interest. By observing the difference in outcomes between randomly allocated groups of subjects it is possible to make an inference that it has only been caused by the intervention or from the variable of interest and nothing else—other than random variation. In a natural experiment, random allocation can happen as a result of accidental differences between populations or areas, which have been created by government or by nature, such as the impact of boundaries or cut-offs for eligibility in a policy programme. More often researchers and policy-makers need to create the random variation themselves in what is called a randomised controlled trial, where individuals or communities are randomly allocated in two or more groups that get an intervention which is compared to a control or a no intervention group. These trials are often called field experiments, which may be contrasted with laboratory experiments done in controlled settings or survey experiments carried out with population samples and standard interviewing techniques. From this simple design it is possible to make a causal inference provided all the conditions for the trial have been satisfied. Even though trials are hard to do, it is the security of the knowledge they produce, especially when done many times and in many places, which is the source of their attraction. Most other research designs in the social sciences do not get close to this strength of inference and usually have to rely on complex manipulation of their data to approximate to it or admit that the relationships they find are not causal. In fact, the goal of the use of most statistics in the social sciences is to create a research design that resembles as much as possible an experiment (Angrist and Pischke, 2008). Little wonder that social scientists should wish to do the actual experiments themselves.

The RCT—sometimes called artificial randomisation—has been around for a long time now. As Hacking (1988) disuses, it goes back even before R. A. Fisher published The Design of Experiments (Fisher 1935), based on his famous agricultural experiments. The origins are in psychological research done in the 1850s, which was taken forward by Charles Sanders Pierce and his student Joseph Jastrow. In the twentieth century, such experiments have increasingly held sway. In the medical and health worlds, they became the norm for tests of new medicines and procedures; then they gradually expanded to cover social and other policy interventions (see Torgerson and Torgerson 2008).

In spite of their advantages the use of experiments by social scientists and those in government has been surprisingly infrequent. In spite of an interest in the interwar period, experiments were the exception rather than the rule. Other techniques were preferred, such as mass surveys of public opinion which became cheaper and more efficient to run in the post-1945 period and gave social scientists and government agencies a range of opportunities for answering research questions that were easier than direct intervention as survey companies carry out the surveys using standardised methods of sampling and interviewing, then supply the data (Green and Gerber 2003). All the researchers or governments had to do was to ensure the costs of the survey were met. As computing improved through larger memory and customised software, it because easier—even routine—to analyse these surveys. The collection of more reliable data by government agencies encouraged observational studies, which ensured the use of ordinary least squares and then other related estimators as the familiar method for use by social scientists and government. Many social scientists tended to follow the methods of economics and quantitative sociology. They used regression and related techniques to try to understand a range of phenomena. Once these methods become embedded it is easy to see how they were sustained over time, and why other techniques did not find favour. In any case these methods appeared to offer answers to the big questions in social science or evaluation, and it was only later that social scientists came to realise that some of these questions are in fact hard to answer without improved methods. Moreover, surveys became increasingly expensive and suffered from declining response rates.

Field experiments in public policy and political science

A policy experiment is a special kind of RCT or field trial carried out by policy-makers, where the public authority carries out the intervention itself by varying the tools of government under its control, and it contrasts with other field experiments done by researchers using their own resources, though these often involve the collaboration of community groups and some kind of partnership with the relevant public authority. Although policy experiments can be done in house with government or agency researchers carrying out are often done in collaborations with researchers and academics, either in an official evaluation or in another form of partnership. The main aim is to test the impact of a policy or existing procedure, or to trial a new way of doing business in the public authority. In that sense, they are practical exercises, but can at the same time answer more general questions in social science theory.

There was an initial interest by policy-makers and academics in policy experiments in the 1920s, which declined and then revived again. One famous example shows the challenges of doing experiments with close collaboration with policy-makers. The Lanarkshire School milk experiment in 1930 tested whether providing free school milk would improve the health of children, reported and criticised by Student (1931). In this experiment, 20,000 children in 67 schools participated, with half not getting free milk. Assignments were done alphabetically, but teachers adjusted then and swapped the allocations causing the control students to do better. This trial shows the challenges of working with policy-makers, especially at the delivery end, and the difficulty of reconciling standard administrative procedures and discretion to the demands of randomised evaluation. In political science the 1920s was also the time when field experiments took place, undertaken by Harold F. Gosnell (1926) on stimulating voter registration (see also Gosnell 1927). But attention faded since that time (except Eldersveld 1956).

The period from the 1930s is that of the supremacy of the medical trial whereby medicines and procedures are tested, becoming the official standard for almost all medical treatments and procedures, and in effect shaping understandings and uses of the RCT. There has been over 350,000 trials done up to 2002 according to one estimate (cited by Bloom 2005: 12), and medicine and medical practice remains at the heart of the systematic reviewing as in the Cochrane reviews. There are about 25,000 trials published each year with the numbers doubling every ten years (Henegan 2010).

Gradually the use of experiments has expanded out from medicine. Influential was the work of Campbell on the statistical properties of experiments and quasi-experiments (Campbell and Stanley 1963), and their use in social settings (Campbell 1957). An important period was the expansion of welfare policies in the 1960s and the demands for stronger evaluations (see Greenberg et al 2003). An early, prominent example was the use of experiments to test for the effect of negative income tax (Munnell 1987), which originated out of the Office of Economic Opportunity and were done in New Jersey. These suffered from administrative and organisation problems, a familiar experience with policy experiments. Another early example is with housing assistance in the 1970s, which tested for the impact of direct financial aid (see Orr 1999). In the 1980s, there has been an expansion of the range of social programme evaluated with randomised allocation, in particular welfare to work policies (Riccio and Bloom 2001), job training (Bloom et al 1997, Bloom et al 1993) and reemployment bonuses (Robins and Spiegelman 2001). Another early productive area is crime with evaluations being done to test hot spots policing (Sherman et al 1995, Sherman and Weisburd 1995, Weisburd and Green 1995) and peer mentoring (Petrosino et al 2002). Education is another area of expansion (e.g. education subsides: see Angrist et al 2002, 2006). The growing interest in trials caused scholars to call for more social experimentation as a way of life for government agencies (Campbell 1969, Greenberg et al 2003), though such periodic advances are also met with the realisation of the high likelihood of failure of implementation with RCTs (see Berk et al 1985).