Some Reflections on the Production and Usage of Cross-National Governance Data
Jan Teorell
Department of Political Science, Lund University
The Quality of Government Institute, University of Gothenburg
October 20, 2010
Memorandum prepared for the Conference on “Democracy Audits and Governmental Indicators,” Goldman School of Public Policy, University of California, Berkeley,
October 30-31, 2009.
I have in this short memorandum assembled five disparate but still inter-related reflections on the current (and, potentially, future) status of cross-country data on “governance” (broadly understood). These reflections stem in part from my experience as a user of such data (mostly, but not exclusively, in preparing a book manuscript on causes of democratization). But they are also drawn from my experience as a producer of novel governance data. Last, but not least, my reflections are rooted in my experience from putting together the so-called Quality of Government Dataset (awarded with the Lijphart, Przeworski, Verba Prize for Best Dataset by the APSA Comparative Politics Section in 2009, available at: This dataset is a freely available compilation of extant freely available datasets on governance or “quality of government” (among other things), both in a cross-sectional and a time-series cross-sectional format (from 1946 onward). Putting together this data resembles, as it were, a blend of the user/producer perspective, since none of this data taken by itself is new to the field; however, the compilation of the data is.
My first reflection concerns how critically important it is that cross-national governance data cover a (preferably) long historical time span. This in part stems from the nature of the phenomena I have been most interested in studying, such as democracy and corruption. Democracy, we know by now, is a sticky phenomenon. That is, countries experiencing democratic rule this year are very likely to continue doing so in the next. Regime change does occur, however, sometimes by a sudden swing of the pendulum in a more democratic direction; sometimes by an unexpected reversal. Other (but, I believe, fewer) countries experience more gradual change, inching up (or down) their level of democracy incrementally. Since the nature and timing of these fairly rare changes in democracy level is what we want to explain (or study the consequences of), purely cross-sectional data on democracy have become close to useless for other than simple descriptive purposes. Basically the same argument applies for corruption. Although we do not know for sure, due to the very lack of such historical data, anecdotal/qualitative evidence suggests that corruption is even stickier (i.e., more strongly auto-correlated) than democracy. From the little we know, curbing corruption has only been successful in a handful of cases, and in most instances seem to have been a slow reform process taking at least a decade, perhaps up to several decades. There are few known examples of a shocks or pendulum-like moves in corruption levels. I would thus argue that there is even more need for long time-series data on corruption levels, although (paradoxically) the extant data sources are by and large purely cross-sectional by nature.
I would press the deep need for long-term historical data on corruption even to the point of sacrificing cross-national measurement equivalence. That is, I would prefer having inter-temporal data on corruption levels over, say, the last 200 years for a handful of countries where we are pretty sure substantial change has occurred – even if these data were produced in a way that excluded cross-country comparisons. This is my second argument. We have become accustomed to thinking about cross-national measurement equivalence first, and about acquiring time-series data on these measures only second. I think the field of governance indicators might need to reverse that order of priority. I am for example at the moment involved in a project collecting data on electoral fraud in the history of Sweden, from 1719 till around 1911, through a hitherto largely unexplored source material: petitions against alleged fraud and misconduct filed with the authorities. Similarly long time-series of data on electoral fraud have in other projects been collected from Costa Rica, Britain and Germany, and could very well be collected for the US (through the series of contested election cases in the House) and France. I believe each of these series in and of themselves tells a large part of the story of how election fraud has waxed and waned over the course of history in each country. Both the particular circumstances favoring fraud, and the specific conditions that helped abolish it, could thus (I believe) be tracked on a country-by-country basis. I would however be very skeptical towards pooling these time-series together and start treating them as comparable measures of the presence of fraud. The institutional circumstances under which petitions could be filed, the nature of judicial or other bodies receiving and deciding upon them, and the laws governing electoral conduct, simply vary to much even among this small set of countries for any meaningful comparisons of, for example, the extent of fraud across countries. I do however believe that meaningful comparisons could be attempted across time (and, at the sub-national level, space) within each single country, which brings me back to the point: inter-temporal measurement equivalence are sometimes more important than cross-sectional equivalence. The same argument possibly applies for other governance dimensions. Corruption could for example be studied historically through legal cases of public officials being prosecuted and caught. Although by no means unproblematic (the potentially biggest challenge being how to separate the extent of corruption from the effectiveness of the legal system), this could allow meaningful long-term time-series data on historical trajectories of corruption levels hitherto unexplored within the social sciences. I would again doubt however that meaningful cross-national comparisons could be made on that source of data.
With the risk of contradicting myself, I now turn to source of governance data that does not allow for any long time-series to be extended back in time, namely expert polls. Although this is the standard data source on governance indicators today, a recent experiment with collecting such data has convinced me that it has still some largely untapped potential. This is my third argument. I am probably not alone in thinking that the main problem with extant expert poll data on governance such as the Worldwide Governance Indicators (WGI), apart from its almost exclusive reliance on perceptions rather than experience, is its poor conceptual foundations. In the numerous and mostly very convincing papers written by Kaufmann and co-authors to defend their approach, there are two queries I have never seen addressed satisfactorily: what the theoretical definitions are of underlying constructs such as “voice and accountability”, “rule of law” and “government effectiveness”; and some empirical evidence showing that the indicators sorted under each of these heading really tap into that very construct and not the others. This is not to say that these data are useless. On the contrary, they are extremely useful (and I use them a lot). I just believe they are not well anchored in theory, and I still haven’t seen any convincing evidence that the very six dimensions of the WGI are what would come out of a dimensional analysis performed on the disaggregated indicators.
In part as a response to this, we decided at the Quality of Government Institute to craft an expert survey of or own. The purpose was to produce novel and theory-driven data on the structure and performance of the public administration in a cross-section of countries. In order to obtain a sample of experts we drew up a list of persons registered with four international networks for public administration scholars (NISPACEE, EGPA, EIPA, and SOG), complemented with searches on the Internet, personal contacts, the list of experts recruited from a pilot survey, and a small snowballing component. All in all, this resulted in a sample of 1361 persons, whom we contacted by email between September 2008 and May 2009, including a clickable link inside the email leading to the web-based questionnaire in English. 529 experts (39 percent) from 58 countries responded, with an average response time of about 15 minutes. (Not very surprisingly considering the sampling frame, the selection of countries was heavily geared towards Western European and Post-Communist countries.) This collection of data can be used for many purposes (and will soon be made publically available on the web), and is already being used in several papers at the Institute. I will here mention one in order to illustrate the critical importance of conceptualization: in a paper presented at APSA in Toronto (posted on the website for this conference and at ssrn.com), I employed an index of government impartiality assembled from five indicators from the expert poll specifically designed to tap into this very concept. By analogy with political equality as the basic norm underlying the input side of the political system, impartiality – defined as “when implementing laws and policies, government officials shall not take into consideration anything about the citizen/case that is not beforehand stipulated in the policy or the law” – is by Rothstein ad Teorell 2008 (Governance 21(2): 165-190) argued to be the norm on the output side that is most compatible with the normative principle of treating everyone with equal concern and respect. What I then did in the APSA paper was to compare the performance of this impartiality index compared to the WGI in predicting preferred societal outcomes such as income growth, institutional and interpersonal trust and subjective well-being. The index assembled from our expert poll fared surprisingly well in this comparison, with few exceptions performing as good or even better than the WGI measures of rule of law, government effectiveness and control of corruption. With clearer conceptual underpinnings, expert polls are thus still a potentially very valuable source of cross-country (but not historical) data on governance. And, quite importantly, it can be collected at a low cost. Since this was a web-based survey, the only costs collecting the data stem from constructing the questionnaire, collecting the sample of experts, and programming the questionnaire into the web survey platform. A rough estimate is that this cost us at most about 300 man-hours in all.
My two final arguments concern the need for new standards to be developed that would ease the future both production and usage of cross-national governance data. The first is a more established convention for how to cite the data sources one is using. My impression is that too little recognition is today given to those laboring with original data collection efforts in this field. Sometimes people simply download your data without ever citing it; sometimes they only cite the compilation of data but not the original source. And even for all those honest people who want to cite the data source, there are oftentimes no clearly established standards for how to do so (the days when all data sources were stored with the ICPSR are, as we all know, gone). It is of course to begin with up to the producer to clearly state how he or she would like the data to be cited. But even here some firmer conventions would help. Some prefer to demand that a published paper on the data should be cited (presumably in order to boost citation rankings?), but not all data produced end up in such a paper, and other conventions exist. Of real help here would thus be if a large organization, such as APSA, drafted a citation convention for data sources that the main journals then could follow.
My final point is more technical, but nonetheless troublesome for both data users and producers. It concerns the lack of a standard for how to treat cases of country mergers and splits in cross-national time-series datasets. How should for example West Germany before and after the merger with East Germany in October 1990 be organized into the data? Should there be one case for West Germany and one for united Germany, or should they be treated as the same case? If the decision is made to separate them into two cases, and the data is annual (which is usually the case), in what year should the case of united Germany start and the case of West Germany end – in the year of the merger or in the year after? The same questions emerge for the last decades in deciding how to handle Yemen in 1990 and Vietnam in 1975-76 (where the question of the timing of the merger is even more complicated). And then what about countries splitting up? Should for example Ethiopia be treated as the same case before and after the secession of Eritrea in 1993? Or what about Russia, is the USSR a predecessor of that case or a different one? If different, in what year should the USSR end and Russia start? Or, if you have assembled data on each post-Soviet country and decided to extend them backwards in time, should you aggregate these data for the entire Soviet Union before 1991 or continue to provide them on a post-Soviet “country-by-country” basis?
When working with pooled data, my experience is that these decisions are of very little consequence for the results obtained (a handful of observations or lost or gained, but that is about all). When data from different sources are put together, however, these decisions have enormous consequences, since there are about as many solutions to these problems being applied as there are available datasets. Very few data producers seem to have reflected on these problems, and even fewer have tried to set up rules for solving them, implying that solutions are opted for on a case-by-case basis (the number of combinations thus multiplying as the number of country merges/splits in each dataset increase). This concern might at first glance appear abstruse to the outside observer. I would argue however that in putting together the Quality of Government Dataset, this one problem was the most difficult to decide on how to handle, and for that matter the most time-consuming of all to actually handle – hours and hours have been spent just checking and rechecking the data for errors on country mergers and splits in this dataset. This is thus another area where I think the scientific community could benefit greatly from some agreed upon standards and recommendations that data producers in the future could follow. And this applies even more strongly, coming back to my first point, for those datasets covering historical times (where the number of country/mergers and splits are even more numerous).