“Internal and External validity: implicit assumptions about causal inference from experiments”

Abstract:

Few terms are employed as often as “internal” and “external validity” in the commentary over Social Science experiments. These notions are used both by experimentalists and by commentators of experiments, such as social science methodologists and philosophers of science. The terms were first coined by Donald T. Campbell in 1957, in the context of discussions regarding the applied methodology of field try-outs of social ameliorative programs, that he termed quasi-experiments.

Probably, the most commonly quoted definition of the terms is to be found in the classical Cook and Campbell (1979, p. 37), where internal validity “refers to the approximate validity with which we infer that a relationship between two variables is causal or that the absence of a relationship implies the absence of cause”, and external validity “refers to the approximate validity with which we can infer that the presumed causal relationship can be generalized to and across alternate measures of the cause and effect and across different types of persons, settings, and times”.

Though not without variation in the definitional details, today most social scientists, methodologists, and philosophers of social science interested in the relationship between causality and experimentation are capable of producing from the top of their heads a workable definition for the dyad composed of internal and external validity. Thus, internal validity immediately brings to mind the distinction between genuine and spurious causes and the idea of causal isolation associated to experiments. External validity, in turn, makes us think of the generalizability of our results and of our causal inferences from the artificially isolated experimental setting, into the outside world which is ultimately of interest to us.

From his initial formulation in 1957 and with the help of a series of collaborators, Donald Campbell gradually reformulated the distinction between internal and external validity and throughout the years, he added more validity types into what is a now a stable four-fold validity typology well known in social research methodology: statistical conclusion validity, internal validity, construct validity and external validity.[1] Ultimately, the first two are usually grouped together, whereas construct and external validity are also often conflated, suggesting, in accordance to Campbell himself (Shadish, Cook and Campbell 2002), that all of the categories can be subsumed under a broader, more encompassing, internal/external pair.

Campbell produced his distinction with the purpose of serving as a guide to researchers engaged in quasi-experiments, defined as large field try-outs in education or socio-ameliorative programs where randomization of subjects across treatments was often not feasible or practical - thus their “almost” experimental nature. Although allocation to treatments of different groups was not perfectly random, these field try-outs did retain a crucial aspect of experimentation, since they were based on the observation of the exogenous introduction of a disruption in a system, with the purpose of noting the changes it produced.

As Mark has put it (1986), the history of quasi-experimentation cannot be separated from the history of the typology of validity. Yet, the use of the terms internal and external validity eventually spread across social scientific fields and internal and external validity are now central terms to thriving experimental disciplines, like behavioural economics. Gradually and since the turn of the century, the distinction between internal and external validity has also permeated the jargon of philosophers of social science, and more recently, it is being increasingly used or referenced by general philosophers of science (Woodward 2003, Cartwright 2007) and even by philosophers of biology (Sullivan 2009). The term, however, is almost never used by experimenters working in the biomedical or the natural sciences, as if its use were, eventually, irrelevant to their practices, in clear contrast to their colleagues in the social sciences.

The picture that emerges is thus somehow puzzling: Can a term be considered central to the discussion of experiments in the social science, be used by philosophers of science in their description of general problems of experimentation across the sciences (or even problems specific to the natural sciences), and yet not be used at all (and even go unnoticed) by the natural scientists themselves? Are the experimentalists in natural sciences missing out on something important? Are their practices affected by the lack in their conceptual repertoire of this internal/external validity distinction?

Our paper suggests a negative answer to these questions. Based on its numerous conceptual problems, we argue that the distinction between internal and external validity is of very limited usefulness, as used currently in fields like behavioural economics, philosophy of the social science, or general philosophy of science. Though currently central to the methodological debates around experiments in these disciplines, this paper argues that the use of the internal/external dyad is very often based on misunderstandings regarding the original definitions and their purposes; or very often, too, based on definitions of internal and external validity that are riddled with conceptual problems. In this paper we show that at best, the terms serve the mere purpose of reminding us the importance of important problems in experimentation, such as reliability or generalizability of findings. At worst the use internal and external validity can serve mere rhetorical purposes that can supply ill-founded methodological advice. The paper explores the limited usefulness of the distinction between internal and external validity on the grounds of its numerous conceptual problems and recommends the use of viable, less problematic, terminological substitutes.

The paper and presentation are structured as follows:

Section 1 further explores the origins of the distinction between internal and external validity and analyses Campbell’s purposes in coming up with these concepts. This introductory section traces how the distinction between internal and external validity gradually penetrated the philosophical literature on expertiments. The second section of the paper goes over some importat conceptual problems of the internal/external validity distinction: first, we will go over the ambiguities in the definition of external validity. Second, we will go over some of the inconsistencies in the definitions of both external and internal validity and the object to which they are supposed to refer - be it experiments themselves (i.e., experimental designs), the data generated by the experiments, or the inferences from experiments. Favoring the latter interpretation, whereby internal and external validity would refer to the inferences from experiments, we present a critique, in the fourth section, of the assumptions implicit in the common uses of the distinction between internal and external validity. In particular, the paper argues that the distinction between internal and external validity presuposes an unrealistic conception of experimentation in which the relationship between inference and design is rigid, and underplays and misinterprets the role of background knowledge in the interpretation of experimental data.

BIBLIOGRAPHY AND REFERENCES

Caamano Alegre, M. (2009). Experimental Validity and Pragmatic Modes in Empirical Science. International Studies in the Philosophy of Science. Vol 23, M.1, Marzo 2009, pp. 19-45.

Cartwright, N. (2006). Well-Ordered Science: Evidence for Use. Philosophy of Science. 2006, Vol 73, Num 5: pp. 981-990

Cartwright, N. (2007). Hunting Causes and Using Them. Cambridge: Cambridge University Press

Campbell, D. T. (1957). Factors Relevant to the Validity of Experiments in Social Settings. Phsychological Bulletin, 54, 297-312.

Campbell, D. T. and J. C. Stanley (1963), Experimental and Quasi-Experimental Designs for Research, Chicago, Rand McNally and Company.

Campbell, D. T. (1986) Relabeling Internal and External Valdity for Applied Social Scientiests. in Trochim (ed.) Advances in Quasi-Experimental Design and Analysis. San Francisco: Jossey-Bass.

Cook, T. D. and D. T. Campbell (1979), Quasi-Experimentation: Design & Analysis Issues for Field Settings, Boston, Houghton Mifflin Company.

Cook, T. D. and D. T. Campbell (1986), The Causal Assumptions of Quasi-Experimental Practice, Synthese, 68: 1. p. 141.

Cronbach, L. (1982). Designing Evaluations of Educational and Social Programs. San Francisco: Jossey-Bass.

Guala, F. (2003), “Experimental Localism and External Validity”, Philosophy of Science, vol. 70, pp. 1195-1205.

Guala, F. (2005), The methodology of experimental economics, Cambridge, Cambridge University Press.

Hammersley (1993). A note on Campbell’s distinction between internal and external validity. Quality & Quantity 25: 371-387.

Hammersley (1993). Abandoning internal and external validity: a response to Swanborn. Quality & Quantity 27: 217-218.

Hogarth, R. B. (2005), “The challenge of representativeness design in psychology and economics”, Journal of Economic Methodology, vol. 12 (2), pp. 253-263.

Lucas, J. W. (2003), “Theory-Testing, Generalization and the Problem of External Validity”, Sociological Theory, vol. 21 (3), pp. 236-253.

Mark, M. (1986), Validity Typologies and the Logic and Practice of Quasi-Experimentation. in Trochim, W. M. K. (ed). Advances in Quasi-Experimental Design and Analysis. San Francisco: Jossey-Bass.

Mayo, D. (2008) Some Methodological Issues in Experimental Economics. Philosophy of Science. 75 (December 2008) pp. 633–645

Schram, A. (2005), “Artificiality: The tension between internal and external validity in economic experiments”, Journal of Economic Methodology, vol. 12 (2), pp. 225-237).

Shadish, W.R., Cook, T.D., & Campbell, D.T. (2002). Experimental and Quasi-

Experimental Designs for Generalized Causal Inference. Boston: Houghton-Mifflin.

Sullivan, Jacqueline. 2009. “The multiplicity of experimental protocols: a challenge to reductionist and non-reductionist models of the unity of neuroscience”. Synthese

Swanborn, Peter G. (1993) External validity abandoned? Quality & Quantity 27: 211-215.

Thye, S. R. (2000), “Reliability in Experimental Sociology”, Social Forces, vol. 78 (4), pp. 1277-1309.

Trochim, W.M.K. (1986). Advances in Quasi-Experimental Design and Analysis. San Francisco: Jossey-Bass.

Woodward,J. (2003). Making Things Happen: A Causal Theory of Explanation, Oxford: Oxford University Pres

[1] Statistical conclusion validity has been defined as the validity of inferences about the correlation between treatment and outcome, and construct validity refers to the inferences about the higher order constructs that represent sampling particulars (Shadish, Cook and Campbell, 2002). These concepts, in particular that of construct validity, are common currency in social psychology and related fields in the social sciences, less so in more recent experimental fields, like behavioural economics (Morton and Williams 2008).