Response to Stephen Gorard: 'The widespread abuse of statistics by researchers: What is the problem and what is the ethical way forward?'
Victor H. P. van Daal (Edge Hill University)
Herman J. Adèr (Johannes van Kessel Advising)
Summary
Gorard makes two claims: (1) there is a widespread abuse of statistics in social sciences, and (2) researchers almost universally report results incorrectly. According to Gorard the way forward is to correct the reporting, so that the abuse will disappear.
Response
We have a number of concerns with this paper, which are listed below.
1. Gorard does not provide any hard data, -and here we don't need any inferential statistics-, on how widespread the abuse of statistics in the social sciences is, nor does he inform us of how 'universal' the poor reporting is. So, we don't get any idea at all about how widespread the abuse of statistics and how universal the poor reporting of results are. Furthermore, even if statistical methods would be abused and reporting would be poor it remains unclear in what way and to what extent this invalidates the interpretation of published research results. Finally, is there any evidence that statistics abuse and/or faulty reporting necessarily leads to bad decisions by policy makers or other stakeholders? In our opinion, wrong decisions are caused by not being able to assess the methodological quality of any research and by not being able to assess which pieces of research are relevant for the decision to be taken.
2. The take on probabilistic reasoning presented in the beginning of the paper is not convincing. The consequence would be that, for example, weather forecasts should not be trusted, because they are based on probabilistic reasoning. For an in-depth discussion of probabilistic reasoning, see Pearl (2000). Morgan and Winship (2007) provide a more accessible treatment, while Shadish, Cook, and Campbell (2002) focus on causal inferential reasoning in social science research.
3. Unlike in physics where truly random samples are used, convenience samples dominate in the social sciences. The main problem with convenience samples concerns generalisation. What is found in a convenience sample cannot be generalised to the population from which samples are drawn. In the field of medical research often a so-called RCT design (randomised clinical trial) is used, in which different treatments (usually two) are randomly assigned to different patients, so that alternative explanations for any treatment effect found can be ruled out. However, random assignment can still go wrong, though statistical techniques are available to fix such problems (Van Renswoude, 2013). Statistical techniques can and should not be abandoned, because hypothesis testing and effect size estimation are essential for the interpretation of any RCT. Generalisation in medical research is achieved by conducting meta-analyses, in which studies relevant for a specific topic are systematically combined.
4. Gorard seems to join others, who time and again discuss the inadequacy of statistical reasoning. However, this is not very helpful for researchers. Instead, researchers should systematically be taught to scrutinize their conclusions in view of the limitations of the statistical techniques used, so-called content robustness (Adèr, 2008), and, equally important, in view of the influence of potential confounders in their designs. In general: a researcher should take a methodological point of view rather than a statistical one.
5. Another perspective of quantitative data analysis is completely ignored in Gorard's paper: the difference between exploratory and confirmatory data analysis: his discussion is confined to the second form of analysis, whereas most data in the social sciences are of explorative nature. Every researcher in the field is aware of this difference and of the different statistical techniques that should be applied.
6. Finally, even in an explorative data analysis, statistical inference can be applied by using cross-validation. A relative large data set (though this requirement can be weakened) is randomly split in two parts. One part is used to describe the sample and the data, and to generate hypotheses. Corresponding statistical hypotheses are then tested on the other part of the data set.
References
Adèr, H. J. (2008). Methodological quality. In H. J. Adèr & G. J. Mellenbergh (Eds.). Advising on research methods: A consultant's companion (pp. 49-70). Huizen, The Netherlands: Johannes van Kessel Publishing.
Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. New York, Cambridge University Press.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, MA: Cambridge University Press.
Shadish, W. R., Cook, D. T., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York: Houghton Mifflin.
Van Renswoude, D. R. (2013). Random or non-random assignment: What difference does it make? In H. J. Adèr & G. J. Mellenbergh (Eds.). Advising on research methods: Selected topics 2013. Huizen, The Netherlands: Johannes van Kessel Publishing.