Clinical significance of patient-reported questionnaire data:
Another step toward consensus
Jeff. A. Sloan, Ph.D.1
David Cella, Ph.D.2
Ron D. Hays, Ph.D.3,4
1. Mayo Clinic, Rochester, MN.
2. Northwestern University and Evanston Northwestern Healthcare, Evanston, IL
3. University of California Los Angeles, Los Angeles, CA
4. RAND, Santa Monica, CA
Introduction
Much has been written and discussed in recent years on the “clinical significance” of quality of life (QOL) differences. In this issue, Yost et al1 provide an example of how to go about estimating minimally important differences when comparing groups of patients for the Functional Assessment of Cancer Therapy – Colorectal (FACT-C). The article discusses the application and limitations of various approaches to obtaining estimates for minimally important effects. It also provides an opportunity to discuss the differences, or lack thereof, between the “competing” estimation methods. The hope is that in doing so, a common ground can be found that refocuses research efforts in this arena on the dissemination of practical application of QOL assessments. That is the purpose of this accompanying editorial.
Why do we need to define clinical significance?
Drawing from the seminal early summary of Lydick and Epstein2, Yost et al dichotomize the various methods for estimating a minimal clinically significant difference: Anchor-based and distribution-based. Anchor-based methods link changes in the QOL measure to other important variables (anchors) while distribution-based approaches link the changes in QOL to the probability distribution of scores from statistical theory. Guyatt et al.3 provide a detailed summary of the various approaches used to date.
Some clinical colleagues who are interested in understanding the meaningfulness of patient-reported data have expressed frustration with the lack of clear guidelines on how to interpret their data.4,5,6 QOL researchers have spent great efforts to convince them that QOL assessments are an important and vital part of modern medicine7. By the early 1990’s, a myriad array of new QOL assessments entered the scientific literature. Clinicians became more familiar with the concept of including QOL in clinical trials and began asking practical questions regarding their usage.8.If QOL is an important clinical outcome, then how does one interpret the results? How does one apply the results from clinical trials comparing groups to the individual patient in the clinic?9 Can clinical pathways include these tools? Clinicians began asking the same questions they would about any other clinical measure, such as blood pressure or a component of a complete blood culture analysis.
Foremost among these questions were “what do these scores mean?” and “how do we tell when a change in a QOL score is important?” Jaeschke and Juniper were among the first to tackle the concept of clinical significance while evaluating an asthma questionnaire10. What followed was a series of disconnected attempts in the literature to define clinical significance.3,4,11 This led to concerns that QOL assessments were not “hard science” and therefore could not be trusted in the same way that laboratory assays were.12 The combination of a lack of familiarity with self-reported endpoints and an unclear message from the QOL “industry” led to clinical research articles questioning the value of QOL assessments.13-17 One might say we were being told to “put up or shut up.” While this may seem harsh, it is reasonable for clinicians to expect a scientific integrity out of QOL assessments equal to other clinical indicators so that they may be incorporated into the gestalt of medical practice.
A clinical significance Tower of Babel
QOL researchers have taken up the task admirably to attempt to define clinical significance. Numerous authors have discussed the various approaches and potential definitions.3,11,18,19, 20 Articles began to appear suggesting cutoff points for numerous QOL assessments. The inherent problem in attempting to find a definition of clinical significance for each tool in each clinical population raised the question of practicality. Several authors attempted to find a unifying theme underlying the various approaches.3,11,18,21,22
Recently, Norman et al23 presented an analysis that indicated a “remarkable universality” among estimates of clinical significance that centered around roughly ½ times the standard deviation (½ SD) of the QOL measure involved. Sloan19,24 used an analogy of “worms, ducks, and elephants” to classify effect sizes as “small, moderate or large” and suggested that the ½ SD was indeed a point of convergence among the various approaches that would represent a clinically significant effect size. A consensus would seem to be evolving.
Not all voices were in accord. Some authors questioned the simplicity of a single estimate for clinical significance,25 pointing out that a range of scores might be more appropriate across different applications and/or different populations. Perhaps this debate26 can be resolved by considering the simple, single value as a guideline rather than a rule. Further research on any clinical “rule of thumb” is informative, and may modify a guideline, but a common starting point can facilitate progress. A clinical example in oncology might be useful. For decades oncologists have considered 50% tumor shrinkage to be a meaningful response to therapy. This is used in oncology despite the fact that a 50% tumor shrinkage has more meaning in some tumors than others, and within a given tumor type, meaningfulness of tumor response depends on many factors and can have many different values. If a patient realizes 40% shrinkage, this may in the individual case be meaningful (e.g., symptom reduction), but today’s convention would not deem this as a response. Nevertheless, tumor response rate is enabled in clinical trials because of the acceptance of this imperfect convention. Perhaps symptom and quality of life response rates could be similarly encouraged in such a way to facilitate new research.
A further confusion in this field has arisen regarding the term “minimal.” Numerous alternative terms including “minimal important difference,” “clinically important difference,” “minimum clinically important difference,” “clinically significant difference” and other word combinations appeared in the literature. Some authors suggested that the ½ SD may be “too large to be minimal” and that perhaps 1/4 or 1/3 SD was more representative of a “minimal difference.”27 Indeed, Cohen’s widely used rules of thumb for interpreting the magnitude of differences offer 0.50 SD as a “medium” effect size and 0.20 SD as a “small” effect. Some have suggested that “minimal” differences might be closer to Cohen’s small effect size than to a medium effect,24,28. while others have found “medium” effects the same size as “minimally important differences.”10, 23, 24 Whether or not the different definitions were supportable methodologically or intuitively was unclear.
The sum total of this discourse was that the basic question of interpreting QOL assessments remained unanswered. QOL researchers need to recognize that if clinical colleagues perceive that the psychometricians cannot agree among themselves on basic scientific issues then this can feed the opinion that perhaps QOL is indeed too soft a science to receive the same priority as traditional clinical parameters.
Can we agree?
In fact, the differences may not be as stark as one might think. As Yost et al1 demonstrate, it is possible to carry out a detailed analysis of clinical significance for a single tool and derive estimates using the various approaches. The strength of the scientific method is inherent in the fact that all the approaches given reasonably similar answers. This is an important point, because if the various methods indicated wildly variable results, we would have to question the underlying process of obtaining QOL scores. Statistical theory tells us that ultimately all roads converge asymptotically if they are indeed assessing the same construct. Individual variability across specific samples is to be expected, characterized and then incorporated into the analysis rather than cited as a weakness.
The differences among the various definitions can be resolved in part with a convergence of the nomenclature involved, especially with respect to the term “minimum.” Although many authors have asked patients how much of a difference might be clinically important, few have incorporated the concept of a “minimum” in a statistical sense. Many have asked the question more in terms of what might be an important difference and then attached the term “minimum”. Perhaps the best way to resolve these grammatical differences is to remove the term “minimum” entirely and just talk in terms of a clinically meaningful or clinically significant effect. If these terms became the common vernacular, it would be more palatable to our consumers, the clinicians, and remove us from the myriad of acronyms associated with the term clinical significance.
A unifying clarification?
We hope we have clarified these issues to some extent. More importantly, we hope that the science can move beyond psychometric minutiae to clinical necessity so that our clinical colleagues may eventually come to accept QOL assessments as enthusiastically as we do, enabling patient voices to be heard in the decision-making equation. Towards these ends, we propose the following guidelines:
1) The method used to obtain an estimate of clinical significance should be scientifically supportable.
2) The ½ SD is a conservative estimate of an effect size that is likely to be clinically meaningful. An effect size greater than ½ SD is not likely to be one that can be ignored. In the absence of other information, the ½ SD is a reasonable and scientifically supportable estimate of a meaningful effect.
3) Effect sizes below ½ SD, supported by data regarding the specific characteristics of a particular QOL assessment or application, may also be meaningful. The minimally important difference may be below ½ SD in such cases.
4) If feasible, multiple approaches to estimating a tool’s clinically meaningful effect size in multiple patient groups are helpful in assessing the variability of the estimates. However, the lack of multiple approaches with multiple groups should not preemptively restrict application of information gained to date.
The four points are intended as guidelines, not rules. Perhaps we can all agree on these starting points.
The encouraging message is that the evidence to date suggests that all approaches of estimating clinical significance converge more than they diverge. This tenet is important to keep in mind when one is engrossed in the morass of psychometric anomalies. While all of us in the QOL “industry” enjoy discovering and describing the sources of variability, the “noise,” in QOL assessment, users of these instruments require practical guidance on “the signal” to apply and use them in the clinical world. Hopefully this editorial has helped enlighten as to when the signal can be considered important.
References
1. Yost KJ,1* Cella D,1 Chawla A,2 Holmgren E,2 Eton DT,1 Ayanian JZ,3 West DW4. Minimally important differences were estimated for the Functional Assessment of Cancer Therapy – Colorectal (FACT-C) instrument using a combination of distribution- and anchor-based approaches.
2. Lydick E, Epstein RS. Interpretation of quality of life changes. Qual Life Res. 2:221-226, 1993.
3. Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR, Clinical Significance Consensus Meeting Group. Methods to Explain the Clinical Significance of Health Status Measures. Mayo Clin Proc 371-383, 2002.
4. Moser DK. Psychosocial factors and their association with clinical outcomes in patients with heart failure: why clinicians do not seem to care. Eur J Cardiovasc Nurs. Oct;1(3):183-8, 2002.
5. Unruh ML, Weisbord SD, Kimmel PL. Health-related quality of life in nephrology research and clinical practice. Semin Dial. Mar-Apr;18(2):82-90, 2005.
6. Morreim EH. The impossibility and necessity of quality of life research. Bioethics 6:218-32, 1992.
7. Osoba D, Bezjak A, Brundage M, Zee B, Tu D, Pater J; Quality of Life Committee of the NCIC CTG. Analysis and interpretation of health-related quality-of-life data from clinical trials: basic approach of The National Cancer Institute of Canada Clinical Trials Group. Eur J Cancer. 41(2):280-7, 2005.
8. Ayanian JZ, Chrischilles EA, Wallace RB, Fletcher RH, Fouad MN, Kiefe CI, Harrington DP, Weeks JC, Kahn KL, Malin JL, Lipscomb J, Potosky AL, Provenzale DT, Sandler RS, van Ryn M, West DW. Understanding Cancer Treatment and Outcomes: The Cancer Care Outcomes Research and Surveillance Consortium. Journal of Clinical Oncology, Comments and Controversies, Vol 22(15):August 1, 2004.
9. Hays, R. D., Brodsky, M., Johnston, M. F., Spritzer, K. L., & Hui, K. Evaluating the statistical significance of health-related quality of life change in individual patients. Evaluation and the Health Professions, 28, 160-171, 2005.
10. Jaeschke R, Singer J, Guyatt GH. Measurements of health status: ascertaining the minimal clinically imortant difference. Cont Clin Trials 10:407-415, 1989.
11. Osoba D A. Taxonomy of the Uses of Health-Related Quality-of-Life Instruments in Cancer Care and the Clinical Meaningfulness of the Results. Med. Care. 40(6)(Supplement):III-31-III-38, 2002.
12. Frost MH, Sloan JA. Quality of Life Measurements: A soft outcome-or is it? The American Journal of Managed Care 8(18):S574-S579, 2002.
13. Movsas B. Scott C. Quality-of-life trials in lung cancer: past achievements and future challenges. Hematology - Oncology Clinics of North America. 18(1):161-86, 2004.
14. Bottomley A. Vanvoorden V. Flechtner H. Therasse P. EORTC Quality of Life Group EORTC Data Center. The challenges and achievements involved in implementing Quality of Life research in cancer clinical trials. European Journal of Cancer. 39(3):275-85, 2003.
15. Giesler RB. Williams SD. Opportunities and challenges: assessing quality of life in clinical trials Journal of the National Cancer Institute. 90(20):1498-9, 1998.
16. Baars RM. van der Pal SM. Koopman HM. Wit JM. Clinicians' perspective on quality of life assessment in paediatric clinical practice. Acta Paediatrica. 93(10):1356-62, 2004.
17. Welsh M. Parkinson's disease and quality of life: issues and challenges beyond motor symptoms. Neurologic Clinics. 22(3 Suppl):S141-8, 2004.
18. Wyrwich KW. Minimal Important Difference Thresholds and the Standard Error of Measurement: Is There a Connection? Journal of Biopharmaceutical Statistics 14(1):97 - 110, 2004.
19. Sloan J, Symonds T, Vargas-Chanes D, Fridley B. Practical guidelines for assessing the clinical significance of health-related quality of life changes within clinical trials. Drug Information Journal 37:23-31, 2003.
20. P. Tugwell, M. Boers, P.M. Brooks, L. Simon, C.V. Strand . OMERACT 5: International Consensus Conference on Outcome Measures in Rheumatology: Minimal Clinically Important Difference Module J Rheumatol 28:395-460, 2001..
21. Sloan JA. Assessing the Minimally Clinically Significant Difference: Scientific Considerations, Challenges and Solutions. Journal of Chronic Obstructive Pulmonary Disease 2: 57-62, 2005