Evaluating the Quality of Evidence

EVIDENCE BASED PRACTICE

Evaluating the Quality of Evidence

Client Assessment and Risk Evaluation (CARE)

The CARE:

Was developed to assist practitioners in weighing the quality of assessment procedures.
Is designed to compute a single index of the quality of a client assessment procedure that reflects the assessment’s ease of use, reliability (particularly inter-rater), and predictive validity.
Higher scores indicate a stronger analysis.
Provides and index of what to look for in a practical, useful, assessment procedure.
Instructions:
Please read the Explanation for each criteria on the CARE form with an eye to applying the criterion to any assessment procedure. The form was intended to rate the quality of any assessment procedure, regardless of whether the procedure follows a published measure, an interview guide, agency procedure, or common practice. Give one point for each check mark. The form is based on the assumption that any assessment procedure ought to be simple to apply, reliably scored, and of predictive value regarding what clients will do in the future or against a more valid measure. Scores can range from 0 to 100. This is only an ordinal scale, meaning that a score of 20 is higher than a score of 10 but not necessarily twice as high. No norms exist for the CARE form.
The CARE assumes background that does not appear in most practice and research texts. If items on the CARE form appear unfamiliar to you, please read the Detailed Explanation for CARE Criteria that appears following this form. Criteria can be rated on the CARE form from documentation that accompanies assessment procedures without understanding specifics in the Detailed Explanation. Standards for risk assessment and for judging the validity of an assessment against a more valid assessment procedure follow the same pattern here, but discussion concerns risk for consistency.

Source in APA format:______

______

Criterion / Points
(1 Point for Each Criterion Checked) / Explanation
Utility of Assessment Procedure for Practice.
1.Assessment procedure easy to learn. / Physically examine the assessment’s procedure and its scoring procedures to rate this. Estimate whether you and your coworkers could do an assessment and score it without confusion simply by following the procedure’s instructions.
2.Assessment in less than 10 minutes. / The assessment’s administration would take less than 10 minutes of additional time above what the client contact would generally take. To estimate time, do a trial with a few actual cases or with a role-played interview to actually time how long the assessment requires, or rely on published reports.
3.Assessment’s scoring less than five minutes. / Try scoring a few assessments to see how long the scoring takes. Allow for experience as a way to shorten scoring time. Scoring should take less than 5 minutes. You may rely on published reports of time required to score.
Reliability (i.e., Consistent with Each Administration Over Time, Across Raters, or Internally Across the Instrument’s Items)
4.Assessment procedure was checked for inter-rater reliability. / This means that two or more raters arrived at their assessment without conferring at all with the other raters. Give no points unless the authors state explicitly that assessments were done independently. Inter-observer, cross-observer, across-raters mean inter-rater.
5.Some (any) inter-rater reliability coefficient computed. / Any inter-rater coefficient will do here so long as the assessments were made independently and any coefficient of agreement was computed.
6.Kappa coefficient of inter-rater reliability for assessment exceeds 0.70. / The authors must both compute a Kappa to rate the agreement of assessments by independent workers, and the Kappa must exceed 0.70. Since decisions in practice are binary (act/do not act), inter-rater reliability and its most appropriate statistic (Kappa) are criteria here.
7.Assessment procedure checked for form of reliability other than inter-rater reliability (e.g., test-retest, split half, internal consistency). / This criterion is met if the authors check for reliability using any procedure other than inter-rater reliability.
8.Reliability coefficient computed other than Kappa above 0.70 or 70%. / Give the points here if the authors compute a coefficient of reliability other than Kappa (e.g., Pearson r, Cronbach’s alpha, Kuder-Richardson formula 20) and value above 0.70.
Predictive Validity (The client’s assessment demonstrates that it can actually predict how the client will perform in the future. The following discussion refers to risk, which is probability of an undesirable behavior, but the same principles for this discussion can apply to other standards for judging accuracy against a more valid criterion).
9.Those who developed the assessment procedure did a systematic review of studies to isolate indicators that might have predictive value to estimate risk. / Look here for a tabular literature review tat lists studies and which indicators were of predictive value in each study. Give no points if you cannot find such a table in the report.
10.The authors clearly describe criteria for including clients of a particular type in their risk-assessment study. / Merely stating the client type (e.g., suicidal or depressed persons) is not enough. Authors must state the specific criterion or measure (e.g., specifically defined prior suicidal behaviors, Zung Self-Rating Depression Scale) for including subjects in the study. Knowing inclusion criteria allows practitioners to judge whether study findings apply to their clients.
11.The risk assessment study’s results were collected prospectively. / This means that indicators of risk were collected; then clients were followed to see what they would do, and then the indicators were evaluated for predictive efficiency against what the clients actually did or against another gold standard.
12.The risk assessment study was done prospectively and the study resulted in greater than 80% being contacted at follow-up. / Divide the number who were contacted at the end of the study regarding their actual behavior by the number who took the risk-assessment measure at the beginning of t study, and multiply by 100.
13.During the data analysis, those who recorded the subject’s actual behavior were blind to what each subject’s risk-assessment score had been. / This analysis will compare the risk assessment’s earlier results against what actually happened later to judge whether the assessment was accurate. Give a point only if the authors state that those who recorded the predicted behavior were blind to what the prediction had been.
14.The risk-assessment measure’s predictive accuracy was checked in at least one validation study. / Risk scales may predict well where they were developed but sometimes do not elsewhere. To meet this criterion, the measure’s accuracy needs to be tested on a sample other than where it was developed.
15.The risk-assessment scale’s positive predictive value (PPV) was higher than the prevalence rate (base rate, prior probability) by at least 10%. / Applying a risk-assessment procedure that will not predict better than chance (prevalence rate) makes no sense.
16.PPV is greater than .80. / The study computed positive predictive value, or gives sufficient data to do so, and PPV is greater than .80. If more than one computation of PPV, then the average PPV is greater than .80.
17.NPV is greater than .80. / The study computed negative predictive value, or gives sufficient data to do so, and NPV is greater than .80. If more than one computation of PPV, then the average PPV is greater than .80.
18.Using the same subjects, the authors compared positive predictive value (PPV) for practitioners’ prediction against PPV for the risk-assessment scale’s predictions, and the latter is higher.* / This kind of study pits the predictive accuracy of practitioners’ assessments against a risk-assessment scale. This kind of evaluation assumes that the practitioners do not know the risk-assessment scale’s score when they make their judgment.
19.The authors state specifically that they have used a receiver operating curve (ROC) analysis to establish the risk assessment’s cutoff or division criteria (e.g., dividing point between high/low risk categories). / Any risk-assessment scale involves a trade-off. If you want to maximize your instrument’s sensitivity to detect the positives, you will also increase our number of false positives. ROC analysis allows practitioners to make an informed judgment about where best to set the scale’s division point(s). For a detailed description, consult MedCalc at
Total number checked (19 possible)
Score = (number checked/19 x 100)
Summary Statistics for Assessment Procedure
Inter-rater reliability Kappa for assessment procedure
Positive predictive value for assessment procedure
Negative predictive value

* Some studies report only sensitivity, specificity, and prevalence. You can still compute PPV with Bayes’s Theorem as follows:

(prevalence)(sensitivity)

[(prevalence)(sensitivity)]/[(1 - prevalence)(1 – specificity)]

With permission from Gibbs, L. (2003). Evidence-based practice for the helping professions: A practical guide with integrated multimedia. Pacific Grove, CA: Brooks/Cole--Thomson Learning.