Darr (2005a) A hitchhiker’s guide to validity
What does validity refer to? What is the recent view of validity?
The degree to which an assessment tool measures what it claims to measure.
The recent view focuses more on the interpretations of assessment results
How can we work out whether an interpretation/decision has an appropriate degree of validity? What would we look for?
- Wide enough range of content.
- Enough items to test scope.
- Requires use of desired skills/ processes.
- Tasks matching the learning intention
- Emphasis on deep knowledge
- Assessment is clear and unambiguous
- The effects of time limits considered.
- Not favouring certain groups (I.e. the role English plays in an assessment on maths, thus favouring English speaking groups)
- Fair reading demands
- Suitable language.
How do we know what kind of evidence to look for?
- Evidence we should collect is considerations relevant to:
- Content.
- Construct.
- Criterion.
- Consequential.
Explain each of the following and provide an example:
Content considerations:
We want a fair sample of the area of learning we are interested in. Does your assessment tool for example cover the scope of the learning domain? If we have a fair sample we can make valid inferences/ construct a validity argument.
Construct considerations:
This is about establishing wether certain characteristics or traits have been developed. –Need to show that a particular construct (i.e. trait) is essential for success in the assessment. In other words success should be relevant to the desired construct.
I.e. in maths “the ability to reason with numbers” might be a trait you want to success. In reading the ability to “make inferences from the text” might be a trait for assessment.
Consequential considerations:
- The consequences of teaching and learning
- Should question your considerations if the consequences are detrimental to educational goals or limit classroom experience.
- A negative example of a consequential consideration would be “teaching to the test”.
Select three factors from Fig. 1 and explain how each affects the validity of an interpretation
· Do the tasks match the learning intention? It is important to test on what is being taught.
· Are the questions ambiguous? If so you may be testing the ability to interpret questions, not the intended learning area.
· Time limitsà You might end up testing how well your students respond to time pressures, not their ability in the learning area you are interested in.
Darr (2005a) A hitchhiker’s guide to reliability
What is reliability concerned with?
- The consistency of results we obtain from assessment.
Explain briefly each of the three types of consistency:
· Consistency across time- For example the time of day.
· Consistency across tasks. - Are the tasks relevantly similar in nature in relation to the learning area being assessed?
· Consistency across markers- Is each person marking and assessing the test in the same way? This is why we have moderation.
How is the degree of consistency related to reliability?
The higher the level of consistency the more reliable the results.
What is a reliability coefficient and what does it tell you?
- Reliability co- efficient is how much assessments agree or disagree in their data (on a scale from 0 to 1). They tell us whether an assessment or test is appropriate or not (more detail below).
Explain the relationship between the importance of a decision and the degree of reliability that is required.
High reliability is required for significant decisions (ones that will affect the confidence of a child for example); lower reliability is allowed for lower stakes decisions. This related to the above comment on the appropriateness of assessments
What is the standard error of measurement?
The standard error of measurement can tell us how far “off” a test is. This can help us to see student achievement as within a range, rather than an individual test score.
Why do you think the standard error of measurement is important?
As above. Accurate reporting of student achievement within an range.
What is triangulation and why is it important?
- Using three different sources of information as the basis of decision making. Important to ensure accuracy.
Darr provides a list of factors that affect reliability – select two and explain.
· Suitability of questions. – We want to test students in a specific learning area, not just how they interpret questions in a specific assessment.
· Training of assessors- To produce fair, equitable, and consistent results. (Also why we need moderation).