Hannah Robinson & Lin Norton, Liverpool Hope University College, 2003

Assessment Plus[1]

A selected review of the assessment criteria literature

Hannah Robinson and Lin Norton

Liverpool Hope University College

Context

The idea for this paper arose from two separate but related tasks for the newly appointed project worker at Liverpool Hope: i) a review of the literature and ii) arranging interviews with lecturers about assessment. The implications presented at the end of this paper come from this review and from informal discussion with colleagues in preparing the interview schedule.

Literature review

There is much in the literature that is related to assessment. Norton & Brunas-Wagstaff (2000) characterised assessment as a multi-faceted process that has several aims:

Providing a means by which students are graded, passed or failed
Licensing students to proceed or practice
Enabling students to obtain feedback on the quality of their learning
Enabling teachers to evaluate the effectiveness of their teaching
Maintaining academic standards of awards and award elements

The review that follows relates specifically to the purpose of assessment most commonly characterised as assessment used for grading students (Samuelowicz & Bain, 2002).

Categorizing assessment criteria

Assessment criteria are widely used in the education system when student’s work is being marked. Nevertheless, exact definitions of core criteria vary across departments and between institutions. There

Hannah Robinson & Lin Norton, Liverpool Hope University College, 2003

is no over-arching definition of what core criteria are, or how they should be weighted, and the views of students and tutors on ‘what makes a good essay’ vary considerably (Norton, 1990; Norton, Brunas-Wagstaff & Lockley, 1999).

Using assessment criteria

Freeman and Lewis (1998) observe that one of the main barriers to the successful use of assessment criteria is that criteria are insufficiently explicit. This can lead to a mismatch between students’ and tutors’ interpretations of the language of assessment criteria (Higgins, Hartley and Skelton, 2002; Merry, Orsmond and Reiling, 1998; 2000) and between interpretations made by tutors in the same department, marking the same essay (Mazuro & Hopkins, 2000; Webster, Pepper and Jenkins, 2000). This problem arises, in part, from the vague nature of the words used when outlining assessment criteria. ‘Adequate’, ‘well-developed’ and ‘weak’ may convey some meaning but they are imprecise. Individuals can mean something different even when they use the same words (Hand and Clewes, 2000).

A related difficulty with understanding assessment criteria is the tendency for different assessors to attribute different levels of importance to criteria – the familiar student complaint that ‘different markers want different things’. Norton (1990) observed that whilst tutors clearly look for a range of things when marking essays (answered the question, structure) and that there is some consistency in what is viewed as important by different assessors, there are ‘idiosyncratic concerns’, such as style and content. At present, published guidelines tend not to explicitly set out specific values for each criterion, allowing tutors some freedom in their judgment, but this leads to difficulties for students. It has also been noted that whilst departments may assure students that work is criterion-referenced, in practice peer referencing and norm-referencing are still used by many tutors. Norton and Norton (2001) found that this was especially the case when tutors were marking borderline essays. There are also reports of tutors reassessing papers they had marked early on in a batch of essays, if they were surprised by a particularly good or bad performance from a student cohort (Ecclestone, 2001).

Ecclestone (2001) has argued that assessment criteria may improve students’ learning and motivation as well as the standard of academic work produced. Again though, a lack of explicit discussion about what is meant by assessment criteria can restrict their usefulness. O’Donovan, Price and Rust (2000) found that students needed help in using the criteria they were given and for each criterion to be explained, preferably with

examples. Students look for reliable maps to guide them through the minefield of essay writing (Webster et al., 2000); they seek to gain the best marks they can and the majority would happily follow a ‘checklist’ if a good grade were assured. Nevertheless, calls for ever more explicit guidelines are open to the suggestion that students will merely know ‘what to do to get a 2:1’ rather than gain a meaningful understanding of their chosen subject.

Most tutors seek to encourage their students to take a deep rather than a surface approach to learning (Biggs, 1999), and it has been argued that explicit criteria facilitate a deep approach. Gipps (1994) has suggested that students should be regarded as ‘novice assessors’ who need to internalize marking guidelines in order to produce work of a good standard. However, assessment criteria alone cannot be seen as the answer to problems of student motivation. In some cases students seem to regard the criteria as a hindrance to their academic development. An unreferenced quotation from a student illustrates this point: ‘I follow the criteria to get good marks but when I finish I’ll write the way I want.’

Democratic Practices and Fairness

Norton & Brunas-Wagstaff (2000) suggested that if students perceive the assessment system as unfair, they will be more inclined to take a surface or strategic approach to their learning. Ecclestone (2001) has also suggested that precise definitions of teaching outcomes and criteria can lead to more democratic practices in assessment as the process is demystified. In general, the development from novice to expert, in any area, is characterised by a declining dependence on rules, routines, and explicit deliberation. Experts are more intuitive, less deliberative and less able to articulate the tacit knowledge on which much of their decision-making has come to depend.

Wolf (1995) has claimed that ‘habits’ develop when assessors mark work regularly. In some cases, a marker may allow personal bias to affect judgment or allow knowledge of a student to influence the marking, perhaps by compensating for an uncharacteristically poor performance by an otherwise able student. Wolf believes these habits are widespread and that markers tend to be oblivious to them. However, a lack of discussion about this topic blurs the distinction between ‘unjustified prejudice’ and ‘justified interpretation’.

Thus although criteria make assessment more amenable to moderation and standardization between markers, gaining explicit agreement about criteria and communicating this to staff and students needs to be an ongoing process. This need for reconstruction can be attributed in part to the way that assessors acquire and refine their own internalized model of assessment over time. This ‘mental model’ is affected by previous experience of marking as well as by previous experience of being assessed. Wolf (1995) has gone so far as to suggest that the ‘mental model’ of quality is applied irrespective of written guidelines and that it is especially powerful when new assessment guidelines are introduced. Furthermore, Hands and Clewes (2000), whilst acknowledging the value of criterion referencing, have pointed out that too many criteria could diminish the importance of tutors’ judgments (they refer specifically to the marking of dissertations) and lead to an increase in ‘marking fatigue’ which itself is a cause of much variability found in assessment quality. Laming (2003) offered some interesting evidence from his comparison of findings on judgment in psychophysical experiments with judgment in marking to support his contention that human markers find it difficult to reliably distinguish between more than five discrete categories.

Assessment guidelines can be seen as an important tool for giving novice assessors confidence to take part in the moderation process. This is important as many academics report feelings of discomfort and fear when participating in exam boards or when double-marking work (Hand and Clewes,2000). Partington (1994) has gone so far as to suggest that explicit assessment criteria that are freely available to staff and students should negate the need for double-marking. In practice though, many tutors see exam boards as a staff development opportunity where individuals’ views on standards and qualities can be discussed and novice tutors can gain insight into the marking process (Hand and Clewes, 2000).

The case for specific guidelines

A strong case can, and has, been made for the use of explicit assessment guidelines (e.g., Elander, 2002; Elander & Hardman, 2002). Studies have highlighted the lack of validity (Newstead and Dennis, 1990) and of inter-rater reliability that exists between assessors (Caryl, 1999; Newstead and Dennis, 1994). It is argued that clear guidelines should reduce these problems (Elander, 2001). Marking is subject to all kinds of influences and biases (such as order and practice effects, fatigue, knowledge of the student), which are well documented in the literature. There is also a need for novice markers to be given advice and training.

The possible barriers to specific guidelines

Yet there is also evidence in the research literature that academic staff may well resist anything that looks like ‘extra work’ given the heavy marking loads that they are currently facing (Norton & Norton, 2001; Newstead, 2003). Added to this is the increasing pressure to give quality feedback (Mutch, 2003), which represents a considerable investment in time. Even if the practicalities of the workload problems can be overcome, there is still the issue of persuading experienced staff to use detailed assessment criteria guidelines. Many academics relish their ‘expert status’ and have a distaste for specified assessment criteria, especially if all criteria are to be considered in each case (Webster et al., 2000). It is claimed that such an approach is formulistic, constraining, and even artificial (thus capable of producing a final mark which bears little relationship to anything that would be given in other circumstances) (DeVries, 1996).

Implications for markers

The following list of questions is not comprehensive, but is intended to be a stimulus to aid readers’ thinking:

· How are we going to persuade experienced lecturers to use explicit assessment criteria in their marking, in terms of their comfortable ‘expert status’?

· How are we going to persuade lecturers (regardless of experience) that using detailed criteria will not impose extra work and pressurize them even more in meeting tight turn-around times?

· What do we think about the Gestalt maxim that the ‘parts are greater than the whole’ when marking? Currently, staff tend to use their discretion to adjust the final overall mark, but consistent use of criteria will prevent them from doing this.

· Are we sure that explicit assessment criteria are actually encouraging students to take a deep approach to their written work, or might we not inadvertently be causing them to take a strategic approach?

· Are there issues around applying specified core assessment criteria to highly individual pieces of written work, such as negotiated learning agreements, dissertations, portfolios, etc?

· The literature refers to discrepancies between tutors’ comments and the marks awarded (Mazuro & Hopkins, 2002; Webster et al., 2000). Does this pose a problem when thinking about constructive feedback that is at the same time commensurate with the grade awarded according to the core assessment criteria (i.e., how do we balance positive feedback with not scaring students by the bluntness of the core criteria)?

References

Biggs, J. (1999) Teaching for Quality Learning at University. Buckingham: Society for Research into Higher education and The Open University Press.

Caryl, P.G. (1999) Psychology examiners re-examined: a 5 year perspective. Studies in Higher Education, 24, 61-74.

DeVries, P. (1996) Could ‘criteria’ in quality assessments be classified as academic standards? Higher Education Quarterly, 3, 193-206.

Ecclestone, K. (2001) ‘I know a 2:1 when I see it’: Understanding degree standards in programmes franchised to colleges. Journal of Further and Higher Education, 25, 301-313.

Elander, J. (2002) Developing aspect-specific assessment criteria for examination answers and coursework essays in psychology. Psychology Teaching Review, 10, 1, 31-51.

Elander, J. and Hardman, D. (2002) An application of judgment analysis to examination marking in psychology. British Journal of Psychology, 93, 303-328.

Freeman, R. & Kewis, R. (1998) Planning and implementing assessment. London: Kogan Page.

Gipps, C.V. (1994) Beyond testing. London: Falmer.

Hand, L. & Clewes, D. (2000) Marking the difference: an investigation of the criteria used for assessing undergraduate dissertations in a business school. Assessment & Evaluation in Higher Education, 25, 5-21.

Higgins, R., Hartley, P. and Skelton, A. (2002) The conscientious consumer: reconsidering the role of assessment feedback in student learning. Studies in Higher Education, 27, 1, 53-64.

Laming, D. (2003) Marking university exams. Presentation at one day seminar on Assessment in Psychology degrees, St Barts Hospital , London, 21 March, 2003.

Mazuro, C. and Hopkins, L. (2002) Different assessment criteria for different levels of a psychology degree: Does a 2.1 at 1st year have to meet different criteria than a 2.1 at 3rd year? Paper given at Psychology Learning and teaching Conference, University of York, York, 18-20 March, 2002.

Merry, S., Orsmond, P. and Reiling, K. (1998) Biology students’ and tutors’ understanding of ‘a good essay.’ In C. Rust (Ed) Improving Student Learning: Improving Students as Learners. Oxford: The Oxford Centre for Staff and Learning Development.

Merry, S., Orsmond, P. and Reiling, K. (2000) Biological essays: how do students use feedback? In C. Rust (Ed) Improving student learning: Improving student learning through the disciplines. Oxford: The Oxford Centre for Staff and Learning Development.

Mutch, A. (2003) Exploring the practice of feedback to students. Active learning in higher education, 4, 1, 24-38.

Newstead, S.E. (2003) The purposes of assessment. Presentation at one day seminar on Assessment in Psychology degrees, St Barts Hospital , London, 21 March, 2003.

Newstead, S.E. & Dennis, I. (1990) Blind marking and sex bias in student assessment. Assessment & Evaluation in Higher Education, 15, 132-139.

Newstead, S.E. & Dennis, I. (1994) Examiners examined: the reliability of exam marking in psychology. The Psychologist, 7, 216-219.

Norton, L.S. (1990) Essay writing: What really counts? Higher Education, 20, 4, 411-442.

Norton , L. and Brunas-Wagstaff, J. (2000) Students’ perceptions of the fairness of assessment. Paper presented at the first annual conference of the Institute for Learning and teaching in higher education, iltac 2000, York, 27-29 June 2000. (Paper available on the ILTHE website: www.ilt.ac.uk)