BERA Conference, University of Lancaster, 12-15 September 1996

BERA Conference, University of Lancaster, 12-15 September 1996

DRAWING OUTRAGEOUS CONCLUSIONS FROM NATIONAL

ASSESSMENT RESULTS : WHERE WILL IT ALL END?

Professor Roger Murphy

School of Education, University of Nottingham

ABSTRACT

This paper reviews the rising availability and misuse of league tables of assessment results in education in the UK. It calls for a government health warning to be introduced for such tables. It also recommends the creation of a national database of assessment results along with other kinds of data about individuals and institutions to improve the likelihood of informed rather than ill-informed analyses of these data.

KEYWORDS

AssessmentResultsLeague TablesExaminations

BERA Conference, University of Lancaster, 12-15 September 1996

DRAWING OUTRAGEOUS CONCLUSIONS FROM NATIONAL

ASSESSMENT RESULTS : WHERE WILL IT ALL END?

Professor Roger Murphy

School of Education, University of Nottingham

1.The Growth of Large Scale Assessments in the UK

Forty years ago many students went through the years of compulsory schooling in the UK with the 11+ examination being the only major encounter that they had with any national educational assessment system. O level and A level examinations were then taken by a much smaller elite group, who stayed on in school after the statutory school leaving age.

A considerable sequence of reforms has led to the removal of 11+ selection (in most parts of the country), and the introduction of a considerable number of new schemes of assessment, which are intended to be taken by most, if not all, students in school education in the UK.

Following last week's announcement concerning the introduction of national baseline assessments for 5 year olds from September 1998, students attending schools and colleges will in due course encounter six phases of national assessments at 5, 7, 11, 14, 16 and 18. Four of these assessment points (7, 11, 14 and 16) come at the end of the four stages of the National Curriculum and the other two occur during the first term of primary schooling and in the final term for those who stay in education from 16-18.

2.The Possible Uses of Large Scale Assessments

The introduction of new tiers of national assessments has often resulted from a variety of motives. The National Curriculum Consultation Document (DES, 1987), which heralded the introduction of new assessments at 7, 11, 14 and 16, referred to a wide range of purposes, including informing pupils, parents and teachers about the progress of individual students, as well as providing a basis for comparing the performance of individual schools, LEAs, and even the national education system from year to year (Murphy, 1989).

As with any kind of information national assessment results can be used appropriately and inappropriately by a variety of users, with different requirements, and with different levels of understanding of the strengths and limitations of the numerical scores that are available to them. Goldstein and Myers (1996) have addressed aspects of this dilemma in proposing a tentative `code of ethics' for those producing league tables and other summaries of these so called `educational performance indicators'.

It is undoubtedly the case that the most appropriate use of this large scale assessment data is in the way in which it is used by and for individual pupils. Large scale and regular assessments can be used to inform educational choices and strategies, and will in many cases motivate and give added direction to the activities and efforts of individual pupils. The fact that the assessment results are produced as part of a national system hopefully gives them greater credibility and meaningfulness, and at best this can give them greater meaning for those who are concerned to enable individual pupils to achieve their potential.

3.The Strengths and Weaknesses of Large Scale Assessments

Although some educationalists have opposed any increases in the use of educational assessment on principle (e.g. Holt, 1969), many others have argued for the benefits of appropriate assessments. Educational assessment can at best be a major contributor to good quality educational provision and school and pupil improvement, and at worst it can set inappropriate goals, demotivate unsuccessful individuals, and give quite misleading information to the public at large about the state of a nation's education system.

No educational assessment procedure is free from limitations, and it is now widely accepted that national assessment schemes have limited reliability (Wilmut, Wood and Murphy, 1996). Even within an area of the curriculum in a single year, any results produced will only give an approximation of the achievements of the pupils who are assessed. Assess them again with a similar set of assessments and the results will change. Assess them again on other aspects of that part of the curriculum and the results can be wildly different. Thus assessment results in education are only ever at best approximations, giving a broad view of the achievements of students in relation to the things covered in their particular assessment. Such assessments are not precise and they are not at all robust, when it comes to using them to make sophisticated comparisons, or ascertain the answers to complex questions about their educational progress.

Where assessment results are at their very weakest is when it comes to comparing performance across different aspects of the curriculum - say achievements in French with achievements in Mathematics, or when it comes to comparing achievements across different years - say the achievements of 16 year olds in History in 1996 with those of 16 year olds in 1966 (Goldstein, 1986; Nuttall, 1986; Cresswell, 1996; Murphy, Wilmut and Wood, 1996). A further area where assessment results are particularly weak is in relation to comparing the quality of schools, colleges, individual teachers, or indeed the impact of government initiatives.

All of these shortcomings of national assessments can be very frustrating for those whose expectations have been raised by the availability of so many assessment results. At last these `consumers' of the educational system have the information which the government has been promising them so that they can exercise the `choices' and like the readers of Which? magazine go out and find a `best buy'. However the reality is that most of the assessment information which is available on a national scale is too crude and indigestible to inform us, without a great deal of analysis, interpretation, and crucially cross-referencing with other types of information. In short league tables based upon raw assessment results are rarely worth the paper they are printed upon in terms of the detailed insights they can give to the effectiveness of the institutions, which they rank order. This conclusion was rather belatedly reached by the DFE in 1995 when they finally came to see the need for `value-added' analyses of GCSE, AS and A level examinations (DFE, 1995) - these are analyses which compare the achieved assessment results of any group of pupils either with their previous achievements or with other factors which can help to predict their expected levels of educational achievement.

4.The Current Use of Large Scale Assessment Results

The rapid increase in national assessments, referred to in Section 1, is now producing a regular outpouring of results on almost a monthly basis. August is still the peak period with GCSE, AS, A level and GNVQ results all hitting the pages of the media in quick succession. No sooner than the newspapers have recovered from one set of sensational stories about "falling educational standards" or "examination boards setting papers that are too easy" than they are launching into another attempt to create something else out of another big data set resulting from a further round of national assessments.

For a developed country with a highly developed education system and a really quite sophisticated set of national assessment procedures all of this is quite incomprehensible. We are apparently channelling large quantities of resources into producing assessment data, which we then do not bother to analyse beyond giving it out to media correspondents in a raw undigested form and inviting them to create some stories out of it.

In the last few years these trends have led to increasingly outrageous claims being made on the basis of such data. Examination boards in particular have been accused of corrupting their grading standards to attract higher entries. Certain members of parliament have regularly attempted to rubbish the value of any of the qualifications that have been introduced since they took the school leaving certificate in the 1940s. In 1996 we even had the spectacle of a major national newspaper claiming to have discovered from the GCSE results (the day before they were due to be released) that schools were "dumping pupils", away from entering GCSE, in order to improve their position in the league tables - an allegation that could have no substance as the league table percentages are based upon GCSE examination results as proportions of pupils on a school role rather than of those entering the examination. Since then we have had Dr John Marks re-working 1995 Key Stage 2 results in English and Mathematics to produce another shock horror story, which cannot be graced with the description of being serious research.

5.The Way Ahead

The situation as I have described it is I believe quite untenable and calls for rapid reform. I totally agree with Goldstein and Myers (1996) that the answer is not to prevent the release of the data - that would be a retrograde step, which would be likely to fuel suspicion about why it was not being released and would prevent the public from having the opportunity to benefit from this particular brand of `freedom of information'. What we need are some measures to increase the likelihood of an informed use of national assessment results.

I would like now to propose two different strategies. The first is an insistence that a government health warning should be printed alongside any league table or simplistic analysis of raw assessment results. The scope and the exact formulation of this warning will need to involve some refinement but it should encompass a requirement that:

(a)All league tables of schools based upon raw assessment results should point out that no meaningful comparisons can be drawn about the quality or effectiveness of the schools unless their assessment results are interpreted in relation to information about the characteristics and prior achievement, of pupils entering them.

(b)All comparisons made between the achievements of pupils in different subjects should point out that they are likely to be severely hampered by the impossibility of equating assessment results across subjects.

(c)All comparisons of results in national assessments in different years should point out that such comparisons are influenced by the demographic characteristics of particular year groups, and may vary because of demographic as well as educational reasons.

The second strategy is for a national database to be established offering access to accredited educational researchers to undertake analyses which explore the relationships between the various assessment results achieved by groups of pupils, both in relation to their school type and in relation to other factors such as home background, ethnicity, gender and other relevant factors which could lead to a more meaningful analysis of the overall data set of results achieved by pupils at 5, 7, 11, 14, 16, 18, and where possible in further and higher education beyond those years. The DfEE and SCAA have already taken some steps towards developing such a database, for their own purposes, but access needs to be assured for a wider range of users and the potential users need to have some say in deciding what sort of data needs to be available and how it needs to be configured.

The responsibility for managing and maintaining this database should rest with the DfEE in conjunction with the Government Statistical Office. Highlights of the best analyses could then by published in an HMSO publication - this would ideally contain more analysis and cross referencing of data sets than the former DES Statistics of Education and could be more along the lines of Social Trends. Educational researchers would also remain free to publish complementary analyses in independent academic journals and reports.

6.Conclusion

Much as people would like national assessments to operate as some kind of educational barometer, letting us know from month to month how things are going, they can never be that. Assessing educational achievement is much more complex than measuring rainfall levels or average temperatures. What is needed in the future are more `value-added' analyses of assessment results, which have the potential to enlighten us about real progress and achievements, and less league tables of raw results, which are rarely worth the paper that they are printed upon. In this as in many other cases too little information can be a very dangerous thing.

We have already seen far too many outrageous conclusions being drawn from national assessment results. If we are going to justify all of the effort that goes into assembling them, then we need to ensure quickly that they are investigated much more systematically, in order that better informed interpretation can be encouraged. I hope that the proposals put forward in this paper can help to point the way and the means towards a better end.

References

Cresswell M (1996) Defining, setting and maintaining standards in curriculum embedded examinations: judgemental and statistical approaches. In: Goldstein H and Lewis T (Eds) Assessment: Problems, Developments and Statistical Issues, John Wiley, Chichester.

DES (1987) The National Curriculum 5-16. A Consultation Document, HMSO, London.

DFE (1995) GCSE to GCE A/AS Value Added - Briefing for Schools and Colleges, HMSO, London.

Goldstein H and Myers K (1996), Freedom of information: towards a code of ethics for performance indicators, Research Intelligence, 57, 12-16.

Goldstein H (1986) Models for equating test scores and for studying the comparability of public examinations. In: Nuttall D L, Assessing Educational Achievement, Falmer Press, Lewes.

Holt J (1969) The Underachieving School, Pitman, London.

Murphy R J L (1989) National assessment proposals: analysing the debate. In: Flude M and Hammer M, The Education Reform Act 1988: Its Origins and Implications, Falmer Press, Lewes.

Murphy R J L, Wilmut J and Wood R (1996) Monitoring A level standards: tests, grades and other approximations, The Curriculum Journal, 7 (3) (In Press).

Nuttall D L (1986) Problems in the measurement of change. In: Nuttall D L, Assessing Educational Achievement, Falmer Press, Lewes.

Wilmut J, Wood R and Murphy R J L (1996) A Review of Research Into the Reliability of Examinations. A report prepared for SCAA, London. bera\outrage.con/11.9.96