In one of our recent webinars for school leaders, we heard from Oxfordshire headteacher, Steve Dew, about Improving assessment with comparative judgement – both within his school and in collaboration with 14 other local primary schools.
The history of comparative judgement
Although Comparative Judgement has only gained popularity as an assessment tool relatively recently, the Law of Comparative Judgement was first published by psychologist Louis Thurstone in 1927. The ranking of students through professional consensus, rather than marking individual pieces of work, dates back further still – with examinations at Cambridge prior to 1792 reputedly judged by a team of academics who would collaboratively and qualitatively decide upon a rank order of students1. In the late 1800s it was the quest for fairer, more reliable, and – crucially, with rising student numbers – scalable assessment that led to the introduction of the more quantitative system that we think of as 'traditional marking' today. Thanks to advances in digital technology, however, the modern incarnation of Adaptive Comparative Judgement allows us to combine the benefits of both approaches.
The problem with marking
One huge challenge for schools of course is the teacher workload involved in marking and moderation. The Education Endowment Foundation identified in their report 'A Marked Improvement' that there is an urgent need to investigate more effective marking approaches. There is a growing realisation that traditional marking just doesn't stack up well in a cost-benefit analysis, with teachers spending vast swathes of their non-teaching time on it, but with little evidence that marking has a significant impact on student performance.
Indeed, we know that it is particularly difficult for teachers to make consistently accurate judgements about the standard of students' work against a universal marking framework, since this latter is inevitably open to interpretation and even bias, and can also result in a skewed analysis of ability – if, for example, the pupil or student is able to evidence those criteria demanded by the marking framework, but is lacking in other respects. Steve Dew's experience of marking moderation against the Key Stage 2 teacher assessment guidance for writing offers a perfect example of this. Talking about the writing moderation sessions that used to take place within a local partnership of primary schools prior to the adoption of a Comparative Judgement approach he says: "Where the difficulties came in was the reliability of the judgements that people were making... so whereas one person might say 'this is expected,’ another might say about the very same piece of work that that was 'working towards' or 'greater depth' – and we'd have a whole range of views by the end of that session about where that work should be placed against expectations."
A new approach
The schools decided to look for an alternative approach to moderation – one that would not only increase reliability but also allow them to feed back their learning from each moderation process into improving writing in each of their schools. Using RM Compare software to allow each teacher to make better/worse judgements against specific criteria (e.g. narrative flow) between successive pairs of scripts, the partnership found that they were able to achieve both aims. When faced with just two scripts to assess at a time, teachers were more likely to agree, and in the rare cases where there was a wide disparity of opinion, the algorithm would pull out the piece of work to allow the judges to review it more often until a clear consensus was reached. The final output was a highly accurate ranking of work and an array of analytic reports by skill, by school, and even by pupil, that schools could use to better inform the focus of their teaching and development of writing.
Described by Ross Morrison McGill, founder of Teacher Toolkit, as "an assessment system that goes places traditional marking cannot reach,"2 Adaptive Comparative Judgement could well be the 'more effective system of assessment' that the EEF exhorts us to look for.