The exams algorithm is dead, long live the teacher

Computers won’t decide students’ grades next year. But while teachers are said to do grade estimation badly, is that a fair view? Emma Seith explores whether teacher judgement is a better indication of a pupil’s true ability – and how changes to the grading system could play out in schools next year

11th December 2020, 12:00am

The Exams Algorithm Is Dead, Long Live The Teacher

Emma Seith

As 2020 draws to a close, the dust is still a long way from settling on the qualifications chaos that ensued after the coronavirus pandemic took hold in March. The National 5 exams for next year were cancelled in October and, this week, the Scottish government announced the Higher and the Advanced Higher exams would also not be going ahead. The Welsh government has taken a similar decision - albeit somewhat earlier - saying it will be turning to teacher judgement rather than exams for its qualifications.

However, grade estimation is something that we are told teachers do badly - it is put forward as an argument for externally marked end-of-year exams. But is that accusation fair and, if so, what does Scotland need to do to ensure an A in Aberdeen is the same as an A in Annan or anywhere else, as education secretary John Swinney has promised?

This year, when the Scottish Qualifications Authority (SQA) was setting out its rationale for moderating teacher estimates - a moderation process that was, of course, ultimately thrown out - it said it had looked at how accurately teachers had predicted their pupils’ final awards in the previous year. It found that just 48 per cent of grades estimated at National 5 in 2019 had resulted in those grades. The equivalent figure for Higher was 44 per cent; at Advanced Higher, it was 43 per cent.

These findings prompted SQA chief executive Fiona Robertson to remark that “the accuracy of teacher estimates was below 50 per cent”, so the SQA “had a responsibility to consider the moderation of teacher estimates where appropriate”.

Then, on results day in August 2020, we were told that teacher estimates alone would have resulted in record levels of attainment with the pass rate up 10.4 percentage points at National 5, 14 percentage points at Higher and 13.4 percentage points at Advanced Higher. When asked to explain the reason for the overestimation, Robertson said that “some teachers and lecturers may have been optimistic, given the circumstances”.

This echoed language used in an Ofqual document published the month before in England, which laid the groundwork for major changes to be made to teacher estimates. It said that the “optimistic” grades submitted by schools and colleges would have resulted in big improvements in GCSE and A-level results this year.

Ultimately, the furore over the so-called “mutant algorithms” created by UK exam boards meant that they were forced to accept teacher estimates - optimistic or not.

But were overoptimistic teachers really the issue here? What about the possibility that comparing teacher estimates to exam results is like setting apples against pears, and that if you ask a teacher to make “a holistic professional judgement based on a candidate’s attainment in all aspects of the course” - as the SQA did - you will get a different outcome than from an exam, but not necessarily a wrong one?

In short, could teacher judgement actually be a better indication of a pupil’s true ability?

Accurate assessment?

It is a viewpoint to which Mark Priestley and Marina Shapira - the academics who carried out the independent rapid review of the 2020 exams fiasco that reported in September - are sympathetic. In their report, they note that several respondents said “the issue with the divergence of estimates and historical performance this year may not be due entirely to inaccurate estimation by centres (as SQA have consistently stated)”.

Rather, these respondents suggested that the difference could have been a sign that the push to close the attainment gap was working - and that teacher estimation might provide “a more accurate assessment of achievement than exams (which are said to disadvantage some learners)”.

Priestley expanded on this point when he and Shapira gave evidence to the Scottish Parliament’s Education and Skills Committee in November. He raised the possibility that “teacher estimates measure something different” to exams, adding: “The issue is not so much about how teacher estimates can predict an exam performance - it is about whether they provide a more valid measure of student performance over time.”

It is an argument that Jo-Anne Baird, a professor of educational achievement at the University of Oxford, has heard before, and one she is quick to dismiss. She says the research shows not just that teachers are bad at predicting their students’ exam results, but also that they are inconsistent in their judgements of pupil performance.

“Some argue that [teacher estimates] are more valid because the teachers know students’ knowledge and understanding better,” she says. “It is hard to argue that they are more valid if they are not reliable.”

Gill Wyness - an associate professor of economics, and deputy director of the Centre for Education Policy and Equalising Opportunities at the UCL Institute of Education - also says that she does not have “much sympathy for the argument that teacher estimates could be more valid than exams”. Her research - based on three years of UK university applications data - showed that only 16 per cent of applicants’ predicted grades were accurate, with the vast majority (75 per cent) receiving overestimated grades. It also showed, however, that teachers are not always optimistic and that high-attaining, disadvantaged students are significantly more likely to receive pessimistic grade predictions.

Wyness says: “I don’t have much sympathy for the argument that teacher estimates could be more valid than exams, simply because that wouldn’t explain why we see biases in them. My research shows that high attainers, low SES [socioeconomic status] and state school kids receive less generous predictions than high SES and private school kids. The implication of that hypothesis would be that teachers’ estimates are somehow more valid for more advantaged pupils, and that doesn’t seem correct to me.”

However, there does seem to be general agreement that none of this is easy. Teachers might come in for criticism for inaccuracy in grade estimates but, equally, we all witnessed what happened when it was left to algorithms to allocate students’ grades.

Wyness was involved in research that tried to use statistical and “machine learning” methods (which apply artificial intelligence), based on prior achievement, to improve the accuracy of predictions. In fact, only “modest improvements” were made on teacher estimates, with about one in three pupils being correctly predicted in the best models. Overall, they found that the predictions generated by the models were incorrect for 74 per cent of students.

Wyness says: “Notably, we find that high-achieving non-selective state school pupils are more likely to be underpredicted compared with their selective state and private school counterparts from our models, suggesting these pupils may just be harder to predict.” She concludes that “predicted grades are quite inaccurate”, adding: “This is not great news given that predicted grades are here to stay in Scotland at least.”

We still do not know why roughly a quarter of teacher judgements in Scotland were not in line with the SQA’s expectations, says Louise Hayward, an assessment expert based at the University of Glasgow. She says there should have been a research project following up what happened in the summer to investigate why teacher judgements were so different. “It’s important to remember that 75 per cent were not different, but 25 per cent were, and it seemed to be there were more differences of view in some of the more disadvantaged areas,” notes Hayward.

“That triggers questions rather than providing answers: why is that the case? If it turns out there are differences in terms of understanding the standards, then what you might be doing is building in disadvantage because teachers in areas of deprivation are feeding back to pupils ‘this meets the standard’ when it does not. Another scenario is that the way you discern standards does not give a dependable profile that relates to young people’s achievements, and that’s particularly obvious in areas of poverty.

“The problem, though, is we don’t know and we won’t know, and we will continue to guess unless we go in and investigate.”

More research required

Priestley’s rapid review recommended further research into the exams debacle, but that was the only recommendation not to be accepted in full by the government. Ministers said that they might consider this as part of its future education research strategy, but it was not a priority because there was “no intention to have a similar model in support of awarding in 2020-21”.

What model the government does intend to have remains unclear. At the time of writing, the detail is expected to be published before Christmas. Swinney has said “there will be no algorithm” and there will be “quality assurance”. Priestley’s proposal of “a nationally recognised, fully transparent and proportionate system for moderation of centre-based assessment” was accepted by the government.

Hayward, who started her career in education as an English teacher, says that the best way to promote a common understanding of standards is to bring together groups of teachers equipped with rubrics - basically, the grading criteria - and sample scripts, and get them discussing the grade they would award. “When I was teaching, I didn’t understand what the standard was until I worked for the SQA. After I saw a few examples of young people’s writing, I began to understand what a Higher A was, and how that was different to a B and a C. You don’t learn that by osmosis - there’s a process you have to go through.”

Baird says if you want to know how to do teacher assessment well, the model to look to is in Queensland, Australia. High-stakes public external exams were scrapped there in 1972, in favour of teacher-devised assessments and judgement. At the heart of the model is teachers getting together and having conversations about standards, says Baird - which chimes with the approach advocated by Hayward.

Despite the challenge that lies ahead, Baird - who was born and bred in Scotland - believes cancelling the exams was the right thing to do because it gives certainty and allows more time for a robust replacement system. But this time around there will be no “mutant algorithm” to act as a buffer between schools and families if students believe they have been disadvantaged.

Edinburgh headteacher David Dempster, in an article for Tes Scotland in November, said that teachers will “have the job of defending the [N5] grades they award against unhappy students and parents, some of whom will be convinced that there would have been a better outcome for them if the exam had been sat”.

Schools are also likely to find themselves responding to appeals next year, as opposed to being the ones making them. This means that they will have to be scrupulous about recording the steps that lead up to every estimate they make, warns Baird (see box, below).

So, should we brace ourselves for another SQA exams debacle in 2021?

Exam results always come under scrutiny, and rightly so given they dictate which pathways are open and closed to pupils after school. But last year taught us that not having exams - and having little time to implement a robust replacement system - makes results day even more contentious.

Not much has changed this year. There has still been a lack of time to prepare for this new approach to assessing pupils, and the prevalence of the coronavirus means there could be more disruption to come.

The key thing, Baird believes, is that the time available is used “to do a better job than we did last year”. For all the complexity of awarding qualifications in the era of Covid-19, that, surely, is a goal that should not prove beyond the Scottish education system.

Emma Seith is a reporter for Tes Scotland

This article originally appeared in the 11 December 2020 issue under the headline “The mutant algorithm is dead, long live the teacher”

Appealing teacher estimates

When the Scottish government took the decision to revert to teacher estimates where moderation had led to students’ results being downgraded in August, it allayed the concerns of many young people - but not all of them.

Some students - often those with health problems or additional support needs (ASN), or those from black, Asian and/or minority-ethnic backgrounds, or those who had been home educated - felt their teacher estimates failed to reflect their achievements. They have called for a right of appeal that bypasses schools.

So far, the government has not stepped in and has been accused of letting these young people flounder. But it seems likely in 2021 that there will be a direct route of appeal for pupils, not least because of the incorporation into domestic law of the UN Convention on the Rights of the Child, which will strengthen young people’s rights.

Jo-Anne Baird, a professor of educational achievement at the University of Oxford, was shocked by the lack of recourse in 2020 for students unhappy with their teacher estimates.

Another problem looms in 2021: schools are accustomed to doing the appealing, says Baird, as opposed to being appealed to. But next year, she warns, schools will have to be scrupulous about the records they keep that detail how grades were arrived at because they will need to be able to justify those grades.