Will a machine soon be doing your marking?

With an increasing number of AI-powered feedback generators in development, machine learning looks set to cut teachers’ marking workloads. But how close are we to that reality, and would the sector welcome it? Zofia Niemtus and Kate Parker investigate
2nd November 2022, 5:00am
Android teacher Walle

When Rishi Sunak was on the campaign trail earlier this year, he made a promise that might have caught the attention of overworked teachers.

In August, he said that if he became prime minister, he would encourage the use of artificial intelligence (AI) to cut teachers’ workload.

Whether he will keep this promise now that he is in No. 10 remains to be seen. But even if he wanted to, does AI really have the power to give teachers back their work-life balance? And, if it does, is this something that teachers would actually want? 

In the past decade or so, there have been movements in this area, with programmes being developed to help teachers with some of the more time-consuming parts of the job, like marking. 

One longstanding example is No More Marking. Founded by psychometrician Dr Chris Wheadon, it uses an algorithm to rank students’ work in order of quality. 

The process is called comparative judgement and sees teachers take two pupils’ scripts, decide which one they think is better, and feed this information into computer software. Based on their initial judgments, the computer algorithm is then able to rank many scripts in order of quality. According to No More Marking, over 2,000 schools already use their programme. 

Other innovations, however, promise to take things a step further. 

At the University of Birmingham, postgraduate students have developed ‘Graide’, a piece of software that, according to its creators, can mark coursework, homework and exams in maths and science. It is claimed to reduce marking time by 89 per cent, while offering students seven times as much feedback as traditional methods. It can even read handwriting (although not long paragraphs of text). 

Manjinder Kainth is one of the postgraduate students who developed the platform, and he explains that the idea behind it was to cut out some of the repetitive parts of marking while maintaining the more personal elements.

The system is able to learn an assessor’s marking style, so they don’t have to mark the same answer twice; as the assessor progresses through marking an assignment, the system learns to automate more and more of the feedback until all the marking is complete. 

“The AI suggests feedback based on the way that you’ve marked previous questions. So based on the previous work that you’ve marked, and the degree of similarity, we can give you various amounts of confidence as to how likely the feedback that we’re suggesting is going to be correct. And you can accept this and it will update the model and increase the confidence as you mark work. But you can also reject this and it will unlearn just as fast,” says Kainth.

“It’s completely customisable by the educator, the tutor or the marker. I can add in multiple different elements and I can edit them, which allows me to be retroactively consistent with all the other previous marking.” 

Increased consistency is one of the major benefits for staff and students, explains Professor Nicola Wilkin, who supervised the project at the University of Birmingham’s School of Physics and Astronomy.  

“For markers, understandably, by the time they get to the bottom of a pile, essentially, the feedback will be the same that they’re giving to a particular question, but it will be abbreviated and it’ll be much more terse than it was in booklet one,” she says. 
 
“The friendliness and the bits that make the student feel good when they’ve got halfway there can get lost. Your encouragement and feedback can be off even though it’s numerically right and tells them technically where they went wrong. Because of the way [Graide] works, you write out a full, well phrased and a good toned piece of feedback at the start. Then all of the students get that equivalent encouragement.”
 
Graide is currently being piloted in universities, but, according to Kainth, there are plans to adapt it for schools. And there are other programmes already in development specifically for the primary and secondary sectors. 

Progressay, for example, is the brainchild of Moktar Alqaderi, a former teacher at Kensington Aldridge Academy in London. It is what he calls a “grading assistant”: patent-pending machine learning designed to help cut marking workload. 

The programme works by building a rubric based on a sample of teacher-marked essays and exam board mark schemes, which it then uses to auto-grade new essays and generate diagnostic feedback. 

Similarly to Graide, teachers can override suggested grades and edit the feedback the programme generates - although, in the latest trials, there was an 85 per cent computer-to-human agreement rate.

There are some promising tools on the horizon, then. But will teachers be willing to use them? After all, can we really trust a machine with a task as fundamental to education as marking? 

Many in the sector are not so sure. 

 

For Alqaderi, it makes sense that teachers would question the accuracy of AI marking, particularly for those subjects where assessment objectives are more nuanced. He points out that English teachers, for example, are often cautious about using Progressay.

“They ask about inference, the higher cognitive processes; there’s lots of doubt around this, and it makes sense. I can relate to that,” he says.

However, he also points out that human marking is not 100 per cent reliable either: according to a paper Ofqual published in 2018, the reliability of examiner marking in some subjects is just 52 per cent.  

“That was our benchmark: can we try and agree that our machine is at least as reliable as an examiner? Today, the system we developed is grading papers at least as reliably as an expert examiner who is being employed to do that very job,” he says.

But Pontus Stenetorp, an associate professor in Artificial Intelligence at University College London, is more sceptical: he disagrees that AI can ever be as reliable as a human marker.

“That’s a misunderstanding of what a teacher does,” he says. “With humans, there is accountability and exercise of power. What am I going to do, fire the AI if it’s not correct? Who takes responsibility? That’s the key problem.”

What about arguments of human bias? Couldn’t a machine mark more objectively? Again, Stenetorp is not convinced.

“Everyone is biased to some degree,” he says. “Look at the judicial process, for example. Judges are biased. But should we replace them with AI? No, because any AI that is sufficiently powerful to compete with human intelligence can’t explain itself. So it can’t explain the reason why it’s come to a judgement. A human can.”

This isn’t to say that AI doesn’t have a place in the classroom, he argues - but the focus needs to be on developing tools that can enhance what teachers do, rather than taking over certain tasks, like marking, completely.

He gives the example of planning lessons based on reading comprehension. A common approach is to have every pupil in the class read and analyse the same text, for ease of planning. This means the teacher only has to have knowledge of one text at a time, and can formulate a single set of comprehension questions around that text.

‘With humans, there is accountability. What am I going to do, fire the AI if it’s not correct? Who takes responsibility? That’s the key problem’

But what if the chosen class reader doesn’t appeal to every pupil in the class? What if you teach a wide range of abilities and find it difficult to select a text and devise questions that are appropriate for all pupils?

This, Stenetorp says, is where AI could help; the right programme could read several different texts and produce comprehension questions in line with a pupil’s ability level.

“This is a very different kind of technology, right? It allows personalisation of learning. It can remove difficult aspects that a teacher can’t necessarily scale,” he says. “A teacher can’t read 15 or 30 books for every single student in the class, but there’s no reason why every student shouldn’t be able to pick their own book.”

He says that AI, too, could be transformational in giving early feedback to students before teachers mark an assignment.

Isabel Fischer, an associate professor at the University of Warwick, has developed an AI tool which does just that: AI Essay-Analyst. Unlike some of the other programmes in this space, Fischer’s tool is not designed to be a substitute marker, but to provide university students with an initial level of helpful formative feedback.

Students using AI Essay-Analyst submit their essays ahead of the deadline and receive feedback on things like word choice, readability and sentence length, as well as how well key concepts have been described and related to each other.

Fischer says the programme can also “feed-forward on how future assignments can be improved”, and allows students to “track their progress from one assignment to the next”.

One of the benefits, she says, is that it helps to level the playing field - by giving students from disadvantaged backgrounds the type of initial feedback on their work that their peers from more affluent backgrounds are more likely to receive at home.

“Students from high socioeconomic backgrounds often have adults who can proofread assignments for them. Those from disadvantaged backgrounds may not have this. With this tool, we offer everyone that proofreading process and extensive feedback, independent of background,” she explains.

Yet teachers remain a key part of the marking process; when the final essays are submitted they can offer additional feedback that the AI system could not.

Fischer gives the example of a student using the third-person phrase “the researcher” when talking about their own work. This is not common practice in academic writing, she says; the AI tool didn’t recognise that, but the teacher did. 

She, therefore, sees AI as a tool to augment the marking process, rather than to automate it.

It’s an important distinction, and one that others are also keen to highlight.

 

Jonnie Noakes, for example, is the director of teaching and learning at Eton College. He was involved in some early trials of Progressay and admits that, although he was impressed by what the tool could achieve, he won’t be using it for all of his marking going forwards.

The problem, he explains, is that the purpose of marking goes beyond just giving feedback on a particular piece of work: it helps teachers to see how students are progressing, what their level of understanding is, and, in turn, shapes teaching going forward. It also fosters good relationships, he adds. 

“Students need to believe I’m invested in them. If they don’t think I’m interested in them and willing to make an effort on their behalf, then they’re much less likely to make an effort. So how do you do this? You prepare your material, you know what you’re talking about, you teach as well as you can, and you mark. You write annotations that show you’ve understood what they’re saying, you take an interest in their strengths and weaknesses,” he says. 

“That human investment of my time and attention in their efforts is part of what you do to build a relationship with pupils. And I wouldn’t want to throw that out. I think I’d find it very difficult to replicate it in another way.”

Rebecca Mace is a former teacher who specialises in digital pedagogy and is an education lecturer at both the University of West London and University College London. She works as a consultant for Progressay, and says she can relate to Noakes’s concerns. 

“Having work submitted to you on time and then returned in a timely fashion is a language of care that is replicated over and over in classrooms throughout the country. As well as a knowledge check, it can serve as a pastoral measure and content (or lack of it) can often be the way that pastoral issues are picked up. If a computer entirely replaced this then that would be a negative,” she says.

‘More and more people are leaving the profession due to unmanageable workloads, and AI is one way to help deal with that’

AI could never replicate all of the qualities of a human marker. But, Mace says, the right programme could create more time for teachers to focus on those aspects of the job that technology can’t touch.

“Teachers are best placed to instil a lifelong love for learning, foster real passion for a subject, facilitate difficult conversations, and demonstrate real care. The important thing is that they are freed up to do this,” she says.  

And, in the face of an ongoing recruitment and retention crisis, the potential AI offers in this area is something we perhaps can’t ignore. 

“More and more people are leaving the profession due to unmanageable workloads, and AI is one way to help deal with that,” Mace says.

Karine George, a former headteacher and active research practitioner, who co-authored the book AI For Teachers with Professor Rose Luckin, agrees - and points out that, regardless of how teachers feel about it, the rise of AI is inevitable.

She draws attention to the fact that algorithms, AI and the like are already used throughout education, from automating admin to adjusting for grade inflation at exam boards. 

“As AI continues to weave into every aspect of our life, we’re going to see more policymaking governance and the need to train stakeholders,” she says. 
 
And so, she continues, it is incumbent on educators to not bury their heads in the sand about this kind of technology. 
 
“Our job is to prepare children for their futures, not our past, so we have to be interested in technology because it’s interwoven in our daily lives, whether we like it or not. And if we’re going to make our students life-ready and work-ready, schools have a responsibility to demystify AI.” 
 
That doesn’t mean ceding control over assessment completely, she continues. Instead, it’s about getting the AI to “do the heavy lifting” in procedural tasks.

“The teacher is still key to teaching and learning; there’s going to be no AI that’s ever going to be able to react with the human emotional empathy that we exhibit every day to our young people,” she says. 

“What we want is a landscape where our time is used more effectively and efficiently so that our expertise is better focused, and we can deploy it better.” 

Let’s hope that if Sunak does decide to follow through on his campaign promise, this is something he will recognise.