GCSEs 2024: Exam board to trial AI in summer exams

The country’s biggest exam board will test how AI can be used to provide ‘quality assurance’ to human marking

15th March 2024, 5:00am

Exclusive

England’s biggest exam board will probe how artificial intelligence (AI) can be used to provide “quality assurance” to human marking in a trial this summer.

AQA will use data from this year’s GCSE and A-level exams to check to what extent marks given by AI match those of senior markers.

The exam board said it plans to use this process to identify and correct “any aberrancy in the marking of particular individuals”.

It is still designing the precise specification for the work but expects to apply the method to subjects with tens of thousands of entries for upcoming GCSE and A-level exams, Tes understands.

The work builds on an AQA research project that has used historic exam questions to measure AI marks against senior examiners’ marks.

Exam board testing AI marking

Talking through the findings at an event last week, Alex Scharaschkin, AQA’s director for assessment research and innovation, said: “We could predict senior examiner marks quite accurately for constructive responses, such as the types of questions where students have to write a few sentences.”

GCSE biology questions worth three and four marks were among those tested, Mr Scharaschkin told attendees at the Association of School and College Leaders’ (ASCL) annual conference in Liverpool.

Among these shorter questions, 80 per cent of senior examiners’ marks matched those of AI.

But the proportion fell to 65 per cent for longer responses worth six marks, Mr Scharaschkin said.

Now AQA plans to trial the same method on “live data” from this year’s exams.

Background: AI no good for high-stakes assessment, says exam board AQA
Teacher workload: AI could take “heavy lifting” out of teaching, says Keegan
Exam marking: How GCSEs and A levels are prepared for marking

Mr Scharaschkin told conference attendees:“We want to be able to identify if there are any problems with the marking in that period so we can sort them out in real time.”

While the testing will happen alongside this year’s summer exam series, it will not be used to inform marking in any way, Tes understands, and remains as research at this stage.

When asked whether there would be an ethical issue if students’ marks are not changed to reflect the findings, AQA told Tes that it doesn’t ”yet know whether the approach is robust or not, which is why we need to do further testing before we actually implement it in marking properly”.

The exam board added: “It’s quite promising as a quality assurance methodology, and we are going to be doing more work on this in this coming summer 2024 exam series to look at how we can scale this up and make it really impactful in terms of improving quality of marking.”

However, AQA is “not proposing that we replace human marking of exams with robot marking of exams”.

Last year the exam board warned that the use of AI was “no good” in marking high-stakes assessment.

The board’s stance on AI remains that it is not good for assessment itself in the short term but has potential in marking the marker, Tes understands.

AQA said: “To date, we have tested AI’s potential use in marking through a specific research project.

“This summer we will build on this by repeating the exercise using the live data from the summer series. However, this will be purely for research purposes - we will not use the results for awarding in any way.”

AI marking ‘has enormous potential’

Tom Richmond, founder and director of education think tank EDSK, said: “I don’t think there is any desire among teachers or leaders to remove humans from the exam marking process altogether.”

But he added that the use of AI has “enormous potential” to add value to the examination system, given the “difficulties in recruiting enough exam markers, as well as the desire for greater reliability in the marking process”.

Mr Richmond said that it remains to be seen whether AI marking could extend from maths and sciences into the humanities, “seeing as long essays in subjects such as English and history present the greatest barriers to reliable marking”.

‘Still work to do’ on AI feedback

AQA has also explored whether AI is able to give effective feedback, but Mr Scharaschkin admitted that there was still “work to do” on this.

“AI systems could reproduce the marking scheme - which isn’t very helpful for the students - and didn’t understand the mark scheme in many places,” he told ASCL attendees.

High-stakes exam questions are not the best use of examples for testing feedback, Mr Scharaschkin added.

He was speaking after AQA last week delayed the rollout of digital languages exams.