Sats: How to fix Year 6 writing moderation
“The devious spirit towered over the dead corpse.”
Is this sentence, written by an 11-year-old pupil, an example of working at “greater depth”?
The vocabulary choice appears effective and it hits the “appropriate register”.
But “dead corpse”? The word “dead” is redundant: it’s a corpse; of course the person is dead. And that indicates a lack of control. So, is the pupil just working at the expected standard?
Debates like the above will be returning to the Year 6 classroom when writing moderation for key stage 2 begins again. The dates for 2023 have been released as part of the key stage 2 assessment and reporting arrangements guidance.
The process might seem like a brutal way to judge the ability of a primary school child. You might even start to wonder how your own writing would fare under this scrutiny. Yet this is the reality of Sats moderation - assessments carried out in school are subject to an external moderation process, which consists of 25 per cent of primary schools receiving a visit from their local authority to have their judgements checked.
More on Sats:
- Sats 2022: “Disappointing but expected” drop in attainment
- Time to stop setting Sats targets
- KS2 Sats results: what secondary leaders need to know
It’s a process that was back in the spotlight in the early summer after the publication of the KS2 writing results, which took a significant dip from the 2019 results: the proportion of pupils meeting the expected standard dropped 9 percentage points to 69 per cent.
This figure has been, quite understandably, used as evidence to demonstrate the impact of learning loss on primary pupils during the pandemic.
But while the moderation process that underpins these results is straightforward in theory, in practice there are many concerns around inconsistencies in the judgements and metrics being applied. This makes using the data from KS2 Sats scores complicated.
After all, some argue that we cannot call a process “moderation” when there are clear differences between not just the method used to moderate but also the conditions in which the assessment is undertaken.
Whether a pupil is “working towards” or “working at” the expected standard can vary depending on the postcode of the school, too, as the local authority’s moderators can vary in their approach: what counts as “working towards” in one school won’t cut it 30 miles down the road in another LA.
All this means many are questioning why we bother with moderation at all.
Sats writing moderation: why do it?
The approach that schools are currently working with is relatively new.
Before 2012, writing levels were decided using a one-off writing task undertaken in test conditions, alongside maths and reading papers. This was externally marked using national curriculum levels.
When the external assessments were dropped in 2012 in favour of teacher assessment, schools were issued guidance to assign national curriculum levels to pupils based on a portfolio of evidence. Then, in 2015, levels were dropped altogether.
Speaking at the time, Nick Gibb, who was then schools minister, said that schools “need to develop their own assessments which provide clear evidence of attainment and progression” and that “levels were just too vague and imprecise”.
Instead, schools could grade writing as either:
- “Working towards the expected standard”.
- “Working at the expected standard”.
- “Greater depth”.
Broader strokes, and more straightforward assessment, right? Perhaps - but how do you ensure consistency in a system that is all about perception and the eyes of the marker?
This is where moderation comes in. In theory, it works as follows:
-
By the middle of May, schools will be told if they’re on the list for a visit from the LA moderator. Guidance from the Department for Education says schools can expect to be visited once every four years, and more frequently if necessary - for example, if they have a teacher new to Year 6.
-
External moderation happens during a three-week window in June, and the list of pupils who will be in the moderation sample (15 per cent of the cohort or five pupils, whichever is the greater) is only shared before the visit takes place.
-
Moderation happens before marks are submitted to the LA, and all schools submit their teacher assessment results by the end of June.
-
The LA moderators are “standardised” using a set of exercises to gain Standards and Testing Agency (STA) approval to moderate, and these are undertaken annually.
-
During an external moderation visit, each child in the sample will be discussed individually, with the teacher sharing examples of their writing with the moderator. A professional discussion takes place on whether the moderator agrees with the judgement made by the school.
-
For schools not being externally moderated, internal moderation should take place, but this is non-statutory.
All this sounds fine in theory, but it still does not accommodate the fact that judgements will vary between moderators, and that will invariably create issues that, while small on the individual level, can add up to major concerns.
The problem of consistency
As our example of the “dead corpse” demonstrates, even with the efforts made to standardise the moderators, how you judge a piece of work can be very subjective and teachers report inconsistencies in their experience with the external moderation.
The teacher whose pupil penned those words - Gillian Fraser, a Year 6 lead at an all-through school in London - says she very much found that the outcome depended on the attitude of the moderator.
“This year, there was an atmosphere of challenge in the room,” she says.
“When they looked at the story, they told me that a ‘greater depth writer’ would know not to put these two words together…they mean the same thing. And yet, in previous years I had pupils marked as ‘greater depth’ whose work had similar ‘errors’.”
“Theoretically, the approach to moderation should be the same - but, in reality, it isn’t”
It’s because of “luck of the draw” experiences like this that Matilda Browne, headteacher of Reach Primary Feltham in West London, says the process needs to be changed.
“I think it really matters if LAs are not moderating in a similar way,” she says. “The training is DfE branded - so, theoretically, the approach to moderation should be the same. But, in reality, it isn’t.”
So, where do teachers see the opinions of moderators differing? Browne gives the example of spelling lists.
“Spelling tests are not always accepted by moderators as evidence that the child has hit the expected standard,” she explains.
“So, although it would be fine with one moderator in one LA, you might have another school in a different LA with a different moderator, and they won’t accept it.”
And it’s not just spelling lists. When speaking with teachers for this article, we were given a range of examples of inconsistencies: whether or not greater-depth pupils can make grammatical errors, if peer-assessment corrections were accepted and the number of times a standard has to be evidenced.
So, how can this problem be overcome?
Better training for moderators?
Browne says we should start by changing training for the moderators. She says when the moderators are being trained, the process they’re tested on is the equivalent of ticking answers without asking to see the working out.
“When a teacher is doing the moderator qualification assessment, they share their judgement, but no one checks how they came to those conclusions,” she says. “The judgement is checked but not the process.”
Browne says adjusting the process to ensure that the trainee moderators explain the rationale behind their judgements would ensure “more consistency” and that “we could be sure the standards are being applied in the way they were intended”.
She also says the current system could be “tweaked” to “increase accountability and ensure consistency” by moving to a “peer moderation model”.
“This would mean all schools would be moderated in a more supportive process where every teacher in Year 6 was moderated with a peer,” she says.
“All Year 6 teachers would need to sign up for a half-day moderation session and the LA would pair the more experienced with the less experienced, and then there would be an official moderator who has passed an assessment, and they would check the process as an ‘arbitrator’.”
Under this model, Browne says teachers would be given more guidance, such as suggestions for how evidence can be gathered, and it would allow them to share resources to create a more collegiate professional process.
Problems with the mark scheme
Of course, no matter what process you put in place, if your starting point is a mark scheme that isn’t fit for purpose, then the moderation process will be doomed.
When the writing test moved from exam to teacher assessment, the mark scheme changed from something designed to mark a single piece of work to a set of descriptors designed to come to a judgement on a portfolio of work collected over time.
In practice, though, that isn’t how teachers mark - as primary headteacher Nigel Attwood, from Bellfield Junior School in Birmingham, makes clear.
“The expected standards have been created to evaluate a portfolio of work across time - so you’re forcing people to use a mark scheme not designed for a single piece of work,” he notes.
“No one really has any idea what greater depth actually looks like”
In particular, he says trying to fit the three descriptors - “working towards the expected standard”, “working at the expected standard” and “greater depth” - to all pieces of work is very hard.
“For example, where it says pupils need to demonstrate an ability to shift in formality, that isn’t easy to do in every single text style. And then the strictness around how often you have to evidence that skill makes it hard to judge whether or not that standard has been achieved,” Attwood adds.
Not only does he think it is a struggle to apply the standards to single pieces of work, he also describes some of the wording in the guidance as “pretentious”.
“Although some is easy to follow, the wording in places is not clear at all; for example, where it says ‘use verb tenses consistently and correctly throughout their writing’…does this mean in every single piece of writing? Is one error not allowed? Children will have strengths and weaknesses in different types of genres. It is tricky to apply to all.”
What could be changed?
Attwood says the criteria need to be rewritten and “simplified”.
“I would look at changing the wording in the criteria and build in an allowance for different styles and genres,” he says.
Maaria Khan is a deputy head and KS2 teacher at a primary school in Yorkshire, where she is also English lead.
Khan also feels the mark scheme itself needs to be changed, and says the “greater depth” statements are “vague”, with some focusing on the criteria for creativity and flair and others placing more importance on grammatical features.
“Teachers go into the moderation meeting never truly knowing if the children will be awarded ‘greater depth’ because you can never be sure what that specific moderator will be looking for across those main criteria,” she says.
Fraser agrees, and argues that “greater depth”, in particular, needs more specific guidance and examples.
“The statements are ambiguous, and no one really has any idea what greater depth actually looks like,” she says.
“They talk about a ‘voice,’ but what about actually giving concrete reasons why a child is or isn’t ‘greater depth’? It needs to be more specific to stop creating a game out of achieving ‘greater depth’.”
However, this is not to suggest that the system is all bad. The moderation process is evolving, and the DfE has made changes to the systems in place to check up on moderators.
From this year, the STA will no longer visit a sample of LAs during the moderation process to monitor the procedures. Instead, it will provide online moderator training only.
This more relaxed approach could be the reason why some teachers speaking to Tes reported that their visits this year were more positive than in previous years.
For example, Attwood says that his school’s moderation visit in 2021 had an entirely different feel to its last visit, four years ago.
“This year, the external moderation visit was about having a professional conversation,” he says. “We had a very different experience when we were last visited and it knocked the confidence of the school. And this is good - it should be a positive meeting where dialogue is allowed to take place.”
The DfE says it welcomes this positive feedback, and that this is an area under continuous review.
“We are encouraged that the process is easier to follow this year than in previous years. We remain committed to working with partners across the system to ensure that we are continuously improving,” says a spokesperson.
However, despite this positivity it seems clear that, for many, the moderation process remains unloved.
Fears about accountability
Another big issue is accountability. Khan says primary teachers can feel “under pressure to reach a set target” and that this can lead to back-to-front approaches to teaching writing.
“[Because of the way writing moderation works] you’re then somewhat dictating your writing curriculum to get those criteria to reach the assessment grade,” Khan explains.
Alex Quigley is the national content manager at the Education Endowment Foundation, and also a former English teacher and author of Closing the Writing Gap.
He says there is “no doubt” that when “results get used to judge the school…it will directly and indirectly affect the reliability of those judgements”.
‘I don’t think the scores should inform the league tables”
Given the concerns above, Quigley is happy to advocate for a return to a writing test because it would bring “parity and fairness nationally” to the KS2 Sats process and remove the issue that moderation creates.
He concedes, though, that he would be “wary of increasing the assessment burden” on teachers by returning to this model.
He’s not the only researcher in this field to feel that better systems are possible.
Alice Bradbury is professor of sociology of education and co-director of the Helen Hamlyn Centre for Pedagogy (0-11) at University College London, and agrees that while the Sats data produced by the writing moderation might be useful, it has its limits.
“The Sats data might tell us something at the individual level about where a child is, which is potentially useful for transition and planning” - but beyond that little else, she says.
Little chance of change - for now
When Tes asked the DfE to comment on the inconsistencies in moderation, or whether a different approach might be taken to writing moderation in the future, it declined to comment.
However, Attwood feels there are steps that could be taken at a policy level to address this problem. He says the issues with moderation would be solved by decoupling the writing judgement from the overall Sats score.
“My big criticism is how the results from the writing judgements are tied into the test,” he says. “The teacher assessment shouldn’t affect their overall Sats score.”
Attwood also says the high-stakes nature of the league tables is partly to blame.
“I don’t think the scores should inform the league tables. Contexts change, and when working in areas with high levels of deprivation, you can’t compare and judge them - it should be judged on the whole-school experience,” he says.
“Performance tables are not a true representation of schools. It’s about teaching the child across a whole curriculum and life skills so they’re ready for citizenship.”
Bradbury agrees that the process of assessment for primary schools needs to be changed, and says that her research supports moving more of the Sats to teacher assessment
“In the research we conducted with More than a Score pre-Covid, we found that, generally, heads were in favour of more teacher assessment as an alternative to Sats, or a system of using tests to inform teacher assessments,” she says.
“There were workload implications noted, though, and external moderation was seen as a source of stress. However, the fact that Sats are high-stakes assessments means everyone teaches to the test, so they may not even tell us that very well.”
Replacing Sats at KS2 perhaps seems unlikely for now - so we shouldn’t expect to be picking over their “dead corpse” any time soon.
However, if we are going to rely on the outcome of Sats scores derived from a moderation process that is causing consternation to so many to inform how our youngest learners are performing, perhaps change is needed to ensure that the system is more reliable for all involved.
Grainne Hallahan is senior analyst at Tes
You need a Tes subscription to read this article
Subscribe now to read this article and get other subscriber-only content:
- Unlimited access to all Tes magazine content
- Exclusive subscriber-only stories
- Award-winning email newsletters
Already a subscriber? Log in
You need a subscription to read this article
Subscribe now to read this article and get other subscriber-only content, including:
- Unlimited access to all Tes magazine content
- Exclusive subscriber-only stories
- Award-winning email newsletters
topics in this article