‘Sats 2023 data isn’t comparable - or particularly useful’

If year-on-year comparability isn’t possible, then what purpose do KS2 tests serve, asks James Pembroke

12th July 2023, 12:00pm

The big news about the latest round of key stage 2 results was that there was no big news.

Nationally, 73 per cent of pupils met the expected standard in reading, a drop of two percentage points from the 2022 figure.

Meanwhile, writing and maths went the other way, both results increasing by two points to 71 per cent and 73 per cent respectively.

The combined measure - the percentage of pupils achieving expected standards in reading, writing and maths - remains stuck at 59 per cent.

Behind the numbers

So, what can we make of these numbers?

Let’s consider the reading test. It was widely reported that the 2023 reading paper was tough with a lot of text to read, 38 questions to answer and some fairly obscure references to bats under Texan bridges.

Social media filled with teachers expressing their concerns about the length and difficulty of the test and sure enough, this has been reflected in the pass threshold, which has dropped from 29/50 in 2022 to 24/50 in 2023.

The threshold for maths, on the other hand, fell by just two marks and grammar, punctuation and spelling saw its threshold rise by one mark.

The aim of these changes in the pass threshold is to ensure that the expected standard is maintained and that results are comparable year on year. And yet, despite the considerable drop in the pass mark for the reading test, the national proportion of pupils meeting the standard has still gone down.

Should we therefore conclude that national standards in reading have declined since last year? Or - on the basis that 2022 had the joint highest reading results since the new key stage 2 tests were introduced in 2016 - that reading was unaffected by or even benefited from the pandemic?

These are the simplistic sorts of conclusions we are likely to see reported in the press, but are they true?

Setting standards in sand

At the heart of the problem is the standards-setting process. We have come to accept performance thresholds - the expected standard, or greater depth in the case of the English primary system - as an unavoidable part of the educational furniture and we can be forgiven for thinking that they are founded upon a robust, even scientific, method.

The fact is they rely on human decision: the best judgement of a panel of people charged with the task of drawing a precise line in the sand. And then comes the challenge of attempting to redraw that line in the same place a year later despite the sands shifting with time.

It is unfair to suggest that the setting of standards is a simple process - it is highly complex and rigorous - but it is also wrong to believe we can use outcomes to reliably compare performance over time and between subjects.

Phrases like “expected standards” and the scaled scores that underpin them give an illusion of accuracy and comparability between subjects; the reality is they give us an impression of school performance, not an exact diagram.

And it is worth noting that if a different method for setting standards had been chosen back in 2016 then the results today would be higher or lower, but they would not be the same.

Have standards declined?

So, we will be confronted by headlines that suggest that standards in reading have declined, while in maths they have improved a bit but still lag a long way behind the 2019 high point. And if 59 per cent of pupils met the standard in the three core subjects, then surely this means that two-fifths of pupils can’t read, write or add up properly.

This is, of course, not true.

Not only should we bear in mind that there is little to no difference between a pupil that scores 99 on a key stage 2 test and one that scores 100 - we should also remember that if that pupil took the test on a different day or if their paper was marked by a different person then their score would also be different. Their score would also be different if they were a year older and took last year’s paper rather than this year’s.

You get the point.

In 2022, school performance data came with a health warning: due to the “uneven impact of the pandemic on 2021-22 school, college and multi-academy trust performance data, we recommend not making direct comparisons with data from previous years or between schools/colleges or MATs”.

This year sees the return of the performance tables for primary schools and with that the inevitable comparisons of school performances, but the above statement makes it clear that we should avoid doing that.

Considering the arcane and imprecise nature of setting and maintaining standards, we should probably avoid comparing one year’s result to the next as well. On top of that, we have a combination teacher assessment for writing and tests for the rest, which makes comparisons between subjects highly problematic, too.

Which begs the question: what exactly can we do with these numbers?

James Pembroke is the founder of Sig+, an independent school data consultancy