Today Ofsted released the results of its pilot study into inspection reliability.
Put simply, it has sought to answer the question that has weighed on many school leaders’ minds: if two different inspection teams inspected the same school at the same time, would they arrive at the same judgement?
To find the answer to this question, Ofsted did exactly that with a sample of lucky schools - conducting two inspections at once. And the answer seems to be that about nine out of 10 judgements were the same.
Firstly, Ofsted should be commended for this study, and for publishing the results. It is a signal of a more thoughtful, evidence-based approach to inspection, and it should be welcomed. Long may this continue.
Secondly, we should be cautious about drawing too many conclusions. It was a very small sample of 24 schools.
Is a reliability of 90 per cent good? It feels high. Respected education researchers, including Rob Coe, have praised it. It is probably at the upper limit of what independent human judgements about complex organisations can achieve.
But let’s flip it around; one in 10 schools received conflicting judgements. This study was conducted on short-form inspections, not full “section fives”, but if similar rates were to be found across the whole inspection system, then hundreds of schools might have got different judgements if they had been inspected by a different team.
When we consider the possible consequences of inspection, including sackings, conversions and disruption (or, indeed, praise and promotion if the judgement is positive), this is a sobering thought. A different team, a different destiny.
When we consider the consequences of inspection, including sackings, conversions and disruption (or indeed praise and promotion if the judgement is positive), this is a sobering thought. A different team, a different destiny.
The main conclusion for me, therefore, is that government agencies (as a whole, for Ofsted is not responsible for all the consequences that follow from its judgements) need to be far more humble in the judgements they form.
Inspection, like exam data, should only be one contribution to the building of a rounded picture over time.
Accountability ‘often relies on false certainties’
Too often our accountability system relies on false certainties. Those who hold schools to account should be much more cautious; they should be transparent in their methods and open to appeals and challenges; above all, given the real risk they are wrong, they should offer support before sanction.
Rather than pursuing diminishing marginal returns on reliability, Ofsted and those who use inspection results should now focus on their consequences.
Indeed, too much reliability could be a bad thing.
There are two obvious tactics for increasing reliability in inspection. One would be to err on the side of the most conservative judgements - to second-guess what other inspectors would be most likely to find. To their credit, the judgements in this reliability pilot do not appear more conservative than usual.
But the second obvious tactic to increase reliability would be to rely more on the data - to derive mechanical and automatic conclusions from the published results.
This could achieve 100 per cent reliability; the only problem is that it negates the whole point of inspection in the first place, which is to challenge what the surface data may show.
Poor results may disguise impressive efforts to turn the situation around; fantastic results may conceal unacceptable sacrifices.
So, too much reliability would be a sign of something quite wrong in the inspection process. We should be able to welcome a diversity of views and perspectives.
It is entirely possible for two credible experts to look at a school and draw out different strengths and weaknesses based on their different perspectives, and for them both to be right. In a different climate, schools would learn more rather than less from this.
So, I applaud Ofsted for studying, publishing and improving its reliability, but let’s not go too far.
Instead, we should accept a certain irreducible inconsistency in human judgements. We should reduce the consequences of these judgements to a proportionate level so that we can celebrate and use their diversity.
All agencies need a degree of humility in how they present and use their findings. This would make for weak headlines, I know, but better schools.
Russell Hobby is general secretary of the NAHT headteachers’ union. He tweets as @russellhobby
Want to keep up with the latest education news and opinion? Follow TES on Twitter and like TES on Facebook