Home
Ofsted is no more scientific than a theatre critic

Ofsted is no more scientific than a theatre critic

Does scientific validity and reliability matter as much as people think when making a value judgement about a school?

21st May 2019, 10:55am

One of the major criticisms levelled at Ofsted inspection is its lack of reliability and validity. Many academic critics, as well as the Headteachers’ Roundtable, focus on this purported deficiency. Ofsted responds by claiming to use research to improve the validity and reliability of its inspection judgements.

Almost everyone agrees that validity and reliability matter, but do they? Can they be straightforwardly applied to the making and justification of inspection judgements? I don’t believe they can.

Accepting that no analogy is perfect, otherwise it wouldn’t be an analogy, let us think of one close to school inspection: theatre criticism.

Theatre critics comment on performance or on a “run” of performances, as school inspectors do, based on a series of observations. Critics judge the quality of the acting; likewise, inspectors comment on the quality of teaching. Critics judge how far the performance reflects the content and intentions of the play text; similarly, with Ofsted and the curriculum. Critics judge the reactions of the audience; likewise, inspectors report on students’ responses.

Critics judge the quality of what they see; so do inspectors. They do not measure what they see on any numerical scale; nor do, or can, inspectors. Critics make their judgements based partly on their previous experience of similar, though never identical, productions; likewise their educational counterparts.

The criteria theatre critics use are largely intuitive, impressionistic, and cannot be reduced to a tick list of agreed items. The same is true of school inspection, as revealed by the weaknesses of all inspection frameworks that have attempted to characterise "quality".

Significantly for those in the acting profession, critics’ reports can influence the run of a play; they can help determine the reputation of individual actors and directors. There is a clear parallel here with high-stakes inspection.

Lastly, it is perfectly reasonable, and perhaps to be expected, that different theatre critics will use their expert judgement to reach legitimately different interpretations and judgements of aspects of the same performance – as indeed do inspectors.

Does it make sense to ask of theatre criticism that it be valid and reliable in the way such terms are usually used in educational assessment? I would argue not. It is a value-laden enterprise with the concept of "quality at its heart" and thus subject to a different kind of assessment logic.

A question of reliability

The terms validity and reliability are most often used to describe the characteristics of tests or measures. A test or measurement is reliable when it gives the same repeated result under the same conditions. But theatre criticism does not involve tests or measures, and the performances with which it is concerned are not the same from one day to the next. It cannot be reliable in the sense that a test or measure can reliable.

It is, of course, possible to ask whether critics can be relied upon to offer sound judgements, but this involves consideration of the depth of their expertise, the breadth of their experience, and how far their appraisals were shared by others and borne out by subsequent events.

There is a clear parallel here with school inspection.

A test is valid if it measures what it claims to measure. But a piece of theatre criticism cannot be valid or invalid as a measure since it does not measure, but instead appraises – a subtly different activity. It can, however, be judged in terms of whether it has “a sound basis in logic or fact” – “valid”, as defined in the Concise Oxford English Dictionary.

The same is true of school inspection. It needs to have a sound evidential basis reflecting the reality of the schools being inspected and needs to be valid in that basic sense, but more than that is required. Crucially, Ofsted’s appraisal involves judging the worthwhileness of what has been observed and that’s not a matter of validity, as normally used in educational assessment.

Just as it does not make sense to criticise the art of theatre criticism for its lack of reliability and validity, so it is with the art of school inspection. Both involve value judgements that are not susceptible to measurement, and thus to claims of reliability and validity, as usually used in educational assessment. But they are susceptible to informed criticism from those with appropriate experience and expertise.

Ofsted’s academic critics need to recognise this, as do those within the organisation who are trying (vainly) to bolster its "academic" and "scientific" credentials.