When I read about large educational trials, for example, the ones done by the Education Endowment Foundation (EEF), I’m often struck by how hard it is to implement interventions “correctly” - or, as researchers would sometimes say, “with fidelity”.
In part, this is because schools are highly complex organisations, with lots of contextual variables to take into account. It’s not always realistic, for example, to expect teachers to adhere to a thick manual of instructions in order to meet “fidelity” criteria.
But there’s another problem, too: if a trial is about an idea or concept that almost every teacher already uses, or at least knows about, this makes it incredibly difficult to run a fair trial.
It complicates matters, both in terms of the intervention schools implementing the programme being trialled and the comparison schools; if the latter are already using elements of the concept being tested, that arguably makes it impossible to get a true control group.
This poses a real challenge for researchers. Of course, we want to do trials of the popular approaches or programmes, but we also want comparison schools where teachers haven’t heard about them.
Over the years, I have seen this problem crop up in several EEF trials of widespread approaches.
In 2014, for example, the EEF reported on an efficacy trial for Changing Mindsets (an intervention that aims to improve attainment by developing a growth mindset in pupils) with limited effects.
One of the report’s conclusions, however, was that “there was some evidence from the process evaluation that teachers in the control groups of both trials were already aware of growth mindsets theory and using it to inform their practice”.
So, in 2019, a re-grant study was completed, again with limited effects. But, once more, the study came with the caveat that “most teachers in the comparison schools (that did not receive the intervention) were familiar with this, and over a third reported that they had attended training days based on the growth mindset approach”.
A similar thing happened with the EEF’s trial of Embedding Formative Assessment (a professional development programme that helps schools to embed formative assessment practices). The results were non-significant, but became statistically significant when the evaluators analysed a sub-group called “non-TEEP”: the group of teachers that did not have previous involvement in another intervention, called the Teacher Effectiveness Enhancement Programme (TEEP), which strongly influenced the delivery of the formative assessment trial.
One final example is the recent trial of the Read Write Inc phonics programme, which noted that schools in the control group had used similar materials, and a variety of other phonics materials - and that this may have affected the results.
We don’t want pervasiveness to become an easy excuse to ignore trial results, but with popular approaches, it will always be hard to run a decent trial.
Unfortunately, I am not sure there is a way around this problem. It is perhaps a fact we just have to live with.
It’s also worth pointing out that big randomised controlled trials are very expensive. As researchers Angus Deaton and Nancy Cartwright explain in a 2018 paper, “the lay public, and sometimes researchers, put too much trust in randomised controlled trials over other methods of investigation”.
The pair highlight how observed and unobserved variables can thwart any equalisation of groups - as illustrated by the examples I have given above.
Yet this doesn’t make such trials useless. As Deaton and Cartwright conclude, they can “play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative programme, combining with other methods, including conceptual and theoretical development, to discover not ‘what works’, but ‘why things work’.”