Concerns around the way Pisa results are calculated have been “overblown”, an academic has concluded.
The methodology has come under fire in the past, with some academics claiming that the Programme for International Student Assessment (Pisa) is “fundamentally flawed”.
But in a blog published today, Professor John Jerrim writes that, having studied it himself, he has found that, in fact, the method used to calculate country scores is actually “OK”.
Pisa scores are partly calculated using a scaling model, which takes into account pupils’ responses on some test questions to estimate their achievement across the three domains tested - reading, science and mathematics.
Long read: Does Pisa really tell us anything useful about schools?
View: ‘No evidence more testing drives pupil anxiety,’ says Pisa boss
Analysis: London students do less well in Pisa rankings than GCSE
But critics have claimed that the scaling model is not reliable for Pisa because some questions have “different degrees of difficulty in different countries”.
How reliable is Pisa?
Even Professor Jerrim, who wrote the 2015 Pisa national report for England, Wales and Northern Ireland, has been critical of the “complexity and opaqueness” of the methodology OECD uses to create the Pisa scores.
But the UCL Institute of Education academic says his own research shows that country-level Pisa results are robust because they are not altered by changes to the scaling model.
Specifically, the research investigated changes to Pisa results if:
- Some questions were given more weight than other questions;
- Test questions that were not reached by pupils were excluded from the calculation of Pisa scores instead of being marked as incorrect;
- The difficulty of test questions was altered.
The analysis found that changes in technical aspects of the scaling model created “almost no change” in country comparisons.
“Now, none of this means that Pisa is perfect,” Professor Jerrim concludes. “Like any study, it has both strengths and limitations.
“Yet it has led me to the conclusion that concerns about the Pisa scaling methodology have been somewhat overblown.”
He acknowledges that technical details could be more transparent. But while he is still doubtful of the comparability of scores from Pisa 2015 to Pisa 2012, the methodology is not the main problem.
“The overall approach to producing the test scores seems OK, and fairly robust to some of the technical decisions made,” he said.
“Hence, to those who wish to critique the Pisa methodology, I would suggest that there are probably bigger fish to fry than the scaling model.”