Home
Archived
Education rankings: hitting a home run or swing and a miss?

Back

Education rankings: hitting a home run or swing and a miss?

Over the next fortnight, the world’s school systems will be ranked by Pisa and Timss. Yet, with increasing weight being given to these international tables in formulating education policy, more questions are being asked about the validity of the rankings and whether they actually tell us anything about school systems at all

25th November 2016, 12:00am

William Stewart

If you are a secondary school teacher in England, then from next summer your job security will ultimately depend on a measure that has arisen as a result of global education rankings.

The “world-class” standard new GCSE grade 5 that your secondary students will be expected to achieve for key accountability measures has been set to match the achievements of teenagers from Finland and other top performing nations in the Programme for International Student Assessment (Pisa).

And if you teach in a primary that is adopting an East Asian “mastery” approach to teaching in maths, then you, too, are experiencing the power of the world education league tables. It is the success of the likes of Singapore and Shanghai in these rankings that has persuaded ministers here of the need for pedagogical change.

On consecutive Tuesdays in the next fortnight, both Pisa and its rival, the Trends in International Mathematics and Science Study (Timss), will publish their latest results. You’ll see a lot in the national press and on television about which countries they have ranked top and which have fallen down the order. And you’ll see a lot about the further changes we need to make to “catch up” with those at the top.

What you are unlikely to encounter are the growing concerns over how these surveys are compiled, the reliability of the rankings that they produce and whether they are even asking the right questions. You are unlikely to be told about the research that strongly suggests ministers are making a category error by assuming that it is what happens in Asian classrooms that leads to table-topping scores, rather than a wider culture that could be far harder to replicate.

This is the other side of that story.

Transparency is key

In recent years, internet superpowers like Google and Facebook have gained enormous influence over how we live. Just this month, fears emerged that “fake news” inadvertently promoted by the two companies had helped Donald Trump to win the US presidential election. And as their power has grown, so have concerns over a lack of proper accountability and transparency about what they do.

It is hard not to see parallels with the rise of international studies like Pisa, Timss and the Progress in International Reading Literacy Study (Pirls), which have started to exert a similar level of power over what happens in schools and education ministries. As with the internet itself, the origins of these studies go all the way back to Cold War publicly funded research. And, much like the internet, they have flourished in a new globalised era.

Andreas Schleicher, the Organisation for Economic Cooperation and Development (OECD) education director who developed and runs Pisa, and his academic counterparts behind Timss and Pirls at the IEA (the International Association for the Evaluation of Educational Achievement), are not cold warriors. They did not set out to acquire power. They are mathematicians and statisticians, technocrats - “geeks”, even - who, like Facebook’s Mark Zuckerberg, developed globally applicable products that happened to be perfectly timed to take off in an interconnected world where the barriers are down.

They began as evangelists for sharing knowledge. They wanted, as Schleicher has put it, “radical openness” in education. But, as the years have passed, their projects have evolved beyond mere evidence-gathering exercises. Professors Heinz-Dieter Meyer and Aaron Benavot from the State University of New York wrote in 2013 that Pisa had “assumed a new role as arbiter of global education governance”.

It’s true that today Pisa and, to a lesser extent, Timss and Pirls have an increasingly direct influence over education policy in some of the world’s most economically advanced countries.

The “Pisa shock” experienced by Germany when it found itself much lower down the rankings than expected in 2001 has turned out to be far from unique. Wales became a more recent victim as it slid down the rankings in successive Pisa surveys in 2009 and 2012. Both countries changed their education policy as a result of their ranking.

Top-performing Pisa nations are not immune either, according to Professor Yong Zhao from the University of Oregon. “It is putting everybody at risk,” the long-standing Pisa critic tells TES. “Countries that do well are put up on a pedestal so that they are not really [willing] to reform their education for fear of losing their spot. Pisa is putting people in a very awkward position.”

He cites the example of Japan. “It tried to reform its education in the last decade,” Zhao says. “But then they saw they were slipping a bit in Pisa, so they went back on the reforms.”

And while countries like Germany and Japan may have changed education policy because of information from international surveys, there also seems to have been a shift to the point where governments are starting to see the survey ranking as an end in itself, as a key policy goal.

Last year, Nicky Morgan, then education secretary, made an ambitious pre-election pledge that England would reach the Pisa top five by 2020.

In Wales, the government is intent on “embedding Pisa skills in Welsh education” and has enlisted Schleicher to help explain to its teachers why Pisa is “so important” and how “GCSEs and the curriculum in Wales” are being changed to test “Pisa Skills”. In this case, Pisa has crossed the line between simply measuring school performance and actually determining what is taught there.

And Wales is not the only country that has attempted such a direct approach. Last year, at a seminar held by Pearson in London, attended by TES, an executive from the OECD commented: “We are increasingly getting requests from governments from all corners of the world saying ‘Help’ and ‘How can we improve our position in the league tables for Pisa?’ But, obviously, they just want us to force them to teach to the test.”

Growing scepticism

With greater power comes greater scrutiny. In 2013, TES revealed damning new allegations against Pisa from statistical and mathematical experts who said that what had become the world’s most influential education league tables were, in fact, “useless”, produced “meaningless” rankings and were compiled using techniques that were “utterly wrong”.

In response, the OECD admitted that “large variation in single country ranking positions is likely” because of the methods it used. For example, in 2009 the organisation said that the UK’s Pisa ranking out of a total of 74 countries was between 19th and 27th for reading, between 23rd and 31st for maths, and between 14th and 19th for science.

But this admission has made little difference to how rankings are reported in the media, and Pisa does not put the caveats centre stage. The resulting league tables are often taken as definitive statements on the quality of schools systems. And the stakes have become so high that a slip of a few positions caused by statistical variation might today spell the end of an education policy or even a minister’s career.

It is not just Pisa that fuels the media’s hunger for simplistic comparisons. Asked to comment on the fact that most people will consume Timss and Pirls only in terms of where their country finishes in a rank order, the studies’ joint executive director Ina Mullis admits to TES: “We know that if we left out that table there would be quite a bit of uproar. It’s an attention gatherer…The media likes that table a lot. You have to display it.”

For Pisa, despite an initial reluctance to admit any fault with its methods or to engage with some academic critics, there are signs that, this year, those concerns may have been listened to.

But the OECD has now revealed that it has started to move away from the method it uses to calculate a country’s score and ranking - the Rasch model - for the latest Pisa.

Schleicher is presenting the shift as a response to those original academic criticisms. But the time it took to admit that there was an issue to address suggests that the OECD is a world away from the “radical openness” he has promoted.

And the more that one attempts to understand the workings of international comparisons, the greater the issue of transparency becomes.

It is not just that the full details have not always turned out to be publicly available. It is that even when they are in the public domain, they are so technical, so involved, that they are often beyond the comprehension of the vast majority of people whose lives they affect.

Just as hardly anyone really understands the Google algorithm, how many teachers, politicians, educationalists or policymakers genuinely understand how Pisa works?

Do teachers realise, for example, that many of the test scores used to calculate Pisa rankings are not from real pupils answering questions, but from a computer running a statistical programme to work out what the probable answer of a pupil who didn’t actually take the test would have been?

There is a comprehension deficit for all but the most knowledgeable. And if most people don’t understand how these studies really work then how can their creators possibly be held properly accountable, whatever institutional structure is in place?

‘Potentially misleading’

The need for real accountability has become more acute as Pisa has grown increasingly bold in the way it uses its influence. In September, Schleicher made a pointed intervention into England’s contentious grammar school debate, saying that in most European countries “academic selection ultimately becomes social selection”.

These strong policy steers emanating from Pisa are not just confined to macro education policy, they are pitched at classroom level as well. And there are concerns that the study has a pedagogical agenda that is not always backed up by its own evidence.

An OECD working paper published this year states: “Pisa defines the three dimensions of good teaching as: clear, well-structured classroom management; supportive, student-oriented classroom climate; and cognitive activation with challenging content.”

Greg Ashman, a teacher in Australia and prominent edu-blogger, noted this definition as he analysed the Pisa data that the OECD used to produce a guide on classroom strategies for maths teachers published last month. But he finds that instead of supporting one of Pisa’s three pillars of good teaching, the figures actually indicate that the less student orientation there is in a country, the higher its Pisa score in maths is likely to be.

This correlation goes unmentioned in the OECD teachers’ guide, which instead recommends that “all students should have the opportunity to be exposed to some student-oriented strategies”.

Christian Bokhove, a maths education expert who has studied Pisa and Timss, argues that it is potentially misleading for Pisa to recommend classroom strategies at all because the study does not question teachers about their methods. It relies instead on questionnaires filled in by pupils.

“Not many people realise that these are not [lesson] observations,” the University of Southampton academic says. “They are not directly from the horse’s mouth.”

He further warns that there is no guarantee that the pupils who participated were taught in the same class - and even if they were, the numbers could be too small to be statistically representative.

“You can never really say something about the classroom level with Pisa,” Bokhove adds. “But they still do. They basically say more than they can actually say based on the data they collect.”

But the OECD and IEA show no signs of doing less with this data, and both continue to expand their output. Pisa is now being positioned as an assessor of school systems. This is a particularly significant shift as Schleicher has readily admitted in the past that Pisa was an assessment that looked at applied knowledge, and was never designed as a measure of school effectiveness.

“There are many different forms of a student’s work - school is one, but it can be private tutoring, it can be learning reading outside school with parents - and we should look at this holistically,” he told TES in 2012. “I agree with the criticism that you can’t say that the school system is entirely responsible for Pisa results.”

But an OECD paper called Beyond Pisa 2015 suggests that things have since changed. Pisa’s main outputs, it says, “provide internationally comparable evidence on the quality, equity and efficiency of school systems”.

Culture versus classroom

There was evidence that Pisa might not live up to that billing in a paper by John Jerrim, from the UCL Institute of Education, two years ago. It looked at the Pisa performance of pupils who were second-generation East Asian immigrants in Australia and had been educated only in Australian schools.

Jerrim found that, as a group, the test scores of these pupils, mainly from a Chinese background, outperformed almost every other participating school system in Pisa, including Singapore, Hong Kong, Taiwan and South Korea - ranked second to fifth place respectively in 2012. Only Pisa table-topper Shanghai fared better, and only by a marginal amount. Yet all the pupils had been taught in Australia, which Pisa ranked 19th, just above the global average.

To say that this poses a serious question for Pisa is an understatement. It suggests that the study may have confused correlation with causation on a grand scale, with potentially devastating results. Schleicher has spent the past six years travelling, extolling East Asian education generally and the virtues of Shanghai’s schools system, in particular, following its stellar performance in Pisa 2009 and 2012.

As a result, countries like England have invested time, energy and tens of millions of pounds into trying to learn from East Asian pedagogy. But Jerrim’s results strongly suggest that it isn’t Shanghai’s schools that have made the difference at all, but rather the background and culture of the pupils who attend them.

The finding is borne out by those with a close-up knowledge of both Eastern and Western schools systems. Richard Nunns, deputy head of Dulwich International High School, Suzhou, in China, says he feels like “punching the computer” when he reads Pisa-inspired accounts of how schools in China excel at maths teaching.

“They reinforce stereotypes rather than say how things actually are,” the maths teacher, whose daughter is going through the Chinese education system, tells TES. He says that what actually makes the difference is Chinese families and society attaching greater importance to education.

“In China, it is the job of the students to keep up with the teacher,” Nunns says. “It is the job of the stronger classmates to make sure their weaker classmates keep up. And if they are not [doing so], then it is the job of the parents to get an outside tutor.”

Jerrim’s paper also raises questions for IEA, which is trying to recruit Chinese provinces to its assessments and has seen East Asian systems top every Timss so far. Mullis admits that “the cultural differences are important to consider”.

“Improving education systems involves a lot of work and energy and perseverance and all of the things that perhaps the Asian cultures are a little more suited to, frankly,” she says.

But Zhao takes issue with any idea that Shanghai’s performance in international rankings is about an “improving education system”. The academic - who was born and educated in China and taught English there - says that if Shanghai pupils had taken part in a Pisa-like survey at any point in the past, “they would have made top scores then as well, because the magical ingredients have been present for thousands of years”.

He argues in his book Who’s Afraid of the Big Bad Dragon? Why China has the best (and worst) education system in the world that the Chinese education system is perfect for Pisa, but unsuited to nurturing the qualities needed to succeed in the 21st century.

“Unless Pisa scores are the ultimate goal of education, there is no reason to admire, envy, or copy education in China,” he writes. “Behind the illusion of excellence is an insufferable reality that the Chinese have long been trying to escape…Chinese education stifles creativity, smothers curiosity, suppresses individuality, ruins children’s health, distresses students and parents, corrupts teachers and leaders, and perpetuates social injustice and inequity.”

He accuses Schleicher of romanticising a system that subjects children to “extreme hardships, some amounting to child abuse”.

And now, as we enter another period of fevered Pisa and Timss reporting and fallout, Zhao believes things are getting worse. He fears that the risk from the Chinese-inspired “illusion” is intensifying because of Pisa’s attempts to respond to criticism about the narrowness of its assessments by measuring “21st-century skills”.

“I think Pisa might be becoming more dangerous because it now claims to measure creativity and problem-solving,” Zhao tells TES. He predicts that Chinese pupils will continue to ace these assessments but that their results will again be misinterpreted.

“Remember that whatever these tests are, they are still testing,” he says. “A test by no means reflects your true creativity - it just measures your capacity to take a creativity test. That actually can become more dangerous and drive governments to do even crazier things.”

William Stewart is news editor at TES. He tweets @wstewarttes.

‘Meaningless’ comparisons between countries

One of the surprising facts about the Programme for International Student Assessment (Pisa) is that not all participating 15-year-olds answer the same questions.

In Pisa 2006, for example, about half the participating pupils were not asked any questions on reading, and only one in 10 pupils was tested on all 28 questions. Similarly, half the participating pupils were not tested at all on maths, even though full rankings were produced for both subjects. Science, the main focus of Pisa for that year, was the only subject that all participating pupils were tested on.

The basic reason for this is straightforward - if all pupils took all of the questions, the tests would be too long. However, Pisa still assigned reading scores to 15-year-olds who did not answer any reading questions, with the same true for maths, so that there was a full set of data to calculate country scores and rankings.

The Organisation for Economic Cooperation and Development (OECD), which runs Pisa, says it is a “system-level assessment” rather than a measurement of individual achievement. So, it argues, there is nothing wrong with calculating “plausible valuables” for pupils who were never asked particular questions.

To work out what these values should be, scores from pupils who did answer the questions are fed into a statistical “scaling model”. Up to and including Pisa 2012, the scaling model Pisa used was the “Rasch model” - a choice that turned out to be the subject of huge academic controversy.

As TES revealed in 2013, Professor Svend Kreiner, a Danish expert in Rasch, argued that it was a completely unsuitable model for Pisa and wouldn’t work, because the questions used had different levels of difficulty in different countries. As a result, he said, Pisa’s comparisons between countries were “meaningless” and “useless”.

Dr Hugh Morrison, a mathematician at Queen’s University Belfast, went further and argued that the Rasch model itself was “utterly wrong” and rendered Pisa rankings “valueless”.

At the time the OECD stuck to its guns and robustly rejected the criticism.

But in a TES interview last month, Andreas Schleicher, OECD education director, took a very different stance when he was asked about academic criticisms of Pisa. “The ones that are constant are ones that have helped shape Pisa, a lot, and our thinking as well,” he said.

When TES then brought up the criticisms of Rasch and asked whether the model was still being used, Schleicher responded: “No, we have now changed to a…That’s a good example where actually over the last few cycles, we started in 2009 to modify the Rasch model and then in 2012 we used a two-parameter variant and now in 2015 we have used the full three-parameter model that those [critics] were recommending.”

According to a subsequent interview with Pisa lead analyst Miyako Ikeda, Pisa has not actually abandoned Rasch altogether. But it has moved away from it - a change she says was brought in from 2015 rather than 2009 or 2012.

According to Ikeda, Pisa used a two parameter scaling model for the first time last year. As well looking at the level of difficulty of each question, as Rasch does, the model also takes into account how effective the question is at measuring performance; for example whether any difficulty is down to its core competence or extraneous factors such as the language it is asked in.

Pisa now uses a mix of Rasch and the two-parameter model, depending on the question, in what Ikeda describes as a “hybrid” solution. But she says there are no plans to introduce a three-parameter model - which would also take into account how guessable questions are, as well as their level of difficulty and discrimination - because that would make the test too long.

The differences between Ikeda’s and Schleicher’s accounts are worth highlighting, not to be facetious but because they illustrate how difficult it can be to fully explain the highly technical, but crucial, details, even for those at the heart of Pisa.

Battle of the rankings: Pisa vs Timms

Programme for International Student Assessment (Pisa)

Published every three years since 2000. The next edition - Pisa 2015 - is to be released this year on 6 December.
Tests 15-year-olds in science (this edition’s main focus), reading, maths, financial literacy and, for the first time, collaborative problem-solving.
It includes 72 participating countries and economies.
The emphasis is on testing how pupils at the end of compulsory education can apply knowledge to real-life situations.
Run by the Organisation for Economic Cooperation and Development (OECD), an intergovernmental economic organisation with 35 member countries, founded in 1961, with a headquarters in Paris.
Other regular OECD international education studies: Teaching and Learning International Survey (Talis), which surveys teachers on their working conditions and classroom practice; and the Programme for the International Assessment of Adult Competencies (Piaac), a survey of adult skills.

Trends in International Mathematics and Science Study (Timss)

Published every four years since 1995. The next edition - Timss 2015 - is to be released this year on 29 November.
Tests 4th and 8th grade (Years 5 and 9) pupils in maths and science.
Has 57 participating countries.
The emphasis of the assessment rests on testing pupils on the contents of their school curricula.
Run by the IEA (International Association for the Evaluation of Educational Achievement), an international cooperative of 69 national research institutions and governmental research agencies, founded in 1967, with bases in the US, Amsterdam and Hamburg.
Another regular IEA international education study is the Progress in International Reading Literacy Study (Pirls), which tests 4th grade (Year 5) reading.

You need a Tes subscription to read this article

Subscribe now to read this article and get other subscriber-only content:

Unlimited access to all Tes magazine content
Exclusive subscriber-only stories
Award-winning email newsletters

Subscribe now

Already a subscriber? Log in

You need a subscription to read this article

Subscribe now to read this article and get other subscriber-only content, including:

Unlimited access to all Tes magazine content
Exclusive subscriber-only stories
Award-winning email newsletters