Home
Teaching & Learning
General
How to avoid the testing traps and make assessment better

Back

How to avoid the testing traps and make assessment better

Both national and classroom assessments can become meaningless with the wrong approaches or context, argues leading expert Therese Hopfenbeck

2nd April 2025, 5:00am

How to avoid the testing traps and make assessment better

Helen Amass

Therese Hopfenbeck first became interested in the science of assessment when she was working as an English as a foreign language (EFL) teacher in Norway and happened to notice something about her students.

“I started being curious about why it was that I could design a classroom-based assessment and some students would run with it and understand everything, but if I were to use the same assessment in another class, those students would struggle,” Hopfenbeck explains.

Now professor in educational assessment at the Assessment and Evaluation Research Centre at the University of Melbourne, a visiting fellow at Kellogg College, Oxford and former lead editor of the journal Assessment in Education: principles, policy and practice, Hopfenbeck has spent much of her career investigating this area further.

Assessment in schools

She has since come to realise that the answer to the question she had as an EFL teacher, along with many of the other questions that teachers and school leaders commonly grapple with around assessment - including which methods are most effective, how to balance a need for rigour against supporting the mental health of young people and how to ensure testing is equitable - are, inevitably, “all about which context we’re talking about and knowing your students”.

“I’m still struggling with those questions,” Hopfenbeck admits, “but now, I have turned a lot of them into research.”

So, what has this research taught her about how schools can make the most of their assessments, and what might need to change about national assessment systems more broadly? Tes sat down with Hopfenbeck to find out.

Tes: How far does context really matter when we are thinking about how we use assessment?

Hopfenbeck: What I have found with a lot of the research I’ve done is that context matters so much more than we have previously acknowledged.

So much depends on which country and context we are talking about, but also, if we’re discussing the UK, for instance, it makes a huge difference whether we are talking about a class in East London or a school in Cardiff. Then socioeconomic background also plays a role.

So, why is it that an assessment might work in some classes and not in others? It is because learning and assessment are related to a number of factors that are placed together, and I don’t think everyone is acknowledging that.

Aren’t standardised tests designed to level the playing field, though?

In theory, yes. But, in reality, a lot of language experts, for instance, will point to the fact that, if we’re talking about cultural knowledge, we know that children who are growing up surrounded by books and grown-ups who read to them have a major head start in the game.

When I worked on international large-scale studies, such as the Programme for International Student Assessment, we knew there was one factor for students that correlated most highly with achievement, and that was the classic question: “How many books do you have at home?”

If you have a child who has spent a lot of time reading and being read to, not only will they have learned things from that reading, but their vocabulary will also be much more extensive than a child who has not been read to. The child who has been read to will automatically better understand a lot of the tasks on standardised tests, just because of that knowledge they have.

This is why what we call “item writers” (the people who develop tasks for standardised tests) know that to be fair, you need to have tests where most students will understand the words.

What else can help to make standardised assessment fairer?

Pisa, for instance, says it tries to have “context independent” tests. If there is a question that involves finance, it will never use real currency like pounds or dollars. Instead, it has developed its own “currency” called “zed” so that, globally, none of the kids would have any advantage.

That, again, is just saying that where we’re coming from and what we’re bringing to the assessment in terms of our prior knowledge is so important.

I think the real question we need to ask is how we find the balance between having standardised tests that are fair for everyone and making assessments that are worth teaching to.

A lot of the things that are happening in schools haven’t changed for 100 years. So, you have a classroom with 25 students. They get the same tests, and still, surprisingly, in many countries, they’re doing them on pen and paper, not online. The fairness isn’t there and it’s kind of old fashioned.

If you go to university and do your PhD, there are no PhD students being compared with the others in the same way because everyone has their own project and their own research question. That’s the real learning; that’s the deep learning. That’s what we know works.

But for children, very often, we just do it the traditional way, which doesn’t really work that well.

What does the global picture suggest about how schools can make assessment more effective overall?

What the evidence says about what works will differ depending on who you’re asking.

Although we live in a world where some politicians act like they have the truth, as scientists, we have to say, and as researchers, we have to live with this fact: there are things that we know are better than other things, but there’s always uncertainty, and we can always improve.

When I talk to teachers, I always tell them to be careful with anyone who says they’re going to sell you the Holy Grail and that this is the way of doing it, because it’s usually not. It’s more complicated than that.

What we do know, if we’re putting together a lot of the research that people trust, is that if you want to make assessment more effective, you need to develop a learning environment where children feel safe, and safe to make errors. That has a huge impact on their willingness to try new things.

We know that in those classrooms where students are not learning as much as they potentially could, sometimes it’s just because the stress and anxiety kick in and they would rather sit and pretend they understand than be called out in a situation where they are concerned that the teacher would think they’re not smart enough.

Fear of failure is a huge challenge across many classrooms, globally. So, developing and facilitating a safe environment is key.

‘You have to support teachers to know how to be good teachers. It is that hard and that easy’

And the other thing is vocabulary, as I said. Unless you have the vocabulary to sit in groups and talk together about how you understand a problem and unless you’re able to ask effective questions, you will not be good with peer assessment, you will not be good with self-assessment, you will not be good at developing your own metacognitive awareness about what you know and what you don’t know.

So, vocabulary training is still really important, and so is literacy. Being able to read well is the foundation, and good teachers are also providing that.

Sometimes, I feel that politicians can be very black or white when they’re selling their ideas: it’s either all about 21st-century skills or it’s all about knowledge. But most of the research shows that we actually need both.

You mentioned vocabulary training. Some people might interpret this as simply getting pupils to commit lots of words to memory, but it sounds like it’s also about explicitly training them in how to use language effectively. Is that right?

Yes. We’re now in an era where we are teaching our children how to use artificial intelligence ethically, and so language is even more important because dialogues and the kinds of questions we are asking are becoming more important when integrating AI into education.

In the context of learning, unless you know how to ask a deep question, rather than just questions that will get you a factual “yes” or “no” response, you will learn less. Decades of research have shown us these are the basic things that are good for learning, but we seem to forget that when new things, such as AI, arrive in the classroom and we’re panicking.

We should be talking together about how to have solid, ethical dialogues with the robots and teach our students how to do that, too. I’ve published an open access paper that goes into more detail about how we might do this.

Another recent technological development is the rise of online testing. What do we know about the implications of that?

We struggle in the field of education to have enough empirical studies showing what is working well and what is working less well.

There are a few international, large-scale studies that have tried to look at trends in countries on how assessment has worked when they have compared pen and paper and online testing.

For instance, John Jerrim at University College London has some empirical data on this. For a 2016 paper, he took data from the 2012 Pisa tests, where more than 200,000 children from 32 economies completed both paper and computer versions of the maths assessment.

He found that, in some countries, scores dropped when students were conducting the test online.

Internationally, we know this is related to access to technology and computers at home, and students’ familiarity with technology - something that was also evident during Covid-19 lockdowns, when not all students had access to their own computers for remote learning.

But when it comes back to the classroom, there are mixed results for online testing, and very often, it’s about the quality, again, of the dialogue.

If we’re talking about formative assessment processes, it’s all about how the teacher is able to have dialogues with students and whether that is online or in person; it’s about the quality of the questions being asked and how the teacher is able to facilitate that learning process.

It shows that online learning is just a tool. If you’re not good at asking questions, then online technology is not going to help you. So, you have to support teachers to know how to be good teachers, whether that’s online or in person. It is that hard and that easy.

Do you think there is such a thing as the perfect assessment system?

No. I think teaching is an art that we will always try to improve, but there will always be new challenges.

We are now in a post-Covid era where we’re still tackling a lot of mental health issues. Globally, we have more and more students persistently absent from school, and teachers are trying to figure out how to assess their learning.

For those teachers who are following the science, they know that to have a really good assessment, they need to know the student, what they’re thinking and what their misconceptions are in order to tailor feedback to the individual. And in this environment, it’s getting more and more challenging because of the complex world situation we find ourselves in.

Are there any countries that you think demonstrate particularly effective approaches to assessment that other systems could learn from?

I’ve always been very fascinated by what we call “student participation in learning” processes in Norway, and the longer I live abroad, the more I think that there’s some really good work in that.

Norway was one of the first countries to make it a legal requirement for students to be active partners in assessment. We had formative assessment movements in Norway, just like in many other countries, informed by the work of researchers like Paul Black, Dylan Wiliam, Louise Hayward and John Hattie.

But very quickly, the Norwegians said, “If we’re going to do this, students need to understand what’s going on.” So, we involved students in developing assessment criteria and made them active participants in the processes.

We evaluated this national strategy 10 years ago, and one finding was that having a balance between trust and accountability was essential for this kind of work.

You see lots of examples in Norwegian schools of students actively working together with teachers on how to develop assessments, how to use assessment criteria and how to use self- and peer assessment. I haven’t seen many other countries doing similar things on a large scale.

Does the global picture suggest there are any assessment practices that we should be trying to avoid?

We should be trying to avoid assessments that increase anxiety in children.

Every system can be abused, and tests can be given in ways that are not helpful for children at all. That can sometimes be happening on a systemic level, where people do not understand that what they are doing is actually not enhancing learning but creating anxiety.

For example, if teachers are forced to administer tests solely to collect data for accountability purposes for the school - to be used in league tables, for instance - where students do not get feedback on their learning, and there is less information about the purpose of the test, this can create uncertainty and anxiety for teachers and students.

Ethical assessment practices involve informing students of the purpose of the test; how results will be used; and how the process will inform learning for both the individual child and the class, and improve teaching.

‘We should be trying to avoid assessments that increase anxiety in children’

I don’t like to blame individuals, but sometimes, you will come across an organisation - a county, a school or a group of schools - who have developed systems that seem to have focused too much on accountability and testing in ways that hinder learning.

In part, it all goes back to the politicians. If they decide that in every year, there’s a certain amount of curricula that teachers are under pressure to get through, and there are always standardised tests hanging over them, we know from the research that it’s really hard for teachers to use some of the approaches to teaching and assessing that are good for learning.

So, a lot of things are outside of the control of teachers. But in some systems, policymakers are collaborating and working more with teachers and researchers, and that knowledge exchange is more helpful for students.

In general, is it helpful to have less standardised testing?

That depends completely on which country we’re talking about. In some countries, I don’t think they have enough standardised testing. In other countries, they have too much, so there’s not enough time for learning.

One reason why I think standardised testing is very important comes back to language and reading.

For example, we did some research where we took the datasets for England from the 2016 Progress in International Reading Literacy Study tests and looked at the same students when they were young and took the phonics screening check.

We saw that there were correlations. So, those who performed better on the screening tests when they were young fared better on the Pirls test when they were 11.

This shows that, sometimes, it’s good to screen and figure out who struggles.

The earlier you can help children who are not able to read well, the better it will be for them later. And we have some systems that are not testing children at all in the first year of school, and suddenly, when they’re 11 or 12 years old, the teacher realises that they’re struggling with numeracy and literacy and are already very much behind.

Good assessment is about knowing at which point in time it’s important to assess, what is important to assess - for instance, reading and numeracy - so you can have early interventions if needed. But then, further down the line, be more careful and ask yourself, “How much data do we need about this child? And at which point?”

Is there anything that a teacher who is working in an imperfect system can do to improve how they use assessment?

The work I’ve been doing in the past few years is about bridging self-regulation and metacognitive awareness in children with the way they are assessed. And I think the most important thing teachers can do is to understand how students are learning, know what they’re not able to do and talk to them about what they can do to improve their learning.

This is easier if teachers have time to have a dialogue with the students, but those dialogues can be chats that are happening in different online spaces. They can be within a whole classroom context or in groups.

No matter how we are doing things, talk to the children. Talk to them about what the standardised test will be like and what makes a typical task. Comfort them. Make sure that they feel safe.

Those things are not new, but they’re still very true for what makes good practice.