The Power of Formative Testing to Retrieve What We Have Learned

Home News & Diary School Blog

This is the second in a series of blogposts on Peter C. Brown, Henry L. Roediger III & Mark A. McDaniel’s seminal book on the science of learning, Make It Stick: The Science of Successful Learning (Harvard University Press, 2014). This week’s blogpost focuses on the second chapter, ‘To Learn, Retrieve’. For a discussion of the first chapter, see here.

Jonathan Beale, Researcher-in-Residence, CIRL

Retrieval practice

‘Retrieval practice’ (‘RP’) is the process of recalling from memory concepts, events or facts (p. 3). This chapter puts forward learning advantages of regular, formative testing, especially for RP. It focuses on the ‘testing effect’: the power of tests to develop retrieval as a learning tool (p. 28). Drawing upon evidence from the science of learning, the authors argue that the use of testing as an effective learning method is rare and ‘remains little understood or utilized by teachers or students’ (p. 29).

‘Formative testing’ is a form of formative assessment where students take tests to evaluate their learning during a course. Summative assessment is used to evaluate a student’s overall learning – for example, an exam taken at the end of a course. (On the distinction between formative and summative assessment, see this article; for an interesting read on how we should understand formative assessment, see this article by Dylan Wiliam.)

While the authors share the common frustration at standardised tests that are ‘given for the sole purpose of measuring learning’ (which is often the use to which summative tests are put), they worry that this ‘steers us away from appreciating one of the most potent learning tools available to us’ (p. 30). They argue that the use of testing as an effective learning method is rare and little understood (p. 29) – a claim this chapter supports using evidence from the science of learning.

Benefits of retrieval practice

The authors note that students are generally unaware of the effectiveness of RP (p. 41). The chapter puts forward the following four main benefits of RP.

1. Memory

Practising ‘retrieving new knowledge or skill from memory is a potent tool for learning and durable retention’. This is the case for anything we retrieve from memory (p. 43). Major benefits of RP are has long-term learning benefits: it helps to consolidate learning in our long-term memory and ‘interrupts’ our process of forgetting what we’ve learned (p. 43). Startling statistics are given about memory: we quickly lose around 70% of what we have just heard or read and the remaining 30% goes away more slowly (p. 28).

2. Recall

RP makes the knowledge or skills we retrieve from memory ‘easier to call up again in the future’ (p. 28). This is because the ‘act of retrieving a memory changes the memory, making it easier to retrieve again later’ (p. 41). RP can also make knowledge or skills more accessible when needed in specific contexts.

A particularly effective RP method is formative testing:

‘[T]esting, compared to rereading, can facilitate better transfer of knowledge to new contexts and problems, and … it improves one’s ability to retain and retrieve material that is related but not tested’ (pp. 41-2).

The way that formative testing can facilitate better transfer of knowledge to new contexts relates to the point in the previous chapter concerning ‘contextualisation’: learning is most effective when it is contextualised (see claim 10 in the previous blogpost).

3. Retrieval practice is better than re-studying

RP makes learning stronger and more durable than learning by re-studying (p. 28).

4. Repeating retrieval further consolidates learning

Repeating RP further consolidates learning in memory and enhances our abilities to recall knowledge or skills (p. 28):

‘Repeated retrieval … makes memories more durable [and] produces knowledge that can be retrieved more readily, in more varied settings, and applied to … wider … problems’ (p. 43).

Benefits of regular low-stakes classroom testing

The authors argue that there are several learning benefits to regular, low-stakes classroom testing. An example of low-stakes testing is a multiple-choice quiz which does not count towards the final grade.

First, low-stakes classroom testing helps with retention. Second, it benefits subsequent study. The authors cite studies showing that after tests, students typically ‘spend more time restudying the material they missed, and they learn more from it than do their peers who restudy the material without having been tested’ (p. 42). Third, students who have taken low-stakes tests have a ‘double advantage’ over those who haven’t:

1. they have ‘a more accurate sense of what they know and don’t know’;

2. they strengthen the ‘learning that accrues from retrieval practice’ (p. 42).

Concerning the first of these advantages, developing students’ awareness of what they know also helps them to identify areas where they need to improve (p. 42). As we saw in the previous blogpost (see claim 8), developing this awareness is important because we are subject to ‘knowledge illusions’, where we believe we know or have learned more than we know or have learned. Such illusions are sometimes a product of learning practices that produce such beliefs even though the practices are generally not effective for learning (pp. 15-16).

Regular low-stakes testing also,

‘improves student attendance’;
‘increases studying before class’;
‘increases attentiveness during class’;
offers an ‘antidote’ to ‘mistaking fluency … resulting from repeated readings, for mastery of the subject’ (a type of knowledge illusion);
‘enables instructors to identify gaps in students’ understanding and adapt their instruction to fill them’;
helps reduce ‘test anxiety’ because ‘no single test is a make-or-break event’ (pp. 42-3).

The benefits listed above ‘accrue whether instruction is delivered online or in the classroom’ (p. 43).

The testing effect

The authors draw upon empirical evidence to illustrate the testing effect (‘TE’). One of these is an intervention at a middle school which investigated the impact on retrieval of occasional no-stakes short quizzes (p. 33). It found that students achieved a full grade higher on material on which they had been quizzed than material on which they had not. The findings from the intervention also suggest that re-reading is not an effective learning strategy (p. 35).

Another intervention showed long-term benefits of TE: regular formative testing positively effected end-of-year exam results taken eight months after the initial intervention (p. 35).

A false dichotomy between memorisation and developing higher order thinking skills

A common objection to the use of tests is that while they are useful for memorisation – i.e., increasing propositional knowledge (knowledge of facts) – they are not particularly useful for developing higher order thinking skills. Such skills include critical and creative thinking skills.

The authors argue, however, that testing can develop high-order skills (pp. 29-30). Their argument attempts to show that the dichotomy often drawn between accruing basic knowledge and developing higher order thinking skills is false. The authors argue that for either of these to be developed, both need to be. This argument is based on the claim that higher order thinking skills related and applied to a particular subject can be better developed if one possesses a stronger knowledge of that subject. For example, when it comes to developing creative thinking skills, the ‘stronger one’s knowledge about the subject at hand, the more nuanced one’s creativity can be in addressing a new problem’ (p. 30).

Practical strategies

We can extract the following teaching and learning strategies from this chapter.

1. Make retrieval effortful

‘Effortful retrieval makes for stronger learning and retention’. The greater our cognitive effort when we successfully retrieve learning, the ‘more that learning is strengthened by retrieval’ (p. 43). The authors note that any kind of RP benefits learning, but ‘where more cognitive effort is required for retrieval, greater retention results’ (p. 41).

It is important to note that effortful retrieval has to be successful for it to be most beneficial. This helps to adumbrate what the limits of effort should be. Effortful retrieval is futile if the effort is doomed to fail; we can only make rational efforts to retrieve what it is possible to retrieve from our memories. So, to maximise the benefits of retrieval, it is best to employ methods where students make the most effort they can reasonably be expected to make to retrieve their learning.

2. Space out retrieval practice

Spacing out RP improves retention. For example, spaced tests – i.e., formative tests taking place at various stages throughout a course, rather than a condensed set of tests at the end (the latter of which is summative testing).

Spacing out RP enhances the benefits of effortful retrieval. To be ‘most effective’, the authors write, ‘retrieval must be repeated again and again, in spaced out sessions so that recall … requires some cognitive effort’ (p. 28). Delaying RP ‘is more potent for reinforcing retention than immediate practice, because delayed retrieval requires more effort’ (p. 43).

Spacing out learning involves the risk that some learning is forgotten. But the authors argue that this is beneficial for learning:

‘When retrieval practice is spaced, allowing some forgetting to occur between tests, it leads to stronger long-term retention than when it is massed’ (p. 32).

One study showed that ‘after a delay of two days, … students who took the initial test recalled more … material than those who simply restudied it’. This was also shown to happen after a week’s delay. The differences in learning benefits between spaced RP and re-studying are significant: in one study, the level of material forgotten by students reduced by 42% (p. 39).

3. Give corrective feedback

It is more beneficial for learning to give students corrective feedback after tests rather than only their mark. Corrective feedback after tests ‘strengthens retention more than testing alone does’ (p. 39); it prevents students from ‘incorrectly retaining material they have misunderstood’; and it ‘produces better learning of the correct answers’ (p. 44).

4. Delay feedback

Delaying feedback yields better long-term learning than immediate feedback. Delaying feedback both supports and can function as spaced practice: delayed feedback ‘gives the student practice that’s spaced out in time’, and spacing out RP improves retention (p. 40).

5. Schedule regular formative testing

The authors recommend scheduling formative tests: build them into the structure of a course such that the students are aware of when tests are. Schedules are most beneficial if they utilise the benefits of spaced practice. For example, a test on an area of learning is not taken too soon after another test on that same area, and adequate space is left between tests such that delayed feedback can be given to and processed by students (p. 32). It is particularly beneficial to follow a schedule of quizzes before and after lessons, and a review (p. 36).

6. Even occasional tests can be highly beneficial

There are benefits to occasional testing, and even one test during a course. The authors cite evidence that shows that even ‘a single test in a class can produce a large improvement in final exam scores, and gains in learning continue to increase as the number of tests increases’ (p. 41).

7. Use testing rather than getting students to re-study material

Formative testing is far better for learning than re-studying. The benefits are partly metacognitive: ‘students who take practice tests have a better grasp of their progress than those who simply reread the material’ (p. 44). The authors recommend no-stakes quizzing as a strategy (p. 39).

8. The most effective tests require learners to supply answers

While simple multiple-choice tests are highly beneficial for learning (p. 41), the most effective tests ‘require the learner to supply the answer’, such as those involving ‘an essay or short-answer test’. These are more effective than multiple-choice tests because supplying answers requires more effort from students. But much simpler exercises where the learner has to supply the answer can also be effective, such as learners doing RP independently using flashcards. The authors write that using flashcards appears to be ‘more effective than simple recognition tests like multiple choice or true/false tests’ (pp. 40-1).

9. Embed formative testing into a course rather than making it unexpected

Studies on university students showed that when formative tests are embedded into a course,

students enjoy the course more;
students take the quizzes more seriously;
students prepare more for quizzes;
results in quizzes tend to be better;
tests can have positive impacts on other areas of student performance (such as quality of essays and class discussions) (p. 38).

Other benefits include the following:

cumulative learning effects accrue over time ‘when course material is carried forward in a regime of quizzes across an entire semester’ (p. 39);
‘in all studies of testing that reported students’ attitudes, … students who were tested frequently rated their classes more favorably at the end … than those tested less frequently’;
students ‘who were frequently tested [in studies] reached the end of the semester on top of the material and did not need to cram for exams’ (p. 42);
in university classes where instructors incorporated low-stakes quizzing, students were reported to ‘embrace the practice’ and rated ‘their classes more favorably’ (p. 44).

10. Testing does not need to be initiated by the teacher

Self-testing and peer testing are useful learning tools. Since learning is most effective when it is effortful, self-testing is most beneficial when it demands greater effort from the student. The authors recommend simple strategies such as self-quizzing and flashcards (p. 44).

11. Even extremely simple tasks can help with retrieval

On this point, the authors discuss the ‘generation effect’: the way in which the effort required to generate a cued answer can strengthen memory. The authors illustrate through a study that showed that ‘simply asking a subject to fill in a word’s missing letters resulted in better memory of the word’ (p. 32).

12. Employ reflection as a form of practice

Reflecting upon our personal experiences with the ways we have used knowledge and skills is useful for both consolidating and developing our learning. The authors describe this process, ‘reflection’, as an ‘essential kind of learning’ (p. 26). Reflection is useful for RP, because it involves ‘retrieving knowledge and earlier training from memory’. It also involves connecting what is retrieved with ‘new experiences’, and ‘visualizing and mentally rehearsing what you might do differently next time’ (p. 27).

Reflection connects with a point in the previous chapter, that learning is strongest when we make it personal (p. 11). Why? Because we learn something best when we see how it matters – such as the ways in which it relates to our lives.

Discussion

Here are four questions we might consider.

1. Test anxiety

The authors write that regular low-stakes testing helps to reduce ‘test anxiety’ because ‘no single test is a make-or-break event’ (pp. 42-3). This only applies to courses where there is not a high-stakes test at the end.

Could regular low-stakes testing reduce test anxiety even when there is a high-stakes test at the end?

2. Delaying feedback

The authors suggest that delaying feedback yields better long-term learning than immediate feedback (p. 40). Evidence ‘shows that delaying the feedback briefly produces better long-term learning than immediate feedback’ (p. 39). With immediate feedback, ‘the learner quickly comes to depend on the continued presence of the correction’. Another theory holds that immediate feedback can become part of a task students learn, such that its later absence (e.g., in real-world settings) constitutes ‘a gap in the established pattern that disrupts performance’ (p. 40).

What are the implications for giving feedback to students as quickly as possible, as teachers are generally encouraged to do?

3. Educational technology

The pandemic has increased our reliance on technology and has resulted in far more online and hybrid teaching. This may lead to permanent and widespread new approaches towards teaching and learning after Covid-19. One risk that some educators fear is an over-reliance on technology in education.

The authors write that the benefits of low-stakes regular testing ‘accrue whether instruction is delivered online or in the classroom’ (p. 43). Their examples of effective learning strategies ‘can be done without technology’: for example, writing down facts learned after reading; using flashcards; self-quizzing; and students elaborating on explanations of the meaning of passages in texts given by other students (p. 36).

Are there lessons can we learn from this on how to best teach online without over-relying on technology?

4. Learning loss

In an intervention outlined in this chapter, the TE was shown to have positive effects on end-of-year exams taken eight months after the initial intervention. The authors suggest that the effect would ‘doubtless have been greater if the retrieval practice had continued and occurred once a month, say, in the intervening months’ (p. 35).

School closures have precipitated a significant learning loss. Recent studies revealed that a fifth of UK children did less than an hour of schoolwork a day during the lockdown earlier this year.

Should we make more use of formative testing to reduce the risk of learning loss when students spend long periods out of school?

As we explore the book further, we may return to address these questions.

Back to all blogs

Next up...

Event

Events

February 2024