2015-08-13

(Note: A version of this article was first published in Scientific American, under the title “A New Vision for Testing.”)

Who was the first American to orbit Earth?
A) Neil Armstrong
B) Yuri Gagarin
C) John Glenn
D) Nikita Khrushchev

In schools across the U.S., multiple-choice questions such as this one provoke anxiety, even dread. Their appearance means it’s testing time, and tests are big, important, excruciatingly unpleasant events

But not at Columbia Middle School in Illinois, in the classroom of eighth grade history teacher Patrice Bain. Bain has lively blue eyes, a quick smile, and spiky platinum hair that manages to look both punkish and pixieish at the same time. After displaying the question on a smartboard, she pauses as her students enter their responses on numbered devices known as clickers.

“Okay, has everyone put in their answers?” she asks. “Number 19, we’re waiting on you!” Hurriedly, 19 punches in a selection, and together Bain and her students look over the class’s responses, now displayed at the bottom of the smartboard screen. “Most of you got it—John Glenn—very nice.” She chuckles and shakes her head at the answer three of her students have submitted. “Oh, my darlings,” says Bain in playful reproach. “Khrushchev was not an astronaut!”

Bain moves on to the next question, briskly repeating the process of asking, answering and explaining as she and her students work through the decade of the 1960s.

The failed Bay of Pigs invasion involved the United States and which country?
A) Honduras
B) Haiti
C) Cuba
D) Guatemala

When every student gives the correct answer, the class members raise their hands and wiggle their fingers in unison, an exuberant gesture they call “spirit fingers.” This is the case with the Bay of Pigs question: every student nails it.

“All right!” Bain enthuses. “That’s our fifth spirit fingers today!”

The banter in Bain’s classroom is a world away from the tense standoffs at public schools around the country. Since the enactment of No Child Left Behind in 2002, parents’ and teachers’ opposition to the law’s mandate to test “every child, every year” in grades three through eight has been intensifying. A growing number of parents are withdrawing their children from the annual state tests; the epicenter of the “opt-out” movement may be New York State, where as many as 90 percent of students in some districts reportedly refused to take the year-end examination last spring. Critics of U.S. schools’ heavy emphasis on testing charge that the high-stakes assessments inflict anxiety on students and teachers, turning classrooms into test-preparation factories instead of laboratories of genuine, meaningful learning.

In the always polarizing debate over how American students should be educated, testing has become the most controversial issue of all. Yet a crucial piece has been largely missing from the discussion so far. Research in cognitive science and psychology shows that testing, done right, can be an exceptionally effective way to learn. Taking tests, as well as engaging in well-designed activities before and after tests, can produce better recall of facts—and deeper and more complex understanding—than an education without exams. But a testing regime that actively supports learning, in addition to simply assessing, would look very different from the way American schools “do” testing today.

What Bain is doing in her classroom is called retrieval practice. The method has a well-established base of empirical support in the academic literature, going back almost 100 years—but Bain, unaware of this research, worked out something very similar on her own over the course of a 21-year career in the classroom.

“I’ve been told I’m a wonderful teacher, which is nice to hear, but at the same time I feel the need to tell people: ‘No, it’s not me—it’s the method,’ ” says Bain in an interview after her class has ended. “I felt my way into this approach, and I’ve seen it work such wonders that I want to get up on a mountaintop and shout so everyone can hear me: ‘You should be doing this, too!’ But it’s been hard to persuade other teachers to try it.”

Then, eight years ago, she met Mark McDaniel through a mutual acquaintance. McDaniel is a psychology professor at Washington University in St. Louis, a half an hour’s drive from Bain’s school. McDaniel had started to describe to Bain his research on retrieval practice when she broke in with an exclamation. “Patrice said, ‘I do that in my classroom! It works!’” McDaniel recalls. He went on to explain to Bain that what he and his colleagues refer to as retrieval practice is, essentially, testing. “We used to call it ‘the testing effect’ until we got smart and realized that no teacher or parent would want to touch a technique that had the word ‘test’ in it,” McDaniel notes now.

Retrieval practice does not use testing as a tool of assessment. Rather, it treats tests as occasions for learning, which makes sense only once we recognize that we have misunderstood the nature of testing. We think of tests as a kind of dipstick that we insert into a student’s head, an indicator that tells us how high the level of knowledge has risen in there—when in fact, every time a student calls up knowledge from memory, that memory changes. Its mental representation becomes stronger, more stable and more accessible.

Why would this be? It makes sense considering that we couldn’t possibly remember everything we encounter, says Jeffrey Karpicke, a professor of cognitive psychology at Purdue University. Given that our memory is necessarily selective, the usefulness of a fact or idea—as demonstrated by how often we’ve had reason to recall it—makes a sound basis for selection. “Our minds are sensitive to the likelihood that we’ll need knowledge at a future time, and if we retrieve a piece of information now, there’s a good chance we’ll need it again,” Karpicke explains. “The process of retrieving a memory alters that memory in anticipation of demands we may encounter in the future.”

Studies employing functional magnetic resonance imaging of the brain are beginning to reveal the neural mechanisms behind the testing effect. In the handful of studies that have been conducted so far, scientists have found that calling up information from memory, as compared with simply restudying it, produces higher levels of activity in particular areas of the brain. These brain regions are associated with the so-called consolidation, or stabilization, of memories, and with the generation of cues that make memories readily accessible later on. Across several studies, researchers have demonstrated that the more active these regions are during an initial learning session, the more successful is study participants’ recall weeks or months later.

According to Karpicke, retrieving is the principal way learning happens. “Recalling information we’ve already stored in memory is a more powerful learning event than storing that information in the first place,” he says. “Retrieval is ultimately the process that makes new memories stick.” Not only does retrieval practice help students remember the specific information they retrieved, it also improves retention for related information that was not directly tested. Researchers theorize that while sifting through our mind for the particular piece of information we are trying to recollect, we call up associated memories, and in so doing, strengthen them as well. Retrieval practice also helps to prevent students from confusing the material they are currently learning with material they learned previously, and even appears to prepare students’ minds to absorb the material still more thoroughly when they encounter it again after testing (a phenomenon researchers call “test-potentiated learning”).

Hundreds of studies have demonstrated that retrieval practice is better at improving retention than just about any other method learners could use. To cite one example: in a study published in 2008 by Karpicke and his mentor, Henry Roediger III of Washington University, the authors reported that students who quizzed themselves on vocabulary terms remembered 80 percent of the words later on, whereas students who studied the words by repeatedly reading them over remembered only about a third of the words. Retrieval practice is especially powerful compared with students’ most favored study strategies: highlighting and rereading their notes and textbooks, practices that a recent review found to be among the least effective.

And testing does not merely enhance the recall of isolated facts. The process of pulling up information from memory also fosters what researchers call deep learning. Students engaging in deep learning are able to draw inferences from, and make connections among, the facts they know and are able to apply their knowledge in varied contexts (a process learning scientists refer to as transfer). In an article published in 2011 in the journal Science, Karpicke and his Purdue colleague Janell Blunt explicitly compared retrieval practice with a study technique known as concept mapping. An activity favored by many teachers as a way to promote deep learning, concept mapping asks students to draw a diagram that depicts the body of knowledge they are learning, with the relations among concepts represented by links among nodes, like roads linking cities on a map.

In their study, Karpicke and Blunt directed groups of undergraduate volunteers—200 in all—to read a passage taken from a science textbook. One group was then asked to create a concept map while referring to the text; another group was asked to recall, from memory, as much information as they could from the text they had just read. On a test given to all the students a week later, the retrieval-practice group was better able to recall the concepts presented in the text than the concept-mapping group. More striking, the former group was also better able to draw inferences and make connections among multiple concepts contained in the text. Overall, Karpicke and Blunt concluded, retrieval practice was about 50 percent more effective at promoting both factual and deep learning.

Transfer—the ability to take knowledge learned in one context and apply it to another—is the ultimate goal of deep learning. In an article published in 2010, University of Texas at Austin psychologist Andrew Butler demonstrated that retrieval practice promotes transfer better than the conventional approach of studying by rereading. In Butler’s experiment, students engaged either in rereading or in retrieval practice after reading a text that pertained to one “knowledge domain”—in this case, bats’ use of sound waves to find their way around. A week later, the students were asked to transfer what they had learned about bats to a second knowledge domain: the navigational use of sound waves by submarines. Students who had quizzed themselves on the original text about bats were better able to transfer their bat learning to submarines.

Robust though such findings are, they were until recently almost exclusively made in the laboratory, with college students as subjects. McDaniel had long wanted to apply retrieval practice in real-world schools, but gaining access to K–12 classrooms was a challenge. With Bain’s help, McDaniel and two of his Washington University colleagues, Roediger and Kathleen McDermott, set up a randomized controlled trial at Columbia Middle School that ultimately involved nine teachers and more than 1,400 students. During the course of the experiment, sixth, seventh and eighth graders learned about science and social studies in one of two ways: 1) material was presented once, then teachers reviewed it with students three times; 2) material was presented once, and students were quizzed on it three times (using clickers like the ones in Bain’s current classroom).

When the results of students’ regular unit tests were calculated, the difference between the two approaches was clear: students earned an average grade of C+ on material that had been reviewed, and A– on material that had been quizzed. On a follow-up test administered eight months later, students still remembered the information they had been quizzed on much better than the information they had reviewed.

“I had always thought of tests as a way to assess—not as a way to learn—so initially I was skeptical,” says Andria Matzenbacher, a former teacher at Columbia who now works as an instructional designer. “But I was blown away by the difference retrieval practice made in the students’ performance.” Bain, for one, was not surprised. “I knew that this method works, but it was good to see it proven scientifically,” she says. McDaniel, Roediger and McDermott eventually extended the study to nearby Columbia High School, where quizzing generated similarly impressive results. In an effort to make retrieval practice a common strategy in classrooms across the country, the Washington University team (with the help of research associate Pooja K. Agarwal, now at Harvard University) developed a manual for teachers, How to Use Retrieval Practice to Improve Learning.

Even with the weight of evidence behind them, however, advocates of retrieval practice must still contend with a reflexively negative reaction to testing among many teachers and parents. They also encounter a more thoughtful objection, which goes something like this: American students are tested so much already—far more often than students in other countries, such as Finland and Singapore, which regularly place well ahead of the U.S. in international evaluations. If testing is such a great way to learn, why aren’t our students doing better?

Marsha Lovett has a ready answer to that question. Lovett, director of the Eberly Center for Teaching Excellence and Educational Innovation at Carnegie Mellon University, is an expert on “metacognition”—the capacity to think about our own learning, to be aware of what we know and do not know, and to use that awareness to effectively manage the learning process.

Yes, Lovett says, American students take a lot of tests. It is what happens afterward—or more precisely, what does not happen—that causes these tests to fail to function as learning opportunities. Students often receive little information about what they got right and what they got wrong. “That kind of item-by-item feedback is essential to learning, and we’re throwing that learning opportunity away,” she says. In addition, students are rarely prompted to reflect in a big-picture way on their preparation for, and performance on, the test. “Often students just glance at the grade and then stuff the test away somewhere and never look at it again,” Lovett says. “Again, that’s a really important learning opportunity that we’re letting go to waste.”

A few years ago, Lovett came up with a way to get students to engage in reflection after a test. She calls it an “exam wrapper.” When the instructor hands back a graded test to a student, along with it comes a piece of paper literally wrapped around the test itself. On this paper is a list of questions: a short exercise that students are expected to complete and hand in. The wrapper that Lovett designed for a math exam includes such questions as:

How much time did you spend reviewing with each of the following:
Reading class notes? _____ minutes
Reworking old homework problems? _____ minutes
Working additional problems? _____ minutes
Reading the book? _____ minutes

Now that you have looked over your exam, estimate the percentage of points you lost due to each of the following:
_____ % from not understanding a concept
_____ % from not being careful (i.e., careless mistakes)
_____ % from not being able to formulate an approach to a problem
_____ % from other reasons (please specify)

Based on the estimates above, what will you do differently in preparing for the next test? For example, will you change your study habits or try to sharpen specific skills? Please be specific. Also, what can we do to help?

The idea, Lovett says, is to get students thinking about what they did not know or did not understand, why they failed to grasp this information and how they could prepare more effectively in advance of the next test. Lovett has been promoting the use of exam wrappers to the Carnegie Mellon faculty for several years now, and a number of professors, especially in the sciences, have incorporated the technique into their courses. They hand out exam wrappers with graded exams, collect the wrappers once they are completed, and—cleverest of all—they hand back the wrappers at the time when students are preparing for the next test.

Does this practice make a difference? In 2013 Lovett published a study of exam wrappers as a chapter in the edited volume Using Reflection and Metacognition to Improve Student Learning. It re- ported that the metacognitive skills of students in classes that used exam wrappers increased more across the semester than those of students in courses that did not employ exam wrappers. In addition, an end-of-semester survey found that among students who were given exam wrappers, more than half cited specific changes they had made in their approach to learning and studying as a result of filling out the wrapper.

The practice of using exam wrappers is beginning to spread to other universities, and to K–12 schools. Lorie Xikes teaches at Riverdale High School in Fort Myers, Fla., and has used exam wrappers in her AP Biology class. When she hands back graded tests, the exam wrapper includes such questions as:

Approximately how much time did you spend preparing for the test? (BE HONEST!)

Was the TV/radio/computer on? Were you on any social media site while studying? Were you playing video games? (BE HONEST!)

Now that you have looked over the test, check the following areas that you had a hard time with:
• applying definitions ________
• lack of understanding concepts ______
• careless mistakes ________
• reading a chart or graph ________

Based on your responses to the questions above, name at least three things you will do differently in preparing for the next test. BE SPECIFIC.

“Students usually just want to know their grade, and that’s it,” Xikes says. “Having them fill out the exam wrapper makes them stop and think about how they go about getting ready for a test and whether their approach is working for them or not.”

In addition to distributing exam wrappers, Xikes also devotes class time to going over the graded exam, question by question—feedback that helps students develop the crucial capacity of “metacognitive monitoring,” that is, keeping tabs on what they know and what they still need to learn. Research on retrieval practice shows that testing can identify specific gaps in students’ knowledge, as well as puncture the general overconfidence to which students are susceptible—but only if prompt feedback is provided as a corrective.

Over time, repeated exposure to this testing-feedback loop can motivate students to develop the ability to monitor their own mental processes. Affluent students who receive a top- notch education may acquire this skill as a matter of course, but this capacity is often lacking among low-income students who attend struggling schools—holding out the hopeful possibility that retrieval practice could actually begin to close achievement gaps between the advantaged and the underprivileged.

This is just what James Pennebaker and Samuel Gosling, professors at the University of Texas at Austin, found when they instituted daily quizzes in the large psychology course they teach together. The quizzes were given online, using software that informed students whether they had responded correctly to a question immediately after they submitted an answer. The grades earned by the 901 students in the course featuring daily quizzes were, on average, about half a letter grade higher than those earned by a comparison group of 935 of Pennebaker and Gosling’s previous students, who had experienced a more traditionally designed course covering the same material.

Astonishingly, students who took the daily quizzes in their psychology class also performed better in their other courses, during the semester they were enrolled in Pennebaker and Gosling’s class and in the semesters that followed—suggesting that the frequent tests accompanied by feedback worked to improve their general skills of self-regulation. Most exciting to the professors, the daily quizzes led to a 50 percent reduction in the achievement gap, as measured by grades, among students of different social classes. “Repeated testing is a powerful practice that directly enhances learning and thinking skills, and it can be especially helpful to students who start off with a weaker academic background,” Gosling says.

Gosling and Pennebaker, who (along with U.T. graduate student Jason Ferrell) published their findings on the effects of daily quizzes in 2013 in the journal PLOS ONE, credited the “rapid, targeted, and structured feedback” that students received with boosting the effectiveness of repeated testing. And therein lies a dilemma for American public school students, who take an average of 10 standardized tests a year in grades three through eight, according to a recent study conducted by the Center for American Progress. Unlike the instructor-written tests given by the teachers and professors profiled here, standardized tests are usually sold to schools by commercial publishing companies. Scores on these tests often arrive weeks or even months after the test is taken. And to maintain the security of test items—and to use the items again on future tests—testing firms do not offer item-by-item feedback, only a rather uninformative numerical score.

There is yet another feature of standardized state tests that prevents them from being used more effectively as occasions for learning. The questions they ask are overwhelmingly of a superficial nature—which leads, almost inevitably, to superficial learning.

If the state tests currently in use in U.S. were themselves assessed on the difficulty and depth of the questions they ask, almost all of them would flunk. That is the conclusion reached by Kun Yuan and Vi-Nhuan Le, both then behavioral scientists at RAND Corporation, a nonprofit think tank. In a report published in 2012, Yuan and Le evaluated the mathematics and English language arts tests offered by 17 states, rating each question on the tests on the cognitive challenge it poses to the test taker.

The researchers used a tool called Webb’s Depth of Knowledge— created by Norman Webb, a senior scientist at the Wisconsin Center for Education Research—which identifies four levels of mental rigor, from DOK1 (simple recall), to DOK2 (application of skills and concepts), through DOK3 (reasoning and inference), and DOK4 (extended planning and investigation).

Most questions on the state tests Yuan and Le examined were at level DOK1 or DOK2. The authors used level DOK4 as their benchmark for questions that measure deeper learning, and by this standard the tests are failing utterly. Only 1 to 6 percent of students were assessed on deeper learning in reading through state tests, Yuan and Le report; 2 to 3 percent were assessed on deeper learning in writing; and 0 percent were assessed on deeper learning in mathematics. “What tests measure matters, because what’s on the tests tends to drive instruction,” observes Linda Darling-Hammond, emeritus professor at the Stanford Graduate School of Education and a national authority on learning and assessment. That is especially true, she notes, when rewards and punishments are attached to the outcomes of the tests, as is the case under the No Child Left Behind law and states’ own “accountability” measures.

According to Darling-Hammond, the provisions of No Child Left Behind effectively forced states to employ inexpensive, multiple-choice tests that could be scored by machine—and it is all but impossible, she contends, for such tests to measure deep learning. But other kinds of tests could do so. With her Stanford colleague Frank Adamson, Darling-Hammond wrote the 2014 book Beyond the Bubble Test, which describes a very different vision of assessment: tests that pose open-ended questions (the answers to which are evaluated by teachers, not machines); that call on students to develop and defend an argument; and that ask test takers to conduct a scientific experiment or construct a research report.

In the 1990s, Darling-Hammond points out, some American states had begun to administer such tests, but that effort ended with the passage of No Child Left Behind. She acknowledges that the movement toward more sophisticated tests also stalled because of concerns about logistics and cost. Still, assessing students in this way is not a pie-in-the-sky fantasy: Other nations, such as England and Australia, are doing so already. “Their students are performing the work of real scientists and historians, while our students are filling in bubbles,” Darling-Hammond says. “It’s pitiful.”

She does see some cause for optimism: A new generation of tests are being developed in the U.S. to assess how well students have met the Common Core State Standards, the set of academic benchmarks in literacy and math that have been adopted by 43 states. Two of these tests—Smarter Balanced and Partnership for Assessment of Readiness for College and Careers (PARCC)— show promise as tests of deep learning, says Darling-Hammond, pointing to a recent evaluation conducted by Joan Herman and Robert Linn, researchers at U.C.L.A.’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST).

Herman notes that both tests intend to emphasize questions at and above level 2 on Webb’s Depth of Knowledge, with at least a third of a student’s total possible score coming from questions at DOK3 and DOK4. “PARCC and Smarter Balanced may not go as far as we would have liked,” Herman conceded in a blog post last year, but “they are likely to produce a big step forward.”

Nate Kornell would like to see a big step forward in the methods students use to prepare for tests. Kornell is a professor of psychology at Williams College whose research focuses on effective learning strategies. Teachers and students often do not employ such scientifically supported strategies, he notes—in part because superficial tests do not make them necessary. He offers two examples of instructional practices that could come into wider use if tests themselves made more rigorous demands.

The first is distributed practice, or spacing out exposures to the material to be learned at intervals spread out in time. The opposite of distributed practice—cramming—is the technique that now reigns in schools, and that’s true for a reason. “When tests emphasize superficial knowledge of facts, reviewing the day before, or cramming the night before, actually works pretty well,” Kornell says. “It’s all really fresh in the students’ minds, and they can disgorge it and get a decent score on the test.”

But, he says, that crammed-in material does not stay in the student’s memory for long. “A day or two after the test, it’s gone— seriously gone, as if they’d never learned it in the first place,” he says. Tests that ask more thoughtful and complex questions would be resistant to the strategy of cramming and last minute review. Instead, students and teachers might discover the rewards of distributed practice, returning again and again to the same material while adding more depth and nuance each time.

A significant body of research shows that such distributed practice leads to more accurate and more durable learning. For example, in a study of fifth-graders published in Applied Cognitive Psychology in 2011, lead author Hailey Sobel of McGill University reported that students who learned definitions of vocabulary words on a spaced-out schedule remembered three times as many definitions as students who spent the same amount of time learning the material in a single session.

One instructor who has seen its benefits firsthand is Samantha Carr, a French teacher at Arroyo High School in El Monte, Calif. She uses a mobile flash card and quiz app called StudyBlue to spread out her students’ learning over the course of the semester. “I know that my students can pull an all-nighter and then come into class on test day and dump it out on the page,” Carr says. “But I also know that cramming like that won’t lead them to retain that knowledge over the long term. The app makes it easy for them to encounter the same material repeatedly over time, right on their own smartphones.”

Another instructional practice that might attract more adherents if tests were made more challenging is interleaving, or mixing up different types of problems during practice sessions. The way math is currently taught, “students typically work a set of practice problems devoted to the immediately preceding lesson, which means they often know the appropriate strategy for each problem before they read the problem,” says Doug Rohrer, professor of psychology at the University of South Florida. “For instance, students might watch their teacher solve a few equations by factoring, and then solve a dozen equations by factoring.”

When problem sets are arranged this way, students’ performance rapidly improves. If they take a test with a similar format, they will likely do well. But they haven’t learned much that will stick with them in the long term, because they haven’t practiced the most important skill of all: figuring out at the outset what type of problem this is. In order to give students practice at this kind of discernment, problems within class and homework assignments need to be interleaved—mixed up, all jumbled together, so that the student never knows in advance which type of problem she will be confronting but needs to figure it out afresh each time.

Rohrer has conducted a number of studies of interleaving in the laboratory; more recently, he has been trying out the technique with seventh-graders at Liberty Middle School in Tampa, Fla. In a study published last year in the Journal of Educational Psychology, Rohrer and his two co-authors asked half of the 126 participating students to complete daily practice problems that were arranged by type: a set of graph problems, followed by a set of slope problems. The other students got the same problems, but mixed up: slope problems and graph problems presented in an unpredictable shuffle.

After three months all the students were led through a review session, and a day later took a test. The students who had been engaging in interleaved practice got 80 percent of the test questions right, compared with 64 percent on the part of students who had been completing blocked assignments—a not-inconsiderable difference. But the real value of interleaving became apparent when the students were tested a full month after the review session. On that test the interleaved students scored 74 percent, the blocked ones a paltry 42 percent.

Both distributed practice and interleaving enhance learning in part because they introduce what University of California, Los Angeles, psychologist Robert Bjork has termed “desirable difficulties”—that is, they make learning harder. For that reason, these techniques may require some adjustment. Jen DeMik is one of the teachers at Liberty Middle School who participated in Rohrer’s study; the experiment is now over, but she continues to interleave the problems in her students’ assignments as often as possible. “It took a while for the kids to get used to it,” DeMik notes, “but after a while they could see that it was a better way to learn.”

There’s one additional reason why testing in American schools is not promoting learning but rather diminishing it: anxiety. Students feel worried because so much is made of tests and their consequences. Teachers, too, are concerned about tests, and they can’t help but pass on this feeling to their pupils. And this anxiety has consequences for the use of tests as vehicles for learning. In a study published last year, two psychologists found that the benefits of retrieval practice were eliminated when practice testing was turned into a high-pressure, anxiety-provoking experience. Scott Hinze, of Virginia Wesleyan College, and David Rapp, of Northwestern University, reported in the journal Applied Cognitive Psychology that retrieval practice improved students’ memory on a final test only when such practice was carried out in a low-key, low pressure way.

“Low-key” and “low pressure” is certainly not how anyone would describe America’s testing regime. What we have right now are high-pressure, high-stakes tests—when what we need are lots of low-stakes or even no-stakes tests, given frequently. This is what Kathleen McDermott, the Washington University professor who co-authored the study conducted at Columbia Middle School, does with her own undergraduate psychology students. The students are given a short quiz at every class meeting, the results of which count only a small amount toward their final grade. She wants to give students a full semester’s worth of experience with retrieval practice, says McDermott, because its effects are not immediately apparent. Indeed, research shows that people intuitively feel that studying is more effective at promoting recall than practicing retrieval, even though that’s not actually the case.

McDermott reports that every year since she instituted regular retrieval practice, her end-of- semester student evaluations contain comments that carry the same gist. “The students all say, ‘The quizzes seemed annoying and pointless at first, but by the end of the course, I could see that they were really helping me learn,’” she says.

At Columbia Middle School, Patrice Bain works hard to make sure that the low- and no- stakes quizzes she gives her students are not only helpful, but downright enjoyable. In addition to her clicker quizzes, Bain also turns her eighth-graders’ homework assignments into impromptu class quizzes. She cuts up the previous night’s list of questions into strips—one question per strip—and then mixes up the strips in a small basket. She draws out five of the strips at random, which becomes the day’s quiz; the students fill out their answers on a slip of paper smaller than an index card.

It’s quick and easy for her to grade these “mini-quizzes,” says Bain, and each one counts for just a fraction of the student’s grade for the course. Each day’s strips then go into a bigger basket, and every couple of weeks Bain gives the class a “big basket quiz”—a technique that manages to combine, in one low-tech package, both retrieval practice and distributed practice, since students are as apt to hear questions from last month as from yesterday.

Annelise Koch, a 14-year- old student in Bain’s class, says all these activities make for a lively class. “Mrs. Bain really wants us to learn,” says Annelise. “She explains why she’s always quizzing us—it’s because having to pull stuff out of our memories makes it easier for us to remember it again later on.”

Annelise spoke those words back in March, when her teacher was gearing up to teach the school year’s big final unit, on the U.S. Constitution. In Illinois, eighth-graders must pass a test on the Constitution in order to go on to high school. Already, students were getting nervous about the high stakes test. Fortunately, Patrice Bain noted, research shows that a regular program of retrieval practice in the classroom actually reduces students’ test anxiety, by desensitizing them to being tested and by reassuring them that they’re ready.

Bain was already planning how she was going to help her students prepare.“We’re going to have daily clicker quizzes, lots of mini- quizzes, a big basket quiz every week. We’re going to be doing retrieval practice all the time,” said Bain. “The students are going to learn so much, and it’s going to be so much fun.”

_______________________________________________

Want to learn more?

You’ve just been reading about affirmative testing, a new approach to assessment developed by cognitive scientists and psychologists at the nation’s leading universities. Affirmative testing is a set of simple, practical techniques that neutralize the negative impact of testing, replacing it with experiences of deep understanding, empowerment, and mastery.

For the first time, these techniques are being presented in an accessible, user-friendly format: an e-course designed by acclaimed author and journalist Annie Murphy Paul, who reports on social science research for The New York Times, Time magazine, and Scientific American. The material in this course is not available anywhere else.

Through a series of lessons delivered to your email inbox or accessed on the course website, you’ll quickly get up to date on the latest research regarding affirmative testing. Each lesson takes just a few minutes a day to complete—but at the end of 28 days, you’ll find that you’ve acquired an entirely new perspective on testing: as an ideal occasion for expanding students’ cognitive and non-cognitive capacities.

You’ll also have at your disposal more than 20 practical, research-based techniques to employ with your students, enabling you to use testing to:

• Focus student attention

• Improve student memory

• Increase student motivation

• Enhance student confidence

• Relieve student anxiety

• Deepen student metacognition (that is, students’ awareness of what they know and don’t yet know)

• Bolster student perseverance and “grit”

Accompanying the course is a 100-page workbook that, when completed, forms a customized reference guide and plan of action for implementing affirmative testing in your particular classroom or school. You can start the course at any time, and work through it at your own pace.

To request an invitation to the course, email annie@anniemurphypaul.com.

About Annie

Annie Murphy Paul is a journalist, author, consultant and speaker. A frequent contributor to The New York Times, Time magazine, and Slate, Paul is the author of two previous books, The Cult of Personality and Origins, a New York Times Notable Book. Her next book, titled Brilliant: The Science of How We Get Smarter, will be published by Crown in 2017. She is the founder of BrilliantEd LLC, a training and consulting company, and speaks frequently at schools and companies around the country; her TED Talk has been viewed more than 1.3 million times. She is also lecturer at Yale University, where she teaches the craft of writing and the science of intelligence. Paul is a recipient of the Rosalyn Carter Mental Health Journalism Fellowship, the Spencer Education Reporting Fellowship, and the Bernard I. Schwartz Fellowship at the New America Foundation; she is currently a Future Tense fellow at New America. A graduate of Yale University and the Columbia University Graduate School of Journalism, she lives in New Haven, Connecticut, with her husband and children.

Show more