« Union Presence and Student Achievement | Main | And in Other News... »

Has the Research on Formative Assessment Been Oversold?


Over the last decade, the teacher practice of using "formative assessments" has become a huge topic of interest.

Though called assessments, in practice they're more like exercises teachers use to gather immediate feedback on whether a student is responding to an instructional technique, with reference to a particular curricular objective.

Proponents say the practice has a strong research base showing it can dramatically improve student achievement. (And now that testing companies are labeling a lot of products as "formative," it's a big moneymaking endeavor, too.)

But recently, some experts have suggested that it may be time to take a closer look at the practice and its research base. At an event held by the Educational Testing Service company earlier this month, ETS Distinguished Presidential Appointee Randy Bennett walked attendees through the research literature.

And like the child's game of telephone, something seems to have been lost in translation.

In 1998, two researchers from King's College London, Dylan Wiliam and Paul Black, published an article in the journal Assessment in Education based on a review of hundreds of studies on formative assessment. In the article, they noted that the studies were too diverse to be meaningfully summarized through a meta-analysis (a wonky term for a scientific research synthesis) into a single effect-size statistic. In fact, they noted that only a handful of the studies were quantitative ones that were rigorous.

But they went on to publish a second article in Phi Delta Kappan that alleged that the positive effect sizes of formative assessment ranged from 0.4 to 0.7 across 40 quantitative studies, a medium-to-large gain.

Subsequently, the biggest experts in the testing industry have referenced this article to support the practice of formative assessments. But they haven't gotten the details of the review right. They have called the review a meta-analysis. (It wasn't.) The effect sizes suddenly sprang up to as high as 1.0. And the number of quantitative studies supposedly reviewed jumped to 250.

And a handful of more recent studies, Mr. Bennett indicated, suffer from selection bias and other methodological issues.

"The research is not as unequivocally supportive of the effects of formative assessment as it is sometimes made to sound," he said.

What does this mean for teachers? In essence, it means formative assessment, though promising, isn't necessarily a silver bullet.

It's an old theme, but there is still a lot more work to be done to make sure that such assessments are valid, well designed, and yield useful results for teachers.


I am not convinced that "the biggest experts in the testing industry" have a firm grip on what Black and Wiliam were describing as "formative assessment." What they are selling to schools is something more like mini summative assessments, given quarterly. There are folks buying these because they look like the state assessments that they cannot figure out how to get around and somehow think that the increased "practice" will be helpful. One might suspect that such systems are made up of rejected state items that the industries have on hand. Easy to do, since they are selling the product without the kinds of rigorous scrutiny that a state testing contract is likely to require.

What Black and Wiliam seem to be describing looks a whole lot more like "backward design" and "performance based assessment," setting clear goals for student accomplishments and offering just in time reteaching opportunities as teachers realize that they don't get it. These things are intensely appropriate for the classroom, but require collaboration and planning time. It's far cheaper for a concerned superintendent to purchase some off the shelf system that gives data to upper echelons at regular intervals. More often doesn't make it formative--just more often.

We could use some formative assessment of "formative assessment." William and Black noted that "the studies were too diverse to be meaningfully summarized through a meta-analysis" (a polite way of saying that the research stinks. Randy Bennett is reported here as confirming that conclusion.

One shouldn't need a "data warehouse" to respond to this information. As Margo/Mom says, feedback mechanisms are needed that illuminate each child's instructional status with respect to a defined capability.

But academic accomplishments that are transparent don't require extrinsic, artificial. bubble-item "assessments."
Methodology for constructing such indicators is available. But the fog generated by present testing practice protects the unaccountables at the top of the EdChain.

A bit of the debate here may stem from varied definitions of what "formative assessment" really is. Some (including the textbook manufacturers mentioned) seem to think it is simply assessing more frequently. True formative assessment includes follow-up instruction. Instruction and assessment should not be viewed as mutually exclusive. A meaningful definition of formative assessment in this context is, "is a process used by teachers and students as part of instruction that provides feedback to adjust ongoing teaching and learning to improve students’ achievement of core content.' (from http://eabbey.blogspot.com/2009/05/5-characteristics-of-effective.html) Without looking at the meta-analysis myself, I wonder if each of these studies involved true formative assessment?

I am confused as to what you're actually saying. Are you suggesting that the research has been co-opted by testing companies, or that the research hasn't stood up to further scrutiny? " 'The research is not as unequivocally supportive of the effects of formative assessment as it is sometimes made to sound,' he said." That's a bit of a cliff-hanger, isn't it? Please, we need a little more detail.

Here's a quick summary of what I said related to effectiveness: The central claim being made by many well-respected educators is that "formative assessment" will improve student learning by large (even "whopping") amounts. However, the evidence for that claim of large learning gains is suspect, to say the least. After reading the original research, it's clear to me that too many people have misinterpreted it, are inaccurately describing it, and are drawing indefensible conclusions from it. My main concern is that the field will be badly disappointed, and that a promising practice will be abandoned, when the typical teacher does not experience the out-size learning gains that are being promised. The main message is two-fold: we should moderate our claims and moderate our expectations.

Comments are now closed for this post.


Most Viewed on Education Week



Recent Comments