How, Exactly, Is Research Supposed to Improve Education?
This week Dylan Wiliam, eclectic Wales native and emeritus professor at University College London, takes over the blog. Dylan began his career as a math teacher in London (having followed his "jazz-folk" band to the capital city) before eventually stumbling into academe. His books include Creating the Schools Our Children Need and Leadership for Teacher Learning. Across a varied career, he has taught in urban public schools, directed a large-scale testing program, spent three years as senior research director at the Educational Testing Service (ETS), and served a number of roles in university administration, including dean of a school of education at King's College London. Dylan, whose advice usually costs a pretty penny, will spend the week offering pro bono thoughts and tips to educators struggling to get school going this fall.
As educators, schools and districts work to overcome the damage to students' education caused by the coronavirus pandemic, it seems obvious that our efforts should be guided by the best evidence we can find about what is likely to be most effective.
The good news is that, over the last 20 or so years, there have been substantial improvements in the way that research findings are summarized and made available to practitioners and policymakers. Increasingly, educational researchers have stepped out of their "ivory towers" and tackled issues of immediate and direct relevance. And, just as importantly, they have taken seriously the task of communicating their research findings to those who might actually use them in real settings (see, for example, here and here).
The bad news is that producing meaningful syntheses of research turns out to be much more difficult in education than it is in, say, medicine. Too little attention is given to figuring out how even well-established research findings might be implemented in real schools and classrooms.
Let's start with research synthesis. For many years, research reviews typically took a "narrative" approach. Researchers read the relevant studies and then figured out the best story to tell. Some reviewers were more systematic, tallying the number of positive and negative results in a particular field, but such an approach treated all experiments as if they were the same size and had the same size of impact. In 1976, Gene V. Glass proposed a technique that he called "meta-analysis" by which the results of different studies could be expressed on a common metric, called "effect size," and thus compared more meaningfully. This approach is now the standard approach to research synthesis in the health sciences.
Unfortunately, as I explain here, meta-analysis is much harder to do well in education for a variety of reasons. Combining different reports on "cooperative learning" might group together studies with very different approaches to cooperative learning, studies with younger students tend to produce larger effect sizes, and different ways of assessing student achievement can produce very different results for the same experiment. Also, since it is much easier to get studies published if the results are dramatic, the studies that are published tend to be the ones where everything went just right, so the published studies tend to overstate the effects in other settings.
These problems are compounded when the results of different meta-analyses are combined in a process sometimes called "meta-meta-analysis." The number of assumptions made in these analyses make it impossible to determine what is really going on. The whole thing is reminiscent of the old joke about someone who, after a speed-reading course, said, "I was able to go through War and Peace in 20 minutes. It's about Russia."
Even when studies are conducted and reported carefully, it is not at all obvious that the results would be relevant across different contexts. While it may seem obvious that studies conducted on undergraduate students are unlikely to provide meaningful insights about how to teach kindergarten, or that those conducted in urban settings may not generalize to rural settings, that intuition can easily get lost when trying to digest the large amounts of information presented in a meta-analysis. It is also important to note that even when we have a well-designed randomized control trial, all we know about is the differences between the control and experimental groups. The schools and districts that chose to participate in the experiment may be different from those that did not.
Class-size-reduction studies are a case in point. Any class-size-reduction program requires additional teachers, so the quality of those teachers is a crucial determinant of the success of the program. Probably the best known such study—the Tennessee STAR study—required recruiting an extra 50 teachers, and it is reasonable to suppose that these were as good as those already employed. However, when such a program requires hundreds, or thousands of extra teachers, it is not at all clear that the additional teachers employed will be as good as those already working in the schools. Worse, the quality of available teachers is likely to vary from district to district. A class-size-reduction program may increase student achievement in one district, yet reduce it in another, due to the difficulty of recruiting good teachers. Similar arguments apply to multitiered systems of support. More intensive instruction in smaller groups is likely to be effective when those teaching the smaller groups are as effective as those teaching the class from which the students were withdrawn. But if those teaching the smaller groups are less effective, then a multitiered system may actually reduce student achievement.
The important point here is that those "on the ground"—the administrators in that district—will know far more about teacher recruitment than those in a state department of education. They have to look at whether the research solves a problem that the district has, how much extra student achievement will be generated, how much it will cost (in money and in educator time), and whether it is possible to implement the reform in their own setting. Research is essential in helping districts avoid things that are unlikely to benefit students, like catering to students' preferred learning styles, and can identify some "best bets" for schools and districts, but research can never provide step-by-step instructions on how to improve student outcomes.
To simplify somewhat, everything works somewhere, and nothing works everywhere. The important question is, "Under what circumstances might this work" and that is something that those "on the ground" are best placed to determine.
Educators cannot afford to ignore research evidence—there are just too many "blind alleys" that look attractive but are unlikely to help students—but they have to choose judiciously. Some interventions may have small effects, but if they do not take up too much time, they can be highly cost-effective. Others may have larger effects but will take time and energy to implement, and, crucially, what works best for one district may not work for the next district. It is imperative, now in the midst of unprecedented educational challenges more than ever, that district leaders become "critical consumers" of research.