Tomorrow, Vanderbilt University's National Center on Performance Incentives will publish the Project on Incentives in Teaching (POINT) study, reporting the results of a major three-year teacher pay experiment in the Metropolitan Nashville Public Schools. The study examines the effect of merit pay for middle school math teachers who were eligible for bonuses of up to $15,000 per year based on student test score gains.
The study will, unfortunately, tell us nothing of value.
Actually, it's worse than that. The study will confuse the issue, obscure the actual question of interest, and (depending on the results) lend either simple-minded advocates or performance-pay skeptics a cudgel that they will henceforth freely misuse in the name of "evidence." Either proponents will start asserting that crude rat-teaches-harder-for-food-pellet pay systems are now "evidence-based," or skeptics will argue that we've seen proof that performance pay "doesn't work."
In either event, I fear the study will make it more difficult to pursue sensible reform of teacher pay. This is not because the study was inexpertly executed; it's because the fascination with "gold standard" randomized field trials is not matched by any awareness of what they can do well, how to use them thoughtfully, or how to recognize their limitations.
There are two schools of thought when making the case for performance pay.
The first is the Skinnerian conviction that paying people to raise test scores will lead them to work "harder." This presupposes that the kinds of cash-for-sales bonuses used to incent encyclopedia salesmen in the 1950s are a way to improve teacher performance. If the test bounce is big, it won't strengthen my faith in performance pay one iota. After all, positive results may well simply reflect that the math teachers in question shifted their time and energy from other tasks to tested material. And, should the results be negative, it may simply reflect that teachers don't respond to cash bonuses like rats do to food pellets. That wouldn't diminish my confidence that it's good for schooling if teacher pay better reflects teacher contributions.
The second school of thought, and the one that interests serious people, is the proposition that rethinking teacher pay can help us reshape the profession to make it more attractive to talented candidates, more adept at using specialization, more rewarding for accomplished professionals, and a better fit for the twenty-first century labor force. And, whether or not bonuses linked to test scores had any effect on measured achievement tells me absolutely nothing on this score.
Whether the merit pay experiment shows big test jumps or none at all, it won't tell us a damn thing about the ability of performance pay to attract new talent to teaching, undergird efforts to promote professionalism, retain talent, or boost regard for the profession--much less how to craft systems that will do any of this. This is not because I have any problems with the talented researchers who led the study or because of the study design; it's because this kind of randomized field trial is horribly suited for understanding what we care about when it comes to performance pay. The frustration is to see increasingly sophisticated and thoughtful research designs employed to address the wrong question simply because it lends itself to clean evaluations.
"The notion that education ought to hold science in the same high regard as do medicine and engineering would seem axiomatic. In principle, IES's mission to transform education 'into an evidence-based field'...is entirely to the good... However, [there] lurks the risk that the pendulum will swing too far, that the lure of 'scientifically based research' will cause certain methods of study--especially randomized field trials--to be demanded even when ill-suited for the issue at hand...
It is vital to recognize that there are really two kinds of 'reforms' in medicine or education--and that the proper role of science and scientifically based research is very different from one to the other. One kind of reform relates to specialized knowledge of how the mind or body works, and the other relates to the manner in which we design and operate organizations, governments, and social institutions.
In education, the former category deals with the science of learning and with behaviors and programs that induce it. Such measures include pedagogical and curricular practices and interventions that relate to the development, knowledge, skills, and mastery of individual students. Relevant approaches would include methods of literacy instruction, bilingual education, sequencing mathematical subjects, and so on. Each of these entails the application of discrete treatments to identifiable subjects under specified conditions in order to achieve specific ends. Such interventions are readily susceptible to field trials, and findings on effectiveness can reasonably be extrapolated to other populations...
The second category of reform entails governance, management, or policy innovations intended to improve organizational effectiveness. It includes such innovations as permitting mayors to appoint school boards,...paying employees based on performance, and so on. None of these changes is unique to education. They draw upon a mass of experience gained in other sectors... Since the results of these structural reforms will be contingent on the context and manner in which they are implemented, even well-designed studies will find it problematic to draw lessons from isolated experiments that trump our broader body of knowledge... [W]hatever the results of small-scale experiments with merit pay or educational competition, this existing body of knowledge ought to weigh more heavily than the results of one or another context-specific study."
More recently, as I wrote in 2008 in Educational Leadership, it's imperative that policymakers and practitioners avoid the temptations proffered by new evidence and take care to be thoughtful when seizing new research and data. I argued:
"Educators should be wary of allowing data or research to substitute for good judgment. When presented with persuasive findings or promising new programs, it is still vital to ask the simple questions: What are the presumed benefits of adopting this program or reform? What are the costs? How confident are we that the promised results are replicable? What contextual factors might complicate projections? Data-driven decision making does not simply require good data; it also requires good decisions."
I also noted:
"We must understand the limitations of research as well as its uses. Especially when crafting policy, we should not expect research to dictate outcomes but should instead ensure that decisions are informed by the facts and insights that science can provide. Researchers can upend conventional wisdom, examine design features, and help gauge the effect of proposed measures. But education leaders should not expect research to ultimately resolve thorny policy disputes over school choice or teacher pay any more than medical research has ended contentious debates over health insurance or tort reform."
As regular readers know, I've long championed the radical proposition that good educators deserve to be paid more than bad educators. Crazy stuff, I know. At the same time, I'm not comfortable distinguishing teacher quality simply on the basis of reading and math test scores, especially given the crude state of even today's most sophisticated value-added systems. Tomorrow's results, positive or negative, will move my stance on this not a whit. This expensive and meticulous project, for all the exertions of the talented investigators, was essentially an effort full of sound and furying signifying nothing--because the study didn't address the questions that matter. More to the point, there's a huge likelihood that opportunists, the gullible, and those in the throes of "the new stupid" are going to misread or misuse the findings.
Could a randomized field trial be designed that would address the questions we really care about? In theory, sure. But it would start by identifying a couple thousand high school students, follow them for fifteen or twenty years, and study whether alterations to the compensation structure of teaching impacted who entered teaching, how they fared, and how it changed their career trajectory. Problem is that such research wouldn't tell us what to do today, wouldn't generate much in the way of findings until the 2020s, and isn't likely to get funded anytime soon. So, we'll keep studying the wrong things and misapplying the wrong lessons. Swell.