Guest post by John Thompson.
The Measures of Effective Teaching Project (MET) is the Gates Foundation's flagship effort to fill what they believe is a huge void in the teaching profession. According to them, up until this project, there was no way to know how effective any given teacher is. Their goal has been to develop scientifically accurate means to accomplish this.
I would have no problem with the Gates Foundation's Measuring Effective Teaching process if it was conducted as pure research. The MET's Tom Kane, in "Capturing the Dimensions of Effective Teaching," illustrates the good that could have come from the experiment had "reformers" considered evidence before imposing their theories on teachers across the nation.
The MET is a $45 million component of the "teacher quality" movement which studies test scores, teacher observations, and student survey data to isolate the elements of effective teaching. That's great. But the MET's assumptions about the outcomes they anticipated have been the basis for Arne Duncan's test-driven policies -- which require test scores to be a "significant part" of teacher evaluations in order for states to receive waivers for NCLB. Then, as evidence was gathered, preliminary reports noted problems with using test score growth for evaluations. The MET has continued to affirm the need for value-added (VAM) as a necessary component of their unified system of using improved instruction to drive reform, even as it reported disappointing findings.
If the MET had been seen as basic research, as opposed to a rushed set of mandates (that have already been enacted into laws), Kane's assumptions could have been phrased more precisely. He could have deleted the word "the" and issued the then accurate statement that one "goal of classroom observations is to help teachers improve practice, and thereby improve students'outcomes." Kane could have then acknowledged that, real world, evaluations are also driven by ego, power, vindictiveness, and the full range of human emotions. An academic study, being an academic study, could assume that evaluations would only be used for righteous purposes. Actual policy should have never been built on such a naïve proposition.
Had the Duncan Administration not jumped the gun and forced districts to attach high stakes to not-ready-for-prime-time metrics, Kane could have written, "the shallowness of the items on the test does not necessarily translate into shallow teaching" but we know that it often (or usually) does. He could have then reported:
Our (MET) results did raise concerns about current state tests in English language arts. ... Current state ELA assessments overwhelmingly consist of short reading passages, followed by multiple-choice questions that probe reading comprehension. Teachers' average student-achievement gains based on such tests are more volatile from year to year (which translates to lower reliability) and are only weakly related to other measures, such as classroom observations and student surveys.
It would have been easier to deal with the finding that "state ELA assessments are less reliable and less related to other measures of practice than state math assessments" if districts were not already using those flawed results to sanction ELA teachers and schools. Similarly, if the MET was pure research there would be nothing wrong with waiting until the last year of the project before reporting on the results of the 9th grade value-added experiment. On the contrary, if "reformers" had not leaped before they looked at the evidence, MET scholars would have been free to warn against the dangers of using value-added for schools with large populations of students who are unable to read for comprehension or for high schools.
It is especially hard to understand why the Gates Foundation's opinions about "teacher quality" have already been imposed on urban schools. Three years after the Gates' theories became law in many states, the MET will issue a final report on "the most vexing question we face [which] is whether or not our results were biased by the exclusion of important student characteristics from the value-added models." The MET sample of students was only 56% low income with 8% being on special education IEPs, so it is even harder to see how it could provide evidence relevant to schools serving intense concentrations of extreme poverty. Moreover, it seems that the MET's economists have overlooked the likelihood that value-added will drive the top teaching talent out of the schools where it is harder to meet test score growth targets.
Finally, if the MET was pure research, its faith in "multiple measures" would have been appropriate. Under research conditions, multiple measures can ensure greater accuracy. But, in actual systems, they have an equal potential for becoming multiple hoops to be jumped through. The MET merely estimated the possible upside of investing in teacher evaluations conducted under no-stakes conditions. It ignored the predictable downsides, which are especially dangerous in an era of data-driven "accountability."
Kane began his article with a metaphor, "When the world is in danger," according to his six-year old son, "it's time to summon the superheroes to save the day." It seems that the Gates Foundation assumed the need for a Super Model to save the day. It sought a single system of multiple measures coordinated and designed to drive accountability, professional development, and instruction.
Kane never seemed to have considered the possibility that each of those functions might be better improved singly and incrementally, as opposed to being a part of a coordinated and transformative system of multiple measures. In the quest to divine the best of all possible Super Models, it never occurred to them that they could also be producing a "perfect storm," where their preferred accountability regime contaminates other methods of improving teacher quality.
When district leaders push value-added models that are systematically unfair to neighborhood schools, it is likely that principals will follow their lead and remain insensitive to the effects of generational poverty. When the central office, distant from the classroom, assumes that good instruction in the toughest schools looks identical to instruction in effective schools, evaluators will be pressured to conform to those preconceptions. When teachers in the schools with the least social capital fail to meet their targets, they will be subject to do even more worksheet-driven test prep (which the Gates research has shown to be ineffective.)
Had the Gates Foundation seen the MET as basic research, perhaps it would have considered a modest proposal. Perhaps, the best goal for evaluations is evaluating what people actually do. If teachers were not doing their jobs, they should be dismissed. If principals (as was obviously true) lacked the time and skills to evaluate instruction, the logical response would have been to address those problems. But, if evaluators were not up to the simpler task of removing teachers whose observable behavior was wanting, why place them in charge of a multidimensional Super Model, utilizing a variety of proxies for student achievement, to achieve multiple goals, in a system that had yet to be studied, much less designed?
What do you think? Why did reformers push to change teacher evaluation laws before evidence was gathered? Why did they not create a methodology to test whether value-added could be made valid for schools with intense concentrations of trauma and poverty? Did they not believe they carried the burden of proof of showing that their theories would do more good and harm?
John Thompson was an award winning historian, with a doctorate from Rutgers, and a legislative lobbyist when crack and gangs hit his neighborhood, and he became an inner city teacher. He blogs for This Week in Education, the Huffington Post and other sites. After 18 years in the classroom, he is writing his book, Getting Schooled: Battles Inside and Outside the Urban Classroom.