Aftermath: My Note to the Gates Foundation
A few weeks back, I wrote about a note I'd received from teacher and blogger John Thompson. At a Shanker Institute event in February, I'd told a skeptical audience that the people I know at education foundations are smart, well-intentioned, and entirely willing to hear from critics, so long as criticism is offered up constructively and in a spirit of mutual respect. The problem, I said, was how rarely skeptics reached out in that spirit. I'd said that if critics did so and hit a wall, I'd be willing to see if I could help.
Thompson wrote with some reasonable concerns about the Gates Foundation's Measures of Effective Teaching project and asked if I'd help him connect with the foundation. I wrote an RHSU post that sought to do just that, and was impressed at how promptly and seriously the Gates folks reached out to Thompson. Anyway, Thompson and Gates research honcho Steve Cantrell ultimately had an extended, robust exchange of views. They didn't "solve" anything but I think this kind of honest, civil disagreement makes it a helluva lot easier to think about finding workable ways forward. So, when they offered to share their takes on the whole deal, I was game. Here's what they had to say.
Rick Hess, in "A Small Request for My Friends at Gates," was kind enough to urge the Gates Foundation to consider three of my research proposals. Their purpose is to better estimate the bias resulting from the inability of value-added models to adequately control for peer effects. If it worked out, I hoped, Rick might invite me to his next encounter with a "UFO," or Unidentified Federal Official, where we try to explain that policy people and teachers who question the current school reform policies are not "wingnuts."
I am pleased to say that Steve Cantrell, the Senior Program Officer for Research and Data at the Gates Foundation, responded and we talked for nearly 1-1/2 hours. I would first like to thank him for his time. During that conversation, I learned that the data from the Gates' Measures for Effective Teaching (MET) project is now available at the University of Michigan.
Dr. Cantrell and I began with my policy recommendations, and I started sounding like a wonk. Fortunately, we rushed through the academic topics and agreed that what we really need is discussions informed by research.
I explained why the predictable result of value-added evaluations, even when balanced by "multiple measures," would be driving talent out of the most challenging schools. When a basic research study is x% inaccurate for individuals, that may be a huge success. But, who would commit to a teaching career where such a chance PER YEAR could damage or destroy it?
Shifting the conversation to politics, Dr. Cantrell raised the issue of the very real fears of teachers and said that the focus must shift from individual teachers to school level improvement. He cited quiet efforts to encourage states and systems not to misuse test scores.
I responded that I have no doubt that the efforts by the Gates Foundation can encourage better policies. Especially in states and districts that have the confidence born of a history of success, the MET findings can be used properly. For instance, I'm cautiously optimistic about the Tulsa/Gates/Kaiser Foundation collaboration.
The policy issue, however, is how will they be used, constructively and destructively. How, I asked, can teachers not oppose reforms that can be beneficial before concrete checks and balances for the inevitable misuses are nailed down? Teachers must fight, politically and legally, against evaluations where the administrators who set policies unilaterally determine whether it was the fault of those policies or the individual teacher for not meeting test score growth targets.
Dr. Cantrell listened to my arguments about why data-driven accountability is much more likely to damage fearful districts with a history of failure. Powerless districts, suffering from a culture of compliance, are the systems that need help, but they will react predictably to testing, circle the wagons, and impose primitive worksheet-driven instruction in order the cover their rear ends. That helps explain why reform has benefited some students while damaging others.
What parent, I asked, would agree to an experiment that was likely to benefit one of his children, but injure another?
I wish we could have been talking in person, so I could have read Dr. Cantrell's body language as we discussed whether it was possible for schools to transition to Common Core, as they also begin value-added evaluations. I wish I could have gauged whether Dr. Cantrell agrees with the statement, "No Moratorium, No Common Core," or whether he sees a scenario where a train wreck is avoided.
I was very pleased that we discussed Vergara v. California, which would strike down teachers' due process rights in that state. I said that the MET research is more consistent with the defendants' case than the plaintiffs'. Gates-affiliated witnesses only testified for the plaintiffs, however. This helps explain why teachers often see foundations as pushing for "corporate reform" and ignoring the damage done when it backfires.
Following the telephone conversation with Dr. Cantrell, I reread the final MET report, which he co-authored. I was struck by the precise wording of the question which it asked and answered, "Can measures of effective teaching identify teachers who better help students learn?" However, I still find no evidence in the MET report to support the idea that its measures can identify ineffective teachers without damaging and/or destroying the careers of good teachers, guilty of nothing but committing their careers to schools where it is harder to raise test scores.
I would like a further opportunity to discuss the test-driven accountability that outrages teachers like me, as well as a rapidly growing number of parents. Why do persons who value data keep insisting that stakes must be attached to those metrics? And, I am still waiting for an explanation of why Common Core requires tests worth teaching to. If we really believe in high standards and excellent assessments, why can't we have tests worth teaching with?
My big surprise in the conversation was actually no surprise. Whenever I speak with reformers, I'm always struck by the way we and they use a very few words in slightly different ways, and how extreme and emotional misunderstandings result. Our telephone conversation was a reminder of the need for more personal communications.
In many ways, I'm sending a message similar to the one that Rick Hess articulated in "In Which I Debate Federal Ed Policy with a UFO." He wrote, "The federal government can make people do things, but it can't make them do 'em well. And when it comes to all the stuff we're talking about, how you do it matters infinitely more than whether you do it." The same applies to edu-philanthropy. Of course, we teachers are completely innocent of contributing to any misunderstanding ... Seriously, I hope the dialogue with the Gates Foundation continues. And, once again, I would like to thank both Steve and Rick.
At Rick Hess's prompting, I recently had the pleasure of a 90-minute chat with John Thompson, a thoughtful critic of the many efforts aimed at improving teaching and learning. John has been a vocal opponent of teaching effectiveness measures used for consequential decisions such as tenure or termination. While I disagree, I do believe that the power and promise of such measures extend far beyond teacher accountability. This is supported by evidence that the bulk of the work of improving teaching and learning lies in helping nearly all teachers improve their practice. Teaching effectiveness measures have great potential to provide teachers with feedback as they work to hone their craft and to help school system leaders understand where support for better teaching and learning is needed, whether that support is effective, and, ultimately, how to design a system of supports to get better results.
John is primarily concerned about error. He believes the new evaluation systems are in the hands of administrators (and statisticians) who through intent or incompetence inaccurately judge teachers in ways that negatively impact their careers. John mentioned the need to put safeguards in place before teaching effectiveness measures are used for consequences. I couldn't agree more. But while I certainly don't want to see effective teachers labeled ineffective, it would be a grave mistake to simply abandon teaching effectiveness measures.
The issue isn't whether we can measure effective teaching. The Measures of Effective Teaching (MET) project proved it is possible to measure teaching effectiveness. MET also published a set of feedback and evaluation design principles. School systems can greatly reduce error by using multiple measures in accordance with these design principles.
Nobody wants an appraisal system where teachers are wrongly deemed ineffective--but managing this challenge is relatively straightforward. Every school system needs a mechanism to assess and verify the accuracy of those who observe classrooms and give feedback to teachers. Otherwise, inaccuracy will threaten the quality of the data, undermine trust in the feedback and evaluation system, and return us to an era of perfunctory appraisal systems.
A desire to use evaluation measures to rank teachers is the real problem here, and is why so many teachers are fearful that they will be inaccurately labeled as ineffective. To address this fear, we should ensure that efforts to identify ineffective teachers are not overzealous. It is dangerous to exaggerate ineffectiveness by assuming teachers within the bottom quartile or bottom quintile of performance are ineffective. We found in the MET project and in subsequent implementations of new evaluation systems, that the real number of truly ineffective teachers hovers around 5 percent. There's a big difference between 25 percent and 5 percent. Teachers are likely to be far less concerned about being misclassified as ineffective when they know the category is reserved for the single lowest performer among every twenty teachers.
Five percent isn't a magic number, and when school systems intervene with teachers who are not serving students well, this number should drop over time. In the MET data, this group consisted of teachers who scored ineffective on all three measures (classroom observation, student assessment, and student perception surveys). One year later, the signal proved both clear and correct: Students taught by the bottom 5 percent fell five school months behind similar students taught by the average teacher. Student learning gains in these classes were remarkably poor, not just below average. Indeed, students in classes taught by the next lowest 5 percent fell just two months behind. Falling two months behind is unfortunate but not irreversible, but falling five months behind will likely have a long-term impact on those students' lives.
The most under-discussed, but valuable, aspect of teaching effectiveness measures is their ability to inform system-wide teaching improvement efforts. If a school system is not well equipped to understand the state of teaching practice within its classrooms, then it will have a difficult time planning and implementing strategies for improvement to meet the development needs of the vast majority of its teachers. These teachers are not significantly underperforming, but they still desire meaningful opportunities to improve. When school systems begin to use measures of effective teaching to assess the effectiveness of their own efforts, teachers will understand that the burden for improving teaching does not sit upon their shoulders alone. The Gates Foundation is sponsoring a growing network of districts to do just that--to use performance feedback for teachers as the starting point for improving teacher development. In the end, I believe John shares my deep concern that the great potential of feedback and evaluation systems to improve the quality of teaching will be lost if their sole purpose becomes teacher accountability.