Guest post by John Thompson.
As more and more schools implement various forms of Value-Added method (VAM) evaluation systems, we are learning some disturbing things about how reliable these methods are.
Education Week's Stephan Sawchuk, in "'Value-Added' Measures at Secondary Level Questioned," explains that value-added statistical modeling was once limited to analyzing large sets of data. These statistical models projected students' test score growth, based on their past performance, and thus estimated a growth target. But, now 30 states require teacher evaluations to use student performance, and that has expanded use of algorithms for high-stakes purposes. Value-added estimates are now being applied to secondary schools, even though the vast majority of research on their use has been limited to elementary schools.
Sawchuk reports on two major studies that should slow this rush to evaluate all teachers with experimental models. This month, Douglas Harris will be presenting "Bias of Public Sector Worker Performance Monitoring." It is based on a six years of Florida middle school data on 1.3 million math students.
Harris divides classes into three types, remedial, midlevel, and advanced. After controlling for tracking, he finds that between 30 to 70% of teachers would be placed in the wrong category by normative value-added models. Moreover, Harris discovers that teachers who taught more remedial classes tended to have lower value-added scores than teachers who taught mainly higher-level classes. "That phenomenon was not due to the best teachers' disproportionately teaching the more-rigorous classes, as is often asserted. Instead, the paper shows, even those teachers who taught courses at more than one level of rigor did better when their performance teaching the upper-level classes was compared against that from the lower-level classes."
Harris's takeaway is that applying value-added to middle school will probably lead to "pretty large errors." He also said that it isn't yet clear whether adding controls could account for all the ways tracking could introduce bias.
C. Kirbo Jackson's "Teacher quality at the High School Level" is also based on a huge database which covers five years of North Carolina data. Similarly, Jackson questions whether value-added can become reliable for teacher evaluations. He concludes that high school models "have about 14 percent of the out of sample predictive power" as for elementary school.
Jackson finds that value-added was not able to distinguish between the top and bottom teachers in Algebra I and English I. Moreover, "the estimates were generally poor at predicting how successive classes of students taught by those same teachers would do."
Jackson also questioned benefits of value-added evaluation, "We know there are other ways in which we could be spending our energy to improve student outcomes," he said. "My takeaway is that this is not it."
It is one thing to fire elementary school teachers using value-added models, but what were reformers thinking when they used them for secondary teachers!?!? Seriously, even if an experimental system may be less unreliable for some teachers, that is not an argument for adopting it.
What do you think? Even if "reformers" thought that value-added would prove to be valid for elementary schools, did they not anticipate the additional problems that sorting would create for secondary schools? What other profession is held in such low esteem by some that outsiders would impose such an untested high-stakes system? As the evidence comes in, can we expect that the accountability hawks their fire now, ask questions later approach to teaching?
John Thompson was an award winning historian, with a doctorate from Rutgers, and a legislative lobbyist when crack and gangs hit his neighborhood, and he became an inner city teacher. He blogs for This Week in Education, the Huffington Post and other sites. After 18 years in the classroom, he is writing his book, Getting Schooled: Battles Inside and Outside the Urban Classroom.