On Student Achievement and Teacher Evaluations
We're evidently headed to a lot of wrangling on this topic, given the focus on student-teacher data in the Race to the Top proposed criteria. So, once again Teacher Beat provides you with a cheat sheet to help you make sense of it.
First off, we must start by assuming, as the federal government does, that it is appropriate to consider student achievement at least to some degree in evaluating teachers. (I fully realize there are people and groups out there who vociferously disagree. If you are one of them, I invite you to leave a comment below to tell us all why, but this would be a short blog item if we didn't start from that assumption.)
Next, how do we define student achievement? This is the place where things really start to get dicey, because most of the annual testing is done in math and language arts. But only perhaps a third of teachers explicitly teach those subjects. So how do we get estimates about student performance in non-tested grades and subjects?
The National Council on Teacher Quality, in this report on Colorado's bid for the Race to the Top funding, elaborates on a few interesting alternatives. It suggests randomly sampling student work, as long as these samples are reviewed independently and audited centrally to ensure consistency.
As for test scores, probably the most promising option is to use "value-added" models that track growth over time rather than absolute proficiency levels, so that teachers aren't penalized off the bat for having poor-performing students.
Now, we've all heard that value-added estimates of teacher performance are problematic. The estimates of a teacher's effectiveness can vary from one year to the next. Sometimes tests aren't appropriately scaled to give good estimates; and the models are typically better at identifying outliers (very good or very weak teachers) than making finely-graded distinctions in the middle.
Still, there is a possibility of reducing error here by focusing only on the top and bottom teachers and comparing results over time, (i.e., if you are a bottom-quartile teacher for three consecutive years, something's wrong.)
Additionally, such scores could be compared to scores on measures conducted by trained observers (principals and/or peer teachers) that describe, for instance, whether a teacher effectively engages students in content, makes the purpose of the lesson clear, and engages in formative assessment to ensure students have mastered concepts.
Finally, we have this important question: Just how reliable should we expect teacher-evaluation systems to be? What margin of error are we willing to accept? Right now, districts lean toward one end, rating nearly all teachers as proficient, even those who are very poor. Clearly we don't want to go the other way, either, and misidentify scores of good teachers.
But if we expect a system to be infallible we're probably going to be disappointed. As any good scientist will remind you, measurement comes with error. Are stakeholders, especially teachers and teachers' unions, willing to accept a system that is highly reliable but not perfect? (If 95 percent of judgments are accurate, is that high enough? What if 90 percent are accurate?)
Now that I've put all that out there, let's hear your thoughts. Is this doable, or should we all give up and go home?