Neither policymakers nor the public understands the complexity of estimating value-added models, so I preferred educating lawmakers and the public about what conditions would have to be in place to validly use these measures to nixing the use of test scores formally. Perhaps that was naive on my part, as Joel Klein wanted to ignore these limitations and move ahead with value-added (see his op-ed above).
But I worried that formally barring test scores from consideration would give union bashers another opportunity to distract attention from the larger problems faced by public education. And now the union pinata match is on. Joe Williams' post stands out for its histrionics. Featuring a mushroom cloud, Williams prognosticates, "When we are all standing at public education's funeral someday in the near future, remember to do a cough-chant of "murderer" when Dick Ianuzzi or anyone else from NYSUT tries eulogize." Kevin Carey digs deep and pulls out Paris Hilton-worthy dramatics: "It's hard to imagine a more unambiguous declaration of the union's total disregard for student learning when its members' jobs are at stake." Socrates calls the legislators "union-mouthpieces." Joel Klein, in his op-ed, even blames unions for the existence of achievement gaps:
Protecting grownups rather than making sure students can read and do math is how our country has gotten into the educational mess it's in today. It's the reason we have shameful racial achievement gaps separating our white and Asian students from our African-American and Latino students.
That's why there are no achievement gaps in North Carolina and Texas!
Yet none of these guys acknowledges the elephant in the room in New York: tests are given in January. That means that a value-added measure would estimate the effects of teacher pairs, not individual teachers: one teacher teaches students from January to June, and another from September to January. Even if two teachers are equally effective, a novice 4th grade teacher who receives students from a 10 year superstar 3rd grade teacher is going to look better than a novice 4th grade teacher who receives students from another novice teacher.
If NYC wants to get serious about value-added, tests need to be given in September and June, and these tests need to be designed to measure growth, which NY state's tests are not.
The good news is that principals are actually pretty good at identifying which teachers have high or low value-added, even in the absence of these data, and they can use this insight to inform their tenure decisions. Take a look at this paper by Brian Jacob and Lars Lefgren, based on a study in which the authors estimated value-added models, but also had principals conduct subjective performance evaluations. They found that principals can identify teachers with high and low value-added; for tenure, the goal is to deny tenure to teachers with especially low-value added. Moreover, Jacob and Lefgren found that, "a principal’s overall rating of a teacher is a substantially better predictor of future parent requests for that teacher than either the teacher’s experience, education and current compensation or the teacher’s value-added achievement measure." They concluded:
To the extent that the most important staffing decisions involve sanctioning incompetent teachers and/or rewarding the best teachers, a principal-based system may also produce achievement outcomes roughly comparable to a test-based accountability system. In addition, increasing a principal’s ability to sanction and reward teachers would likely improve educational outcomes valued by parents but not readily captured by standardized tests.
See below the fold for more wonky stuff on the testing calendar.
Is there any way to accurately estimate teacher effects given the current testing calendar? If 3rd grade students are randomly assigned to the novice 4th grade teachers in the same school, teachers would get even shares of experienced and inexperienced teachers' former students, and they could be compared. Of course, we know that principals do not randomly assign students to teachers, so we have a problem.
But what if we wanted to compare teachers across schools, as Joel Klein would like to do? If teacher quality varies across schools (which it does), two identical 4th teachers - one at a school with experienced 3rd grade teachers, the other at a school with novice 3rd grade teachers (to simplify, I'm using experience to proxy teacher effects)- will have very different value-added estimates, and this is a problem that is not solvable by randomly assigning 3rd grade students to 4th grade teachers.