More Signs of the Apocalypse!
Here's my take on the New York tenure law discussion going on around the blogs:
1) The backdoor process was unsavory, and now threatens to displace an important discussion about the limits of value-added measures in New York. Sherman Dorn offers some fertile thoughts on the process issue. Also worth noting that last week's outragists were hardly outraged about the secrecy surrounding NYC's teacher experiment.
2) Critics would do well to separate the likely effects of this law from their unhappiness with the process. Consider Robert Gordon's post, which interprets the law's effects as follows:
This means that in deciding whether to give a teacher a presumptive right to teach for 30+ years, a principal may not consider evidence of whether the teacher is helping students learn. The principal can consider whether the teacher maintains neat bulletin boards, whether the teacher attends meetings on how to pay for pencils, and whether the teacher is sufficiently deferential in the hallway. But the principal may not consider, based on achievement data, whether children are learning.
Do classroom observations provide no "evidence of whether the teacher is helping students learn?" Value-added measures, after all, are simply a proxy for student learning, and observations also provide proxy data on student learning. Gordon assumes that principals cannot identify teachers with especially low value-added in the absence of test score data. But if value-added measures mean anything, very low performers should be getting poor subjective evaluations too. It turns out that principals are actually pretty good at identifying teachers with low value-added based on subjective evaluations (see this post). If a teacher is a consistent low performer, the three admissable forms of evidence in tenure decisions - 1) observations, 2) peer review, and 3) an evaluation of how teachers use data to inform instruction - already provide lots of information about how teachers affect student learning.
3) To my knowledge, no one has provided a viable technical solution to the middle of the year testing issue. Given existing problems with value-added and the added complication of midyear testing dates, it would be wildly irresponsible to put these measures into place in NY without further study.
If you want new reasons (not related to testing dates) to sweat about the fallibility of value-added, check out this paper, which was presented last weekend at AEFA by Tim Sass (in collaboration with RAND's J.R. Lockwood and Dan McCaffrey). They looked at the year-to-year stability of value-added estimates in Florida, and found that it's often the case that teachers who are in the bottom 20% of value-added estimates in one year are not in the bottom 20% the next year. In Broward County, only 41.4% of teachers who were in the bottom 20% in one year were in the bottom 20% the next year, too. In Orange County, only 31.7% of the teachers who were in the bottom 20% in one year were also there the next year!
Update: Robert Gordon cherrypicks a finding from the Jacob and Lefgren paper to make his point. Perhaps if he'd read beyond the abstract and looked at the magnitude of the value-added advantage over principal ratings in predicting future student achievement (a whopping .036 SD in reading and .074 SD in math), he would realize that all is not lost. And again, this minuscule value-added advantage is coming from the middle of the distribution, not the top and bottom - and the bottom is the relevant issue in tenure decisions. From the same paper:
While value-added measures of teacher effectiveness generally do a better job at predicting future student achievement than principal ratings, the two measures do about equally well in identifying the best and worst teachers. With regard to parent satisfaction, we find that a principal’s overall rating of a teacher is a substantially better predictor of future parent requests for that teacher than either the teacher’s experience, education and current compensation or the teacher’s value-added achievement measure.
Moreover, what kind of predictive advantage can we expect inaccurate/noisy value-added estimates to have over principals' evaluations?
1) The backdoor process was unsavory, and now threatens to displace an important discussion about the limits of value-added measures in New York. Sherman Dorn offers some fertile thoughts on the process issue. Also worth noting that last week's outragists were hardly outraged about the secrecy surrounding NYC's teacher experiment.
2) Critics would do well to separate the likely effects of this law from their unhappiness with the process. Consider Robert Gordon's post, which interprets the law's effects as follows:
This means that in deciding whether to give a teacher a presumptive right to teach for 30+ years, a principal may not consider evidence of whether the teacher is helping students learn. The principal can consider whether the teacher maintains neat bulletin boards, whether the teacher attends meetings on how to pay for pencils, and whether the teacher is sufficiently deferential in the hallway. But the principal may not consider, based on achievement data, whether children are learning.
Do classroom observations provide no "evidence of whether the teacher is helping students learn?" Value-added measures, after all, are simply a proxy for student learning, and observations also provide proxy data on student learning. Gordon assumes that principals cannot identify teachers with especially low value-added in the absence of test score data. But if value-added measures mean anything, very low performers should be getting poor subjective evaluations too. It turns out that principals are actually pretty good at identifying teachers with low value-added based on subjective evaluations (see this post). If a teacher is a consistent low performer, the three admissable forms of evidence in tenure decisions - 1) observations, 2) peer review, and 3) an evaluation of how teachers use data to inform instruction - already provide lots of information about how teachers affect student learning.
3) To my knowledge, no one has provided a viable technical solution to the middle of the year testing issue. Given existing problems with value-added and the added complication of midyear testing dates, it would be wildly irresponsible to put these measures into place in NY without further study.
If you want new reasons (not related to testing dates) to sweat about the fallibility of value-added, check out this paper, which was presented last weekend at AEFA by Tim Sass (in collaboration with RAND's J.R. Lockwood and Dan McCaffrey). They looked at the year-to-year stability of value-added estimates in Florida, and found that it's often the case that teachers who are in the bottom 20% of value-added estimates in one year are not in the bottom 20% the next year. In Broward County, only 41.4% of teachers who were in the bottom 20% in one year were in the bottom 20% the next year, too. In Orange County, only 31.7% of the teachers who were in the bottom 20% in one year were also there the next year!
Update: Robert Gordon cherrypicks a finding from the Jacob and Lefgren paper to make his point. Perhaps if he'd read beyond the abstract and looked at the magnitude of the value-added advantage over principal ratings in predicting future student achievement (a whopping .036 SD in reading and .074 SD in math), he would realize that all is not lost. And again, this minuscule value-added advantage is coming from the middle of the distribution, not the top and bottom - and the bottom is the relevant issue in tenure decisions. From the same paper:
While value-added measures of teacher effectiveness generally do a better job at predicting future student achievement than principal ratings, the two measures do about equally well in identifying the best and worst teachers. With regard to parent satisfaction, we find that a principal’s overall rating of a teacher is a substantially better predictor of future parent requests for that teacher than either the teacher’s experience, education and current compensation or the teacher’s value-added achievement measure.
Moreover, what kind of predictive advantage can we expect inaccurate/noisy value-added estimates to have over principals' evaluations?


Comments
I look forward to some ridiculous cases in which the courts are asked to interpret the meaning of the phrase "student performance data." If the New York state legislature wants to preclude administrators making tenure decisions based on standardized test scores, they should say so explicitly. I recently attended an event in which student artwork was on display, and a music teacher directed a small ensemble of children performing. That artwork and the students' performances are student performance data, and I'd like to think that their teachers had something to do with the quality of the work. The parents certainly thought so.
(A brief pause here for Andy Rotherham, Charlie Barone and their minions to say, "Aha! No evidence that NCLB chased art and music out of that school!")
Posted by: skoolboy | April 15, 2008 5:38 AM
SB -
Thanks for saying it for us!
Welcome to the minions.
I think you're getting the idea.
--- Charlie
Posted by: Charles Barone | April 15, 2008 9:37 AM
Point #2 is hard to take seriously, given the ferocity with which the anti-accountability crowd protests any time someone wants to give a principal any sort of evaluative or, God-forbid, hiring/firing responsibility. Principals, we are to believe, are all incompetent and capricious demagogues who will wield any smidgen of power with the malevolence of a robber baron. Now the story changes: who needs objective measures of accountability when we have teacher observations?
The smugness of the anything-but-accountability folks as they have their anti-testing cake and wolf it down is hardly bearable when the stakes are so high. To pretend that everything is hunky dory in unionized teacher land as long as the principal is doing the evals is disingenuous at best.
Posted by: Socrates | April 15, 2008 6:30 PM
Socrates, I don't see this as an "anything but accountability" argument. My point in #2 is simply that principals/APs are not operating in the dark, as some pols/bloggers would have us believe.
As is clear from point 1, I'm no more happy with the process than you are. But I don't think you're acknowledging the difficulty of producing these "objective measures" even when we have September-June tests, and NY's testing schedule makes these issues even more complicated.
Posted by: eduwonkette | April 15, 2008 6:46 PM
Yeah, I'm sorry about that - my comments were more directed at the typical opponents of accountability than they were at you. You accurately state the limits of testing, but the majority of those in the blogosphere who support point #2 don't address said limitations with much nuance. For them (and for all the knee-jerk anti-NCLB-ites), testing is bad, period. And similarly, principal evaluations are bad, period. Thus, the argument goes, all accountability is suspect and should be completely abolished in the name of protecting teachers.
I agree with you that the mid-year testing dates are preposterous, and that testing as a single measure of accountability is similarly problematic. Most of those who support this legislature's utterly absurd determination, though, don't approach the issue with any degree of subtlety; in their minds, the only good teacher evaluation is no evaluation. Clearly, what we need is good, objective measures of accountability to be used alongside the more subjective measures that provide color to the stark numbers we get from test scores.
Posted by: Socrates | April 15, 2008 8:08 PM
Chris Cerf presented the NYC DOE's value added work to the Panel for Educational Policy, the body responsible for approving policy decisions for the city's public schools. As the Manhattan representative I've not taken a position on the value added work mostly because my requests to see the technical information including the specifics of the designed test have not been granted when I asked.
That said, I can understand the teachers union point of view. The Bloomberg administration's DOE has no credibility with regard the use of data and statistics. Data is routinely manipulated and deceptively presented to bolster the administration's policy agenda. My biggest complaint is with how parent survey results were manipulated to purport to show class size was not the primary concern of parents. But you only have to look at the previous post on this blog to see another example The DOE press release heralded the new gifted and talented policy as a major step forward for closing the achievement gap while even a cursory review of acceptance numbers showed lower income districts fell further behind.
Posted by: Patrick Sullivan | April 15, 2008 9:23 PM
The question is not whether "all is lost" without value-added data. The question is whether value-added data contribute useful information or should be categorically barred from use.
Similarly, the question is not whether it is possible to make an accurate subjective judgment without value-added data. The question is whether complementing subjective judgment with value-added data is likely to yield a more accurate conclusion. Common sense suggests that the more information you have, the more likely you are to be accurate.
We are not talking about pornography here. (Not to imply that you would advocate banning pornography.) We are talking about data on student performance. Why is it okay to ban using student performance data to inform decisions affecting student performance? For goodness sake!
By the way, I discussed the Jacob & Lefgren paper at some length on a panel at CAP two years ago. The sharpest critic there argued that the paper showed subjective ratings are "like chance" and not to be trusted.
Take care.
Posted by: Robert Gordon | April 16, 2008 12:05 AM
Mr. Gordon: eduwonkette is on record that she is not opposed to the use of value-added data for decision-making in schools. Trust me, she likes value-added data. And I think she'd agree with me that the New York state legislature ban on the use of any student performance data in the making of tenure decisions is ridiculous. (But she can speak for herself, and I'm sure she will.)
But here's the thing, and I'm delighted that you've weighed in. The only value-added system currently under development in New York City is fundamentally flawed, for the reasons that eduwonkette has discussed -- especially the reliance on mid-year testing. I have yet to hear any defense of the NYC value-added system, let alone a persuasive defense. Are you, as a consultant to the NYC DOE, prepared to offer one? The issue in New York City is not value-added in general -- it's the particulars of this system.
And may I encourage you to encourage the NYC DOE to do a better job of explaining the value-added system under development to teachers, parents, principals, researchers, and the general public? Performance evaluation systems have the best chance of being perceived of as legitimate by diverse stakeholders if they are perceived to be fair. Stakeholders can only judge the fairness of an evaluation system if they understand what goes into the evaluation. Right now, the biggest obstacle to perceiving the NYC system as fair is that a teacher's estimated value-added depends on both the performance of the teacher her students had in the previous year and the performance of the teacher her students will have in the following year. That's not fair!
Posted by: skoolboy | April 16, 2008 6:01 AM
Robert Gordon is the author of the highly flawed "fair student funding" system, which in the name of equity, would have cut the budget of half of the failing schools in NYC by an average of $400,000.
Similarly, every formula this administration has come up with has been simplistic and statistically illiterate -- and without any awareness of its damaging effects on schools, teachers and students.
Take the school grading system and the merit pay scheme -- with more than half of the grade or reward based upon gains or losses of one year's worth of test scores alone at the school level, which nearly every expert has said is statistically invalid.
So Gordon is asking us to trust the DOE to use this test score data responsibly in conjunction with other information in making tenure decisions?
They are the last people I'd trust to use any sort of data in a transparent, reliable manner, since they get this stuff wrong nearly 99% of the time. It's like giving a gun to a convicted serial killer.
Fool me once, shame on you; fool me twice, or three times, or four times, or in this case, ad infinitem -- shame on me.
Posted by: Leonie Haimson | April 16, 2008 12:28 PM
The finding you selected from the paper of Tim Sass, J.R. Lockwood, and Dan McCaffrey is just what we all hope for in the best of systems: Deficient performers are unobservable the following year, after choosing other careers; or they are coached, mentored, improved, matured, etc into becoming better teachers.
I'm dubious of these performance assessment methods as well, but I would choose as evidence of the random error leading to regression the fact -- if it is so -- that large fractions of the BEST teachers, as indicated by these metrics, drop out of that top tier the following year. Very effective teachers should be revealed as effective year after year. Even policy makers would have to concede that the assessment methods are worthless if the performance statistics say otherwise.
But, all of that is modeling-theoretical. Can't we learn to accept more about the falibilty of statistical models from the "surprises" on Wall Street? Good teachers no differently from other good workers run into problems, perhaps with a challenging mix of students; and they get seek and get help, in good systems, in overcoming them.
DemostiX
Posted by: DemostiX | October 5, 2008 8:07 PM