A Level-Headed (But Still Narrow-Minded) Take on VAM
Value-added has become both poster child and whipping boy in the debate on how to improve teacher evaluations. This editorial from some very bright people makes a very sane case for not giving up on VAM ("Value-Added: It's Not Perfect, But It Makes Sense," EdWeek, December 15). The authors argue that, while not perfect, VAM is still very useful and should not be ignored. Among their key points:
- Teachers are vastly overrated on performance evaluations, with over 99% receiving satisfactory ratings.
- VAM isn't perfect, but the solution is to improve it, not stop trying to use it.
- Teacher evaluations get better when we add relevant information, so controversy shouldn't stop us from using VAM.
- There's a lot of fear that VAM will cause good teachers to be misclassified as bad teachers. However, we should be more concerned about the students who are being taught by all the bad teachers who have for years been rated satisfactory under our current evaluation systems. It's in the best interests of students to be tougher on teachers.
- Statistical measures that are used to predict performance in other industries are also fairly weak; for example, the year-to-year correlation between MLB batting averages is 0.36. Yet these measures are given great importance in other industries; therefore, we should not be afraid to use VAM, and we should not expect VAM scores to become more stable.
- VAM isn't any worse than other statistical efforts to predict teaching effectiveness, such as using teacher test scores, method of teacher training, years of experience, etc, and in fact it may be better.
On this last point, the authors explain their conclusion:
When teacher evaluation that incorporates value-added data is compared against an abstract ideal, it can easily be found wanting in that it provides only a fuzzy signal of teacher effectiveness. But when it is compared to performance assessment in other fields or to evaluations of teachers based on other sources of information, it becomes obvious that even a fuzzy signal of teacher effectiveness, if it is the best available signal, can be a vast improvement over no signal.
Let's be clear about the logical leap that the authors are making here:
- Teachers are overwhelmingly classified as satisfactory, leading to "false positives," or the inaccurate rating of poor teachers as satisfactory.
- To address this problem, we need to use statistical methods to identify the teachers who are actually poor so that they can be targeted for improvement or firing.
- VAM is better than any other statistical method for identifying bad teachers, so we should use it.
The key term I've inserted is statistical method, because the authors seem to ignore the fact that we have many other measures of effective teaching, and new methods such as student and peer input are looking more and more promising. Yes, batting averages have an impact on baseball player salaries, but other factors count for a lot more. Look at the players with the top batting averages over the past few decades, and chances are you haven't heard of many of them. It makes sense to collect VAM data, but not to formulaically convert it into an overall effectiveness rating.
I agree that it's a huge problem that we have so many false positives in teacher evaluation. This is nothing less than a crisis, which affects both the quality of the teaching students experience each day and the credibility of the profession.
But rather than use poorly predictive statistical techniques to replace the judgement of principals, we need to figure out what will help principals make better judgments, including those that result in an unsatisfactory evaluation of a teacher, and we need to hold principals accountable for false positives.
This might turn out to be the best use of value-added data: If my supervisor can see that I've consistently rated teachers with low VAM scores as excellent, clearly we need to talk. That's not to say that any given low-VAM teacher is not in fact excellent—the volatility of VAM scores virtually ensures that this will often be the case—but it does indicate that my process for evaluating teachers needs some work.