Ed. Dept. Official Says Teacher Evaluations Shouldn't Rest on Test Scores Alone


Vis-a-vis this recent blog item, Education Department official Judy Wurtzel apparently won plaudits today from educators for reiterating that teacher evaluations should be based on several different measures of performance, not on test scores alone. (She was speaking at the Association for Curriculum Supervision and Development's legislative conference, which I've been following at Twitter.)

Wurtzel, the deputy assistant secretary for planning, evaluation, and policy development at ED, added, however, that such data should not be excluded from the evaluation process.

Now that that's cleared up, the question becomes to what extent test scores (or other indicators of student growth) should be weighted in making determinations of teacher effectiveness. I went back through the draft Race to the Top guidelines to find out, and unfortunately the language here is fairly vague. Student-growth data should be "a significant factor" in such decisions, the proposed criteria state.

Now just what does that mean? Some teachers and administrators, no doubt, would think that basing 10 percent of an evaluation test scores would be significant. But that's a far cry away from specifying that the student data should make up the majority of the data sources weighed in an evaluation or be made the preponderant criterion. (Those terms leave no doubt that such data would make up at least 50 percent of the rating.)

Does ED want to leave this decision up to states and districts to determine? That's a possibility, but seems a bit odd given that the RTTT application is so detailed in all other respects.

If you support the use of these measures, how much weight do you think they should be given?


It seems that the objections from the teaching community to this sort of a plan are usually based on these basic ideas:

1. We don't trust the high-stakes assessments, because multiple-choice items don't accurately reflect student achievement

2. We don't trust all administrators to evaluate us fairly and objectively

3. Teaching is more complicated than private sector industry work, so we shouldn't be subject to private sector-like accountability measures.

Perhaps keeping this vague can be the reason why RTTT is the first initiative to implement a successful large-scale merit-based hiring and compensation model. This is the time for states to engage union officials in the process of developing these applications. Ask teachers to design the accountability measures themselves. Perhaps there could be a mix of high-stakes data and teacher-created or district-created formative assessments used in the formula. Why not include peer reviews and walkthroughs in addition to administrator review? If teachers are asked to help create the rules, these measures have a chance to succeed. If it is forced on them, it will not.

The majority of teachers are good teachers. They want to elevate their profession, and they want to make sure that only the good ones are able to stick around. They also want to enjoy the many benefits that RTTT grants will provide to them and their students. They will have to accept some things that they might not love- i.e. inclusion of high-stakes data in the accountability model- but they can also help SEAs create a plan that will be palatable to the teaching community.

Another widespread concern is how teachers of subjects other than math and reading will become eligible for merit pay. It is neither reasonable nor desirable for teachers of art, physical education, or even science to be evaluated primarily on test scores in other content areas.

Will "other measurements" include teacher-labor-intensive projects, such as dubious, easily faked student portfolios?

If performance pay is to be based upon a serious of test scores from state or district assessments should the district have someone other than the teachers assessing the students? Won't this lead to cheating, misuse of time prepping for testing, etc.? What teachers will get the kids who are more difficult to reach and teach?

There are a couple of other ideas that I think are important to teachers, Sherman.
3. If student test scores on a single, rough proxy instrument becomes the basis for teacher evaluation or pay, then that instrument will become the focus, crowding out broader learning to an even greater extent than it already has. It supercharges the impact of test instruments on what is taught and how it is taught that educators feel is already distorting the enterprise in detrimental ways.
4. While test score data can be very useful and could be a large part of what is discussed in a teacher's evalaution -- it's hard to dismiss and calls attention to issues of teaching quality -- it must never constitute the judgement. Test scores do not speak for themselves. The judgement must take a variety of factors into consideration. Richard Rothstein's latest book "Teachers, Parformance Pay, and Accountability" examines the myth that quantitative measures are widely used for performance evaluation in the private sector and finds that they are not. When we designed a rigorous evaluation system in Montgomery County, Maryland, we specifically avoided the test scores appearing on a teacher's evaluation. The scores must not speak for themselves because there are too many factors that impact the data other than the teacher, and too much that the teacher does to produce outcomes that are not reflected in the data. I agree with Judy Wurtzel

In many cases, there is no test data that can be used to evaluate a teacher. But when test data is available,the data should be used with caution. Growth on test scores from year to year often have no real meaning, there is no validity information for its use with small groups of students. What really counts is the extent to which students improve beyond what could have been expected, but that information is almost never available.
So, when test data is available -- use it for less than 10% of total evaluation.

Robin Kuykendall and Mike Stahl both have a point. Any merit pay system that relies on student performance will reward cheaters first and foremost. Honest teachers will be labelled "ineffective" and fired. Educational decisionmaking and research will be corrupted by inflated test data. Future citizens will be dumber than bricks. School reformers and "turnaround" administrators will make millions. Politicians will point to high test scores and win re-election.

Think this can't happen? It already has.

Stephen, what you are terming "vague" is what I would call a refusal to fall into the trap of specifying what kinds of evaluation systems states and districts come up with. We know that the current systems fail to distinguish between teachers in any meaningful way. Some states have systematically removed any connection between student achievement as measured by standardized tests and the teachers who taught them (via "firewall"). Not only is the data not to be used in evaluations, but legally, it cannot even be collected. This is a ridiculously anti-intellectual prohibition.

One of the difficulties in engaging in discussions of evaluation is the sort of slip made by Doug in asserting that RttT is the first to "implement a successful large-scale merit-based hiring and compensation model." I'm sorry--does it do that? Did I miss something? I believe (and I could be wrong) that it sets a pre-requisite of teacher evaluation, and specifies that student growth/learning, measured by standardized tests be among the criteria. However, evaluation is a long ways away from merit-based hiring and compensation. And herein lies the rub.

When RttT (or anyone else) SAYS teacher evaluation based on student outcomes (or words to that effect) and teachers HEAR merit-based hiring and compensation--well, what we have here is a failure to communicate. I cannot deny that there exists an idiots chorus somewhere chanting "identify the losers and fire them." But, I have to speak up and point out that this makes no more sense than setting the grading curve to ensure that the bottom 10% always fail. This doesn't aid in teaching or learning--merely in building a competitive (or cannibalistic) esprit du corps (or fatalism for those always on the bottom).

The whole point of evaluation (of anything) is to assist in decision-making. Hiring, firing and setting salaries are only a very small part (granted much larger when it's YOUR job or salary) of the whole aim of evaluation in the realm of developing human resources. And all of these other things (guiding class or course assignments, determining professional development needs, understanding effective vs ineffective pedagogies, curricula and methods--and their interaction with specific teachers and students) are unavailable to us when we allow ourselves to make decisions about what and how to evaluate teachers based on their angst about (possibly) being fired or compensated. Doug is right there in identifying a lack of trust in adminsistrators to do our evaluation of teachers. Certainly that suggests a need to inject hard data into the process. But there are many evaluative methodologies that go beyond the meeting with the boss. 360 evaluations include the viewpoint of co-workers and (dare I say it) "customers" or others. But the aim of this qualitative data is not to see who is fired or given raises. It is to set improvement goals--based on the real assumption that none of us can ever stop learning and growing (or is immune to slippage over time).

Of course, any system can be gamed, and where that is the intent, I don't know what we can do. But perhaps then we should be asking the question--why are teachers so eager to game the system? Why are they so opposed to improvement?

