Using Test Scores Tends to Lower Teacher-Evaluation Ratings, Study Shows

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

If you’ve been following the teacher evaluation debates at any point over the last decade or so, you know that states and districts have waffled on whether and how student achievement should be incorporated into a teacher’s rating.

Should value-added scores—which aim to isolate how much a teacher has contributed to a student’s learning, as measured by tests—be a part of the calculation? Should they count for 50 percent of the evaluation rating? Or 35 percent? Or 10?

And there are less-discussed questions about the elements of teacher evaluation: Should student perceptions of a teacher be included? How much should they count for? And out of the total number of points available on an evaluation, how many should a teacher need to earn to be considered effective?

It’s fair to say that these decisions have often been made at the state and district levels somewhat arbitrarily. If test scores made up 50 percent of an evaluation and that seemed too high, the state or district might dial that back to 35 percent or 20 percent. (See Washington, D.C. and Tennessee as examples.)

A new study takes an in-depth look at how the weights and thresholds used in an evaluation system affect teachers’ ratings—and finds that teachers with similar underlying scores (including observation, value-added, and student survey measures) can get significantly different outcomes from one place to the next.

“We’ve invested huge amounts of time and money and political capital in redesigning teacher evaluation systems, and there’s a lot of moving parts in those systems,” said Matthew Kraft, an assistant professor of

education and economics at Brown University and co-author of the study. “These features are critical.”

The researchers looked at data from about 1,300 teachers who participated in the Measures of Effective Teaching study, a large-scale, multiyear research project that was funded by a $45 million grant from the Bill and Melinda Gates Foundation.

“We find that teacher proficiency rates change substantially as the weights assigned to teacher performance measures change,” says the study, recently published in Educational Researcher. Proficiency rates also change substantially when “the same teachers are evaluated using different performance ratings thresholds.”

The chart on the right from the report puts evaluation systems from eight districts onto a common scale. (These were systems used in 2013 and 2015.) Teachers are proficient if they are deemed Level 3 or 4. As you can see, teachers in Fairfax County, Va., and Philadelphia needed to earn 50 percent of the total available evaluation points to be deemed proficient. But in Miami-Dade and New York City, they needed to earn closer to 75 percent of the total points.

Ratings Still High Overall

Among the most important findings from this new study: When value-added scores are incorporated into evaluations, the ratings tend to go down. And the more weight a system puts on value-added scoring, the lower the scores are likely to be, the study showed.

That’s because value-added scores tend to be relative measures, explained Kraft. Value added is generally “designed to just compare you to your peers,” he said. “Everybody can’t be good with value added.”

On the other hand, with classroom observation scores, everyone can be excellent, he said.

And that leads to a huge caveat in all of this: As it stands, despite the variation in systems, almost all teachers across the country continue to get positive ratings.

That’s largely because observation scores make up the meat of most teacher evaluation systems, said Kraft. And as we know from previous research, principals tend to rate their teachers highly.

See also: