Different Measures of Effectiveness Shown to Be Complementary

By Stephen Sawchuk — October 19, 2011 2 min read

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Both a value-added teacher-effectiveness measure and a series of scored teacher observations bear a positive relationship to students’ future academic achievement, according to a recently published paper in the journal Labour Economics.

In plain English, this means that when a teacher scored well on one measure of teaching ability, she also tended to score well on the other measure. That’s encouraging news as states and districts go about the difficult task of designing evaluation systems that incorporate both kinds of information.

“The value-added information is useful information, but it’s imperfect; the subjective complements it and makes us more certain in the overall evaluation,” said Jonah E. Rockoff, one of the study’s authors. “If someone is performing highly on both of these metrics, we can be more confident they’re actually truly outstanding.”

For the study, Rockoff and his co-author analyzed teacher-student data from New York City between 2003 and 2008. Using a value-added method, they looked at first-year teachers’ performance in the classroom.

Then, they analyzed two forms of subjective, observation-based evaluations for these teachers:
• Information from teachers hired through the city’s Teaching Fellows program, who were rated on a 5-point scale based on a mock teaching lesson and other criteria; and
• Information from a district mentoring program, where mentors would periodically observe and provide monthly feedback to the new teachers, also based on a 5-point scale.

The authors then looked to see how well the measures predicted teachers’ future performance.

They found that both the observations and the test-score-based measures were correlated, or related; that both types picked up effectiveness information; and that that information was complementary; that is, they gauged different facets of teacher effectiveness.

The study also found that the effectiveness calculations became more precise when they were combined, which thereby “increases our confidence in each measure,” the study states.

The findings also mean that, if someone scored well on one measure and not on the other, it could point to a problem in the evaluation. For instance, perhaps the teacher got lucky on test scores that year or had an evaluator that he didn’t get along with. Essentially, the two forms of information can help serve as a check on each other.

The paper underscores the importance of using observations in addition to just value-added, because they can pick up on teaching skills not captured by test scores.

Obliquely, the paper also points out the crucial role of the training of observers.

When examining the scores the mentor teachers gave out, the authors found that some were generally more lenient graders than others. Future teacher performance wasn’t related to teachers’ average score; it was related to how teachers were scored relative to other teachers scored by the same grader.

Ideally, you want all raters to be trained to see the same thing when observing performance. This inter-rater consistency is something long-standing teacher-evaluation systems (like Cincinnati’s or the TAP system) have underscored in their training.

“The lesson here is you do need to get the training right so the norming is right,” Rockoff said.

Stephen Sawchuk

Assistant Managing Editor, Education Week

Stephen Sawchuk is an assistant managing editor for Education Week, leading coverage of teaching, learning, and curriculum.