Both a value-added method and principal observations tied to a teaching framework identified the same teachers as particularly high or low-performing under Chicago's teacher-evaluation pilot, a new study concludes.
But principals struggled to provide high-quality "coaching" and support to teachers based on the results, the report says—a finding that gives a sense of just how difficult it will be to use the new teacher evaluation systems springing up across the nation to improve teaching and learning.
Released last weekend, the report from the Consortium for Chicago School Research is the year two analysis of Chicago's pilot teacher evaluation system (follow the embedded link here to read my write-up of the year one findings.)
In brief, the study looks at the performance of teachers in select elementary schools in which the system was piloted over the period of 2008-09 to 2009-10. The system uses the Framework for Teaching developed by Charlotte Danielson.
To determine reliability, the study's authors compared ratings given by principals with those of external evaluators. To look at the system's validity, they analyzed the relationship between principals' observations and value-added estimates for the teachers, where available.
Value-added, you'll remember, is a statistical technique used to estimate how much growth an average student makes in a year and then compares that to actual performance, the idea being that those students who made more or less than the predicted amount must have had a particularly good or weak teacher.
Finally, the authors conducted case studies in eight of the pilot elementary schools.
Here's a rundown of the findings, plus a discussion to help you situate them within the literature on teacher evaluations.
• Across most of the framework standards, students showed the greatest growth in test scores in the classrooms in which teachers received the highest observation ratings from their principals, and the least growth in those where they received the lowest ratings. This finding is similar to the conclusion of several studies on Cincinnati's teacher-evaluation system.
• Principals and observers gave similar numbers of lower scores, but principals gave the top rating more often than the other observers did, across all 10 of the evaluations standards. Interestingly, much of this variation disappeared when researchers controlled for the teachers' prior evaluation scores, suggesting that principals may be drawing on background knowledge in assigning scores. While this doesn't exactly fit the narrative of vindictive principals, it does show that who you get as an observer potentially matters.
• Most of the principals were close to the external observers in terms of how strictly they applied the evaluation standards, but there were a few outliers on both ends. Eleven percent of principals regularly rated teachers lower than the observers while 17 percent tended to rate them higher. Another reason to consider more than one observer in a teacher-evaluation system.
• The case studies revealed that teachers and principals generally reported more meaningful discussions about teaching practice than the old "checklist" evaluation did, and that the conversations were more evidence-based. Teachers' experiences were largely dependent on whether they felt the principal had applied the framework fairly.
• Principals struggled to ask high-level, complex questions during evaluation conferences to elicit better reflection from teachers on their scores. Principals felt they weren't given enough training on this "coaching" aspect, and they tended to dominate the conversations.
Much more in the report, so check it out. In the meantime, it will be interesting to see how this pilot informs the development of a new teacher evaluation system in the city, which has lurched back and forth now between a couple of different frameworks. We'll be watching here with interest.