Race to Inflate: The Evaluation Conundrum for Teachers of Non-tested Subjects
In honor of my last post which chronicled my personal perils under Chicago's teacher evaluation system, I thought it would be interesting to take a deeper look into the reform measures being considered to replace the current evaluation system. Quantifying quality teaching on a large scale is a complex mission, and I've invited a friend who has been studying the issue to explain a major sticking point: Evaluating teachers of non-tested subjects based on student growth.
Guest Blogger: Alex Seeskin
Alex is the English Department Chair at Lake View High School in Chicago. He is a Teach Plus Policy Fellow and a National Board Certified Teacher.
Every year, the vast majority of teachers in the United States are rated "excellent" or "superior," and little support or accountability ever reaches our lowest performing teachers.
In an effort to fix the system, President Obama's Race to the Top Fund offered states the chance to win money if, among other things, they made "student growth" - the degree to which a teacher's students grow over the course of a school year - a significant factor in the evaluation process. According to a recent report by the National Council for Teacher Quality, it appears that Race to the Top has been effective: 23 states have required that student growth play a significant role in teacher evaluations.
For most English and math teachers, student growth means using standardized test scores and complex statistical algorithms to measure the effect a teacher has on a his or her students. Unfortunately, however, the raucous debate over the efficacy of these "value-added" measures has overshadowed critical discussions about how to measure student growth for teachers of non-tested subjects (such as k-2, foreign language, and special ed), who make up more than two-thirds of the profession.
Currently, many states plan to have teachers of non-tested subjects use a make-shift version of value-added measures where teachers identify learning objectives, choose assessments that correspond with these objectives, monitor student progress over the course of the year, and then present that progress as evidence of student growth.
While this process may very well improve the overall quality of teaching, it is an ineffective way to evaluate teachers. If it takes complex statistical algorithms to measure student growth for English and math teachers, what makes us think that teachers of non-tested subjects can validly and reliably measure student growth on their own?
Imagine a high school history teacher, Ms. Wilson. In her district, half of her yearly evaluation is based on observations using Charlotte Danielson's Framework for Teaching and the other half is based on student growth. Since there are no official state or district tests to determine growth in history, Ms. Wilson is required to set objectives at the beginning of the year, write her own assessments, and use that data from those assessments to demonstrate student growth.
During the second week of school, Ms. Wilson gives her students a pre-assessment - an essay that she grades with a nationally-accepted rubric. Over the course of the school year, she monitors data from several similar essays, and by the last week of school, the students' average score has improved 24 percent. She submits the assessments and the scores to her principal, arguing that the results are a clear indication of growth. Her principal looks through the assessments, and impressed by the data, gives her a "superior" rating on the student growth portion of her evaluation.
Great, right? Not so fast. Ms. Wilson, her principal, and almost anyone without an advanced degree in statistics, don't have background or tools necessary to determine student growth - indeed, it is much more complicated than just subtracting the average score on the first assessment from the average score on the second. First, you have to determine the expected growth of the students. In other words, Ms. Wilson's students improved by 24 percent, but if they should have improved by 34 percent, then Ms. Wilson may not deserve her superior rating.
And the only way to determine expected growth of Ms. Wilson's students would be to have their previous scores on history assessments, the scores of students with similar backgrounds on Ms. Wilson's assessments, and access to a reliable value-added algorithm. Otherwise, stating that Ms. Wilson's students improved by 24 percent is almost meaningless in the context of an evaluation.
Actually, it's completely meaningless because what Ms. Wilson never told her principal is that the stress of a high stakes evaluation led a normally principled teacher to make some ethically dubious decisions. For starters, when Ms. Wilson distributed the essay assignment at the beginning of the year, she told her students it wouldn't affect their grade, while she made the final essay assignment a major part of their exam, thus insuring higher motivation. Then, when she graded the final essay, she was a more lenient in her scoring. Finally, when it came time to calculate her students' average score, she left out two or three students from each of her classes. They were perpetually late and though they passed with D's, she felt it would be unfair to include their scores.
Indeed, evaluating teachers of non-tested subjects on student growth places good teachers like Ms. Wilson in ethically challenging positions with high-stakes decisions in the balance. Terrified of losing their jobs and with little chance of getting caught, many teachers will undoubtedly push the boundaries of ethical responsibility, with a few going even further than Ms. Wilson, and blatantly fabricate student data.
Let's be honest - at the end of the day, what teacher is going to admit that his or her students didn't grow and what principal is going to have the time, the expertise, or even the resolve to question a teacher's results?
Ironically, all of this means that evaluating teachers of non-tested subjects on student growth will only serve to preserve the status quo of inflated teacher evaluations. And the problem with inflated teacher evaluations is that they make it difficult to identify, support, and in rare cases, fire struggling teachers. That is not a recipe for "student growth."
Given these concerns and how little student growth measures in non-tested subjects have been tried and tested, we need to pause and proceed with extreme caution before these measures are applied to hundreds of thousands of teachers all over the country. The Center for American Progress and the Education Trust suggest giving states until the 2016-17 school year to fully implement student growth measures in high-stakes teacher evaluations. At the very least, we should delay full implementation until then, while piloting different approaches in the interim.
After all, if we are going to hold teachers accountable for student growth, it is only fair that we hold teacher evaluation measures to the same standards of excellence.
Photo by Procopio Photography