Debating the Merits of Standardized Testing
This week, Michelle and Jack will be discussing the divide over standardized tests—addressing their quality, their impact on instruction, and their use in evaluating schools and teachers.
SCHNEIDER: You believe in standardized testing, Michelle. I don't. Maybe you can tell me what you think the role of such tests should be?
RHEE: I think it's important to have a measure of student academic knowledge and progress. In order to provide high-quality instruction, we need a picture of what kids know and are able to do.
I think one mistake people make is saying "assessments take time away from the classroom." I believe that good assessments are a critical part of good teaching and instruction.
I also think it's important to say that there are lots of ways to assess students. While standardized tests are one way to do this, and an important way, they certainly should not be the only way.
SCHNEIDER: I agree with you there. I do think there's a role for assessment. And I think that assessments that are specific to a school site are not sufficient—because as much as they may have meaning within a school, they will have little significance outside of that school. So district or state leaders are going to have trouble figuring out who needs help and what, exactly, they need help with. And parents are going to have trouble figuring out how to advocate for their kids.
RHEE: I couldn't agree more. I visited with a mother whose daughter got all A's through elementary school. When it came time to head to middle school, the daughter was denied admission to the city-wide magnet. The mom was confused. The school told the mom her daughter was far below grade level. The mom was distraught because the school had always told her that her daughter was excelling academically. That's why we need to ensure that parents have accurate information about what the academic achievement levels of their children are.
SCHNEIDER: But the real question here is how such tests should be used. Because they've been used primarily to beat schools up. And that runs completely counter to what I believe their usefulness is.
RHEE: First, tests should be used to modify instructional practices to better differentiate for individual students and inform a teacher's practice. Second, the growth on the tests should be used to determine a teacher's impact on student learning (value added). In combination with a variety of other measures, this should be used for a teacher's evaluation. Third, again as part of a multiple measure assessment, test score growth (controlling for factors outside the purview of the school/educator) should be used to assess whether students are learning and schools are meeting the needs of all students. The same types of measures should be used at the district, state level, etc.
SCHNEIDER: On the first point, I totally agree with you. Assessment is most useful when it can inform instructional practice. On the other two points, however, I have some serious reservations. You say that student growth should be used to determine a teacher's impact on student learning. But I see huge problems there. Now, I do agree that growth scores are better than raw scores. Raw scores, we know from research, are an inherently unfair way of comparing schools; they correlate too highly with factors that are outside of the school's control—income, race, social capital, language. Growth scores are better. But they are still totally imperfect. They can't really separate out the influence of the 10th grade history teacher, say, from that of the 10th grade English teacher. They can't account for peer effects or for non-random sampling. And you really need to pay attention to the tests themselves—because some tests are easier for low-achievers to show growth on, and some are easier for high-achievers to show growth on.
Next you say that these tests should be used to evaluate teachers. And I will agree that they should have some role in the process. But directly? I don't think so. Testing performance should be used to guide our thinking. It should not replace thinking. Instead, it should tell us where to look. But we have to actually look. The tests are too imperfect and too limited to cut out the actual human process of getting in a classroom and seeing whether a teacher is actually struggling, and then trying to figure out why.
RHEE: I don't disagree with anything you said about the challenges with using growth scores to determine student learning. That's why we need to continue to improve the quality of standardized assessments. However, the critiques that you have with standardized tests aren't any different from issues with other types of assessments. They are all imperfect.
SCHNEIDER: That's a good point. But there is far, far too much certainty about the efficacy and fairness of these tests. There is a way to use them effectively. And there is a way to make them fair. Achieving those aims, however, is hard. We're not even close to being there yet. And when we pretend to be, it demeans teachers, it harms students, and it creates a backlash against testing in general.
RHEE: Again, I totally agree. Test score growth can be used to help determine where we should be looking at a teacher's practice. When we implemented the IMPACT model in DC, the "value added" scores were able to give our classroom teachers, master educators and principals a lot of information about what the teacher's strengths and weaknesses were. Of course, those were combined with high-quality classroom observations by well-trained master teachers who were experts in the grade level/subject area in which the teacher was teaching.
SCHNEIDER: I don't think that's the national conversation, though. The national conversation is about ostensibly "objective" measures of teacher quality, conducted by machine. But there's nothing objective about measures like value-added (VAM). It's just that all of the subjectivity—all of the human choices—are made on the front-end. It looks objective by the time it's applied. But choices about tests, questions, weighting, and the like, are all subjective in nature. This scares teachers.
You'd be much more likely to get teachers on board if what they heard was this: VAM is going to be used as a tool by trained professionals who will use it like a radar—to figure out where to direct the search team. The value-added model, based on test scores, is simply not a sufficient mechanism for evaluating teachers.
To be continued...