Between a Political Rock and a Statistical Hard Place
Some days, skoolboy feels bad for the hard-working folks in the New York City Department of Education. They’re caught between a political rock and a statistical hard place. The political rock is the New York State accountability system, which complies with No Child Left Behind’s requirements to test students annually in grades 3-8 in Mathematics and English Language Arts, and to classify students, based on their test scores, as either Not Meeting Learning Standards (Level I), Partially Meeting Learning Standards (Level II), Meeting Learning Standards (Level III), or Meeting Learning Standards with Distinction (Level IV), and then aggregate the performance of students, and subgroups of students, to assess the school’s progress toward the goal of 100% proficiency for all students by the year 2014. The mechanism for this is a series of grade-specific exams, with a broad (but arbitrary, as Dan Koretz explains in Measuring Up) standard-setting process that define the scores on the exam that correspond to the four proficiency levels. Whatever a student’s scale score on the exam, he or she is classified into a particular proficiency level.
The statistical hard place is that the proficiency levels are only part of the story. The NYC DOE has found that the scale scores matter, such that a student whose scale score is halfway between the cutoffs for Level II and Level III, and therefore whose proficiency level is Level II, has a higher probability of graduating from high school on time than a student whose scale score is right at the cutoff for Level II. The scale scores have predictive validity—that is, they predict educational outcomes that we think of as important—but they don’t have the political currency of the proficiency levels specified by the state and the federal government.
There’s no evidence, to skoolboy’s knowledge, that achieving a proficiency level on NCLB-style exams has any predictive validity over and above the scale scores on which they are based. (Another regression discontinuity design study waiting to happen.) But I’ll wager that they don’t.
Whether or not the state/NCLB proficiency levels matter, the NYC DOE is stuck. They have to pay homage to the state standards, even though their internal evidence shows that partial progress—“learning quite a bit,” in skoolboy’s terms—really does matter for students’ futures, and therefore is something that schools should be held accountable for.
And I don’t disagree. I would be comfortable (though not ecstatic) with school progress reports that used changes in scale scores to quantify how much students had learned from one year to the next, under two conditions: (a) if the exams were vertically linked, and (b) if the uncertainty in the estimates of school-level effects on the average change were taken into account. Neither of these conditions is met in the current New York City School Progress Reports.
Navigating the political rock and the statistical hard place is definitely a challenge, both rhetorically and in the construction of the School Progress Reports. Rhetorically, the DOE is obliged to argue that a student who is Level III in fourth grade and Level II in fifth grade has lost ground—that student has fallen off of the sharp Level III cliff—because the state and federal accountability metrics treat this as a sharp discontinuity. But as a practical matter, the student may not have fallen off a cliff; rather, she may be just a little bit lower on a gradual hill in fifth grade than we’d like, but still higher on the hill than she was in fourth grade--and the DOE’s internal analyses document that anyone who is higher on the hill is better off than someone lower.
What’s the DOE to do? Well, it could continue to escalate the rhetoric directed toward its critics. (I note with alarm that the DOE went from calling me by my blogging name “skoolboy” on Monday to calling me “Professor Pallas of Teachers College” on Wednesday—whose proclivity to giving A’s to all of his students will come as a surprise to many of them—what’s next? Examining my teeth?) Or it could speak honestly and openly about the challenge of incorporating political and technical realities into the School Progress Reports. I think readers know which path skoolboy recommends.