« Bill Gates, U.S. Superintendent of Schools | Main | The NYC High School Progress Reports Meet Credit Recovery »

# School Progress Grade Effects on NYC Achievement: Tame, Fierce, or a Hot Mess?

skoolboy ventured into the rarified air of NYC’s Harvard Club yesterday to hear Marcus Winters present his new Manhattan Institute research on the effects of the 2006-07 New York City School Progress Reports on students’ 2008 performance on state math and English tests in grades four through eight. The analysis uses a regression-discontinuity design, capitalizing on the fact that schools received a continuous total score summarizing their performance on school environment (15%), student performance (30%) and student growth (55%), but there are firm cut-offs that distinguish schools receiving an F from those receiving a D, those receiving a D from those receiving a C, etc. This means that there might be schools that are very similar in their total scores, and presumably on other school characteristics, on either side of a given cut-off, allowing researchers to study the test-score consequences of obtaining a specific letter grade.

The two tables below summarize the impact of the Progress Report grades on student math and English proficiency, respectively. Both tables contrast the consequences of getting an A, B, D or F with a reference category, a C grade. A green up-arrow indicates that students in a school that received a particular Progress Report Grade did better than students in C schools, whereas a red down-arrow indicates that students did worse than students in C schools. An X indicates that student performance did not differ significantly from that of students in C schools at the p<.05 level.

There’s a lot of X’s. In math, students in F schools did better than students in schools receiving higher grades, although this seems to be primarily due to an effect in grade 5. Students in D schools also did better than those in schools receiving higher grades, also due to their advantages in grade 5, apparently. In English, the letter grade a school received did not have any consequences for student performance.

Although both Winters and discussant Jonah Rockoff were careful to note limits both to the analyses and what they can tell us about the incentive effects of accountability systems, both characterized the results as pretty clear evidence that schools reacted to receiving an F or a D in ways that boosted student achievement. This was particularly noteworthy, they argued, because such little time had elapsed between when a school learned that it had received a D or F and when students were tested—January, for English, and March, for mathematics.

Well, yeah, the short time between receiving the grade and the testing is certainly an issue, and surfaced as the likely explanation for why no effects of the School Progress Report grades were found in English. But skoolboy is still worried about math. There were no statistically reliable consequences for getting a D or an F in grades 4, 6, 7 and 8; only in grade 5 is there a test-score boost. How are we to make sense of this? If the letter grades are such a powerful incentive, wouldn’t they affect the performance of students in all of the grades in a school, not just fifth-graders?

Cool person Amy Ellen Schwartz posed a very smart question from the audience. "What about those A and B schools doing worse than the C schools in 5th grade math? What does that mean?" she asked. The panelists didn’t want to address that head-on, in skoolboy’s view, but he will: Looking at 5th grade mathematics, there’s as much evidence of the receipt of an A or a B causing a school to coast as there is evidence of the receipt of a D or an F causing a school to be more productive. Probably not a popular interpretation among the true believers in the power of incentives in the room.

But the bigger story is one of what Winters called "tame" effects. No effects of the School Progress Report grades in English, and limited evidence of effects in Math. A short time-horizon between the “treatment” of receiving the grades and student testing. Ambiguous incentives, both positive and negative, associated with the grades. A very weak theory of how the grades would be expected to increase student performance. It’s a wonder that Winters found anything at all.

A last point: Winters suggested that there were dire predictions that schools would "give up" if they got low Progress Report grades, and his findings, he said, did not show that. Although there were editorials at the time of the initial release of the Progress Reports last fall expressing concern that schools might be stigmatized by getting a C, D or F when students were performing at generally high levels, I question whether anyone thought that schools, and the educators who work in them, would "give up." The more predictable reaction—which I think was born out—was that principals, teachers and parents would simply not believe the Progress Report grades accurately characterized what they saw on a day-to-day basis. A lot of stakeholders don’t believe that the Progress Report grades are reliable measures of school performance, and given what eduwonkette and I have shown about the instability in the student progress measures at the heart of the system, those beliefs are well-founded.

A brief version of the research can be found here. The technical version is now available at the same location.

When he says there's a boost in 5th grade math scores is he comparing this year's 5th graders to last year's 5th graders or this years to themselves when they were in 4th grade last year?

Also, what's the dependent variable -- did he compute a gain score using a test that isn't vertically equated, or is he using overall proficiency?

Lastly, I've also never heard the theory that schools would give up if they received a low grade -- I'm not really sure why or how that would happen.

There'a a cute video making the rounds showing what a stop-sign would look like if it was designed by a big corporation.

I think there is plenty of room to create a much funnier video showing what a stop-sign would look like if it was designed by the educational community.

Corey: The analysis regresses student i's score in school s at the end of year t on a cubic function of the student's prior year test score and a bunch of covariates. The analyses that pool students across grades are doing something I wouldn't recommend, given the lack of vertical equating of the test across grades.

It seems to me that the statistics here are even more marginal than they first appear...

They make 48 independent measurements, and 6 of them appear to be significant at the 95% confidence level -- did I get that part right?

What are the chances of that happening even if there were no actual underlying effect?

Rachel: I wouldn't characterize the analyses pooling grades 4 through 8 as independent of grade-specific regressions, since the data in the grade-specific regressions also appear in the pooled all-grades regression. So it's basically 4 coefficients out of 40 that are reliably different from zero, which could easily be due to chance.

The Manhattan Institute is to be commended for this research. Researchers are seldom willing to do the extensive data preparation necessary for this kind of work: collapsing, combining, weighting, and rescaling scaled, collapsed, and weighted variables over and over again. Just organizing a regression with so many control variables can be difficult. They are to be applauded for their willingness to report that, time and time again, the Progress Report results had no effect on student performance. This research reflects the kind of transparency long overdue in educational accountability.

However, despite their meticulous attention to detail, these researchers may have overlooked a basic relationship in the data. See their Table 3. Look at the data by letter grade from 2006-2007. Consider just two variables: Overall Progress Report Score and Percent Black. Using scientific statistical methods, we can investigate these data. Let me modestly propose, in keeping with the rigor of educational accountability research, that this little data set is most certainly worthy of intensive scrutiny. Below is the table, reformatted.

Grade, Overall Progress Report Score, Percent Black:

F 23.6 45.5%
D 35.0 44.5%
C 44.7 36.3%
B 56.6 32.2%
A 72.6 26.4%

Only a trained statistical eye could possibly see the trend.

A regression analysis employing Percent Black as a predictor and Overall Progress Report Score as the dependent measure results in an unadjusted r-square of 0.9586, a statistically significant effect (Pearson correlation = -.97910, p

A planned follow-up analysis indicates that the effect is particularly strong among the middle to high scoring schools. Subsetting the data to include only C or better schools (n=3), the regression model results in an unadjusted r-square of 0.9998 (Pearson correlation = -0.99990, p

146.99002 + (-2.81423 x Percent_Black) = Overall_Progress_Report_Score

Based on observed data, these estimates are highly accurate to within one half of one percent across the three grades.

Grade, Percent Black, Observed Overall Score, Predicted Overall Score (rounded to 2 decimal places):
C 36.3 44.7 44.83
B 32.2 56.6 56.37
A 26.4 72.6 72.69

Given that the far majority of the schools in New York are awarded C or better scores, this equation will allow us to predict with high accuracy the Overall Progress Report Score of almost any school using only the percent of students who are black.

Who needs ARIS when you have a pencil? Imagine the money we could save. With the budget crisis upon us, the cost savings could be used to fund enough independent research organizations to provide full employment for every economist in Manhattan. With more time and more money, just imagine what an entire building full of Ph.D.'s might discover.

An alternate hypothesis which may explain a lot of the failure of American public education systems in general to respond to objective negative numbers with significant improvement:

The industry is systemically and pervasively incompetent to do what it is expected to do because it has - successfully - ignored research from relevant non-education fields (psychology, neuroscience, psychiatry, social/group sociology/psychology, etc.) for dogs' years while failing to allow or perform its own rigorous education industry-limited research, and where there is education industry-limited valid research, it has successfully rejected all efforts to mandate significant use of same.

Educrats and speducrats know what they know. And it just isn't good enough. Time to break down the artificial and highly fictitious boundaries they have created between human life in the rest of the world and that in the allegedly unique, rarified world of an American public school. Cognition is cognition; behavior is behavior; group dynamics is group dynamics. Surprise! These have all been studied - well - elsewhere. Time to bring those lessons into the school environment.

A planned follow-up analysis indicates that the effect is particularly strong among the middle to high scoring schools. . . .

Given that the far majority of the schools in New York are awarded C or better scores, this equation will allow us to predict with high accuracy the Overall Progress Report Score of almost any school using only the percent of students who are black.

That's a good analysis, however if we look at home-buying patterns for parents with children who can help schools achieve high ranking, we see that these parents intuitively seem to gravitate to neighborhoods with low diversity features. I wonder what rule of thumb they're using?

Comments are now closed for this post.

• TangoMan: A planned follow-up analysis indicates that the effect is particularly read more
• Dee Alpert: An alternate hypothesis which may explain a lot of the read more
• J. Swift: The Manhattan Institute is to be commended for this research. read more
• skoolboy: Rachel: I wouldn't characterize the analyses pooling grades 4 through read more
• Rachel: It seems to me that the statistics here are even read more

### Technorati

Technorati search