Come on Feel the Noise!
Last week, New Yorkers scratched their heads and tried to make sense of the Progress Report results. What does it mean, for example, when 77% of schools that received an F last year jump to an A or a B? Michael Bloomberg has a resolute answer to this question, “Not a single school failed again....The fact of the matter is it’s working.”
Last week, skoolboy and I took to our computers with the newly released data. Of particular concern is the progress measure, which makes up 60% of a school’s grade. Both skoolboy and Dan Koretz have already identified serious flaws in DOE’s test progress model. Even in the absence of these problems, we know that all models of year-to-year growth must contend with measurement error present in two different tests.
What the heck is measurement error? Bear with us for two paragraphs, because this is critical to understanding the central problem with the Progress Reports. A test score is just a proxy for students' underlying skills and competencies. If you give a student a test, the test score represents the combination of her "true" level of skills plus measurement error. This error may be a function of idiosyncratic factors like not eating breakfast (which might hurt your score), having the good fortune of having studied the material that happens to be on the test (which would increase your score over your true level of skill), or a dog barking during the test (which might decrease the scores of all students in a classroom). A "gain score" represents the difference between two test scores, both of which are measured with error, so they provide noisy estimates.
If measurement error was constant, then it would just cancel out when we difference the two scores. But we know that measurement error is likely to be random – the two errors do not just cancel out. Another kind of error stems from sampling variation, which I have discussed here before. In short, the more measurement error (or “noise”) in the results, the harder it is to detect the “signal” that represents a school’s actual contribution to growth in student learning.
In what follows, we demonstrate that there is almost no relationship between NYC schools' progress scores in 2007 and 2008. The progress measure, it appears, is a fruitless exercise in measuring error rather than the value that schools themselves add to students. If we believe that the Progress Reports are in the business of cleanly identifying schools that consistently produce more or less progress, this finding is rather troublesome.
First, some sunnier results: Below, we provide scatterplots of the relationship between the overall environment and performance-level scores in 2007 and 2008 for the 566 elementary schools that received overall grades in both years. In both cases, last year’s score is a strong predictor of this year’s score. To quantify the extent to which two variables move together, we can make use of a measure called a correlation coefficient. A correlation of 0 implies that the variables have no relationship, while a correlation of 1 represents a perfect positive relationship. We find that the correlation is .82 for the performance score and .75 for the environment score. This is exactly what we would expect – schools’ performance or climates do not wildly change from year to year.
But the relationship between the 2007 and 2008 progress scores is quite different – the correlation is -.02. In other words, there is almost no relationship! This is precisely what we would expect to see if the growth measures were primarily capturing measurement error. (These correlations are still low, but slightly larger, for K-8 and middle schools - the correlations were .11 and .15, respectively.)
We are left with three possible explanations:
1) The poorly constructed progress measure is simply measuring noise.New Yorkers are left with three courses of action:
2) The DOE somewhat tweaked the progress measure for this year, so the results are not comparable.
3) The receipt of and publicity around last year’s progress measures fundamentally changed how New York City’s elementary schools do business, so that schools that were more successful in raising student achievement in 2007 suddenly became less so, and schools that were less successful in raising student achievement in 2007 suddenly became more so.
* If explanation 1 is correct, we should ignore these report cards altogether because they are primarily (60%) measuring error.Thanks to skoolboy’s masterful analysis of the data, we present evidence below the fold to suggest that the likely culprit is measurement error. The evidence is not conclusive, because every single element of the progress measure—and there are 16 of them in this year’s student progress measure—changed slightly from last year to this year. The strategy that we pursue below is to compare those elements of the progress measure that were used in both years - for example, the percentage of students making at least one year of progress, or the average change in proficiency scores. Again, we stress that these measures were not identical across years, but one would expect them to be moderately related. Needless to say, that is not what we found. We think it extremely unlikely, given these analyses described in detail below, that this is simply due to a tweaking of the progress report measures.
* If explanation 2 is correct, we should not compare schools' grades in 2007 with their grades in 2008, because they are measuring fundamentally different dimensions of school performance. In this case, the collective hysteria that has ensued in NYC schools last week about why grades are up or down is all for naught.
* And if explanation 3 is correct, eduwonkette and skoolboy should shut up and get out of the way of the silent revolution that has transformed public schooling in New York City.
And what of the third explanation—a fundamental overhaul in the effectiveness of New York City’s elementary and middle schools over the past year that reshuffled the effective and ineffective schools? Magical transformations that shift schools from low to high-progress, or vice versa, are the fabled stuff of Hollywood movies, not reality. Real school change, unfortunately, is not an overnight affair.
Where does this leave NYC parents, teachers, and principals, all of whom are trying to make sense of what these measures mean? Bottom line: It's impossible to know what your A or your F means, because these grades are dominated by random error. Let's hope that the DOE heads back to the drawing board rather than continuing to defend the indefensible.
A key measure in both last year’s and this year’s student progress measure is the percentage of students making at least one year of progress in ELA and in Math, where a year of progress is defined as attaining the same or higher proficiency rating in 2008 in the subject as the student received in 2007, with a minimum proficiency rating of 2.00 in 2008. Three changes to this are new this year: (a) if a student scored at Level IV in both 2007 and 2008, that student is counted as making one year of progress, even if the proficiency rating declined from 2007 to 2008 (b) all students who were designated Special Education in 2007 receive a +0.2 addition to their 2007 proficiency rating before calculating whether a year of progress was achieved; and (c) any middle school student earning an 85 or higher on the Math A or Integrated Algebra Regents exam is automatically classified as making one year of progress in Math.
For elementary schools, the correlation between the peer horizon score for the percentage of students making at least one year of progress in ELA in 2007 and in 2008 is -.10, and the correlation for the citywide horizon score over the two years is -.09. There is essentially no stability over time in which elementary schools were successful in advancing their students a year in ELA achievement. The story is even more surprising at the K-8 and middle school levels; the K-8 peer horizon correlation is -.15, and citywide horizon correlation is -.16, whereas the middle school peer horizon correlation is -.24, and citywide horizon correlation is .01.
The stability in a school’s ability to advance its students a year of progress in Math in 2007 and 2008 is a bit higher, especially at the middle school level. For elementary schools, the correlation of the peer horizon score in 2007 and 2008 is .09, and for the citywide horizon score it’s .16. Among K-8 schools, the peer horizon score correlates -.03, and the citywide horizon score correlates .11. The greatest stability is seen at the middle school, where the over-time correlation for the Math peer horizon score is .33, and for the citywide horizon score is .32.
We did the same kind of over-time calculation for the average change in proficiency scores from 2007 to 2008, which also involved the Special Education adjustment in 2008. Five of the six correlations for the average change in ELA proficiency, which range from -.16 to -.37, are negative and statistically significant. What this means is that the schools that were judged to be more effective in raising students’ ELA proficiency in the 2007 report card were significantly less successful in producing ELA gains in 2008 than the schools that were less effective in 2007.
At best, there is no correlation over time in the DOE’s reports of which schools are good at inducing growth in ELA achievement. At worst, the DOE’s system finds that the schools that were better than average in 2007 were actually worse than average in 2008.