Why skoolboy Is Uncertain about the NYC School Progress Reports
It’s election season, which means that we’re being inundated with polls. The reporting of poll results drives statisticians nuts, because the press often reports the percentage of those surveyed who favor one candidate or another, without taking into account the poll’s margin of error. The margin of error is a way of quantifying the uncertainty in the poll numbers, because even a well-designed poll that surveys a random and representative sample of the population is going to generate an estimate of the true proportion of those in the population who favor a particular candidate. The general rule of thumb is, the more information available in a sample, the less uncertainty in the estimate. A smaller batch of information will yield a more uncertain, or imprecise, estimate than a larger batch of information. This is as true for estimates of the relative performance of schools and teachers—whether in the form of a complex value-added assessment model or a simple percentage—as it is for political polls.
With apologies to anyone who’s had an introductory statistics course, suppose that we were trying to estimate the average age of the teachers in a very small school—one with only four teachers—but we can only draw a sample of three of the teachers to estimate that average. The four teachers are 25, 30, 30, and 55 years old, and the true average age is (25+30+30+55)/4=35. If our sample was the teachers who are 25, 30 and 30, our estimate of the average age of teachers in the school would be (25+30+30)/3=28.25. If our sample was the teachers who are 30, 30 and 50, our estimate of the average would be (30+30+55)/3=38.33. It’s a simple example, but it shows that different samples drawn from a given population can produce quite different estimates, that can be some distance away from the true population value. You wouldn’t want to place too much confidence in a particular estimate if you knew that another, equally valid sample of the same size could generate an estimate that was quite different.
That same logic applies to estimates of school and teacher performance, such as the New York City School Progress Reports. Most of the elements of the Progress Reports are estimates (for an explanation why, see here), but the calculation of the overall letter grades which receive so much attention do not take the uncertainty in these estimates into account. Today, I’ll show that using the 2008 School Progress Reports.
One of the indicators of student progress on the School Progress Reports is the percentage of students who made a year’s worth of progress in English (ELA) and in math from 2007 to 2008. In a given school, each child who was tested in both years can be classified as having made a year’s worth of progress or not, and by totaling up those students who made a year’s worth of progress and dividing by the number of students who were tested in both years, a percentage can be calculated. (There’s an additional wrinkle for students who transferred from one school to another, but it doesn’t affect the logic I’m writing about.)
Each school is compared to a group of 40 peer schools that are judged to be similar based on their demographic and other characteristics. A school’s percentage of children making a year’s progress in ELA is compared to the highest and lowest values in its peer group, and the school gets a peer horizon score that represents its location between the high and low peer group values. For example, if a school had 55% of its students make a year’s progress in ELA, and the percentage for the lowest school in its peer group was 47%, and the percentage for the highest school in its peer group was 71%, the school was located one-third of the way between the lowest and highest schools (8 percentage points above the minimum, out of a possible 24 percentage points above the minimum in the peer group.) That peer horizon score of .33 would be multiplied by the 5.625 points that this component is counted in the calculation of the overall letter grade of the school, yielding a net contribution of 1.875 to the school’s overall score.
The problem is that this calculation doesn’t take into account the fact that all of these percentages are estimates. The chart below looks at one elementary school in particular—Senator John Calandra School (08X014)—and compares it to its peer group of 40 schools. At Calandra, 58.3% of the students made a year’s worth of progress in English in 2008. But the standard error of that percentage is 3.5%, which means that it’s possible that Calandra's true percentage could be anywhere from 51.3% to 65.3%, a wide range. (This range is shown in the “error bars” above and below the estimated percentage for each school.) The same is true for most of the other schools in the peer group. In fact, only two of the 40 schools in the peer group (the ones with the blue markers in the chart) have a percentage that we are confident is higher than Calandra’s percentage. For the other 38 schools in the peer group, we can’t rule out the possibility that Calandra’s percentage is equal to the estimated percentage in those schools. There’s a tremendous amount of overlap among these schools.
And yet Calandra received a peer horizon score of .463, and other schools in the peer group whose percentages of students making a year’s worth of progress in English did not differ statistically from Calandra received peer horizon scores ranging from .169 to .903. Calandra’s peer horizon score of .463 counted for 2.6 out of a possible 5.625 points toward the overall score on the School Progress Report. Other peer schools whose percentages did not differ significantly from Calandra’s received from 1.0 to 5.1 points out of a possible 5.625 points on this component of the overall score. Differences of this magnitude could easily make the difference between an overall grade of A and of B, or of B and of C—just due to chance. An accountability system such as the New York City School Progress Reports that doesn’t acknowledge the importance of chance and uncertainty is fundamentally misleading the public about its ability to distinguish the relative performance of schools. Some schools are likely doing significantly better than other schools; the problem is that the School Progress Reports don't provide enough information to judge which ones.