I’m as much a fan of data as the next guy. But I worry that proponents of data-driven decision-making are understating just how hard it is to use data thoughtfully.
I’d like to describe the strategy championed by the New York City Department of Education, and point out the difficulties involved. The logic that the DOE is promoting is (a) use data to identify an area where a school is lagging, either in relation to some absolute standard or to other similar schools; (b) use the available data systems to identify similar schools that are doing better in this area; (c) ask these more effective schools what they are doing that accounts for their success; and (d) adapt their suggestions for use in the school.
It’s not as easy as it looks to determine which schools are doing better than others. Two different criteria are relevant: is the difference in performance between two schools large enough to matter, which is sometimes termed educational significance or practical significance; and is the difference in performance between two schools real, or could it just be due to chance, which is typically described as statistical significance. Ideally, we are interested in differences that are both practically and statistically significant. But a difference could be large, but not statistically significant (which is often the case when we have a small sample of information about performance), or statistically significant, but very small (in which we are pretty sure that the difference is real, but it’s just not very important). (Yes, statistical significance does matter!)
This is kind of abstract, so here’s an example, drawn from the NYC Department of Education’s Survey Access tool, which reports the results of the system’s first round of Learning Environment Surveys in the spring of 2007. The Department’s spiffy PowerPoint presentation imagines the principal and a group of teachers in (mythical) IS 402 identifying teacher engagement as an issue. In particular, teachers in this school generally disagreed that “Obtaining information from parents about student learning needs is a priority at my school.” Using the Survey Access tool, it’s possible to identify 12 similar NYC schools (i.e., middle schools with an enrollment over 700 and at least 25% ELL students), seven of which have more positive scores on this question. In the top school, the Eleanor Roosevelt School, 71% of the teachers strongly agreed or agreed with the statement, whereas in the bottom school, 13% of the teachers strongly agreed or agreed. (In mythical IS 402, 36% of the 31 teachers who responded to the survey strongly agreed or agreed.)
So why not just look at the seven schools above IS 402? Because the percentages of teachers strongly agreeing or agreeing is an estimate of the true percentage that would be observed if all teachers in the school responded to the survey. (In these 12 schools the teacher response rate ranged from 26% to 53%; in mythical IS 402, 40% of the teachers responded.) Our interest is in the population of teachers in the school, not just the sample that chose to respond. And there’s a degree of uncertainty in these estimates. If a different group of 31 teachers in IS 402 responded, just by chance, we might not have obtained an estimate of 36% strongly agreeing or agreeing. In fact, with a sample of 31 teachers responding and a sample estimate of 36%, the percentage of all of the teachers in IS 402 agreeing or strongly agreeing could plausibly range from 23% to 49%. (There’s a finite population correction in there, for those who care about such things.) That’s a pretty big range, and the range of possible values is pretty large for the other dozen schools as well.
Of the seven schools above IS 402, just one of them, the Eleanor Roosevelt School, is really head-and-shoulders above it in a statistical sense. The other six are statistically indistinguishable, because there’s so much overlap in the intervals in which the true percentage of all of the teachers strongly agreeing or agreeing in each school lies.
Would the principal and teachers in IS 402 learn something from asking the staff in these seven other schools how they do things? Sure! It doesn’t hurt to think about new ways of doing business. Will doing so raise performance in IS 402? Probably not. Because an assessment of statistical significance suggests that, with the exception of Eleanor Roosevelt, these other schools really aren’t doing better, and therefore there’s no reason to think that adopting their practices will yield genuine improvements.
Data-driven decision makers, beware of spurious comparisons.