One way to get leverage on this question is to consider how other fields approach the issue of accountability. Doctor and hospital accountability for cardiac surgery - also the topic of a NYT commentary today - is instructive in this regard. Borrowing heavily from previous work, let me outline how state governments have approached doctor and hospital accountability in medicine. In subsequent posts this week, I'll write about the outcomes of medical accountability systems, as well as some of their unintended consequences.
Medicine makes use of what is known as “risk adjustment” to evaluate hospitals’ performance. Since the early 1990s, states have rated hospitals performing cardiac surgery in annual report cards. The idea is essentially the same as using test scores to evaluate schools’ performance. But rather than reporting hospitals’ raw mortality rates, states “risk adjust” these numbers to take patient severity into account. The idea is that hospitals caring for sicker patients should not be penalized because their patients were sicker to begin with.
In practice, what risk adjustment means is that mortality is predicted as a function of dozens of patient characteristics. These include a laundry list of medical conditions out of the hospital’s control that could affect a patient’s outcomes: the patient’s other health conditions, demographic factors, lifestyle choices (such as smoking), and disease severity. This prediction equation yields an “expected mortality rate”: the mortality rate that would be expected given the mix of patients treated at the hospital.
While the statistical methods vary from state to state, the crux of risk adjustment is a comparison of expected and observed mortality rates. In hospitals where the observed mortality rate exceeds the expected rate, patients fared worse than they should have. These “adjusted mortality rates” are then used to make apples-to-apples comparisons of hospital performance.
Accountability systems in medicine go even further to reduce the chance that a good hospital is unfairly labeled. Hospitals vary widely in size, for example, and in small hospitals a few aberrant cases can significantly distort the mortality rate. So, in addition to the adjusted mortality rate, confidence intervals are reported to illustrate the uncertainty that stems from these differences in size. Only when these confidence intervals are taken into account are performance comparisons made between hospitals.
Contrast this approach with that used by the New York City Department of Education's progress reports, where "point estimates" are used to array schools on an A-F continuum with no regard for measurement error. Readers know well that your friendly neighborhood "statistical nut" has no beef with the use of sophisticated statistical methods to compare schools. But I would just ask that we have some humility about what these methods can and cannot do. (Sidenote: The only winners when we ignore these issues are educational researchers, who can then write regression discontinuity papers using these data. Thanks for the publications, Joel and Mike!)
And it's quite eye-opening to compare the language used by state and federal governments used to explain their accountability systems with the rhetoric we hear in education. Consider this statement from the Department of Health and Human Services to explain the rationale behind risk adjustment:
The characteristics that Medicare patients bring with them when they arrive at a hospital with a heart attack or heart failure are not under the control of the hospital. However, some patient characteristics may make death more likely (increase the ‘risk’ of death), no matter where the patient is treated or how good the care is. … Therefore, when mortality rates are calculated for each hospital for a 12-month period, they are adjusted based on the unique mix of patients that hospital treated.If you replace the word "hospital" with "school" above, you can imagine the reception this statement would receive in the educational accountability debate. Soft bigotry of low expectations, and you probably kill baby seals for fun, too.
Readers, why is the educational debate so different? Full disclosure: I will shamelessly appropriate your thoughts in my dissertation, which attempts to answer this question, and also establish the effects of each of these systems on race, gender, and socioeconomic inequalities in educational and health outcomes.