It may be an elegantly executed study, or it may be a terrible study. The trouble is that based on the embargoed version released to the press, on which many a news article will appear today, it's impossible to tell. There is a technical appendix, but that wasn't provided up front to the press with the glossy embargoed study. Though the embargo has been lifted now and the report is publicly available, the technical appendix is not.
By the time the study's main findings already have been widely disseminated, some sucker with expertise in regression discontinuity may find a mistake while combing through that appendix, one that could alter the results of the study. But the news cycle will have moved on by then. Good luck interesting a reporter in that story. And even when researchers working in the policy advocacy industry make sloppy, indefensible errors - for example, when Greene and Winters used data that the Bureau of Labor Statistics warned against using to show that teachers are overpaid - they're not approached with caution by the press when the next report rolls around.
So as much as I like to kvetch about peer review and the pain and suffering it inflicts, it makes educational research better. It catches many problems and errors before studies go prime time, even if it doesn't always work perfectly.
As for the Winters, Greene, and Truitt study, the jury is still out - as it should be until we have more information. I'll get back to you once I've read the technical appendix.
Let me first apologize to Jay Greene and my readers for shooting off a short post before teasing out all of the complexities around thinktanks, research, and the reporting of research in the popular media. I used Greene's paper as a vehicle for doing so, and that may have made it appear that I was criticizing the quality of that study when I was not in a position to do so. I shouldn’t have raised questions, even hypothetical ones, about the methods in that paper until the technical report was available for review, and you should definitely read Greene's response here.
This issue, however, is much larger than this particular Manhattan Institute report, and I want to use Greene's critique - that I have posted on working papers from the National Bureau of Economic Research (NBER) - to point out some important differences between papers issued by outlets like NBER and thinktank reports:
1) With NBER papers, everything is on the table upfront. They are scholarly papers that include extensive methods sections and robustness checks in every paper. Greene writes that, “If [reporters] requested the technical report, they could get that.” But the press release makes no mention of a technical report at all. The key difference is that there’s an extra step in the process to get to the detailed methods, which reporters writing articles could ostensibly circulate to other scholars for comment before writing an article.
2) There is no PR machine behind NBER papers. It’s one thing for me to write about a study on my blog. It’s entirely another to send press releases to reporters at newspapers and other media outlets, who in turn – and this is their fault, not Greene’s – cover his report like it’s a final product. The more complex the methods are, the more there is a need for peer review because it becomes more difficult to eyeball the problems from the sidelines – and Greene and his colleagues are using sophisticated methods in this report.
3) NBER papers generally aren’t trying to persuade anybody in particular of anything. They are not intended to sway public policy. Contrast this with the press release approach of policy advocacy thinktanks. For example, the press release for this study said, "In this report, Winters, Greene, and Trivitt dispel the myth that high stakes testing in reading and math will harm student proficiency in low-stakes subjects. The data from Florida provides further evidence for policy makers considering the renewal of No Child Left Behind, showing that national testing incentives improve overall educational achievement levels.”
4) NBER has implicit quality controls. It is a community of scholars to which one must be invited, one that has strong norms about how research is conducted and reported. The quality of the average NBER working paper is extremely high. There is much less variation in the quality of NBER papers than there is in thinktank reports. On some level, this is an issue of the trustworthiness of institutions; for example, I trust a report coming out of RAND or Mathematica more than I do one coming out of the Heritage Foundation, because neither RAND nor Mathematica have a stated ideological agenda.
For the best treatment of the thinktank issue I’ve ever seen, see this post by Dean Millot, and his preceding posts here and here.