Field testing for the Common Core-aligned PARCC and Smarter Balanced is now underway, and dozens of states are preparing to make consequential decisions based on the results come next spring. I keep getting excited e-mails about all this. Me? I'm frustrated (and a little astonished) that, four years after the creation of the testing consortia, I still can't get meaningful answers to some practical questions about how all this is going to play out. When I bring these up, I mostly get accused of nitpicking.
Am I? You be the judge. After all, we're now talking about results that will be used to make decisions about job security, pay, and which programs are effective. Here are three questions that I can't seem to get answered:
How will we compare the results of students who take the assessment using a variety of different devices? There will be variability in screen sizes, keyboards, and potentially in the visual display. Some students will be using certain kinds of devices for the first time. And many states will be administering tests to some number of students using paper and pencil in 2015, and likely beyond. What do we know about how to account for all this variation in order to produce valid, reliable results?
While there are always questions about consistency of testing conditions, these get super-sized when the stakes climb and variation is non-random. Well, limited access to the required devices means that all the usual questions get accentuated. How will PARCC and SBAC account for vastly different testing conditions? Depending on testing infrastructure, some schools will be able to assess students in their regular classroom while other schools will have to shuffle students around the building, to schools across town, or to independent testing centers. How much does this matter? What do we know about how to track and then account for the impact of such factors on outcomes?
How will we account for the fact that we're apparently looking at testing windows that will stretch over four or more weeks? Students in schools which administer the test towards the end of the testing window will have had a lot more instructional time than students in schools which test at the beginning. The variation could be 10 percent of the instructional year, or more. How is this going to be tracked and accounted for when comparing teachers, schools, programs, and vendors?
The stakes attached to standardized assessment have become more significant while at the same time the push to introduce national, computer-based assessments has raised important questions about the validity and reliability of results. My point is NOT that the above questions are deal-breakers, but that they should have been addressed during the design phase--and that they need to be answered in some reasonable fashion before much weight is put on the results. As I noted back in February 2011, after helping to convene a gathering of state officials and assessment types, "It'll be challenging to provide the technology infrastructure, school facilities, or computing devices needed to allow states to test all students in a small window...It struck me that no one in attendance had much thought about how this kind of design would compromise current efforts to use assessment results for accountability or teacher evaluation, or about how this would sow legitimate doubts among teachers and parents regarding fairness in a high-stakes environment." Now, more than three years later, nothing's changed.
The National Assessment of Educational Progress is credible in part because it's careful about this stuff. When students take NAEP, they do so under identical, controlled testing conditions. My concern is that, with the Common Core assessments, we're on the cusp of a lose-lose proposition. One possibility is that these kinds of issues won't really get attention until things start to go live next year, at which point the tests, and much that depends on them, will suffer an enormous reversal. Another is that we'll wind up putting a lot of weight on results of dubious value. Setting aside all the larger debates about the Common Core, a failure to provide reasonable, reassuring answers to these questions is cause enough to question whether we ought to attach consequences to PARCC or SBAC results in 2015 (and maybe beyond).