Smarter Balanced Test Scores Are Mostly Flat. Is That a Problem?
Most states that administered tests from the Smarter Balanced Assessment Consortium in 2016-17 found that achievement rates didn't really budge much from the previous year. Now a battle's broken out over how to interpret those findings.
In an opinion piece for Real Clear Education, former California testing whiz Douglas McRae and Williamson Evers, a former U.S. Department of Education appointee under George W. Bush, argued that, on average, scores improved in PARCC states but didn't improve in the states using Smarter Balanced. They claimed that it's unlikely this happened by chance, and concluded that something might be wrong with the SBAC exam. (PARCC stands for the Partnership for Assessment of Readiness for College and Careers. It's SBAC's main competitor.)
SBAC officials, as you can probably guess, say that's just not so, in this commentary written in response at The 74.
Smarter Balanced and PARCC may have fewer states using their tests than they did a few years ago, but these test scores are still vitally important, since states base public school progress reports on them. In a significant number of states, they underpin a lot of policy—including plans under the federal Every Student Succeeds Act. And test construction mishaps, while uncommon, aren't unheard of. (Just ask the District of Columbia.)
So who's right? Well, as much as it pains me to admit, there isn't a clear answer.
My illustrious colleague and predecessor on the testing beat, Catherine Gewertz, and I teamed up to get the reaction to the memo from four testing experts (psychometricians, for you nerds out there), and their conclusion was that it's nearly impossible to say without diving into the (very) wonky technicalities of test construction. Among other things, they said, such an investigation would require an examination of the pool of test questions, and probably a look at the technical reports from last year's administration of the exam.
SBAC folks are still working on a techical analysis of the 2016-17 results.
So until we know more, the best thing to do, it seems, is to lay out the arguments—and let you, readers, guide us to what else you'd like us to look into.
Why might test scores have flattened?
McRae's and Evers' analysis relies on average gains across SBAC states' grade levels, so there's actually more variation in achievement scores than might appear on first glance. But overall it's correct in noting that scores on SBAC tests in both English/language arts and math were generally flat in the 2016-17 student year, compared to 2015-16.
There are a few ways to interpret flat scores. It could genuinely reflect a lack of growth in student learning—sometimes called the "plateau effect"—in student scores. This term refers to the fact that achievement on tests can flatten when students have gotten used to them—or after schools and teachers have picked the low-hanging instructional fruit, and are struggling to implement new teaching methods to move kids forward.
But the argument the critics are making is a different one: that there is a technical problem, something wrong with how the test is measuring student learning. In the memo, they suggest that perhaps the test isn't sensitive to the full range of student abilities. If you don't have enough questions to measure knowledge among the subset of students who are particularly high- or low-scoring, then you might lose the ability to capture their learning growth, they say. And the SBAC's test is adaptive—it adjusts to each test-taker's ability level—which means that you need more items, generally speaking, than on a "fixed form" test, where most kids take the same set of questions.
How is SBAC responding?
SBAC folks say they take the concerns seriously and are testing a variety of theories on the flat results.
But they reject the hypothesis that the flat scores is a function of the number of test items or the test's adaptive nature. Why? Because the consortia actually added test questions to Smarter Balanced's item bank in the 2016-17 school year, including some on the easier end of the scale, they said.
There is some evidence to back this up. A report on SBAC's website shows that it added about a third more items to the bank; and according to that report, simulations showed that they were easier items. An appendix at the back of the report also shows the range of item difficulty for the test at each grade.
SBAC Executive Director Tony Alpert said that one of the things the group is looking at is whether individual test items are performing as they did in field testing—whether they are really as "easy" or "hard" as results from those pilots indicated.
SBAC will be releasing a report around March that should contain more technical information on test-item performance, he said. (The full technical report from the 2016-17 SBAC is due out in August.)
So that's where we are for now. I'll be looking forward to seeing these forthcoming reports—but as always, we'd love to hear if there's anything else you want to know about this situation. I'm sure SBAC folks will want to know, too.