A few weeks ago, I asked three questions about how confident we should be that the results of the new, quasi-national, computer-assisted Common Core tests will be valid and reliable enough to support stuff like teacher evaluation and school accountability. These are questions that I'd been publicly asking for several years with little result. I'm pleased to report that, in the last couple days, I've received serious responses from thoughtful executives at PARCC and Smarter Balanced. Today, I'm publishing the response from SBAC's Joe Wilhoft (yesterday I published the response from PARCC's Jeff Nellhaus). As I'll discuss briefly on Thursday, the responses don't fully satisfy me--I've follow-up questions and would like a few clarifications. But my primary aim was a more transparent discussion about how the Common Core effort is supposed to play out. As I've often said, we know vastly less than we should on this score. For that reason, I want to extend my appreciation to Joe and to Jeff for their constructive, reasoned responses. Here's what Joe Wilhoft, executive director at SBAC, had to say:
By Joe Wilhoft
Your questions--along with many others--are well worth asking, and I'm providing some responses and commentary below. Before addressing your specific questions, however, I think a few general comments are in order.
At Smarter Balanced we recognize that the stakes for this project are quite high. Member states are counting on the assessments to play an integral role in their strategies to prepare all students to be college and career-ready. That's made it all the more important that our states work together to leverage their expertise in designing and implementing assessments. I believe that through our collaboration with hundreds of experts across the country, Smarter Balanced has been able to do what no single state could accomplish alone--and we are committed to getting it right.
On March 25 students in our member states began taking our Field Test. As of last Friday, we've had almost 2.4 million students complete a full mathematics and/or a full English language arts/Literacy test. Our Field Test design presents students with a fully blueprint-compliant form of an assessment, meaning schools and students can experience the full array of item types, testing time, and logistical demands.
From the beginning, Smarter Balanced has focused on creating an assessment system that provides valid, reliable, and fair information about student achievement. We rely heavily on the advice of our Technical Advisory Committee that includes national experts in large-scale assessment design (including NAEP), computer adaptive testing, and educational measurement for diverse student populations. We have been developing a comprehensive validity framework that establishes a series of research questions that will be addressed over the next several years. In short, we take the technical issues seriously and welcome the opportunity to provide more information about our work.
Now, on to your questions:
"How will we compare the results of students who take the assessment using a variety of different devices?"
As millions of students participate in this spring's Smarter Balanced Field Test, we are capturing extensive information about the performance of students using different devices--from desktop computers to tablets--and using different operating systems. Upon the conclusion of the Field Test, we'll be able to use those data to determine the extent to which these factors have a differential impact on student performance. As you point out, the distribution of device types is not random across schools, and our analysis needs to be stratified by important school-level features. I should also point out that both consortia collaborated on identifying minimum technology requirements for devices. For example, iPads and other tablets are required to have an external keyboard because we found through one-on-one trials that students had difficulty using the on-screen keyboard exclusively. In addition, we require that devices have a screen size of at least a "10-inch class." Would some have liked to use smaller, less expensive tablets to administer the assessments? Yes, but these devices would pose challenges, particularly for English language arts items where reading passages and questions are displayed side-by-side. Finally, I'd want you to know that we've established a device/operating system certification program that we are asking device manufacturers to use. This certification assures that the Smarter Balanced assessment items and supports will be properly displayed on the device/operating system pairing. Manufacturers are not required to use the certification, but we do make available to member states the device/operating systems that have passed certification. (There have been reports, for example, of ACT's Aspire not rendering properly on some device/operating system pairings. We have not had reports of such issues with the more than 2.5M students who have participated so far in the Smarter Balanced Field Test.)
You also asked about the comparability of online assessments with paper-and-pencil tests. Smarter Balanced will offer paper-and-pencil versions of the summative assessment during a three-year transition period as schools and districts upgrade their technology. States that have already moved to 100% online assessments--including Smarter Balanced members Delaware, Hawaii, Idaho, and Oregon--and states that have been transitioning to online testing have already faced this issue of paper/online comparability. This turns out to be fundamentally similar to many other comparability situations, like when a state adds a new item type to an existing test, or like NAEP deciding to use the same reporting scale on its Reading test even though the test had been realigned to new content frameworks. In other words, though the online/paper issue does have unique features, it does not present itself as a new type of comparability question. Smarter Balanced has embedded data collection on the online/paper issue into our Field Test designs, in order to help guide the design of operational paper forms and to inform the feasibility of various linking and equating strategies.
"How will PARCC and SBAC account for vastly different testing conditions?"
Consistent testing conditions such as clear instructions and access to embedded and external resources are an important part of ensuring valid results, although there is not much evidence that physical setting such as size of room or number of test takers has much bearing. Currently state accountability tests are administered in a variety of settings. As long as the number of proctors is sufficient to assure proper coverage, the important point is likely the fidelity with which test administration protocols are followed. That's why Smarter Balanced has developed detailed test administration procedures to help district and school assessment coordinators (you can download the administration manuals for the Field Test on our website). The manuals cover everything from how to download the secure Internet browser (so students cannot surf outside websites) to how to create an appropriate testing environment free of distractions.
States already have extensive experience with administering assessments, and the principles of good administration are the same whether the test is given on paper or online. In nearly all cases, we believe that students will take the Smarter Balanced assessments in their schools--either in classrooms with mobile computers (laptops and tablets) or in existing computer labs or libraries.
"How will we account for the fact that we're apparently looking at testing windows that will stretch over four or more weeks?"
States have the flexibility to determine when to administer the Smarter Balanced year-end assessments at any point during the last 12 weeks of the school year. We established this testing window to ensure that schools with limited technology could maximize the use of existing resources and to accommodate the considerable variation in state and district calendars. As schools upgrade their technology, this window may be shortened.
At a Consortium-wide meeting in September, member states discussed whether or not Smarter Balanced should set required grade-level windows within the broader 12-week window--for example: set the first four weeks for grade 3, the second four weeks for grade 4, and so on. The benefit of this would be to standardize the number of weeks into the school year that each grade uses each assessment, allowing year-to-year growth estimates to be based on a common number of intervening months of instruction. After lengthy discussion, member states decided not to have a Consortium-imposed calendar, but did agree that states could require individual grade levels to be assessed in a shorter window than the overall 12-week window.