The Golden Mean
This post is by Joan L. Herman, co-director emeritus of the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) at the University of California, Los Angeles
Last week marked an important milestone in both assessment consortia's progress in developing new accountability assessments of the Common Core State Standards. The Partnership for Assessment of Readiness for College and Careers (PARCC)'s field test will involve more than one million students in 14 states and the District of Columbia, while the Smarter Balanced Assessment Consortium's test will involve three million students in more than 20,000 schools. Already, in the first three days, each consortium had each assessed hundreds of thousands of students.
We all know that the purpose of these field tests is to assure the quality of the test items and tasks and technology delivery systems and not to evaluate students, teachers, or schools. No score reports, in fact, will be available. Nonetheless, the experience is likely to raise some anxieties.
Some technology glitches are inevitable--after all, working these out is one purpose of the test--but still, these will occasion naysayers claiming that schools aren't and can't be ready to move into the 21st century. Others will look at the items (sample or otherwise), observe the difficulties students are experiencing, and say "my kids can't do this"--and want to pull back. Some policymakers will look at the costs and say "it's more than we spend now, we shouldn't do it--and why would we if it's going to make us look bad?" They too will want to pull back.
Still others--and I worry about some of my deeper learning colleagues here--are going to look at the items and tasks and say "Is this all there is? The assessment doesn't go far enough for real deeper learning." And they too may join the chorus of naysayers or if not, stay quiet in the debate. I'd like to urge that we not do that. PARCC and Smarter Balanced may not go as far as we would have liked, but given the constraints they are working under, they are likely to produce a big step forward. And the immediate alternative very likely would be return to low level testing.
Consider where most state tests currently are in assessing deeper learning. Recent studies by RAND and by Norman Webb and colleagues provide a telling portrait. In both cases, the studies used Webb's 1-4 Depth of Knowledge (DOK) index to conduct item by item review of test items and tasks, where:
- DOK1: Recall
- DOK2: Simple Application (some mental processing)
- DOK3: Reasoning, inference
- DOK4: Extended planning and investigation
RAND's study examined released items and tests from the 16 states that reputedly best addressed complex thinking and problem solving, while Webb and colleagues studied the alignment of six high school tests and the Common Core. In mathematics, the great preponderance of items were classified at DOK1 or 2, even including the constructed-response items. Similarly, in reading, the great majority of selected-response items were at the two lower levels. Even for the constructed-response tasks, only about a third were at DOK3 and less than 10 percent were at DOK4. Clearly--to me at least--this is insufficient to signal either deeper learning goals or the knowledge, skills, and capabilities that students will need for college and career success.
In contrast, based on current plans, both PARCC and Smarter Balanced will emphasize items and tasks at and above DOK2, and at least a third of a student's total possible score will be dependent on items and tasks at DOK3 and DOK4. Based on the sample tasks that both consortia have provided, their performance task components will address DOK4.
For me, the performance task component is the "can't live without." Although the performance task components may be "on demand" and may not reflect the full range of deeper learning--or even the full expectations of the Common Core--they do point us in the right direction and I am hopeful that they will be sensitive to instruction that focuses on deeper learning. Unfortunately, the performance task component also is the expensive component, and states looking to do assessment on the cheap--in terms of testing time and/or cost--will want to do without it. With the current accountability regimen, research on the effects of testing tells us that that would be a penny-wise and pound-foolish decision. We'd be back to a test-driven, lower-level assessment.
I am optimistic that PARCC and Smarter Balanced will lead us toward assessments that are sensitive to rich curriculum and instruction that engages students in deeper learning. I believe that such programs necessarily have to help students develop the specific knowledge and skills articulated in the Common Core, even as they help students reach the deeper learning vision of students who are independent and adept at critical thinking, problem solving, communication and collaboration, and who have the motivation and persistence to achieve college and career success. My hypothesis is that students, teachers, and schools who are engaged in rigorous and effective project-based and other programs of deeper learning also will do well on the consortia assessments. It's not an either-or proposition, but students need to be able to show their expertise in both. (Although the reverse is not necessarily true: doing well on on-demand and performance tasks does not necessarily predict that students will excel in deeper learning domains). As PARCC and Smarter Balanced are field testing their plans, I think we ought to be developing plans to test this critical hypothesis--care to join me?