Is Robo-Grading Driving the Design of Common Core Tests?
As we read of the 17 states across the country where a strong backlash has grown against Common Core, my own state of California is conspicuous by its absence. California has yet to see a strong reaction against the Common Core, for several reasons. The State Superintendent of Education, Tom Torlakson, has taken the slow approach recommended by leaders like Randi Weingarten. As a result, California students will take only trial tests from the Smarter Balanced Assessment Consortium (SBAC) for the first time this coming spring, and they will not be used for accountability purposes. And the state will not ask students to take the old STAR tests, aligned to the old state standards. The California Teachers Association has thrown its support behind Common Core implementation, and the state is spending $1.25 billion to prepare for the transition.
Lots of open questions remain, however. There has been a strong emphasis on the new standards being "more rigorous." In other states, that has translated into difficult tests, and a sharp drop in the number of students considered proficient. English learners, who are numerous in California, have done very poorly on the Common Core tests used thus far in other states. So the first question is whether the SBAC tests will yield similar drops in proficiency.
Another big question has to do with the qualities of the tests, and how we are measuring skills. One of the selling points of this shift has been that these tests will be better able to measure critical thinking, less prone to narrow the curriculum to test preparation. So California teachers are using the extra time we have to take a closer look at the questions on these tests.
(source: Sample Items | Smarter Balanced Assessment Consortium)
Alice Mercer explains her reaction.
The item above comes from a sample item on the SBAC (Smarter Balanced Assessment Consortium) website, and it's supposed to assess the writing of an argument (what used to be called persuasive writing). Click on the image to get a larger view to read the task.
I'm sharing this because this week I administered a similar writing task at my site recently. Students were given a sheet of facts and statistics, and a sheet of arguments for the two positions they had to choose from, to write a 5 paragraph essay.
When we were previewing the tasks, it seemed questionable to give students a list of "arguments" since this would make this a much lower-level cognitive task, and not require any analysis on the part of students. When you look at this type of task, you have to wonder why they would do things this way, and how they could insist that these tasks are at a "higher" cognitive level and more demanding than past writing prompts.
Of course, in the past writing was only assessed at a state level in 4th and 7th grade in California, and now we're including it every year in tests. I know, I know, teachers in classrooms, and many sites, were doing summative writing assessments, but there's a difference. Common Core wants to do whole-scale standardized assessments of writing. There are some drawbacks to that. It's time-consuming, and it's expensive, but not if you figure out a way to continue to machine/computer grade essays and writing, just as current multiple-choice tests are done. It would explain much about why the standards have been structured the way they were, and why some of the tasks we're seeing are the way they are, but it also ensures that what we're asking students to do will be so constrained by the parameters of the testing environment that they will be useless to judge authentic student learning.
I'm not the only one who's thinks that the assessment has determined the standards. Here is an excerpt from Tom Hoffman:
But... what's the deal with that fourth grade standard, particularly (my emphasis)
Determine a theme of a story, drama, or poem from details in the text...
I read these things and even now they seem perfectly reasonable at first, but then they worm around in my brain for a day and a half or so and stop making sense. Why would you tell a fourth grader to determine the theme from details? ... I'm sure it is just the testing guys throwing in the "details in the text" so they could write the kinds of questions they had in mind, e.g.:
Part 1: The theme is a) Love; b) Death; c) Both
Part 2: Which of the following details support your answer...
(end of excerpt from Tom Hoffman)
There has been talk about computer grading of these assessments, including writing, since they were first being developed. This piece that appeared in the NY Times a while back talks about some of the limitations of robo-readers, but Tom Hoffman gives a delightful example of the shortcoming here, Computer Scoring Open Ended History Questions - Tuttle SVC. Basically, the programs can judge grammar and usage errors (although I suspect it will lead to a very stilted form of writing that only a computer could love), but it's not in the position to judge the facts and assertions, or content in an essay. The only way to do that is to limit students to what "facts" they are using by giving them a list. It would also explain the love for a certain type of closed-reading instruction in Common Core, and the hostility to background knowledge and information, that's written about in this piece by Paul Horton: Common Core and the Gettysburg Address - Living in Dialogue - Education Week Teacher.
Even if my assertion that the standards were written to accommodate testing, and more specifically machine scoring of writing are wrong, these are still lousy tasks that are very low-level and not "rigorous" or cognitively demanding.
I have some concerns to add in addition to the ones mentioned by Alice.
When we, as teachers, write tests we will grade ourselves, we can offer open-ended questions, because we are able to understand a wide range of responses. Therefore, these tests can allow and even encourage creativity and critical thinking. But when tests are standardized, and even more when they are designed to be scored by computer, they are training our children to think like the computer and predict what the computer wants them to write. The tests are designed to be scored by computer because this is by far the least expensive means available -- and it will yield uniform results. No human errors possible -- unless the human error is in trusting machines to make such judgments in the first place.
Tests have become so consequential that every one of them is in essence a training exercise. We are training our children to respond as the test-maker wishes. And as our evaluations begin to be based in part on their scores, we are strongly motivated to train them in this manner.
This reveals one of our basic fears as educators and parents about the Common Core and associated tests. The project is an attempt to align and standardize instruction and assessment on an unprecedented scale. The future, according to the technocrats who have designed these systems, involves computer-based curriculum and tests, and frequent checks, via computer, on student performance. And as this report in EdWeek indicates, there is great deal of money to be made. Los Angeles Unified has already spent a billion dollars on iPads, and one of the chief justifications was to prepare for computer-based assessments such as these.
Here we are beginning to see the ways in which grading technology may be shaping the tests, and the very way we ask students to show how they are applying the skills they have learned. If this is the "Smarter" test, it seems far less intelligent than a qualified teacher, capable of challenging students with an open-ended question. And if we are sacrificing intelligence, creativity and critical thinking for the sake of the efficiency and standardization provided by a computer, this seems a very poor trade.
What do you think of the observations made by Alice Mercer? Have these tests been designed so computers can grade them? Is this "smarter"?
Continue the dialogue with Anthony on Twitter.