Thanks for “nailing” Nicholas Kristof. Another very well-meaning ally. With friends like Kristof we... .
Kristof ought to read Rothstein et al more carefully on the complexity of the relationship between the economy and schooling. The lapse between schooling data and economic data is one source of error. Confusing correlation with causation is another. Besides, if all the people in the world became well-educated, would that be tragic?
What data is Kristof referring to that shows that teachers with a better education and who stay in teaching longer don’t “teach” better? What does “teach better” mean? Suppose it turned out that college degrees don’t raise productivity in engineering, medicine, law, Wall Street, or politics?
There’s been a lot of talk on the business pages of our media about the “data problem.” It ought to give the “data-driven” school reformers pause to reconsider. Maybe we are just creating a bubble that too will burst if we continue to base our actions on the belief that scores on standardized instruments are evidence of success.
Even the technical meaning of a “good test” is open to dispute. Margo (a frequent commenter on our blog) states that at least they are more “reliable” than professional judgment. How can she tell?
I want a nation of citizens who are less inclined to think that the “truth” can be captured in one of four feasible answers—a,b,c, or d. I mention “feasible” because in constructing such tests it is crucial not to have one “right” and three absurd alternatives. They are designed to produce differentiated responses. There’s a peculiar science/logic to this arrangement. On both IQ/ability and traditional achievement tests we’re promised ahead of time a population that fits a normal curve. We’ve replaced these in K-12 schools with judgments about benchmarks, which must still rest on a numerical rank order based on a, b, c, d. The big new invention is that there is often no technical back-up for the validity or reliability of such exams. Many big-name psychometricians shun them.
All “reliability" tells us is that the student would get a similar score on a similar test if given at another time or place. The “garbage in,” “garbage out” dilemma. All scores on old or new tests also have a substantial measurement error. Like Wall Street's numbers, we have no independent basis for relying on the scores—validity is in the eye of the beholder (human judgment). Since they correlate with family income, wealth, parental education, and race and are gatekeepers to prestigious institutions and jobs, it’s a circular game.
When parents told me that their child seemed to read well, but scored poorly, they often believed the indirect evidence (test score) and not the direct evidence (listening to their children read). Parents had been trained to distrust judgment and rely on “real evidence”. That’s how I became a testing maven—when my own 8-year-old son “failed” a 3rd grade reading test even though I “knew” he could read fluently.
We need schools that “train” our judgment, that help us become adults who are in the habit of bringing judgment to bear on complex phenomenon. This includes judging which expertise to “trust”—and defending such choices; it includes being open-minded about one’s judgments, as well as one’s favorite experts. It includes acknowledging that even experts must live with a substantial degree of uncertainty.
"I think we are lying to children and families when we tell children that they are meeting standards and, in fact, they are woefully unprepared to be successful in high school and have almost no chance of going to a good university and being successful." —Arne Duncan, U.S. News and World Report
Duncan seems more comfortable lying with statistics? What, after all, is his definition of a “good college” but one that’s hard to get into—thus consigning most people to failure. Similarly what’s his definition of “success”? Doing “better than average”? Thus consigning most of us to failure. I know too many successful adults who don’t meet Duncan’s definition to call such teachers liars.
Some folks (Diane?) think that my skepticism about tests should make me a fan of a single national test. All of these dilemmas are even worse, after all, when we are dependent on data collected from states using 50 different tools, each reporting scores in different ways, and on and on. I think I’d be a sympathizer if we could go back to the old NAEP tests, developed in the 1930s, which tried to use sampling in the interest of collecting better data—standardized prompts and open-ended tasks—that opened the door to more authentic responses. Such an instrument could provide some common data out of which we might develop uncommon responses and interpretations. But NAEP went from a promising beginning to being another standardized test. In our eagerness for simpler data, when only complex data will do, we lost a useful research instrument.
We turn classroom teaching into a “test-like” setting. When we script teaching and pre-code children’s responses we have simply another form of standardized testing. I see it daily: when teachers tell children to put on “their thinking caps.” The kids shift into that special “school-mode” of so-called thinking: trying to guess what answer the teacher wants to hear.
It’s not what was needed in the 19th Century, or the 2lst.
P.S. Take a look at Imagining Possibilities on just this subject.