If only everyone stopped using the word "achievement" as a synonym for scores on tests. It's a sleight of hand that justifies so much that's gone wrong. We've meanwhile discounted the work of real live children as "soft" data.
Having "normal" temperature may be an indicator of health, but when we think it's the definition of health, beware. We wouldn't be so stupid, would we? A high score on a multiple-choice driving test means something different than a road test driving a car. So we prefer the latter if we value safety. Do we value intellectual achievement less?
Standardized multiple choice tests can probably measure some things somewhat accurately if designed to simply inform us how many of a particular list of terms, names, places, dates we can identify correctly. That's the driver's written test. For this we don't need complex psychometric tools--but short-answer tests. Perhaps, Diane, this is what Hirsch has in mind?
On the ones we now use each question has one right answer and four wrong ones. Therefore the particular wording of the questions, and all five possible answers make a big difference. No matter how many real teachers or academic experts were involved in setting the "standard" or the benchmark, only the test-makers design the wording, test them out on a sampled population, revise, and decide the rules that tell us if they "work." The rules are designed to differentiate on a statistically sound and "credible" basis. (Which may explain why Jay Rosner of Princeton Review found that those questions in the pool from which pre-tested items are selected on which African-Americans answered right more often than whites were rarely if ever used) .
Even after all this fiddling test-makers know that there is substantial built-in error for individuals and small groups (like a single classroom). They know that scores will vary between different forms of the same test by chance. The chance of happening to have studied or not studied a particular set of items covering one subfield vs. another, or having accidentally filled in the wrong box, misread a word, or just had a run of good/bad luck. On normed tests ("the gold-standard" in testing) grade-level means the mid-point and chance error can make a difference of plus or minus 6-months. (These are similar to the problems we have with much public polling; we can't tell whether people oppose health reform because it is too "socialistic" or the opposite.)
Short-answer tests could be more reliable but are harder to score "objectively." Essays, as many scorers have warned us, are even messier. Yes, we could do better on the latter if we were willing to invest substantially more in them, and trusted professional judgment rather than artificial formulas: the number of words in a sentence, use of paragraphing, proper opening and closing paragraphs, et al. The same is true for reading "levels"--the opening sentence in "Pride and Prejudice", "It is a truth universally acknowledged, that a single man ...", is composed of words on a 5th grade level, but only a sophisticated reader of literature may recognize it as satire.
We school teachers then invent formulas to help students score well, e.g., selecting "the main idea" or the "best title" for a short reading passage. (Although, no actual publishing house would ever use the "right" titles.) In consequence we agree to direct students to the learning of "test-like" tricks--the higher the stakes the more we conform.
As a teacher, I was intrigued by the outliers--scores that seemed surprisingly high or low for particular students. I could learn something useful by going over the test with such students. But I couldn't catch an outlier if I didn't already know the students.
When my 3rd grade son's teacher told me he needed a remedial reading class, I knew she needed a remedial teaching class. She had never once read with him. He was a sophisticated fluent reader, who had his own odd theories about how best to answer tests.
Teachers should have higher expectations? Probably, but when we focus on testing we offer lower expectations, above all to the least advantaged kids. That's what I discovered in my two years of subbing in Chicago K-8 schools nearly 50 years ago, pre-Reform. Their very lively intelligence was not tapped beyond first grade. Teachers were programmed way back then--with a few exceptions in every school who closed their doors and ignored the lesson plans. No, it's only secretly rebellious teachers who have ever done right by our least advantaged kids. That's why, Diane, I disagree with your title ("the main idea"). It was never a "great system" for most of those who bravely stepped into their first grade classroom. Teachers were never respected.
It's what all our new fancy Reforms aren't tackling. How can we use schools as places where teachers, parents, and kids engage in serious intellectual challenges, respectful of their own histories and inclinations, buttressed by the vast knowledge and know-how of many others, past and present? Plus, the confidence, perseverance and curiosity to push beyond their boundaries. That's what drives some kids to spend hours throwing basket after basket, others to practice the clarinet long after their parents might like them to stop, and on and on. Even David Brooks agrees--social trust is at the crux. ("The Sandra Bullock Trade,"The New York Times, March 30, 2010.)
It's do-able. Yes, I actually know that from experience at the schools I've worked at. We had much more to learn ourselves from our failures too. But we can keep improving only if we want schools to be lively places where adults--like students--learn from their experience, exercising individual and collective judgment. Schools can become powerful learning sites for young and old alike--encouraging boundless curiosity about many things and expertise at some.
Note, of course, that such a description hardly fits the generic goal: "college or work ready". Ugh.