« What's Wrong With Merit Pay | Main | Can Better Teachers Close the Achievement Gap? »

When Tests Don't Measure Well What They Appear to be Measuring


Dear Diane,

Your Tuesday column set out good reasons for rejecting standardized testing to reward teachers. I sent a letter home to parents every year on the 10 factors that influenced their child's reading scores. I'll send it to you one day soon—it still applies.

Incidentally, some readers may not realize that the high-scoring nations that use standardized tests, if at all, use a different kind than those you were describing. They often consist of written and oral cross-examination, with grades determined by well-qualified judges. (The international scores we read about, readers should realize, are the results of low-stakes tests, which were given on a sampled basis.)

In short: the varied U.S. tests whose scores we so often hear about don’t measure well what they appear to be measuring. I’m not talking about the short-term versus long-term memory issue which lies behind the charming little comedy routine about the five-minute university (which just tests you on what college students remember two years later). The best rationale for national standards and tests is precisely in the lack of equivalence in current "standardized" state tests. If standardized tests were used properly, two different reading tests for students in the seventh month of 4th grade would be largely interchangeable—unless there was some fundamental philosophical disagreement about the nature of reading. Their sole merit is that one is comparing oranges to oranges. (This is also another good reason not to test young children in the process of learning to read—where scores must reflect the method of teaching, not the achievement of reading.)

Psychometric design of multiple-choice items requires some reasonable alternate answers that pick up reasonable alternate viewpoints, rather than simple-minded rights/wrongs. They also require test-makers to eliminate questions which don’t properly discriminate. Note: “Discriminate” here has a "narrow" psychometric meaning. (The unused passages and questions Jay Rosner of Princeton Review found in the pool of potential SAT items that black students more often got right than white students didn't discriminate properly "statistically.")

As E.D. Hirsch and I both note, such tests also abound in passages that require knowledge to which neither home nor school have equally exposed kids (and which Hirsch and I want to solve in different ways). All of these "faults" are built into the requirement to rank-order along a particular curve. These are not designed as pass/fail tests. Reliable psychometrics could only rank you by percentile—nothing more nor less. X percentage of students taking the test at the same time and under the same conditions got a higher number of "right" answers. There are no statistical methods to arrive at proficiency, etc. Those are "subjective"—i.e. human judgments.

Data distortion—as anyone studying our current economic crisis can tell us—is a serious problem. As in economics, so, too, in education. The "way" we report data can also distort it, as the term "grade level" has done. Which is why I am so often baffled about international comparisons: who is quoting what? (E.g. I’m skeptical when I note that China scored high on one of the recent tests, given that a high percentage of kids in China aren’t in schools at all, above all in rural China—which is still immense.)

When it comes to NYC test scores, I’m more of an expert. Or I was, when tests used to come to schools with the publisher's background information, including a warning not to prep kids. And before we began to believe in test miracles (scores that went up by leaps and bounds one year, and down another). I used to be amused at how schools that contained district gifted programs bragged about their success at getting higher scores than their sister schools, which reporters somehow overlooked in their stories.

Yes, Diane, the intellectual discipline needed to exercise good judgment can be the enemy of improved standardized test scores. The Coalition of Essential Schools embarked 20 years ago on a different path—which included “standards” of a different sort. We “invented” examinations that sought to judge students by publicly accessible exercises of judgment by adults. A panel “judged”—and documented—how students defended their actual work in a variety of fields. They did so in ways that seemed appropriate both to the “discipline” and the mission of the school.

The Rhee/Klein/Duncan/et al traveling show would be amusing if it weren’t potentially influential. The capacity of our educational leaders to represent their views with the support—financial resources—of precisely the big-money boys whose accountability to the American public in their own sphere or expertise has proven to be so shamefully inept is immense. Inept at best, and corrupt at worst. They have transferred the same mindset now to a field they know precious little about. Juan Gonzalez’s “exposé” about the funds provided to Al Sharpton’s alliance (National Action Network) by hedge-fund allies of Bloomberg help explain that “odd” coalition of test "believers." But we can't all be attending closely to everything, and repeated untruths or half-truths can become "common sense."

Mike Rose’s latest blog posts are treasures, and belong (alas) in another world entirely than the one we mostly blog about, Diane. Mike's close attention to children’s learning seems passe. His original book—"Lives on the Boundary"—is a must re-read. He describes why the kind of education we are intensifying today led to the high college dropout rate in the '70s and '80s when he wrote it. The students who were arriving, even at selective colleges like UCLA, were woefully unprepared for the fundamental work of “higher” education. KIPP, I fear, will discover Rose’s point too late. Gerald Graff’s "Clueless in Academe" is a newer and differently oriented book making a similar point—as well as including a flattering chapter on the late CPESS (Central Park East Secondary School). Yes—we could all get higher "scores" as we become a stupider nation.

The Manhattan Institute study is just another example of how we selectively pick and choose "data," just as Goldman, Sachs' latest data—heralding better times for them—is based on a decision to change the calendar for comparison purposes! It reminds me of how we increased high school attendance some years ago—by counting attendance third period instead of first.



I think I'm repeating a previous comment on a previous post--but here goes anyway.

I often say, tongue only partly in cheek, that we should pass a law that requires that any elected government official must pass a standardized test about tests and statistics and the use of data before being allowed to vote on or sign any law regarding tests or the use of data from them. I think it only fair. And I nominate you to head the committee that devises the test. You can choose your own committee members to work with you. And if in your wisdom you chose to make the test criterion referenced instead of norm-referenced--that would be OK with me. As long as the test included some questions that demonstrate the lawmakers understand the difference and why it matters.

Deborah, there is one sentence in your post that surprises and puzzles me. In speaking of Mike Rose's blog you say "Mike's close attention to children’s learning seems passe." Why is it passe? I think of "passe" as referring to something we used to do, but have gotten away from for one good reason or another. From my perspective it is quite the opposite. "Close attention to children's learning" is not something we used to do. It's not something we're getting away from - it something we've never gotten around to yet. And I think it's very important that we do get around to it. That is my argument in my article "The Lack Of Description In The Study Of Education", which is on my website.

I only recently discovered Mike Rose‘s blog, and I am impressed by what he has to say. I intend to keep reading him, and whatever he has written in the past. The first time I read his blog the main thing I liked about it was he included some actual description, a taste of the nitty gritty of teaching and learning. Has John Dewey ever done as much? I don’t understand at all how “passe” fits in.

Apparently my perspective on a lot of things is a little off center. I think it was about a year ago that I started reading the ed blogs. It took me quite a while to discover what perhaps should have been obvious from the start. Most of the discussions on these blogs are of education policy, not of practice, not of pedagogical analysis or theory. So who cares about educational policy? Has educational policy ever taught a kid to read or add fractions?

Over time it occurred to me that in the perspective of many readers and writers, educational policy is seen as the way to educational improvement. The more I thought about that the more I realized that I disagree. I do not see educational policy as leading to educational improvement. In my perspective (and I don’t claim to have thought this through very deeply at this point) the best educational policy can do is to permit and enable good practice. The best policy, it seems to me, would be laissez faire. That is the policy, it seems to me, that let my teachers follow their intuition and common sense and give me a reasonably good education. There were plenty of educational fads when I was young, but my teachers fortunately did not indulge in them. I don’t see NCLB as leading to educational improvement, and I don’t see national standards as having any promise of educational improvement.

Dick Shutz, whose opinions I respect and with whom I often agree, said something along this line the other day. He said “ . . . "merit" is a function of the team of personnel at the school level, not at the individual teacher level.” That puzzles me as much as your use of the word “passe”. It seems just the opposite to me. Has a team ever taught a kid to read or add fractions? It seems to me it’s all up to the individual teacher. And since it’s all up to the individual teacher, the best educational policy is laissez faire, and the best hope of educational improvement is by looking more closely at teaching and learning in the real world, which is what Mike Rose is doing.

I’m used to being a minority of one, but maybe I‘m more isolated in my thinking than I previously realized. But also, perhaps, thinking about these differing perspectives might be productive.


I think you are right when you say that educational policy doesn't educate students. I especially like your comment about how the best it can do is enable and permit the best educational practices.

Which is precisely why the debate about educational policy is so important. Because the fact of the matter is that what educational policy can do - and is doing right now - is making things a whole lot worse. Educational policy right now is standing in the way of teachers using their best educational practices. These debates about school governance, merit pay, et al may seem trite and they should be, but their potential effects are disastrous, as we have seen in one urban setting after another, despite the phony statistics cited by the self-described reformers.

If anyone were to take seriously just one of the points you make in this thread, Deborah, there would be "change we can believe in."

I dunno. It seems to me that what people refer to as "educational policy" is thinly veiled opinion and ideology. I can understand Federal, State, and LEA legislation and regulation. And I can understand dialog and debate about these. But typically "policy" pieces are nothing more than arm-chair rhetoric.

As you point out, standardized achievement tests reference students' learning, not teachers instruction. All of the results, whatever their form, focus on students, not on the instruction they've received. So any diagnosis/feedback holds previous instruction harmless and provides no cues regarding the next optimal instructional course of action.

When the focus of measurement is on the instruction rather than on the students, the information concerns are of the same sort as in other sectors of life: transparency of outcomes; significance of outcomes; reliability of delivery; time and cost of atttainment.

None of these matters are reasonably addressed by the artificial, external collections of test items and the arcane test construction and statistical scaling.

Few instructional accomplishments worthy of recording in elementary education can reasonably attained in one academic year. When I speak of the school team, I'm referring to the personnel who had a direct hand in delivering the accomplishment. It's necessary for all these personnel to be on the same page re the instructional course of action being followed, to have continuous information re the instructional status of each child in terms of the expertise that's to be delivered, and a transparent means of demonstrating when the accomplishment has been attained.

Students will differ in terms of their rate of attainment. And teachers can have latitude and exercise judgment in their instructional decisions. But what this perspective quickly illuminates is that the deficits pertain to the instructional products and protocols not to the students.


Could you send us all that letter home to parents re reading?



While the standardized testing game can indeed be destructive, from my perspective as a school leader, I can say that our attention to data has lead to great LEARNING gains for students. However, that may be because we remain committed to raising our children not test scores.

In all the discussion about school reform that seems to take place so far away from schools... there is one very critical missing piece. President Obama's education plan addresses pre-school, K-12, and access to an affordable college education-- and if he were to successfully implement every component of his "5 Pillars"... we will still not achieve the reforms we need to eliminate the achievement gap in America. That can only come if he can achieve his plan for universal health care.

Just as the economic recovery plan hinges on health care reform, so too does significant education reform. I can only imagine a day when my students can have access to complete medical, dental, vision, hearing, dietary, mental health, and critical care. Talk about an equalizer.

My rationale, if it isn't self-evident, is here: "El Milagro Weblog": http://kriley19.wordpress.com/

When the focus of measurement is on the instruction rather than on the students, the information concerns are of the same sort as in other sectors of life: transparency of outcomes; significance of outcomes; reliability of delivery; time and cost of atttainment.

None of these matters are reasonably addressed by the artificial, external collections of test items and the arcane test construction and and statistical scaling

How would you test "transparency of outcomes, significant of outcomes, reliability of delivery, time and cost of attainment" without using standardised testing?

If you don't have a standardised test, what is transparent about outcomes? For example, how do you tell if an "A" for writing refers to neatness of handwriting, or vividness of word choice, or gramatical accuracy? And if you don't send some external force in to measure how can you tell how transparent a school's outcome is, when one school's principal might be being scruplously honest and insist on her staff being so and another school's principal cheating every-which-way and putting pressure on staff to do the same?

If you want to test the significance of outcomes, you validate the test. If you don't have a standardised test, you don't know what the outcomes are between one test and another, so how do you validate it?

I don't know what you think is the difference between "reliability of delivery" and "transparency of outcomes".

Time and cost of attainments again is dependent on knowing what the attainments of educational delivery actually is.

I don't know any sector in society where transparency of outcomes, significance of outcomes, etc are assessed without using standardised tests, to the extent that they can standardise. Sports bodies standardise equipment (eg height of the basketball hoop) and measures of performance for those areas where international comparisons are wanted. Scientists are compulsive about agreeing on standardised measures of distance, weight, volume, temperature, etc. Engineers if they want to test how something responds to an impact set up machinery to drop the equipment from the same height each time so the test is standardised.

Tests are of course artificial. You artifically test the lifebelt because you want to be pretty confident it will work when it's really needed. You artifically test the food because you don't want to poison your customers. You artifically test students' arithmetic ability because you don't want them stuffing up their use of money as adults.

You externally test because people regularly fail at being objective about their own work, as in the example of principals who cheat on tests, but also in more subtle cases of bias.

You use arcane test methods if those are more accurate than commonplace ones, because you want to see if something really works (would you prefer that your doctor only used medicines that could be easily understood by the layperson, rather than using medicines that arcane testing methods like double-blinding have shown to be the most effective?)

Standardised tests as currently applied have their problems. But if you don't standardise you are lost.

I would amend your last sentence to say, that even if you standardize you may be lost anyway. The problem is that even with standardization someone, somewhere is making judgments about what the standard ought to be. The standard, of course, being an artificial stepping stone to some agreed-upon outcome. Of course, if there is widespread consensus about what this outcome ought to be then the process of working toward a standard might proceed in a relatively straightforward rational (though not necessarily smooth) way. In education there I don't believe there is widespread consensus about what the outcomes ought to be (see Cremin, Labaree, Grubb and Lazerson, etc) except in the most vague "all kids should learn" kind of way. Thus, the role of standardization is often to smuggle in one's version of the "right" outcomes in the guise of rational, central accountability because an actual democratic consensus couldn't be reached. In the business world an owner may legitimately impose his view of the right outcomes, thus setting the stage for the development of standards, but the in public sphere of education, "ownership" is shared by all. The other broad danger of standardization, is that they become a fetish. We come to think that the standard is the outcome, forgetting that we set the standard to achieve the outcome. What Deb has championed over the years is the idea that "regular" teachers, students, parents, and communities are smart enough to devise their own ways to tell whether the standardized reading test is leading to the desired educational outcome. After all our testing experts are experts in making standardized tests, but are no more expert than you or I in determining what are the proper educational outcomes for schools. This question of educational purposes, no matter how much science and pseudo-science we surround it with, is at its core a question of values. Thus, it must always remain in the sphere of public debate and the democratic process. It cannot be farmed out to experts in testing and accountability. Finally, as evidenced by this (http://www.nytimes.com/2009/04/26/business/26corner.html?pagewanted=2&em) New York Times piece, we probably are overestimating the degree of standardization in the business world.

P.S. Lest I be accused of equivocating on the ideas of standard and standardized, I write this in the context of the standards-based standardized tests that have become the norm in recent years.

Thanks Keith for relieving me having to respond to Tracy. The paucity of responses to this letter suggests that people feel it's not worth agreeing or disagreeing? But well said.

Brian R. I was at first surprised at your comment, and then wondered if it is based on my limited personal history. I entered taching in the mid 60s when there was an enormous number of books written by teachers and close observors about life inside the scool and classroom They were largely exploratory. undefensive. and comfortable at making the story itself the point of the story. Kozol, Kohl, Holt, Herndon and so on. There have been others since--but these were my "introduction" to teaching, rather than finding myself in a policy setting. The best magazine from that period was put out by home-schoolers in Boston who maybe had one piece each issue on how to protect (politically) home schooling, but the rest were accounts of efforts to "teach" x to y. In th 60s and 70s, David Hawkins put out a marvelous equivalentm called OUTLOOK, that still amazes me. Try Learning How to Crawl, which came out of that work--by Tony Kallett. Then there was a whole genre of inbetweens--Duckworth, Mike Rose, Rob Fried, even Sam Freedman's story of one high school in NYC. I read such accounts voraciously--which is why I enjoyed Matthew's book on KIPP and Tough's on the Harlem Zone school. These were books written by people who, mostly, had a visionary idea of the capacity of children to be serious and enthralled learners of the highest order. They felt we were on the edge of taking "educability" seriously. Rereading the material that came out of City College (NYC) Workshop Center in the 60s and 7os amazes me. Some articles addressed room arrangements, handing out pencils if you will, and other broad thoughts about how parents and teachers interacted. No one dreamed of discussing math education without exploring it, trying it out, quoting kids. It's a milieu tht I've lost track of. Maybe it's still alive and well. And maybe--or certainly--it didn't inundate the world that perhaps most teachers were exposed to in the "good old days". The "anti-intellectual" so-called "hands on" frame tht most teacher "education" is stuck in was well-entrenched then too. As was its reverse--Marxistical tracts on class oppression and teaching (which I also sometimes read and appreciated.) What was remarkable for me was how healthy for a while was the third alternative--a world where ideas and practice were truly honored in their complexity and in their relationship to each other--as I, Thou and It came together around something worthy of close observation.

Thanks for getting me to think more about this; and I hope you've found places of this today too.


Our comments are getting tangled up in the verbiage of "standards," "standardization," and "standardized tests" and drifting from the substance of Deborah's post.

The terms are different and usage is different in US el-hi education and in general usage.

STANDARDS in EdTalk are CONTENT standards, for the most part rhetorical, and always wishful. No consideration is given to the instructional time nor to the nature of the instruction to accomplish the intent--which is seldom clear. The talk of "raising standards" is empty. These are not performance standards as they would be in general usage.

STANDARDIZATION refers to the process of generating test norms--today a statistical scale derived using Item Response Theory. The test construction and scaling process results in measures that are sensitive too racial/SES categories, but not to instruction. For a recent confirmation of that contention, see the best data base available, the 8th grade results of the Early Childhood Longitudinal Study:


STANDARDIZED TESTS are the achievement tests used to determine "Adequate Yearly Progress." AYP is a statistically impossible federal mandate.

Deborah, I appreciate your responding to my comment. I have been thinking about it. I now see your reason for the word "passe". I also started teaching in the sixties. I didn't stay in teaching, however, and I have only a passing acquaintance of the books that came out of that era. You mention four names - Kozol, Kohl, Holt, and Herndon. It took a few minutes, but I realized those names are familiar to me. Indeed perhaps those books are sitting on my shelves right now. I looked and immediately found Herndon's "The Way It Spozed To Be". I remember that as an interesting book, but when I wrote my article a year or two about the lack of description in the study of education that book did not come to mind. I opened it at random and read a page or two. There is indeed description there, very interesting description. I think when I first read it I thought of it as human interest, not pedagogical theory. I think that assessment is still true, but the description is indeed there, and I will, as time permits, reread it, and give it a lot more thought this time.

My impression is that Mike Rose goes a step further than Herndon and others do. But I will have to give a lot more thought to that as time permits.

This brings up an interesting question. What are the classics of pedagogical theory? I am not talking about educational policy or politics here. Nor am I talking about educational controversy. I am talking about pedagogical theory, our ideas of how teaching and learning actually occur. What books, or writings, have actually shaped pedagogical thinking? And how? Did Dewey really have ideas that influenced pedagogical theory or practice? If so what were these ideas? Call me a cynic, but the ideas I associate with Dewey are ideas that I have long dismissed, ideas that are more in the nature of attractive nuisances than foundations of anything.

I can nominate one book that I believe did indeed influence education, but I'm sure I'll get a lot of disagreement on this. "Why Johnny Can't Read", for all its shortcomings and limitations, did influence thinking about teaching and learning, at least my thinking. Some might say that it was a book about controversy, but it did make an argument about the nature of learning to read, and how to best teach reading.

My thinking along these lines is very much influenced by one classic bit of writing, "The Project Method", by William Kirkpatrick in 1918. It was tremendously influential, but in my humble opinion, harmful. I found it on the internet a few years ago and used it as a centerpiece in my article I mentioned above, "The Lack Of Description In The Study Of Education". I argued that it is totally wishful imagining. There is no grounding in reality. Perhaps it should be studied by prospective teachers, but only as a bad example.

So I would be interested in knowing what others might consider the "must read" books for pedagogical theory.

Comments are now closed for this post.


Most Viewed on Education Week



Recent Comments