« Honest Information Is Essential | Main | Creative Non-Compliance »

In Defense of Judgment


Dear Deb,

It's no big surprise that "standards" involve judgments. Only standards related to physical objects are fixed, like systems of weights and measurements (e.g., the metric system).

But any standard that involves decision-making, real decision-making, means that human judgment is required. People make decisions about what is considered a passing score on the medical boards, on the law school admissions tests, even on the pass mark for the written test to get a driver's license. Some group of fallible human beings decides what constitutes the appropriate body of knowledge, and how much of that knowledge the applicant should possess. This may be a "viewpoint," but there could be no standards at all without relying on the judgment of people who are hopefully, presumably "knowledgeable" about what constitutes a passing mark.

As part of my own ideological evolution, which I referred to in an earlier post, I have come to believe that no single measure should be completely determinative and that students (and others) should have not only multiple measures of their capacity but multiple opportunities to demonstrate it. There was a time when my faith in testing was greater than it is today, when I thought that a single test could serve as a proxy for a pile of judgments. But my view now is that the test matters—it reveals whether students have mastered what was taught (math, for example) or what they learned in their home environment (vocabulary, concepts). But it is also the case, as I readily admit, that some students are not good test-takers and that some tests are simply not very good tests. Therefore, I would not want to have anyone's life or career or year in school judged by a single test score.

We know that college admissions, for example, depend on a range of measures. Admissions officers consider many pieces of evidence: grades, test scores, the student's essay, letters of recommendation, and other signifiers of the student's readiness and motivation for college-level work. The more competitive the college, the likelier it is to have this elaborate process of review. Granted, schools do not have the time or resources to make so nuanced a decision about every student every year. Tests are a shortcut for saving time and resources. The danger—and I agree with you that the danger has turned into a present-day reality—is that the tests become not only a shortcut, but the only means of judgment, a substitution for human decision-making and for the multiple measures that should be considered when the stakes are high (promotion, graduation).

I disagree with you about the "expertise" involved in deciding what kids "should" be reading. Those decisions are made all the time. They are made by textbook publishers and editors. The results of those decisions are to be found in the literature textbooks used in the majority of American public schools. When I was writing "The Language Police," I ordered every mass-market literature textbook used in the schools. I was really chagrined; the decisions had been made, and from my point of view, they were uniformly awful. I am a strong believer that literature taught in schools should be a mix of the classic and the modern, but that it should all be really wonderful literature by someone's lights. What I found instead was about 30-40 percent good literature, classic and new, and a lot of assorted trivia. Also a huge proportion of the books devoted to graphics and blank space.

I don't know whether or not you agree with me about the importance of classic literature (I am referring here not to the ancients, but to English and American writers who are generally acknowledged to be worth reading like Shakespeare, Donne, Mill, Thoreau, Dickinson, Longfellow). To make my case, I prepared two anthologies, "The American Reader" (1991), which is multiculturally American, and "The English Reader" (2006), which I edited with my son Michael. Sorry to be self-referential, but these books were my attempt to gather the wonderful things that are usually left out of the prescribed curriculum in the literature textbooks, and to save busy teachers the time required to find all these pieces. To anyone who says, "but that's not what I would choose," I say, "okay, choose your own." No problem.

I know, having studied the history of curriculum, that the content of history textbooks and literature textbooks changes from generation to generation. So it has been and so it ever will be. But we should not throw out all we have known in the past as irrelevant. That way lies a generation with no common grounds for discussion and debate, with no songs to sing communally. That way leaves to commercial interests all that we know.



The problem with literature choice is that so many English teachers still think of their job as teaching literature--specific literature--rather than teaching the standards of their state. This makes curriculum very personal, and therefore very arbitrary. I teach "To Kill a Mockinbird" because...why? Because I love it? Because I think everyone should love it? Or because it allows me to teach certain skills and concepts necessary at that grade level?

The lack of clarity in the language of the standards doesn't help. It's often unclear exactly which skills and concepts need to be taught--or the skills are clear but the concepts are left blank. Some states have provided additional performative language, which helps on the skills side, and some districts have worked, internally or with help, to better define what teaching those standards really means, week to week. But clearer standards regarding what IDEAS are essential would help make the case for certain texts, or at least certain classes of texts (like the classics).

People often look to the field of certification testing as proof that good tests can be developed to measure complex performance. There are some similarities to educational testing, but some important differences. Just a few examples:

Some excellent statistical work has been done on the LSAT, but it is different in some interesting ways. An old master item pool can include 5,316 test items for a paper and pencil test that requires only 100 of the items (van der Linden, Ariel, & Veldkamp, JEBS, Spring 2006, p.81-99, quoted from p.89-90). How many states have that many items for any test? For any content area, even summing over grades? That is one of the reasons why the LSAT is very, very different.

As for tests for medical professionals, Tony LaDuca of the National Board of Medical Examiners (Ed Measurement: Issues and Practice, Summer 2006, p. 31-33) addresses the problem in developing a sound test for such a wide array of professionals, even within a specialty. He explains the weaknesses of common task inventory analysis for medical licensure. A typical task inventory analysis asks professionals working in the field to list what they do and to rate how important it is. That information is then used to develop a good test and maybe even to weigh test results to emphasize items of greater importance. But LaDuca says that task inventory analysis oversimplifies the complexity of jobs like doctors and nurses who have a high degree of complexity and autonomy in their work. He gives the example of three different kinds of nurses – at-home residential, in-patient care, neonatal -- and how they might rate different tasks. "If we can imagine that very different importance ratings come from respondents because their practice contexts are markedly different, then we must be concerned that the mean of their ratings is not a good representation of any rater's view" (p. 33). We see the same lack of consistency among teachers at NCLB standard settings for the same reasons – and hide it by taking the mean or median.

Wang, Witt, and Schnipke (same issue) counter that "In spite of the varying degrees of reliability in ratings of task importance, however, task inventory remains a widely used method" (p. 36). They do offer a number of worthwhile suggestions for improving reliability -- more ratings, more carefully collected ratings, and, finally, accepting that some professions are more heterogeneous than others. "Anyone who has worked with a variety of licensure and certification clients can testify that some groups seem to share a common understanding of their field that leads to quick consensus in activities like job analysis, while other groups require much more discussion to work through their diverse perceptions" (p. 36). One thing is certain: People who imagine that certification testing has answered all of the riddles do not know certification testing.

As for cut scores in certification testing, people should know that a great deal of certification testing is done in a quasi-criterion-referenced manner. The profession decides how many new professionals the profession needs and can allow to maintain the current earning potential of those employed and a cut score is set to allow only that number of people to pass. You must admit that it has a certain evil beauty.

What is the current hot topic in certification testing? Simulated performance testing -- tests that measure simulated real-world performance, not standardized paper-and-pencil tests as in the current NLCB model. Perhaps we should look to certification testing after all.

Maybe what education needs is more like a Driver’s License Test. A Driver's License test does not consist of just a written test. Depending on your state, it can involve a physical (eye test); a paper-and-pencil or computerized test of knowledge (written rule test); an oral test (identify this sign); a performance test (the driving part); provisionary passing status for initial test takers; multiple options to repeat any part of the test; additional testing for similar, but different driving (motorcycle, commercial truck); regularly required retesting at certain ages, additional testing at any age if you move to a different state; and the possibility of complete revocation of your passing status if you fail the second performance measure taken every day day-after-day in the field (loss of license for very bad driving). It is the ultimate in Multiple Measures! It really isn't anything at all like a standardized educational assessment -- except that each state can make it own test. At least, they still can today.

Which raises the only really important question. Diane: Given that you feel that NCLB has placed too much attention on standardized tests, how would the national standardized tests you are seem to favor help alleviate that problem?

In respone to the idea that one test fitting all, does not give us accurate information. I would agree with some of the comments made by certifiable. We need to start giving state and district test using a variety of assessments. Yes reality is that this is not how it occurs now, however reality also shows that all students do not learn the same. If all students don't learn the same way, they should not be assessed the same ways. This high stakes testing does nothing to aid in student development.

Comments are now closed for this post.


Most Viewed on Education Week



Recent Comments