eduwonkette_header_515.jpg

Through the lens of social science, eduwonkette takes a serious, if sometimes irreverent, look at some of the most contentious education policy debates. (Find eduwonkette's complete archives prior to Jan. 6, 2008 here.)

« Bold and Broad Brain Scan: It's Not an Either/Or, and No One Said It Was! | Main | Cool People You Should Know: Ken Frank »

Why We Should Care About Test Score Inflation

nailed_first_inflation_s.jpg
Kevin Carey’s dismissal of “test score inflation” provides an ideal opportunity to talk about the book I finished this weekend, Measuring Up: What Educational Testing Really Tells Us, by Dan Koretz, a psychometrician at the Harvard Grad School of Education – hardly an opponent of testing.

Koretz calls “test score inflation,” in which gains on tests used for accountability dramatically outpace gains on low stakes tests, the “dirty secret of high-stakes testing.” If you compare NAEP trends and state score trends, you’ll see that state scores have increased significantly more than NAEP scores since NCLB was adopted.

To understand why test score inflation is a serious problem, you have to understand the sampling principle of testing. Koretz provides the following example: Suppose we want to evaluate students’ vocabulary. A typical high school student knows 11,000 root words, but a test can only include a sample of these words – maybe 40. If we design our test well, we can still learn something about the breadth of each student’s vocabulary. But we don’t really care if the student knows the 40 words on the test; rather, we care about the larger domain from which these words are sampled.

Now imagine that for weeks before our test, I drilled students incessantly on those 40 words. Voila! They perform exceptionally on the test. Yes, their vocabularies have increased by 40 words. Maybe these are 40 really important words - the so-called "test worth teaching to." But proficiency in the domain that my test is intended to measure has not expanded by the same amount. I’ve seen this over and over again; administrators and teachers figure out which concepts are consistently on the test, and which aren’t, and they alter their instruction accordingly. The trouble is that if we administer a slightly different test, drawing on a broader range of concepts from the domain we care about, kids haven't mastered them.

Carey explains that this is just a standards mismatch problem - i.e. state test standards are not the same as those used on national tests. Koretz takes Carey’s critique head on in this passage:

"Alignment is a lynchpin of policy in this era of standards-based testing. Tests should be aligned with standards, and instruction should be aligned with both....And alignment is seen by many as insurance against score inflation. For example, a principal of a local school that is well known for the high scores achieved by its largely poor and minority students gave a presentation to the Harvard Graduate School of Education a few years ago. At one point, she angrily denounced critics who worry about 'teaching to the test.' We had no reason to be concerned about teaching to the test in her school, she asserted, because the state’s test measures important knowledge and skills. Therefore, if her faculty teaches to the test, students will learn important things.

This is nonsense, and I have a hunch about what I would find if I were allowed to administer an alternative test to her students. Alignment is just reallocation by another name. Certainly it is better to focus instruction on material that someone deems valuable, rather than frittering time away on unimportant things. But that is not enough. Whether alignment inflates scores depends also on the importance of the material that is deemphasized. And research has shown that standards-based tests are not immune to this problem. These tests too are limited samples from larger domains, and therefore focusing too narrowly on the content of the specific test can inflate scores." (p. 253-254)

We only care about test scores if they translate into general improvements in children’s academic skills that generate meaningful improvements in their life chances. If these gains don’t translate to tests that measure similar skills – basic reading and math competencies - what are the chances that they are going to help them succeed in the workplace or in college? And that is a very good reason to worry about test score inflation.

Spoiler alert: NY state test scores are out next week, if not sooner. What should we make of NYC's flat NAEP scores alongside state test improvements so large they're unbelievable? Kind of makes you wonder.

TrackBack

TrackBack URL for this entry:
http://blogs.edweek.org/cgi-bin/mt-tb.cgi/4149.

Comments

Thanks for a terrific and enlightening post. I'm looking forward to reading the Koretz book.

If only the more vocal (and more lavishly funded) bloggers took the time to digest the relevant work in this area. Among this crowd, it's almost a badge of honor to NOT pay attention to noted experts who have devoted their lives to understand these issues.

But heck, if you're Kevin Carey, why go through all the trouble to read when you can spout policy recommendations for free?

I didn't say that test score inflation doesn't exist. In fact, I said that large differences between NCLB tests and other tests would be cause for concern. What I said was that divergence doesn't seem like, to quote Eduwonkette (or Linda Darling-Hammond) "prima facie" evidence of inflation, since there's another plausible explanation -- standards misalignment. "Prima facie" means (per American Heritage) "true, authentic, or adequate at first sight" or (and I think this is closer to the author's intent) "Evident without proof or reasoning; obvious."

I'm puzzled by Eduwonkette's persistent unwillingness to represent opposing arguments honestly.

Also, the pre-rebuttal of test score improvement in NYC? Hilarious.

I think readers will have to decide who is more reasonable, thorough, and honest: Eduwonkette, who consistently turns to the research of noted experts and the social science knowledge base for guidance, or Kevin Carey, who has little to offer beyond a holier-than-thou tone, personal attacks, and a desire to be right at all costs.

"Prima facie" evidence of this: Kevin asserts that the diverging trends in NCLB and state test scores are really not significant until that divergence becomes "wild," "e.g. a 50% increase on the state test on the state test while SAT-10 scores plummet."

Where does that magic threshhold come from? Kevin's mind. What research exactly tells us that this threshhold is the tipping point at which the divergence becomes test inflation? And if 50% isn't big enough to suit his argument (say if actual test scores diverged 65%), he'd make it 70%, and insist Eduwonkette is a union member for good measure (which he frequently likes to hint, apparently with no basis for that either). Give me a break.

At least when Eduwonkette looks to write about test inflation she turns to the work of a Harvard psychometrician who actually has devoted his life to studying these issues.

When NYC scores on NAEP were flat, Joel Klein refuted the obvious implication by saying that NYC students study only for the NYC test, not for NAEP. But if the skills learned in test prep are so specific that they don't transfer to other tests of the same subjects, what has been learned? Nothing other than the mendacity of the NYC Department of Education spin machine.

Post a comment

Ground Rules for Posting
We encourage lively debate, but please, no profanity or personal attacks. By commenting, you are agreeing to abide by our user agreement.

USA-2008-olympics-ette_160.jpg

eduwonkette
E-mail me

The opinions expressed in eduwonkette are strictly those of the author and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Get RSS

Get eduwonkette delivered by e-mail. Enter your e-mail here:

Delivered by FeedBurner

Advertisement
Powered by
Movable Type 3.34

EW Archive