« Bold and Broad Brain Scan: It's Not an Either/Or, and No One Said It Was! | Main | Cool People You Should Know: Ken Frank »

Why We Should Care About Test Score Inflation

Kevin Carey’s dismissal of “test score inflation” provides an ideal opportunity to talk about the book I finished this weekend, Measuring Up: What Educational Testing Really Tells Us, by Dan Koretz, a psychometrician at the Harvard Grad School of Education – hardly an opponent of testing.

Koretz calls “test score inflation,” in which gains on tests used for accountability dramatically outpace gains on low stakes tests, the “dirty secret of high-stakes testing.” If you compare NAEP trends and state score trends, you’ll see that state scores have increased significantly more than NAEP scores since NCLB was adopted.

To understand why test score inflation is a serious problem, you have to understand the sampling principle of testing. Koretz provides the following example: Suppose we want to evaluate students’ vocabulary. A typical high school student knows 11,000 root words, but a test can only include a sample of these words – maybe 40. If we design our test well, we can still learn something about the breadth of each student’s vocabulary. But we don’t really care if the student knows the 40 words on the test; rather, we care about the larger domain from which these words are sampled.

Now imagine that for weeks before our test, I drilled students incessantly on those 40 words. Voila! They perform exceptionally on the test. Yes, their vocabularies have increased by 40 words. Maybe these are 40 really important words - the so-called "test worth teaching to." But proficiency in the domain that my test is intended to measure has not expanded by the same amount. I’ve seen this over and over again; administrators and teachers figure out which concepts are consistently on the test, and which aren’t, and they alter their instruction accordingly. The trouble is that if we administer a slightly different test, drawing on a broader range of concepts from the domain we care about, kids haven't mastered them.

Carey explains that this is just a standards mismatch problem - i.e. state test standards are not the same as those used on national tests. Koretz takes Carey’s critique head on in this passage:

"Alignment is a lynchpin of policy in this era of standards-based testing. Tests should be aligned with standards, and instruction should be aligned with both....And alignment is seen by many as insurance against score inflation. For example, a principal of a local school that is well known for the high scores achieved by its largely poor and minority students gave a presentation to the Harvard Graduate School of Education a few years ago. At one point, she angrily denounced critics who worry about 'teaching to the test.' We had no reason to be concerned about teaching to the test in her school, she asserted, because the state’s test measures important knowledge and skills. Therefore, if her faculty teaches to the test, students will learn important things.

This is nonsense, and I have a hunch about what I would find if I were allowed to administer an alternative test to her students. Alignment is just reallocation by another name. Certainly it is better to focus instruction on material that someone deems valuable, rather than frittering time away on unimportant things. But that is not enough. Whether alignment inflates scores depends also on the importance of the material that is deemphasized. And research has shown that standards-based tests are not immune to this problem. These tests too are limited samples from larger domains, and therefore focusing too narrowly on the content of the specific test can inflate scores." (p. 253-254)

We only care about test scores if they translate into general improvements in children’s academic skills that generate meaningful improvements in their life chances. If these gains don’t translate to tests that measure similar skills – basic reading and math competencies - what are the chances that they are going to help them succeed in the workplace or in college? And that is a very good reason to worry about test score inflation.

Spoiler alert: NY state test scores are out next week, if not sooner. What should we make of NYC's flat NAEP scores alongside state test improvements so large they're unbelievable? Kind of makes you wonder.

Thanks for a terrific and enlightening post. I'm looking forward to reading the Koretz book.

If only the more vocal (and more lavishly funded) bloggers took the time to digest the relevant work in this area. Among this crowd, it's almost a badge of honor to NOT pay attention to noted experts who have devoted their lives to understand these issues.

But heck, if you're Kevin Carey, why go through all the trouble to read when you can spout policy recommendations for free?

I didn't say that test score inflation doesn't exist. In fact, I said that large differences between NCLB tests and other tests would be cause for concern. What I said was that divergence doesn't seem like, to quote Eduwonkette (or Linda Darling-Hammond) "prima facie" evidence of inflation, since there's another plausible explanation -- standards misalignment. "Prima facie" means (per American Heritage) "true, authentic, or adequate at first sight" or (and I think this is closer to the author's intent) "Evident without proof or reasoning; obvious."

I'm puzzled by Eduwonkette's persistent unwillingness to represent opposing arguments honestly.

Also, the pre-rebuttal of test score improvement in NYC? Hilarious.

I think readers will have to decide who is more reasonable, thorough, and honest: Eduwonkette, who consistently turns to the research of noted experts and the social science knowledge base for guidance, or Kevin Carey, who has little to offer beyond a holier-than-thou tone, personal attacks, and a desire to be right at all costs.

"Prima facie" evidence of this: Kevin asserts that the diverging trends in NCLB and state test scores are really not significant until that divergence becomes "wild," "e.g. a 50% increase on the state test on the state test while SAT-10 scores plummet."

Where does that magic threshhold come from? Kevin's mind. What research exactly tells us that this threshhold is the tipping point at which the divergence becomes test inflation? And if 50% isn't big enough to suit his argument (say if actual test scores diverged 65%), he'd make it 70%, and insist Eduwonkette is a union member for good measure (which he frequently likes to hint, apparently with no basis for that either). Give me a break.

At least when Eduwonkette looks to write about test inflation she turns to the work of a Harvard psychometrician who actually has devoted his life to studying these issues.

When NYC scores on NAEP were flat, Joel Klein refuted the obvious implication by saying that NYC students study only for the NYC test, not for NAEP. But if the skills learned in test prep are so specific that they don't transfer to other tests of the same subjects, what has been learned? Nothing other than the mendacity of the NYC Department of Education spin machine.

Comments are now closed for this post.


Recent Comments

  • anonymous: When NYC scores on NAEP were flat, Joel Klein refuted read more
  • Doug Douglass: I think readers will have to decide who is more read more
  • Kevin Carey: I didn't say that test score inflation doesn't exist. In read more
  • Doug Douglass: Thanks for a terrific and enlightening post. I'm looking forward read more




Technorati search

» Blogs that link here


8th grade retention
Fordham Foundation
The New Teacher Project
Tim Daly
absent teacher reserve
absent teacher reserve

accountability in Texas
accountability systems in education
achievement gap
achievement gap in New York City
acting white
AERA annual meetings
AERA conference
Alexander Russo
Algebra II
American Association of University Women
American Education Research Associatio
American Education Research Association
American Educational Research Journal
American Federation of Teachers
Andrew Ho
Art Siebens
Baltimore City Public Schools
Barack Obama
Bill Ayers
black-white achievement gap
books on educational research
boy crisis
brain-based education
Brian Jacob
bubble kids
Building on the Basics
Cambridge Education
carnival of education
Caroline Hoxby
Caroline Hoxby charter schools
cell phone plan
charter schools
Checker Finn
Chicago shooting
Chicago violence
Chris Cerf
class size
Coby Loup
college access
cool people you should know
credit recovery
curriculum narrowing
Dan Willingham
data driven
data-driven decision making
data-driven decision-making
David Cantor
Dean Millot
demographics of schoolchildren
Department of Assessment and Accountability
Department of Education budget
Diplomas Count
disadvantages of elite education
do schools matter
Doug Ready
Doug Staiger
dropout factories
dropout rate
education books
education policy
education policy thinktanks
educational equity
educational research
educational triage
effects of neighborhoods on education
effects of No Child Left Behind
effects of schools
effects of Teach for America
elite education
Everyday Antiracism
excessed teachers
exit exams
experienced teachers
Fordham and Ogbu
Fordham Foundation
Frederick Douglass High School
Gates Foundation
gender and education
gender and math
gender and science and mathematics
gifted and talented
gifted and talented admissions
gifted and talented program
gifted and talented programs in New York City
girls and math
good schools
graduate student union
graduation rate
graduation rates
guns in Chicago
health benefits for teachers
High Achievers
high school
high school dropouts
high school exit exams
high school graduates
high school graduation rate
high-stakes testing
high-stakes tests and science
higher ed
higher education
highly effective teachers
Houston Independent School District
how to choose a school
incentives in education
Institute for Education Sciences
is teaching a profession?
is the No Child Left Behind Act working
Jay Greene
Jim Liebman
Joel Klein
John Merrow
Jonah Rockoff
Kevin Carey
KIPP and boys
KIPP and gender
Lake Woebegon
Lars Lefgren
leaving teaching
Leonard Sax
Liam Julian

Marcus Winters
math achievement for girls
meaning of high school diploma
Mica Pollock
Michael Bloomberg
Michelle Rhee
Michelle Rhee teacher contract
Mike Bloomberg
Mike Klonsky
Mike Petrilli
narrowing the curriculum
National Center for Education Statistics Condition of Education
new teachers
New York City
New York City bonuses for principals
New York City budget
New York City budget cuts
New York City Budget cuts
New York City Department of Education
New York City Department of Education Truth Squad
New York City ELA and Math Results 2008
New York City gifted and talented
New York City Progress Report
New York City Quality Review
New York City school budget cuts
New York City school closing
New York City schools
New York City small schools
New York City social promotion
New York City teacher experiment
New York City teacher salaries
New York City teacher tenure
New York City Test scores 2008
New York City value-added
New York State ELA and Math 2008
New York State ELA and Math Results 2008
New York State ELA and Math Scores 2008
New York State ELA Exam
New York state ELA test
New York State Test scores
No Child Left Behind
No Child Left Behind Act
passing rates
picking a school
press office
principal bonuses
proficiency scores
push outs
qualitative educational research
qualitative research in education
quitting teaching
race and education
racial segregation in schools
Randall Reback
Randi Weingarten
Randy Reback
recovering credits in high school
Rick Hess
Robert Balfanz
Robert Pondiscio
Roland Fryer
Russ Whitehurst
Sarah Reckhow
school budget cuts in New York City
school choice
school effects
school integration
single sex education
small schools
small schools in New York City
social justice teaching
Sol Stern
Stefanie DeLuca
stereotype threat
talented and gifted
talking about race
talking about race in schools
Teach for America
teacher effectiveness
teacher effects
teacher quailty
teacher quality
teacher tenure
teachers and obesity
Teachers College
teachers versus doctors
teaching as career
teaching for social justice
teaching profession
test score inflation
test scores
test scores in New York City
testing and accountability
Texas accountability
The No Child Left Behind Act
The Persistence of Teacher-Induced Learning Gains
thinktanks in educational research
Thomas B. Fordham Foundation
Tom Kane
University of Iowa
Urban Institute study of Teach for America
Urban Institute Teach for America
value-added assessment
Wendy Kopp
women and graduate school science and engineering
women and science
women in math and science
Woodrow Wilson High School