« Come on Feel the Noise! | Main | Happy Anniversary! »

What Does Educational Testing Really Tell Us? An Interview with Daniel Koretz

| 1 Comment
Koretz.jpg
Daniel Koretz, a professor who teaches educational measurement at the Harvard Graduate School of Education, generously agreed to field a few questions about educational testing. He is the author of Measuring Up: What Educational Testing Really Tells Us.

EW: What are the three most common misconceptions about educational testing that Measuring Up hopes to debunk?

DK: There are so many that it is hard to choose, but given the importance of NCLB and other test-based accountability systems, I'd choose these:
* That test scores alone are sufficient to evaluate a teacher, a school, or an educational program.

* That you can trust the often very large gains in scores we are seeing on tests used to hold students accountable.

* That alignment is a cure-all - that more alignment is always better, and that alignment is enough to take care of problems like inflated scores.
EW: I'm intrigued by your third point about alignment. For example, we often hear that because state testing systems are directed towards a particular set of standards, we should primarily be concerned with student outcomes on tests aligned with those standards. This is the common refrain about a "test worth teaching to." What's missing from this argument?

DK: Up to a point, alignment is a clearly good thing: we want clarity about goals, and we want both instruction and assessment to focus on the goals deemed most important.

However, there are two flies in the ointment. The first is that the achievement tests are concerned with, no matter how well aligned, are small samples from large domains of performance. That means that most of the domain, including much of the content and skills relevant to the standards, is necessarily omitted from the test. As I explain in Measuring Up, this is analogous to a political poll or any other survey, and it is not a big problem under low-stakes conditions. Under high-stakes conditions, however, there is a strong incentive to focus on the sampled content at the expense of the omitted material, which causes score inflation. Aligned tests are not exempt. Score inflation does not require that the test include poorly aligned content. Even if the test is right on target, inflation will occur if the accountability program leads people to deemphasize other material that is also important for the conclusions based on scores. And to make this concrete: some of the most serious examples of score inflation in the research literature were found in Kentucky's KIRIS system, which was a standards-based testing program.

The second problem is predictability. To prepare students in a way that inflates scores, you have to know something about the test that is coming this year, not just the ones you have seen in the past. The content, format, style, or scoring of the test has to be somewhat predictable. And, of course, it usually is, as anyone who has looked at tests and test preparation materials should know. Carried too far, alignment actually makes this problem worse, by focusing attention on the particular way that knowledge and skills are presented in a given set of standards. Think about 'power standards,' 'eligible standards,' and 'grade level expectations,' all of which can be labels for narrowing in on the specifics of how a set of skills appear on one state's particular assessment.

Why is this bad? Because many of those specifics are not relevant to the students' broader competence and long-term well-being. Scores on a test are a means to an end, not properly an end in themselves. Education should provide students knowledge and skills that they can use in later study and in the real world. Employers and university faculty will not do students the favor of recasting problems to align with the details of the state tests with which they are familiar. As Audrey Qualls said some years ago: real gains in achievement require that students can perform well when confronted with "unfamiliar particulars." Improving performance on the familiar but not the unfamiliar is score inflation.

EW: What are the implications of score inflation for both measuring and attenuating achievement gaps? Because schools serving disadvantaged students face more pressure to increase test scores via the mechanisms you describe, I worry that true achievement gaps may be unchanged - or even growing - while they appear to be closing based on high-stakes measures.

DK: I share your worry. I have long suspected that on average, inflation will be more severe in low-achieving schools, including those serving disadvantaged students. In most systems, including NCLB, these schools have to make the most rapid gains, but they also face unusually serious barriers to doing so. And in some cases, the size of the gains they are required to make exceed by quite a margin what we know how to produce by legitimate means. This will increase the incentive to take short cuts, including those that will inflate scores. This would be ironic, given that one of the primary rationales for NCLB is to improve equity. Unfortunately, while we have a lot of anecdotal evidence suggesting that this is the case, we have very few serious empirical studies of this. We do have some, such as the RAND study that showed convincingly that the "Texas miracle" in the early 1990s, supposedly including a rapid narrowing of the achievement gap, was largely an illusion. Two of my students are currently working with me on a study of this in one large district, but we are months away from releasing a reviewed paper, and it is only one district.

I have argued for years that one of the most glaring faults of our current educational accountability systems is that we do not sufficiently evaluate their effects, instead trusting - evidence to the contrary - that any increase in scores is enough to let us declare success. We should be doing more evaluation not only because it is needed for the improvement of policy, but also because we have an ethical obligation to the children upon whom we are experimenting. Nowhere is this failure more important than in the case of disadvantaged students, who most need the help of education reform.

Inflation is not the only reason why we are not getting a clear picture of changes in the achievement gap. The other is our insistence on standards-based reporting. As I explain in Measuring Up, relying so much on this form of reporting has been a serious mistake for a number of reasons. One reason is that if one wants to compare change in two groups that start out at different levels - poor and wealthy kids, African American and white kids, whatever - changes in the percents above a standard will always give you the wrong answer. This particular statistic confuses the amount of progress a group makes with the proportion of the group clustered around that particular standard, and the latter has to be different for high- and low-scoring groups. I and others have shown that this distortion is a mathematical certainty, but perhaps most telling is a paper by Bob Linn that shows that if you ask whether the achievement gap has been closing, NAEP will give you different answers - very different answers - depending on whether you use changes in scale scores, changes in percent above Basic, or changes in percent above Proficient. This is not because the relative progress has been different at different levels of performance; it is simply an artifact of using percents above standards. This is only one of many problems with standards-based reporting, but in my opinion, it is by itself sufficient reason to return to other forms of reporting.
1 Comment

We could use some discussion on what those other forms of reporting are.

Comments are now closed for this post.

Advertisement

Recent Comments

  • GP: We could use some discussion on what those other forms read more

Archives

Categories

Technorati

Technorati search

» Blogs that link here

Tags

8th grade retention
Fordham Foundation
The New Teacher Project
Tim Daly
absent teacher reserve
absent teacher reserve

accountability
accountability in Texas
accountability systems in education
achievement gap
achievement gap in New York City
acting white
admissions
AERA
AERA annual meetings
AERA conference
AERJ
Alexander Russo
Algebra II
American Association of University Women
American Education Research Associatio
American Education Research Association
American Educational Research Journal
American Federation of Teachers
Andrew Ho
Art Siebens
ATR
Baltimore City Public Schools
Barack Obama
Bill Ayers
black-white achievement gap
books
books on educational research
boy crisis
brain-based education
Brian Jacob
bubble kids
Building on the Basics
Cambridge Education
carnival of education
Caroline Hoxby
Caroline Hoxby charter schools
cell phone plan
charter schools
Checker Finn
Chicago
Chicago shooting
Chicago violence
Chris Cerf
class size
Coby Loup
college access
cool people you should know
credit recovery
curriculum narrowing
D3M
Dan Willingham
data driven
data-driven decision making
data-driven decision-making
David Cantor
DC
Dean Millot
demographics of schoolchildren
Department of Assessment and Accountability
Department of Education budget
desegregation
Diplomas Count
disadvantages of elite education
do schools matter
Doug Ready
Doug Staiger
dropout factories
dropout rate
dropouts
education books
education policy
education policy thinktanks
educational equity
educational research
educational triage
effects of neighborhoods on education
effects of No Child Left Behind
effects of schools
effects of Teach for America
elite education
ETS
Everyday Antiracism
excessed teachers
exit exams
experienced teachers
Fordham and Ogbu
Fordham Foundation
Frederick Douglass High School
Gates Foundation
gender
gender and education
gender and math
gender and science and mathematics
gifted and talented
gifted and talented admissions
gifted and talented program
gifted and talented programs in New York City
girls and math
good schools
graduate student union
graduation rate
graduation rates
guns in Chicago
health benefits for teachers
High Achievers
high school
high school dropouts
high school exit exams
high school graduates
high school graduation rate
high-stakes testing
high-stakes tests and science
higher ed
higher education
highly effective teachers
Houston Independent School District
how to choose a school
IES
incentives in education
Institute for Education Sciences
is teaching a profession?
is the No Child Left Behind Act working
Jay Greene
Jim Liebman
Joel Klein
John Merrow
Jonah Rockoff
Kevin Carey
KIPP
KIPP and boys
KIPP and gender
Lake Woebegon
Lars Lefgren
leaving teaching
Leonard Sax
Liam Julian

Marcus Winters
math achievement for girls
McGraw-Hill
meaning of high school diploma
Mica Pollock
Michael Bloomberg
Michelle Rhee
Michelle Rhee teacher contract
Mike Bloomberg
Mike Klonsky
Mike Petrilli
narrowing the curriculum
National Center for Education Statistics Condition of Education
NCLB
neuroscience
new teachers
New York City
New York City bonuses for principals
New York City budget
New York City budget cuts
New York City Budget cuts
New York City Department of Education
New York City Department of Education Truth Squad
New York City ELA and Math Results 2008
New York City gifted and talented
New York City Progress Report
New York City Quality Review
New York City school budget cuts
New York City school closing
New York City schools
New York City small schools
New York City social promotion
New York City teacher experiment
New York City teacher salaries
New York City teacher tenure
New York City Test scores 2008
New York City value-added
New York State ELA and Math 2008
New York State ELA and Math Results 2008
New York State ELA and Math Scores 2008
New York State ELA Exam
New York state ELA test
New York State Test scores
No Child Left Behind
No Child Left Behind Act
passing rates
Pearson
picking a school
press office
principal bonuses
proficiency scores
push outs
pushouts
qualitative educational research
qualitative research in education
quitting teaching
race and education
racial segregation in schools
Randall Reback
Randi Weingarten
Randy Reback
recovering credits in high school
Rick Hess
Robert Balfanz
Robert Pondiscio
Roland Fryer
Russ Whitehurst
Sarah Reckhow
school budget cuts in New York City
school choice
school effects
school integration
single sex education
skoolboy
small schools
small schools in New York City
social justice teaching
Sol Stern
SREE
Stefanie DeLuca
stereotype threat
talented and gifted
talking about race
talking about race in schools
Teach for America
teacher effectiveness
teacher effects
teacher quailty
teacher quality
teacher tenure
teachers
teachers and obesity
Teachers College
teachers versus doctors
teaching as career
teaching for social justice
teaching profession
test score inflation
test scores
test scores in New York City
testing
testing and accountability
Texas accountability
TFA
The No Child Left Behind Act
The Persistence of Teacher-Induced Learning Gains
thinktanks in educational research
Thomas B. Fordham Foundation
Tom Kane
Tweed
University of Iowa
Urban Institute study of Teach for America
Urban Institute Teach for America
value-addded
value-added
value-added assessment
Washington
Wendy Kopp
women and graduate school science and engineering
women and science
women in math and science
Woodrow Wilson High School