Last April, near the end of the year I spent observing their classroom for a book I'm writing about school choice in Washington, D.C., I watched the third graders of Bancroft Elementary School run off their lunch on the playground before a final afternoon of preparation for the city's high-stakes standardized exam, the DC-CAS.
Their teacher, Rebecca Lebowitz, sat next to me. I asked her what it felt like to teach in a system that placed such high stakes on a single test. "I think the emphasis we place on this test is out of proportion," she said as her students began lining up to head back inside. "But I also think as long as the teachers aren't overly nervous, the kids won't be, either. We're playing the game so they can be players in the game."
I was reminded of Ms. Lebowitz's observations yesterday, when the District of Columbia's parallel school systems - its centralized district of neighborhood schools, and its decentralized network of charter schools - issued parallel press releases in which each side touted its own respective rise in DC-CAS scores.
These pronouncements of victory beg an essential question, and it's one I think we have yet to sufficiently answer: When it comes to evaluating the overall health of a school - whether you're a prospective parent or a state agency - which data is most relevant, and why?
Since the start of the century, federal policy in America has provided a clear answer: what matters most are a child's scores on standardized exams in reading and math. Since then, schools and states have adjusted their schedules and priorities accordingly, resulting in a modern landscape of public education in which many children experience daily deep dives into the intricacies of numbers and letters - and barely skim the surface of everything else.
The ongoing willingness of policymakers - and, by extension, the general public - to judge schools based on this single metric of success is one of the more surreal features of modern American school reform. By comparison, much of the private sector has moved away from using net income as a company's sole benchmark, and many businesses have adopted a "balanced scorecard" approach that features both financial and non-financial metrics, and both inputs and outcomes. In doing so, these businesses have rightly heeded the 1976 warnings of social psychologist Donald Campbell, who said: "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
This insight, which has come to be known as "Campbell's Law," does not mean measurement has no place in organizational improvement. It does mean, however, that if policymakers are serious about evaluating whether or not schools are successful, they need to go a lot deeper than reading and math scores. As the Fordham Institute's Kathleen Porter-Magee puts it, "If we value learning in other areas, we need to measure it. And that doesn't mean simply adding testing hours but rather being more deliberate and creative about the assessments we administer and the content they measure."
What, then, should we measure, and how? And what role should standardized tests continue to have in our efforts to transform American public education?
On the second question, I defer to Harvard's Daniel Koretz. For years, Koretz has been researching the effects of high-stakes testing programs on how teachers teach - and students learn. In 2008, he published the book Measuring Up: What Educational Testing Really Tells Us, to share his insights. "Careful testing can give us tremendously valuable information about student achievement that we would otherwise lack," he says. The question is how well we understand what standardized tests can, and cannot, tell us about American schools. "There is no optimal design," he asserts. "Rather, designing a testing program is an exercise in trade-offs and compromise - and a judgment about which compromise is best.
"Critics who ignore the impact of social factors on test scores miss the point," Koretz argues. "The reason to acknowledge their influence is not to let anyone off the hook but to get the right answer. Certainly, low scores are a sign that something is amiss . . .. But the low scores themselves don't tell why achievement is low and are usually insufficient to tell us where instruction is good or bad, just as a fever by itself is insufficient to reveal what illness a child has. Disappointing scores can mask good instruction, and high scores can hide problems that need to be addressed."
More than sixty years ago - 1951, to be precise - the University of Iowa's E.F. Lindquist argued for similar caution. Like Koretz, Lindquist was a researcher on core issues of educational assessment; unlike Koretz, he was arguably the person most responsible for fostering the development and use of standardized tests in the United States, having helped design not just several of Iowa's state assessments, but also the ACT, GED, and National Merit Scholarship test. He and his colleagues even invented the first optical scanner for scoring tests - an innovation with Cotton Gin-like implications for the exponential spread of standardized testing in the decades that followed.
Lindquist was, in other words, about as far from "anti-testing" as you could be. Yet he also understood a fundamental principle about assessment, and about teaching and learning itself. "The only perfectly valid measure of the attainment of an educational objective," he wrote, "would be one based on direct observation of the natural behavior of individuals."
Lindquist's point relates back to the first question - what else should we measure, and how? And on that topic I defer to veteran educator Ron Berger, who says "to build a new culture, a new ethic, you need a focal point - a vision - to guide the direction for reform. The particular spark I try to share as a catalyst is a passion for beautiful student work and developing conditions that can make this work possible.
"I have a hard time thinking about a quick fix for education," Berger explains in his book An Ethic of Excellence, "because I don't think education is broken. Some schools are very good; some are not. Those that are good have an ethic, a culture, which supports and compels students to try and to succeed. Those schools that are not need a lot more than new tests and new mandates. They need to build a new culture and a new ethic."
To build a new ethic at a school - whether it's a new charter school or an aging neighborhood school - one must begin somewhere. And Berger believes high-quality student work (as opposed to high-rising student scores) is the logical place to start. "Work of excellence is transformational," he writes. "Once a student sees that he or she is capable of excellence, that student is never quite the same. We can't first build the students' self-esteem and then focus on their work. It is through their own work that their self-esteem will grow. If schools assumed they were going to be assessed by the quality of student behavior and work evident in the hallways and classrooms - rather than on test scores - the enormous energy poured into test preparation would be directed instead toward improving student work, understanding, and behavior. And so instead of working to build clever test-takers, schools would feel compelled to spend time building thoughtful students and good citizens."
In sum, having a debate about whether data is good or bad misses the larger point; what matters is which data schools are using, and to what end. Even John Dewey, the founder of the Chicago Lab School and the man who is generally considered to be the father of progressive education, believed data was an essential tool adults should use to make informed decisions that would support the development of the children in their charge. But when Dewey spoke about "data," he understood it less as a proxy for a single skill, and more as a reflection of its original Latin meaning: "Something given."
So it's instructive that test scores have gone up in the District of Columbia - and it tells us little about the overall health of our city's schools. We talk about test scores because we're still not sure how to talk about - and measure - anything else. We pronounce victory because it's how we, as Rebecca Lebowitz put it, play the game of modern school reform. And until we develop the collective capacity Ron Berger speaks about - of elevating and evaluating high-quality, challenging, relevant, engaging, experiential student work - teachers like Rebecca Lebowitz will keep doubling down on reading and math, and the rest of us will keep wondering why the change we seek continues to elude our collective grasp.
Follow Sam on Twitter.