eduwonkette_header_515.jpg

Through the lens of social science, eduwonkette takes a serious, if sometimes irreverent, look at some of the most contentious education policy debates. (Find eduwonkette's complete archives prior to Jan. 6, 2008 here.)

Main

January 23, 2009

Wish #2: The End of Proficiency Only Accountability Systems

mylerdude_Pico_superdog_81688_o_thumb.jpg
The No Child Left Behind Act may represent the largest threshold-based government accountability system in the country. Schools are evaluated not by how much progress students make, but by their success in pushing students over the proficiency bar. By now, you’re probably familiar with the discontents of this system: states can game the system by setting that proficiency bar low; some schools have triaged their students, essentially reallocating resources to the kids most likely to become proficient in the very short-term; and policymakers can misleadingly make claims about declining racial achievement gaps based on proficiency rates, even as these gaps are unchanged or growing.

Proficiency-based accountability systems leave us in a terrible spot. On the one hand, we want to push kids and raise the bar for proficiency. But on the other hand, we want to make sure that the lowest performing students aren’t kicked to the curb. The higher you raise that bar, the more likely you are to have a significant proportion of students in any given school below proficiency. And those are precisely the conditions under which it makes sense for educators to allocate their time and attention strategically.

All of this, of course, should have been expected in a system focused on proficiency rather than growth. And contrary to popular belief, NCLB's growth model pilot doesn't allow true value-added models, but is instead based on a "projection model” which requires all students to reach a fixed proficiency target regardless of their initial achievement levels.

What am I suggesting? The new Department of Education would do well to let states experiment with a few different accountability systems: 1) dump proficiency altogether and identify schools as in need of improvement based on whether they are making less growth than expected. In other words, drop NCLB’s arbitrary targets and evaluate schools based on how they are doing compared to the schools we already have, or 2) keep proficiency around, but focus improvement efforts on schools that are both low-growth and low-proficiency – not relative to an arbitrary standard, but perhaps those in the bottom 15% of both categories. (That number should be set based on the number of schools to which states can provide targeted support.)

Either of those options would require significant new investments in better tests that are designed to measure growth, and careful attention to building a value-added model that is both valid and reliable. New Yorkers know well that a poorly designed value-added model at the center of the Progress Reports wreaks more havoc than no value-added model at all.

My recommendations will surely fail to impress the “no excuses” crowd (or more aptly, the “nuke the system” crowd—my belated entry into Elizabeth’s Green’s name-the-reformer contest) who see anything short of “100% proficiency” as not radical enough. “No excuses” is great rhetoric, but in the end it’s just that. So my wish #2 is that we move past this bravado in the next four years and develop a more reasonable and effective way of identifying and supporting low-performing schools in getting better.

PS: Check out Richard Rothstein's related op-ed, Getting Accountability Right, which speaks back to Wish #4 (integrating a broad set of goals of public schooling into accountability systems).

January 13, 2009

Lies, Damned Lies, and Bush Administration Accomplishments

largevalues.jpg

Yesterday, President George W. Bush, as part of his swan song, released a compendium entitled “Policies of the Bush Administration 2001-2009.” Not surprisingly, No Child Left Behind is the centerpiece of administration’s accomplishments in K-12 education, and the fact sheets detail the administration’s claims about progress.

Skoolboy’s favorite section is the one on Reading First. You remember Reading First, don't you? The program whose interim impact evaluation, sponsored by the Institute of Education Sciences, found no evidence of effects on reading comprehension test scores in grades 1 to 3? The text reads:

NCLB established the principle that Federal funding should be invested in programs that have rigorous research demonstrating their effectiveness. Reading First has provided more than $6 billion to fund scientifically-based instructional programs, valid and reliable diagnostic assessments, and professional development for teachers. State data shows that Reading First students from nearly every grade and subgroup have made impressive gains in reading proficiency. For first grade, 44 of 50 States reported increases in the percentage of students proficient in reading comprehension; for second grade, 39 of 52 States reported improvement; and for third grade, 27 of 35 States reported improvement.

“52 States”? Maybe we should have invested a bit more in Math First.

Okay, cheap shot, there are 54 state education agencies (SEA’s) that received funds under Reading First, including American Samoa, the Bureau of Indian Education, the District of Columbia, and the Virgin Islands.

But seriously: How did “44 of 50 States” report increases in the percentage of first grade students proficient in reading comprehension when, according to the American Institutes for Research compilation of Reading First Annual Performance Reports from 2003-2007, only 40 of the 54 SEA’s even reported reading comprehension proficiency for first grade students for two or more years?

December 4, 2008

Bubble, Pony, or Lone Star?: Portraits of the Secretary of Education

A month ago, Flypaper asked us to come up with appropriately silly backdrops for Margaret Spellings' portrait, which will be unveiled on December 18th.

All you lame duck Department of Education staffers - here's something to post on the water cooler tomorrow morning. Enjoy.

spellings-portrait-bubble.jpg

Maggie-Portrait.jpg

maggie-texas.jpg

September 30, 2008

No Child Left Behind: Looking Back, Looking Forward

soapy-maggie.gif
I'm knee deep in old NCLB documents, and ran across the Department of Education's NCLB song. NCLB represented not only a major shift in federal education policy, but an embrace of policy/PR boosterism that's enough to make all of us giggle (Remember Armstrong Williams?). Back from 2002, here are the NCLB lyrics:

We're here to thank our president,
For signing this great bill,
That's right! Yeah,
Research shows we know the way,
It's time we showed the will!
No matter how catchy the ditty, a song can't carry a fundamentally flawed law. That's where Tom Toch and Doug Harris come in. They've penned a thoughtful commentary in this week's Ed Week about the future of NCLB (Salvaging Accountability). It's an important one, because it recognizes that NCLB conflates the school's contribution to student learning with what students bring to the school to begin with. Essentially the argument is that:

1) "It’s critical in any accountability system that the metrics used to judge performance reflect accurately the contributions of those being judged."

2) "As a measure of school performance, however, [the NCLB] snapshot strategy is flawed. Because student populations vary greatly from school to school, and because family income, parental education, and a host of other non-school-related factors have a major influence on students’ learning, some schools have to improve student achievement a lot more than others to get their students up to state standards. The federal law is unforgiving of such schools. As a result, it gives an unfair advantage to schools with students from privileged backgrounds, and it fails to measure what matters most: how much students learn during the school year."

3) The Department of Education's Growth Model Pilot offers little improvement over the current rating system because it relies on a projection model - i.e. are students on target to be proficient in a 3 year window? - rather than a true growth model.

4) The new NCLB should dump the projection model, and focus its sanctions on schools that are both low in terms of their growth, and low in terms of their proficiency. And there's no reason to wait for reauthorization - this could all happen via regulations.

No commentary can do it all, so here are some issues to ponder for their next round. The goal of Toch and Harris' proposed system is to make measurement of school performance a more fair and effective enterprise. Why not take the leap and dump 100% proficiency altogether? That way, we could narrowly tailor our sanctions to schools that are low-performing compared to the schools we already have.

And if we're going to go full throttle on value-added models, we can't just punt the measurement problems. For example, Toch and Harris write, "value-added calculations have larger margins of error than NCLB’s proficiency ratings, but because they measure what’s most important in judging schools—student learning gains—their statistical shortcomings are more than worth tolerating."

A poorly designed growth model is no better than the poorly designed proficiency model that we have now, and no one knows this better than New Yorkers. Value-added systems that have literally no relationship between two years' value-added measures are still bad public policy. In short, beware the silver bullet.

September 23, 2008

What Does Educational Testing Really Tell Us? An Interview with Daniel Koretz

Koretz.jpg
Daniel Koretz, a professor who teaches educational measurement at the Harvard Graduate School of Education, generously agreed to field a few questions about educational testing. He is the author of Measuring Up: What Educational Testing Really Tells Us.

EW: What are the three most common misconceptions about educational testing that Measuring Up hopes to debunk?

DK: There are so many that it is hard to choose, but given the importance of NCLB and other test-based accountability systems, I'd choose these:
* That test scores alone are sufficient to evaluate a teacher, a school, or an educational program.

* That you can trust the often very large gains in scores we are seeing on tests used to hold students accountable.

* That alignment is a cure-all - that more alignment is always better, and that alignment is enough to take care of problems like inflated scores.
EW: I'm intrigued by your third point about alignment. For example, we often hear that because state testing systems are directed towards a particular set of standards, we should primarily be concerned with student outcomes on tests aligned with those standards. This is the common refrain about a "test worth teaching to." What's missing from this argument?

DK: Up to a point, alignment is a clearly good thing: we want clarity about goals, and we want both instruction and assessment to focus on the goals deemed most important.

However, there are two flies in the ointment. The first is that the achievement tests are concerned with, no matter how well aligned, are small samples from large domains of performance. That means that most of the domain, including much of the content and skills relevant to the standards, is necessarily omitted from the test. As I explain in Measuring Up, this is analogous to a political poll or any other survey, and it is not a big problem under low-stakes conditions. Under high-stakes conditions, however, there is a strong incentive to focus on the sampled content at the expense of the omitted material, which causes score inflation. Aligned tests are not exempt. Score inflation does not require that the test include poorly aligned content. Even if the test is right on target, inflation will occur if the accountability program leads people to deemphasize other material that is also important for the conclusions based on scores. And to make this concrete: some of the most serious examples of score inflation in the research literature were found in Kentucky's KIRIS system, which was a standards-based testing program.

The second problem is predictability. To prepare students in a way that inflates scores, you have to know something about the test that is coming this year, not just the ones you have seen in the past. The content, format, style, or scoring of the test has to be somewhat predictable. And, of course, it usually is, as anyone who has looked at tests and test preparation materials should know. Carried too far, alignment actually makes this problem worse, by focusing attention on the particular way that knowledge and skills are presented in a given set of standards. Think about 'power standards,' 'eligible standards,' and 'grade level expectations,' all of which can be labels for narrowing in on the specifics of how a set of skills appear on one state's particular assessment.

Why is this bad? Because many of those specifics are not relevant to the students' broader competence and long-term well-being. Scores on a test are a means to an end, not properly an end in themselves. Education should provide students knowledge and skills that they can use in later study and in the real world. Employers and university faculty will not do students the favor of recasting problems to align with the details of the state tests with which they are familiar. As Audrey Qualls said some years ago: real gains in achievement require that students can perform well when confronted with "unfamiliar particulars." Improving performance on the familiar but not the unfamiliar is score inflation.

EW: What are the implications of score inflation for both measuring and attenuating achievement gaps? Because schools serving disadvantaged students face more pressure to increase test scores via the mechanisms you describe, I worry that true achievement gaps may be unchanged - or even growing - while they appear to be closing based on high-stakes measures.

DK: I share your worry. I have long suspected that on average, inflation will be more severe in low-achieving schools, including those serving disadvantaged students. In most systems, including NCLB, these schools have to make the most rapid gains, but they also face unusually serious barriers to doing so. And in some cases, the size of the gains they are required to make exceed by quite a margin what we know how to produce by legitimate means. This will increase the incentive to take short cuts, including those that will inflate scores. This would be ironic, given that one of the primary rationales for NCLB is to improve equity. Unfortunately, while we have a lot of anecdotal evidence suggesting that this is the case, we have very few serious empirical studies of this. We do have some, such as the RAND study that showed convincingly that the "Texas miracle" in the early 1990s, supposedly including a rapid narrowing of the achievement gap, was largely an illusion. Two of my students are currently working with me on a study of this in one large district, but we are months away from releasing a reviewed paper, and it is only one district.

I have argued for years that one of the most glaring faults of our current educational accountability systems is that we do not sufficiently evaluate their effects, instead trusting - evidence to the contrary - that any increase in scores is enough to let us declare success. We should be doing more evaluation not only because it is needed for the improvement of policy, but also because we have an ethical obligation to the children upon whom we are experimenting. Nowhere is this failure more important than in the case of disadvantaged students, who most need the help of education reform.

Inflation is not the only reason why we are not getting a clear picture of changes in the achievement gap. The other is our insistence on standards-based reporting. As I explain in Measuring Up, relying so much on this form of reporting has been a serious mistake for a number of reasons. One reason is that if one wants to compare change in two groups that start out at different levels - poor and wealthy kids, African American and white kids, whatever - changes in the percents above a standard will always give you the wrong answer. This particular statistic confuses the amount of progress a group makes with the proportion of the group clustered around that particular standard, and the latter has to be different for high- and low-scoring groups. I and others have shown that this distortion is a mathematical certainty, but perhaps most telling is a paper by Bob Linn that shows that if you ask whether the achievement gap has been closing, NAEP will give you different answers - very different answers - depending on whether you use changes in scale scores, changes in percent above Basic, or changes in percent above Proficient. This is not because the relative progress has been different at different levels of performance; it is simply an artifact of using percents above standards. This is only one of many problems with standards-based reporting, but in my opinion, it is by itself sufficient reason to return to other forms of reporting.

September 17, 2008

Between a Political Rock and a Statistical Hard Place

Some days, skoolboy feels bad for the hard-working folks in the New York City Department of Education. They’re caught between a political rock and a statistical hard place. The political rock is the New York State accountability system, which complies with No Child Left Behind’s requirements to test students annually in grades 3-8 in Mathematics and English Language Arts, and to classify students, based on their test scores, as either Not Meeting Learning Standards (Level I), Partially Meeting Learning Standards (Level II), Meeting Learning Standards (Level III), or Meeting Learning Standards with Distinction (Level IV), and then aggregate the performance of students, and subgroups of students, to assess the school’s progress toward the goal of 100% proficiency for all students by the year 2014. The mechanism for this is a series of grade-specific exams, with a broad (but arbitrary, as Dan Koretz explains in Measuring Up) standard-setting process that define the scores on the exam that correspond to the four proficiency levels. Whatever a student’s scale score on the exam, he or she is classified into a particular proficiency level.

The statistical hard place is that the proficiency levels are only part of the story. The NYC DOE has found that the scale scores matter, such that a student whose scale score is halfway between the cutoffs for Level II and Level III, and therefore whose proficiency level is Level II, has a higher probability of graduating from high school on time than a student whose scale score is right at the cutoff for Level II. The scale scores have predictive validity—that is, they predict educational outcomes that we think of as important—but they don’t have the political currency of the proficiency levels specified by the state and the federal government.

There’s no evidence, to skoolboy’s knowledge, that achieving a proficiency level on NCLB-style exams has any predictive validity over and above the scale scores on which they are based. (Another regression discontinuity design study waiting to happen.) But I’ll wager that they don’t.

Whether or not the state/NCLB proficiency levels matter, the NYC DOE is stuck. They have to pay homage to the state standards, even though their internal evidence shows that partial progress—“learning quite a bit,” in skoolboy’s terms—really does matter for students’ futures, and therefore is something that schools should be held accountable for.

And I don’t disagree. I would be comfortable (though not ecstatic) with school progress reports that used changes in scale scores to quantify how much students had learned from one year to the next, under two conditions: (a) if the exams were vertically linked, and (b) if the uncertainty in the estimates of school-level effects on the average change were taken into account. Neither of these conditions is met in the current New York City School Progress Reports.

Navigating the political rock and the statistical hard place is definitely a challenge, both rhetorically and in the construction of the School Progress Reports. Rhetorically, the DOE is obliged to argue that a student who is Level III in fourth grade and Level II in fifth grade has lost ground—that student has fallen off of the sharp Level III cliff—because the state and federal accountability metrics treat this as a sharp discontinuity. But as a practical matter, the student may not have fallen off a cliff; rather, she may be just a little bit lower on a gradual hill in fifth grade than we’d like, but still higher on the hill than she was in fourth grade--and the DOE’s internal analyses document that anyone who is higher on the hill is better off than someone lower.

What’s the DOE to do? Well, it could continue to escalate the rhetoric directed toward its critics. (I note with alarm that the DOE went from calling me by my blogging name “skoolboy” on Monday to calling me “Professor Pallas of Teachers College” on Wednesday—whose proclivity to giving A’s to all of his students will come as a surprise to many of them—what’s next? Examining my teeth?) Or it could speak honestly and openly about the challenge of incorporating political and technical realities into the School Progress Reports. I think readers know which path skoolboy recommends.

September 12, 2008

Cool People You Should Know: Doug Downey

Doug-Downey.jpg
To many observers of public education, there is no doubt about which schools are failing - it's the schools with low rates of students passing state tests, stupid!

Of course, this assumes that students' achievement is a direct measure of school quality. "Yet we know that this assumption is wrong....It follows that a valid system of school evaluation must separate school effects from nonschool effects on children's achievement and learning" writes Doug Downey, a cool Ohio State sociologist of education you should know, in his recent paper (in collaboration with Paul von Hippel and Melanie Hughes), "Are 'Failing' Schools Really Failing?"

Analyzing data from the Early Childhood Longitudinal Study - Kindergarten Cohort, a national sample of 21,000 kindergarteners that were then followed through 5th grade, Downey and colleagues thus set out to isolate the effects of schools on student learning. The ECLS data are uniquely suited for this task because the study evaluated students in the fall and spring of kindergarten, and again in the fall and spring of first grade. It turns out that summers - a time when students are only affected by non-school influences - are the key to teasing apart school and nonschool factors.

Downey and colleagues look at schools' effectiveness in four different ways. First, they examine NCLB's method - overall test score levels. They then turn to 12-month learning rates; think growth models, which measure test score growth, for example, between a test given in April 2007 and a test given in April 2008. They contrast those rates with 9-month learning rates; imagine a test given in September, and then again in May. Finally, they introduce a measure called impact, which is the difference between the school year and summer learning rate.

"Impact" is attractive because it doesn't require us to measure and statistically control for all of the different aspects of children's nonschool environments that may affect school success, as do cardiac surgery report cards. It captures what we need to know about students' out-of-school environments without bogging us down in the methodological and political problems associated with introducing these controls. And it helps us adjust for "soft" factors like innate student motivation, for which it is difficult to measure and control. Moreover, it holds schools harmless for what happens to their students over the summer, which currently serves as a confounding factor in growth models.

What percent performing in the bottom 20% of overall achievement are actually in the bottom 20% for measures of impact and learning? Less than half! High-achieving schools are concentrated in more affluent communities, but "high impact" schools exist across the socioeconomic spectrum. And the opposite is true. There are plenty of school with good test scores that are skating by because simply because they had advantaged kids to begin with.

What does this all mean for NCLB? Downey and colleagues put it like this:
Our results raise serious concerns about the current methods that are used to hold schools accountable for their students' achievement levels. Because achievement-based evaluation is biased against schools that serve the disadvantaged, evaluating schools on the basis of achievement may actually undermine the NCLB goal of reducing racial/ethnic and socioeconomic gaps in performance. If schools that serve the disadvantaged are evaluated on a biased scale, their teachers and administrators may respond like workers in other industries when they are evaluated unfairly - with frustration, reduced, effort, and attrition. Under a fair system, a school's chances of receiving a high mark should not depend on the kinds of students the school happens to serve.
Crystal clear, creative thinking is the distinguishing feature of Downey's work - see, for example, his paper on school effects on child obesity, or his paper asking if schools are "the great equalizer."

Wonks can rest a little easier tonight with the knowledge that Downey's now turned his attention to NCLB.

Schools Restructuring under NCLB: Blow ‘em up Good?

95129c.jpg

This morning, the Center for Education Policy in Washington, DC is issuing the latest in a series of state-level reports on the fate of schools restructuring under NCLB policy. Today’s report, authored by Brenda Neuman-Sheldon (a one-time student of skoolboy’s, but I hear that she’s back on solid food), examines restructuring schools in Maryland. In 2007-08, Maryland had 38 schools in restructuring planning, a huge increase over the four schools the preceding year, and 64 schools in restructuring implementation, a 7% decline from the preceding school year. The restructuring schools are concentrated in a small number of Maryland’s 24 school districts, with 61% of the restructuring schools in Baltimore City, and an additional 30% in Prince George’s County, which adjoins Washington, DC. This concentration has stretched the capacity of the state and these districts to support restructuring planning and implementation. Prince George’s County, for example, soared from one school in restructuring planning in 2006-07 to 21 in 2007-08.

Neuman-Sheldon identifies a major shift in the form that restructuring schools in Maryland is taking. Whereas 58% of the schools in restructuring implementation in 2007-08 relied primarily on the appointment of a school “turnaround specialist” as the engine of restructuring (already a decline from the 73% using this option in 2005-06), all of the schools in restructuring planning that had submitted a plan at the time the report was written were proposing some form of “zero-based staffing”—i.e., replacing most or all of the staff in the school or asking all staff to reapply for their positions. It’s the neutron bomb theory of school reform!

But is it a good theory? That remains to be seen. What mechanism will bring highly-qualified teachers to these failing schools? Where will the tenured teachers who leave the schools go? In schools that replace only some of their staff, how will decisions about who stays and who leaves be made?

Beyond these logistical questions, though, lies another fundamental challenge: will changing the staffing—including the principals, who, Neuman-Sheldon reports, are often surprised to learn that when they select zero-based staffing as an option, they’re placing their own jobs on the line—fundamentally alter the context for teaching and learning in the school, when other powerful forces shaping teaching and learning aren’t changing at all?

September 9, 2008

Lessons for No Child Left Behind from "No Cardiac Surgery Patient Left Behind"

heart_art.jpg
New AYP numbers are out, folks. In California, only 48% of schools made AYP, and only 34% of middle schools did so. In Missouri, only about 40% of schools made AYP. Pick almost any state, and you'll see that there are soaring numbers of schools designated as "in need of improvement." With numbers like these, it's worth considering whether NCLB's measurement apparatus is accurately identifying "failing schools."

One way to get leverage on this question is to consider how other fields approach the issue of accountability. Doctor and hospital accountability for cardiac surgery - also the topic of a NYT commentary today - is instructive in this regard. Borrowing heavily from previous work, let me outline how state governments have approached doctor and hospital accountability in medicine. In subsequent posts this week, I'll write about the outcomes of medical accountability systems, as well as some of their unintended consequences.

Medicine makes use of what is known as “risk adjustment” to evaluate hospitals’ performance. Since the early 1990s, states have rated hospitals performing cardiac surgery in annual report cards. The idea is essentially the same as using test scores to evaluate schools’ performance. But rather than reporting hospitals’ raw mortality rates, states “risk adjust” these numbers to take patient severity into account. The idea is that hospitals caring for sicker patients should not be penalized because their patients were sicker to begin with.

In practice, what risk adjustment means is that mortality is predicted as a function of dozens of patient characteristics. These include a laundry list of medical conditions out of the hospital’s control that could affect a patient’s outcomes: the patient’s other health conditions, demographic factors, lifestyle choices (such as smoking), and disease severity. This prediction equation yields an “expected mortality rate”: the mortality rate that would be expected given the mix of patients treated at the hospital.

While the statistical methods vary from state to state, the crux of risk adjustment is a comparison of expected and observed mortality rates. In hospitals where the observed mortality rate exceeds the expected rate, patients fared worse than they should have. These “adjusted mortality rates” are then used to make apples-to-apples comparisons of hospital performance.

Accountability systems in medicine go even further to reduce the chance that a good hospital is unfairly labeled. Hospitals vary widely in size, for example, and in small hospitals a few aberrant cases can significantly distort the mortality rate. So, in addition to the adjusted mortality rate, confidence intervals are reported to illustrate the uncertainty that stems from these differences in size. Only when these confidence intervals are taken into account are performance comparisons made between hospitals.

Contrast this approach with that used by the New York City Department of Education's progress reports, where "point estimates" are used to array schools on an A-F continuum with no regard for measurement error. Readers know well that your friendly neighborhood "statistical nut" has no beef with the use of sophisticated statistical methods to compare schools. But I would just ask that we have some humility about what these methods can and cannot do. (Sidenote: The only winners when we ignore these issues are educational researchers, who can then write regression discontinuity papers using these data. Thanks for the publications, Joel and Mike!)

And it's quite eye-opening to compare the language used by state and federal governments used to explain their accountability systems with the rhetoric we hear in education. Consider this statement from the Department of Health and Human Services to explain the rationale behind risk adjustment:
The characteristics that Medicare patients bring with them when they arrive at a hospital with a heart attack or heart failure are not under the control of the hospital. However, some patient characteristics may make death more likely (increase the ‘risk’ of death), no matter where the patient is treated or how good the care is. … Therefore, when mortality rates are calculated for each hospital for a 12-month period, they are adjusted based on the unique mix of patients that hospital treated.
If you replace the word "hospital" with "school" above, you can imagine the reception this statement would receive in the educational accountability debate. Soft bigotry of low expectations, and you probably kill baby seals for fun, too.

Readers, why is the educational debate so different? Full disclosure: I will shamelessly appropriate your thoughts in my dissertation, which attempts to answer this question, and also establish the effects of each of these systems on race, gender, and socioeconomic inequalities in educational and health outcomes.

August 29, 2008

Why the Achievement Gap Matters

skoolboy has explained, much more eloquently than I can, why achievement gaps matter even if the scores of white, African-American, Hispanic, and Asian students are all rising equally:

There are a great many social institutions that sort and rank individuals on the basis of test scores and the competencies they represent. Most of these institutions don’t have an unlimited number of positions or slots—rather, individuals are competing against one another for access. When these institutions rely on test scores, and there is an achievement gap among racial/ethnic groups on these tests, the lower-scoring group will be underrepresented. Raising everybody’s scores doesn’t change the rankings of individuals, which is the only way to change the representation of minority groups among those who are selected. Only by reducing the achievement gap can we increase the chances that members of racial/ethnic minority groups can get ahead in society via selective social institutions.

In K-12 and postsecondary education, examples of these selective processes abound. Kindergarteners are often placed in reading groups when they first arrive at school, students are selected to be part of gifted and talented or magnet programs, and even less selective colleges have limited capacity. The same processes apply in the workplace. Whether you are looking for a job at Walmart or Goldman-Sachs, there are usually more applicants than positions.

Taking inspiration from the conclusion of skoolboy’s earlier analysis, I performed a basic simulation to demonstrate how black and Hispanic students fare in selection processes given existing NYC achievement gaps. I randomly generated 100,000 student scores, which roughly represents the size of a NYC cohort. In NYC schools, approximately 15% of students are Asian, 15% are white, 39% are Hispanic, and 31% are African-American. I generated these data to mirror the achievement gaps that exist in 8th grade math NAEP scores in NYC: the average African-American and Hispanic student scores .83 and .72 standard deviations below the average white score, respectively, and the average Asian student scores .28 standard deviations above it. I then asked what the racial composition of selected students would look like if we selected 5% of students, 10% of students, and so on.

Let’s start with a relatively selective process. Imagine that we choose the top 10% of students in NYC for a gifted and talented program based solely on their test scores. The graph below shows the difference in the percentage of students that would be selected in each racial group compared to their representation in the NYC student population. Though NYC schools are only 30 percent white and Asian, 66 percent of selected students would be white or Asian under this scenario. Only 17% would be Hispanic and 10% would be black.

race_simulation2.jpg

Because of the achievement gap, black and Hispanic children are left behind in many selection processes that potentially - and significantly - affect their life chances. When we make these processes even more selective - think, for example, about admission to schools like Stuyvesant or Bronx Science - white and Asian students will be even more overrepresented, as the table below demonstrates.

And this is not only relevant to very selective institutions. Below, I move the cutoff point for our selective program progressively up. Even if 50% of students would be given a slot, white and Asian students still represent 44% of selected students, though they are only 30% of the population.

race_simulation.jpg

In short, the achievement gap has real implications for black and Hispanic kids as they move through their educational careers: they are more likely to be placed in low ability groups in integrated schools, less likely to be selected for gifted and talented programs, and less likely to attend the college of their choice. And, I stress again, this applies to both very selective and relatively non-selective institutions. Gains in proficiency that do not also close achievement gaps on continuous measures do little to help black and Hispanic kids get ahead.

See previous takes from Mike Petrilli and Jay Mathews here.

A few notes: To keep this simple, I assumed that the scores of all groups follow a normal distribution with equal variances. Since achievement gaps are often larger at the top of the distribution than in the middle and bottom, white and Asian students have an even larger advantage than they appear to based on this simulation.

August 27, 2008

Guest Blogger Bruce Fuller: The Benefits and Dilemmas of Centralized Accountability

Bruce Fuller, sociologist and professsor of education and public policy at the University of California - Berkeley, has co-edited a new book, Strong States, Weak Schools: The Benefits and Dilemmas of Centralized Accountability. Below, he provides a Q&A on the book’s findings.

Q. Media reports summed-up your findings by saying that teacher responses to the No Child Left Behind Act and state accountability efforts have been “haphazard”, and teachers are feeling demoralized. Didn’t we know this already?

A. We do know that teacher associations are eager to revamp No Child following the November elections, and even recraft Washington’s role in education. And the Bush Administration, business groups, and some civil rights advocates claim that No Child is working.

The seven research teams that came together to produce Strong States, Weak Schools set the stage by first showing that student achievement has inched up at a glacial pace since No Child was enacted in 2002, even slowing progress observed in the 1990s, as state-led accountability and school finance reforms were successfully pursued. Progress is more discernible in certain states.

But few researchers have hung out in schools, interviewed teachers and principals, and asked how front-line educators interpret new accountability regimes. This includes how teachers try to address state curricular standards, how they might use more textured data on what students are learning (or not), and the extent to which principals (and their district superintendents) motivate their teachers to focus on improving their pedagogies.

Earlier ethnographic studies tended to be conducted by scholars with a priori agendas, hoping to detail how teachers feel overly controlled by accountability measures, or how teachers held deep affection for them. Instead, our seven contributing teams probed different parts of the implementation elephant. Do front-line educators in elementary versus secondary schools hold different viewpoints? Do exit exams prompt different responses inside our high schools? Do the rules and tools of accountability programs operate differently to boost average student achievement, in contrast to factors that narrow racial gaps inside schools?

Q. So, does teacher resistance to top-down accountability programs help to explain the tepid gains in student test scores?

All seven teams found that teachers and principals have redoubled their efforts to assist low-performing students, in part because of accountability programs advanced from either state capitals or Washington. The spotlight placed on how student subgroups are doing, the availability of richer data on individual student competencies, and the threat of sanctions are motivating teachers to buckle down and collaborate to devise new pedagogical approaches and build stronger relationships with students.

Yet two factors constrain whether teacher responses are coordinated and effective over time. First, the RAND study, led by Laura Hamilton, found that the attention that teachers pay to curricular standards, whether they study student data, and the value they place on accountability pressures vary enormously within schools. The good news is that teachers in poor communities are not more or less responsive to accountability rules and tools, compared to those in middle-class neighborhoods. The bad news is that teacher responses are highly variable and eclectic within schools. This suggests that relatively few principals motivate their staff to pull in the same direction and employ new training and data tools that accountability programs often support.

Second, the uneven leadership of district superintendents and the stickiness of school institutions – especially high schools – tend to disempower principals. Tom Luschei and Gayle Christensen probed deep into these dynamics, hanging out over time in a few districts. They found that district leaders often respond to accountability demands in ritualized fashion, failing to work intensively with their principals to mobilize rules and tools. Two studies of high school responses, appearing in Strong States, Weak Schools, detail how growth targets, program improvement triggers, and exit exams turn teacher attention to low-achieving adolescents. But these individual-level responses rarely lead to innovative structural change in balkanized high schools.

Q. What is working to motivate teachers and raise student achievement, then?

Two studies in the book offer insights here: Melissa Henne and Heeju Jang examined what worked in 111 California elementary schools as they variably succeeded in closing achievement gaps between Anglo and Latino students. They show that disparities narrow when teachers report that their principal motivates staff to focus on raising achievement and delivers tools that make everyone feel efficacious. This is not simply a mechanical process: more equitable schools have teachers who report strong, respectful relationships with their principal and colleagues.

And Soung Bae went deeper into a California school district that had narrowed ethnic achievement gaps over time. She discovered district leaders who banked heavily on inservice teacher training – hammering on state curricular standards and inventive pedagogies. Then, district staff followed teachers back into their classrooms to provide ample clinical follow-up.

Q. So, what do these implementation studies say to state and federal policy makers who will soon be debating changes in accountability programs?

Pay attention to what motivates teachers, who, like other professionals, seem eager to pursue shared goals if they are trusted to improve their craft. The link between district staff and principals appears to be key. If district leaders are simply messengers of government – with little agility in adapting to rules and mobilizing tools – then their principals will have less capacity to motivate their teachers.

Teachers do report enormous dissatisfaction, at least in California, Georgia, and Pennsylvania, in being forced to ignore certain subjects and topics if they do not appear on state tests. Somehow, policy makers must face the sharp-edged dilemma of simplifying tests and the curriculum, while recognizing that tying the hands of teachers may erode everyone’s motivation.

All seven empirical studies can be viewed here.

August 22, 2008

Yes, Beltway Wonks, Sampling Error Does Matter

beltway.jpg
It's in vogue these days to declare the building blocks of statistical inference irrelevant to assessing the performance of schools. For example, Joel Klein recently argued that statistical significance is "a game." Yesterday, Kevin Carey argued that accounting for sampling error - the idea that there is statistical uncertainty in measures from a sample rather than the full population - in the context of NCLB is "silly" because "unlike opinion polls, NCLB doesn't test a sample of students. It tests all students. The only way states can even justify using [margin of errors] in the first place is with the strange assertion that the entire population of a school is a sample, of some larger universe of imaginary children who could have taken the test, theoretically."

Dan Koretz, Harvard psychologist and author of Measuring Up: What Educational Testing Really Tells Us, provides a very clear explanation of why Carey is wrong:
A few readers might be wondering: if all students in a school (or at least nearly all) are being tested, where does sampling error come into play? After all, in the case of polls, sampling error arises because one has in hand the responses of only a small percentage of the people who will actually vote. This is not the case with most testing programs, which ideally test almost all students in a grade.

This question was a matter of debate among members of the profession only a few years ago, but it is now generally agreed that sampling error is indeed a problem even if every student is tested. The reason is the nature of the inference based on scores. If the inference pertaining to each school...were about the particular students in that school at that time, sampling error would not be an issue, because almost all of them were tested. That is, sampling would not be a concern if people were using scores to reach conclusions such as "the fourth-graders who happened to be in this school in 2000 scored higher than the particular group of students who happened to be enrolled in 1999." In practice, however, users of scores rarely care about this. Rather, they are interested in conclusions about the performance of schools. For the inferences, each successive cohort of students enrolling in the school is just another small sample of the students who might possibly enroll, just as the people interviewed for one poll are a small sample of those who might have been. (p. 170)
Addressing complexities like sampling error is not just exploiting a "loophole" to avoid NCLB sanctions. Rather, it's an assurance that when we label a school as "in need of improvement," we're not wrongly assigning that label. It strikes me as deeply ironic that even as NCLB endorses "scientifically-based" research, many wonks continue to turn their noses up at the central conventions of the science of statistics.

June 27, 2008

The Unintended Consequences of Focusing on Proficiency

FantasticCommenter2008_150px.jpg
I'm totally in awe of the regular commenters here - for me, they are the best part of this site. I had to share this comment by Rachel, who had this to say about the post below:

One of my worries about the emphasis on "proficiency" -- and the lack on emphasis on anything above proficiency -- is the unintended consequence of creating a two-tier, mostly segregated, educational system. Public school teach poor kids basic skills, and parents who want more than basic skills try to figure out how to get their kids into private schools -- or, if they can, move to affluent suburbs.

Now, public schools that teach poor kids basic skills are better than public schools that don't teach poor kids basic skills. But in my district -- which has an interesting demographic mix -- there's a clear tension between the "let's make sure everyone's proficient before we think about anything else" point of view, and the "we need to make sure each kid makes a year's progress every year" point of view.

And it's pretty clear that if parents get the idea that no one at a school is interested in much besides proficiency, you start losing the proficient kids to private schools and charter schools -- which then exacerbates the social inequality that "closing the achievement gap" is supposed to end.

Hat tip to Scott McLeod for making the commenter graphic.

June 26, 2008

New York's Lake Woebegon Effect

woebegon.jpg
Sol Stern nails it in his article on test score inflation:

The premise of NCLB, as of so many current education reform efforts, is that schools must serve the interests of children, not the interests of the adults who work in the system. But in a classic case of unintended consequences, the widespread test inflation produced by NCLB is serving only the interests of the adults. New York education officials like Mills, New York City mayor Michael Bloomberg, and his schools chancellor, Joel Klein—along with teachers’ union leaders like Randi Weingarten—advance their varied agendas in the glow of inflated test scores. But the children are the big losers. Sometime in the next decade, the white children of Lake George and the black children of New York City will come face to face with reality. On a high school math Regents test—or on an SAT test, or in a college remediation course—they will discover that they are not quite as proficient as New York State once assured them.

June 17, 2008

High Achieving Students in the Era of No Child Left Behind

DHARTbanner.jpg
Fordham's new study on how high achievers have fared under No Child Left Behind is out. (See NYT coverage here.) Here's the main story:

* While the nation's lowest-achieving youngsters made rapid gains [on NAEP] from 2000 to 2007, the performance of top students was languid. Children at the 10th percentile of achievement (the bottom 10 percent of students) have shown solid progress in fourth grade reading and math and in eighth grade math since 2000, but those at the 90th percentile have made minimal gains.

* This pattern - big gains for low achievers and lesser ones for high achievers - is associated with the introduction of accountability systems in general, not just NCLB. An analysis of state data from the 1990s shows that states that adopted testing and accountability regimes before NCLB saw similar patterns before NCLB: stronger progress for low achievers than for high achievers.

All of this, of course, should have been expected in a system focused on proficiency rather than growth. And contrary to popular belief, NCLB's growth model pilot doesn't allow true value-added models, but is instead based on a "projection model." Michael Weiss has a great commentary in Ed Week this week on this issue:

In practice, projection models are extremely similar to NCLB’s original status measure. In schools where students enter with high initial achievement levels, the learning gains required to get students on track to become proficient are quite small, while in schools where students enter with low initial achievement levels, the required learning gains to get students on track to become proficient may be unrealistically large. Consequently, under the federal growth-model program, schools are still held to different standards­—some must produce large gains while others need only to produce small gains. Both status and projection models require all students to reach a fixed proficiency target regardless of their initial achievement levels. It is because No Child Left Behind’s status model and the growth-model pilot program’s projection models are so similar that very few new schools are making AYP because of “growth” alone.

If Tom Toch's post over at The Quick and the Ed is any indication, it looks like many factors are coming together to shift the winds on NCLB - both from proficiency to value-added models, and from ignoring the role of out-of-school factors to acknowledging that it is unfair to hold schools solely accountable for them. Said Toch:

What we need to do is find ways to give schools credit for successfully improving the educational performance of the kids they have, by using so-called value-added measures of student performance, and by capturing more than just how well schools teach basic reading and math skills....Yes, we need to hold schools and teachers accountable for their performance....But no, we shouldn’t pretend that poverty has no impact on students. No accountability system can work unless it is credible, and NCLB, as currently crafted, is not.

June 13, 2008

Still a Bobo in Paradise

bobs.jpg
Meet the Status Quo. It includes the Chairman of the Board of the NAACP (Julian Bond), the former president of the Urban League (Hugh Price), a Nobel prize winning economist and expert on early childhood interventions (Jim Heckman), some of the country's most distinguished experts on urban poverty (William Julius Wilson, Christopher Jencks) and educational accountability (Helen Ladd), a well-known professor of pediatrics at Harvard Medical School (T. Berry Brazelton), two former Surgeon Generals (Jocelyn Elders and Richard Carmona), Ernie Cortes (of the Industrial Areas Foundation), school practitioners like Debbie Meier, Ted Sizer, and Jim Comer who have spent their careers challenging the status quo, and too many other people to list here who have dedicated their lives to improving the lives of poor and minority children. And yes, David, they accept your apology.

I really do hate my permanent residence in the reality-based community, but at least half of the achievement gap that exists between black and white students - the fact that the average black 12th grader performs at about the 16th percentile of the white distribution (a gap of about 1 standard deviation)- cannot possibly be attributed to the K-12 schools. Why? The average black student enters kindergarten testing at about the 25 percentile of the white distribution in math (a gap of .663 standard deviations), and the 35th percentile of the white distribution in reading (a gap of .4 standard deviations). "Squeezing teachers," "dealing with teachers who don't teach," or "holding teachers feet to the fire," I'm sorry to say, are not going to address that gap. And between kindergarten and 12th grade, kids are only in school 22% of their waking hours. It turns out that poor students' slower rate of learning in the summer plays a significant role in increasing existing gaps.

Of course schools play a role in exacerbating these problems - no one said they don't - in particular because of the unequal distribution of teachers across schools. We can all acknowledge that this distribution of teachers is a partial legacy of contract rules - still in place in many districts - that gave preference to senior teachers. Both coalitions are concerned with attracting and retaining good teachers in hard to staff schools, and perhaps they can find some common ground there.

But it would be great if we grounded this discussion in some basic facts - facts that might include the current distribution of school effects, and how much of the achievement gap we could expect to see narrowed if we move a student from a below to an above average school (critical for the school choice question); how modest the effects of accountability have historically been on gaps (very little action at all on the black-white gap - Texas also comes to mind), and how more "vigorous accountability" will differ in ways that produce different outcomes; how much of the gap is a function of school-year versus summer learning; and how much of the gap is there when kids start school.

June 11, 2008

Why We Should Care About Test Score Inflation

nailed_first_inflation_s.jpg
Kevin Carey’s dismissal of “test score inflation” provides an ideal opportunity to talk about the book I finished this weekend, Measuring Up: What Educational Testing Really Tells Us, by Dan Koretz, a psychometrician at the Harvard Grad School of Education – hardly an opponent of testing.

Koretz calls “test score inflation,” in which gains on tests used for accountability dramatically outpace gains on low stakes tests, the “dirty secret of high-stakes testing.” If you compare NAEP trends and state score trends, you’ll see that state scores have increased significantly more than NAEP scores since NCLB was adopted.

To understand why test score inflation is a serious problem, you have to understand the sampling principle of testing. Koretz provides the following example: Suppose we want to evaluate students’ vocabulary. A typical high school student knows 11,000 root words, but a test can only include a sample of these words – maybe 40. If we design our test well, we can still learn something about the breadth of each student’s vocabulary. But we don’t really care if the student knows the 40 words on the test; rather, we care about the larger domain from which these words are sampled.

Now imagine that for weeks before our test, I drilled students incessantly on those 40 words. Voila! They perform exceptionally on the test. Yes, their vocabularies have increased by 40 words. Maybe these are 40 really important words - the so-called "test worth teaching to." But proficiency in the domain that my test is intended to measure has not expanded by the same amount. I’ve seen this over and over again; administrators and teachers figure out which concepts are consistently on the test, and which aren’t, and they alter their instruction accordingly. The trouble is that if we administer a slightly different test, drawing on a broader range of concepts from the domain we care about, kids haven't mastered them.

Carey explains that this is just a standards mismatch problem - i.e. state test standards are not the same as those used on national tests. Koretz takes Carey’s critique head on in this passage:

"Alignment is a lynchpin of policy in this era of standards-based testing. Tests should be aligned with standards, and instruction should be aligned with both....And alignment is seen by many as insurance against score inflation. For example, a principal of a local school that is well known for the high scores achieved by its largely poor and minority students gave a presentation to the Harvard Graduate School of Education a few years ago. At one point, she angrily denounced critics who worry about 'teaching to the test.' We had no reason to be concerned about teaching to the test in her school, she asserted, because the state’s test measures important knowledge and skills. Therefore, if her faculty teaches to the test, students will learn important things.

This is nonsense, and I have a hunch about what I would find if I were allowed to administer an alternative test to her students. Alignment is just reallocation by another name. Certainly it is better to focus instruction on material that someone deems valuable, rather than frittering time away on unimportant things. But that is not enough. Whether alignment inflates scores depends also on the importance of the material that is deemphasized. And research has shown that standards-based tests are not immune to this problem. These tests too are limited samples from larger domains, and therefore focusing too narrowly on the content of the specific test can inflate scores." (p. 253-254)

We only care about test scores if they translate into general improvements in children’s academic skills that generate meaningful improvements in their life chances. If these gains don’t translate to tests that measure similar skills – basic reading and math competencies - what are the chances that they are going to help them succeed in the workplace or in college? And that is a very good reason to worry about test score inflation.

Spoiler alert: NY state test scores are out next week, if not sooner. What should we make of NYC's flat NAEP scores alongside state test improvements so large they're unbelievable? Kind of makes you wonder.

June 10, 2008

Big Props for a "Broader, Bolder Approach to Education"

The potential effectiveness of NCLB has been seriously undermined, however, by its acceptance of the popular assumptions that bad schools are the major reason for low achievement, and that an academic program revolving around standards, testing, teacher training, and accountability can, in and of itself, offset the full impact of low socioeconomic status on achievement.
-The Broader, Bolder Approach to Education Task Force Report

This morning, more than 60 heavy hitters kicked off a campaign calling for a "broader, bolder approach to education policy." (You may have already seen the print ads in the Washington Post and NY Times.) Co-chaired by Sunny Ladd, a Duke University economist, Pedro Noguera, a sociologist at NYU, and Tom Payzant, the former Boston schools superintendent and U.S. assistant secretary of education, the task force calls for a more expansive view of education policy that views schools as one component of a comprehensive youth development strategy. Here are their four recommendations:

1. Continued school improvement efforts. To close achievement gaps, we need smaller classes in early grades for disadvantaged children; to attract high-quality teachers in hard-to-staff schools; improve teacher and school leadership training; make college preparatory curriculum accessible to all; and pay special attention to recent immigrants.

2. Developmentally appropriate and high-quality early childhood, pre-school and kindergarten care and education. These programs must not only help low-income children students academically, but provide support in developing appropriate social, economic and behavioral skills.

3. Routine pediatric, dental, hearing and vision care for all infants, toddlers and schoolchildren. In particular, full-service school clinics can fill the health gaps created by the absence of primary care physicians in low-income areas, and poor parents’ inability to miss work for children’s routine health services.

4. Improving the quality of students’ out-of-school time. Low-income students learn rapidly in school, but often lose ground after school and during summers. Policymakers should increase investments in areas such as longer school days, after-school and summer programs, and school-to-work programs with demonstrated track records.

eduwonk suggests that the acknowledgment that schools can't do it alone is just another tired opinion, "The explicit rejection that perhaps schools are even a substantial part of the educational problem is unsettling." Recall that many of these signers have spent years studying school effects - the effect of going to one school versus another, all else equal - on test scores. This is a conclusion derived from years of confronting that distribution of school effects over and over again.

Particularly notable in this regard is the leadership of Sunny Ladd, who spent the last 10 years investigating the effects of accountability on North Carolina schools. She's an economist - hardly someone against the use of incentives - but she's seen the meager effects of accountability alone on the reduction of achievement gaps. And many early supporters of NCLB-style arrangements are represented here as well - Susan Neuman, Bob Schwartz (the President of Achieve from 1997-02), and Milt Goldberg (of the A Nation at Risk commission).

No one is saying that schools aren't important. No one is saying that we should abandon efforts to improve schools. And no one is saying that we should "let schools off the hook." What they are saying is that the effects of schools are not large enough to wipe out the gaps that are created by students' out-of-school environments.

You can - and I hope you will - become a co-signer on the statement here.

June 9, 2008

NCLB This Week: The Trailer

neuman-536.jpg
In this Time article, Susan Neuman, who served as Assistant Secretary for Elementary and Secondary Education during George W. Bush's first term, lets us in on her doubts about NCLB and the administration's missteps. Buried at the bottom of the article is a good reason to keep your eyes on the papers tomorrow:

Neuman still supports school accountability and the much-maligned annual tests mandated by the law. But she now believes that the nation has to look beyond the schoolroom, if it wishes to leave no child behind. Along with 59 other top educators, policymakers and health officials, she's put her name to a nonpartisan document to be released on Tuesday by the Economic Policy Institute, a Washington think tank. Titled "A Broader, Bolder Approach to Education," it lays out an expansive vision for leveling the playing field for low-income kids, one that looks toward new policies on child health and support for parents and communities. Neuman says that money she's seen wasted on current programs should be reallocated accordingly. "Pinning all our hopes on schools will never change the odds for kids."

May 13, 2008

Roberta Flack, Vietnam, and NCLB - All in One Op-Ed

Roberta-Flack-Best-Of-2006.jpg
It's a deadly slow week in education policy, so I'll pass along this op-ed in the School Library Journal (Killing Me Softly: No Child Left Behind) on a teacher's decision to leave teaching because of the No Child Left Behind Act. Minus 5 points for the melodramatic beginning (I feel like the last marine who got out before the siege of Khe Sanh. I feel like the one Titanic band member who overslept, missed the voyage, and lived. In my darkest moments, I feel like a traitor.), but you can't hold that against a guy who writes young adult fiction. Here's an excerpt:

If you’re a teacher, thanks for being braver than I am. Thanks for riding it out when I’m just, well, riding out. And if you’re a parent, please fight for your child. Ask to see your school’s test-materials budget and its library budget. Ask to visit the classroom on a random day, unannounced. Ask whether your kid is getting more or less art than she would have had five years ago. Ask why band practice is at 7 a.m. when it used to be part of the school day. And while you’re mourning the loss of art, music, language, or history, ask the one most damning question of all: What took its place? If you get really riled up by the answer, please consider running for a spot on the school board.

As for me, I’m out. And I’m sorry.


Are teachers leaving because of NCLB? Does anyone have stories or data?

April 4, 2008

TGI - NCLB

Smurfs_Posters_T.G.I.F.jpg
1) Grad Rate Questions: Sherman Dorn frames 12 questions about the forthcoming grad rate measure. If the 2014 proficiency target provides any indication, the answer to this question, "If there are such required benchmarks, is there any supporting research to suggest that the status or improvement benchmarks are realistic?" will be a resounding no.

2) Swifty Statistics: I'm a sucker for a good Harper's Index, so head over to Charlie Barone's brief on the new school choice and tutoring report. His take: "The take-home message is that more and more students are exercising their options to transfer to another school or to enroll in after-school tutoring. The number who chose to transfer more than doubled. The number enrolled in after-school programs increased over 500%."

Charlie's index gives us raw numbers, but the percentages of eligible students participating in choice and SES are 1% and 17%, respectively. On the choice tip, everyone likes to blame districts for not notifying parents (70% of districts required to offer choice to elementary students notified parents; 20% did at the middle school level, and 17% did at the high school level). However, the report notes that:

Most districts that did not offer the school choice option said it was because all schools at that grade level were identified for improvement. Districts typically have fewer total schools available at the middle and high school levels: 77 percent of districts with high schools have only one high school and 67 percent of districts with middle schools have only one middle school, while 53 percent of districts with elementary schools have only one elementary school.

Even if we consider the group of parents who were notified and had options in their districts, very few chose to leave their schools. The literature I've seen on choice suggests at least four reasons for low choice participation: 1) revealed preferences (parents are actually pretty happy with their schools, and have better information than NCLB does about these schools), 2) preferences for closeness to home and other non-academic features of schooling, 3) a lack of information about school options, and 4) structural barriers to choice (attending a non-neighborhood school imposes costs on the choosers). Whatever the reasons, we need to ask whether school choice is a better NCLB policy option than more targeted interventions that could be delivered to struggling students in the schools they currently attend.

Obam-Arts

obama-wanforweb.jpg
Via Mike Klonsky, looks like Obama has been reading the ed blogs on curriculum narrowing. He said:

Part of the reason you’re seeing schools eliminate art and music – or at least diminish them – is because of No Child Left Behind, a law that was intended to raise standards in local schools but what happened was because it relied just on a single standardized test, school districts felt pressured to just teach to the test….in a lot of school districts, they just had to make choices, and they decided, you know what, if we’re going to bring our kids up to test level, all they can do is just study math and reading every day all day long. They’ve eliminated recess. They’ve eliminated art and music. So part of the solution then is changing NCLB so that the assessment is one that takes into account all of the factors that go into a good education, and is developed with educators, not to punish schools, but to improve schools.

Surely some blogger will hit Obama back with the "only 16% of districts reduced art and music" finding from the CEP report. It's worth noting that many schools that will struggle with NCLB's targets already eliminated or scaled back art and music pre-NCLB because of budget cuts. What we need to know is what proportion of schools reduced art and music of schools that had art and music programming before NCLB.

Image Credit: TARARTRAT

March 22, 2008

Madame Secretary Demands Triage, Randy Reback Delivers

spellings.jpg
"We need triage," Madame Secretary explained last week. This morning, Randy Reback delivered it to my inbox via the Journal of Public Economics' new issue, which includes his paper, "Teaching to the Rating: School Accountability and the Distribution of Student Achievement." Reback analyzed data from Texas, the birthplace of NCLB-style accountability, and here's what he found:

* Schools respond to math performance incentives both by targeting math resources towards specific students and by making broad changes which also help very low achieving students. These responses tend to sacrifice the targeted students’ reading performance and to sacrifice relatively high achieving students’ performance in both math and reading.

* Schools respond to reading performance incentives by targeting resources towards the reading performance of particular students, sacrificing these students’ math performance and sacrificing all other students’ performance in reading.

* Finally, schools devote fewer resources towards students in the terminal grades during years when short-run incentives are low than during years when incentives are high.

Reback concluded:

Whether the finding of non-trivial distributional effects is a positive or negative outcome of this public policy is entirely subjective. If one of the primary goals is to create a sort of educational triage, in which students below minimum grade-level skills are pushed up, then the No Child Left Behind type of accountability system appears to be fairly effective. Furthermore, the results say nothing about the overall impact of this system on performance: it may be a rising tide that lifts all boats (and lifting some more than others), or it may be a falling tide sinking all boats (and sinking some less than others).

The important lesson here is that schools respond to the specific instructional incentives created by the accountability system. Schools' responses include targeting specific students, targeting specific subjects, and making broad changes which affect all students. An accountability system should only create disproportionate incentives concerning student achievement gains if the intention is to help some students more than others and to boost performance in some subjects by more than others. Otherwise, the optimal accountability system requires a more evenhanded approach.

March 17, 2008

Charlie Barone and I Agree!

Charlie.jpg
An event so rare that it deserves its own blog post: Charlie points to a Washington Post article on NCLB and students with disabilities. The article argues that NCLB has forced schools to focus on disabled students because their scores are separately disaggregated and only a small fraction of students can be exempted. Before NCLB, too many state accountability systems had gaping loopholes that allowed these students to be ignored (for more, see here).

Of course, this brings us back to the NCLB incentives debate. If we credit the structure of the law when students with disabilities receive more attention, shouldn't we look at the structure of the law when schools emphasize tested subjects? These are questions better answered by someone with a completed AERA paper...

February 24, 2008

Richard Rothstein and the Cream Puff Caper

cream%2520puff.jpg
In a talk last Thursday at Teachers College, Richard Rothstein proposed a "Report Card on Comprehensive Equity" that would broaden the set of measures we use to assess the achievement gap. Rothstein argued that accountability systems that focus only on basic academic skills distort the educational process as schools focus more on skills for which they’re held accountable. Because we want more out of schools that math and reading scores, Rothstein proposed extending the data we collect to include domains such as critical thinking and problem solving, social skills and work ethic, readiness for citizenship and community responsibility, foundation for lifelong physical health, foundation for lifelong emotional health, appreciation of the arts and literature, and preparation for skilled work.

How could we collect these data on a nationwide scale? Rothstein explained that NAEP was originally designed to collect data on a wider range of skills, including civic engagement and students' ability to work in a group. Rothstein and his co-authors, Rebecca Jacobsen and Tamara Wilder, plan to propose an expansion of NAEP's data collection activities to the National Assessment Governing Board. Rothstein provides a clear picture of what these measures could look like here.

What of the cream puff caper, you ask? After some discussion of public education's goal of promoting physical health, attendees were greeted with plates full of cookies and cream puffs. Cream puffs that, while delicious, had the unfortunate side effect of food poisoning. Hopefully only a handful of people learned of the perils of eating dessert the hard way.

What do you think about Rothstein's proposal? I think it's an important first step in accounting for the many goals of public education that we care about. It was a formidable task to pull these data together - kudos to Rothstein, Jacobsen, and Wilder.

February 12, 2008

Panic at the Disco! Take the Eduwonk Challenge

panic%20at%20the%20disco.jpg
The latest NCLB splosion already has the blogosphere assigning battle names, i.e. Trail of Tears, Wonk Wars, or the I-95 Knockdown. I prefer Panic at the Disco, and you can watch eduwonk and I get down courtesy of David Bellel.

In his most recent post, eduwonk asks me to bring it:

My challenge for Eduwonkette is to offer up what sort of requirements for school accountability she'd support. How many kids, or what percent, should a school have to teach reading and math to, well, in order to make "adequate yearly progress?" What's the bar below which no school should be allowed to fall? Should less be expected in terms of performance from schools serving a lot of poor or minority kids because they are more challenging populations based on the data? How many other subjects should we test students in if we don't want to just focus on reading and math?

To which I say: Game on, week of February 24th. Until then, this site is running eduwonkette lite as I dropkick some deadlines.

In the meantime, readers and fellow bloggers, take the eduwonk challenge: What kind of requirements for accountability would you support?

February 8, 2008

There Won't Be Blood

there-won%27t-be-blood.jpg
A wise woman once advised that name-calling is a poor substitute for a good argument. In my view, it is the feeble tool of last resort for desperate men who cannot win arguments on their own merits. It has no rightful place in policy debates.

Let me wrap up this debate over NCLB's unintended consequences by recapping my central argument:

1) By mandating an escalating series of sanctions for schools that fail to demonstrate adequate yearly progress in reading and mathematics, NCLB has created incentives for schools to focus on reading and math, rather than other subjects. As our fearless leader once noted, “What gets measured, gets done.”

2) NCLB does not mandate that educators focus on reading and math to the detriment of other subjects. But NCLB is a policy predicated on the idea that incentives can fundamentally change behavior. We should *expect* teachers to respond to NCLB's powerful incentives.

3) It therefore is not surprising that there is a growing body of evidence, both systematic and anecdotal, that many schools are devoting more instructional time to reading and math and less time on other school subjects, such as social studies, science, and the arts. This is particularly evident in schools most at risk of missing AYP.

4) If our national goals for public schools are to prepare young people to be competent, well-rounded and productive adults, we must assess how effective public policies such as NCLB are in achieving these goals.

There are a variety of revisions to NCLB that might be considered to enhance its ability to meet a broad set of goals for public education. Robert Pondiscio put it nicely when he wrote, "If the cure is worse than the disease, then find a better cure." We could, for example, create incentives for teaching additional subjects. Or we could seek to build the capacity of schools to teach subjects such as social studies and science more effectively alongside reading and math. But NCLB does neither of these.

Where do we go from here? We can continue to stand on the mountain and hand down outraged edicts to educators. But sternly lecturing our nation’s teachers will do little to change their behavior. If our goal is to ensure that children in all schools have access to a broad and deep education, we fail them by adopting this approach.

Bottom line: it's reckless public policy to ignore the evidence that NCLB’s incentives have resulted in more attention to reading and math, and less attention to other school subjects.

February 4, 2008

Social Studies, Science, and the No Child Left Behind Act

baking%20soda%20volcano.jpg
One of the most stable findings in the management literature is that measuring a narrow subset of organizational goals results in employees ignoring non-measured tasks that are no less critical to the overall mission of the organization. When lawyers are rewarded for billable hours, they focus on increasing hours rather than quality. When case workers are measured by the number of job placements, they push job seekers into positions that are poorly suited for them. Management wonks call this "goal distortion" (see Richard Rothstein here; see also Timely Tidbits on Unintended Consequences). The take home point is that the facile use of quantitative indicators can cause as many problems as it solves.

Let's revisit a very old debate on NCLB's effect on science and social studies teaching in the schools most likely to struggle with AYP. My Valentine Charlie Barone followed up on my original post by asking:

Why did 56% of all districts not narrow down their curricula?

What we're quibbling about is who's responsible for cutting social studies and science - NCLB or teachers/schools. According to Barone, it's the schools, stupid. Because all schools don't narrow their curriculum post-NCLB, NCLB does not provide incentives to narrow the curriculum.

How would we know if NCLB creates an "incentive problem?" Let's consider a non-education example. Suppose we attempted to get drivers to slow down by tripling the price of speeding tickets. Drivers now have a much stronger financial incentive to ease up on the pedal. How do we evaluate this policy? We want to know if a driver living in the "crazy expensive ticket world" is more likely to slow down than he would be if he inhabited the old world. Imagine we observe that after this policy change takes effect, 50% of drivers slow down. By any standard, a 50% reduction would be considered a huge policy effect. While we might be interested in learning more about the other 50% of drivers, our ticket increase had a powerful effect on driver behavior.

Now, back to education: are more schools cutting social studies and science in a NCLB world than would be in a non-NCLB world? I think so. One could argue, as many accountability proponents certainly do, that reading and math are more important than science and social studies, or that the test score "gains" that accountability policies yield make these other losses acceptable. I don't agree with those arguments, but at least they acknowledge what is happening on the ground.

eduwonk also argued that the problem is schools, not NCLB:

I don't buy the argument that cutting other subjects, especially social studies, is an incentive problem here. Rather, it's a capacity problem. Too few schools are able to deliver a really powerful instructional program today and in the absence of that they do a lot of counterproductive things.

Low capacity schools may be more likely to cut social studies and science than high capacity schools, but NCLB, not low capacity, is the cause of the cuts. After all, low capacity schools taught more social studies and science pre-NCLB than they do now. Even higher capacity schools are affected by incentives - to use the driving parallel, wealthy drivers can handle a more expensive ticket, but many will slow down anyway.

My proposal for a reauthorization bumper sticker? "NCLB doesn't narrow curriculum. Schools narrow curriculum."

January 23, 2008

Exceptional Ed Week Commentary on Testing and Accountability

ladd.jpg
Helen Ladd, an economist at Duke, has turned in an exceptional commentary about rethinking the ways we hold schools accountable. Ladd has spent a decade studying the effects of North Carolina's accountability system. Here's an excerpt:

The bottom line is clear: Test-based accountability has not generated the significant gains in student achievement that proponents — however they perceived the problem to be solved—intended. Nor is the country on track to meet either the high proficiency standards required under the No Child Left Behind law or the equity goals suggested by its name.

As a reform strategy, test-based accountability falls short in at least three ways. First, it pays too little attention to the social factors that affect student achievement....Second, the approach pays too little attention to the broader system within which individual schools operate. Where is the accountability for state, county, or district officials who fail to provide the resources and support services needed to make the schools function better?....Third, test-based accountability tends to be punitive and pays too little attention to promoting effective process and practice within schools.


Ladd goes on to outline an alternative system - one that maintains realistic test score goals but incorporates inspection-based reviews of schools. This approach deserves serious consideration.

January 18, 2008

No Child Left Behind Not the Silent Killer, But...

nightshift-at-the-factory-factory.jpg
Let me pile on to the eduwonk-Barone-Pondiscio debate. I'm no fan of the "NCLB: The Silent Killer" melodrama that blames the No Child Left Behind Act for all of our schools' problems, and there's obviously plenty of it to go around. This is what Charlie Barone and eduwonk reacted to yesterday when they pointed to a NYT article about college prep to argue that NCLB is not forcing schools to become drill and kill test-prep factories. (See eduwonk's post here.) Robert Pondiscio responded at Core Knowledge by providing an insider's view of currriculum narrowing and test prep. He concluded, "Dismiss it at your own peril."

I'm with Robert on this one. In my view, NCLB is creating very real problems by leading some schools to focus primarily on reading and math and to zero in on a small set of tested skills in these subjects at the expense of the full range of skills we want kids to have. I also think this response is too pervasive to ignore. While we can argue whether it "works" or not, it's happening.

The much blogged about Center on Education Policy Report (available here) released last summer found that 44% of districts had reduced time spent on social studies, science, arts and music, lunch and recess to fit in more time for reading and math. Comparing districts that had at least one school not making AYP with those who had none reveals starker contrasts: 51% of districts with at least one identified school decreased time in social studies, while 31% in districts with no identified schools did. (See Table 4 in the CEP report for more.) You can look at the numbers above in a glass half full way - it's not all schools, after all. To me, it's enough schools to cause concern. (It's also worth noting that district-based surveys probably understate how much narrowing there is in schools struggling with AYP.)

eduwonk and Barone are arguing that not all schools have responded to NCLB's incentives this way, so the problem isn't with NCLB. The underlying assumption is that good educators can resist these pressures. But eduwonk and Barone both support NCLB, I think, because they believe schools need incentives to improve. If you believe that incentives can have strong impacts on behavior, it doesn't make sense to argue that schools can (and should) just turn their backs on these incentives. Schools get no credit for teaching science and social studies, and schools that cut back on untested subjects and do lots of test prep are playing by NCLB's implicit rules.

There are a number of ways to address this issue that would seem acceptable to NCLB proponents - i.e. by "right sizing" the school day, as Paul Reville suggested, or testing all subjects, as the Center on Education Policy advised - but supporters of NCLB would do well to acknowledge and address the problem.

(Image credit: nataliedee.com)

January 15, 2008

Paradise by the Dashboard Light

dashboard.jpg
Last week, Madame Secretary unveiled a shiny new toy called the "National Dashboard." I pooh-poohed it, saying that most of these data were already available in the National Center for Education Statistics' Common Core of Data or elsewhere. After checking it out (and seeing how pretty it is!), I like it. If you need a tidbit of data quickly, this is helpful, and most data consumers aren't going to take the time to navigate the Common Core. Score one for the Madame.

What does the dashboard include? Demographics, percentage of schools by state making AYP and in restructuring, NCLB funding, percentage of teachers that are highly qualified in low and high poverty schools, percentage of students proficient by subgroup on the state test and NAEP, graduation rates, percentage of students taking AP exams, and the percentage of students using tutoring and choice options. Check out your state at the link above.

January 10, 2008

Ladies Who Lunch

maggie-and-pony.gif
Madame Secretary's rolling up at the National Press Club for lunch today, vowing to take matters into her own hands on NCLB (USA Today article here). She's expected to chat about expanding growth models, differentiating sanctions, and requiring states to adopt a uniform definition of high school graduation. More details here.

I'm all for growth models. But growth models that don't ditch the 100% proficiency fantasy are not much of an improvement. Stay tuned for the GWG's talk.

January 8, 2008

Birthday Presents for NCLB: Some Thoughts on School vs. Teacher Effects

rod-paige-armstrong-william.gif
Today is NCLB’s 6th birthday. NCLB is, at its core, a policy predicated on the idea that schools vary widely in their ability to improve students’ test scores. By holding schools accountable, the hope is that “bad” schools will become more like “good” ones. (Note - this is a post about NCLB on NCLB's terms, so I'm going to focus on test scores. For more posts on NCLB, take a look here.

However, as I wrote yesterday, once we take into account students’ background characteristics, school effects on standardized test scores are pretty small. The good news is that teacher effects on test scores are quite large (you can find more posts on teacher effectiveness here). In short, the differences between teachers in improving test scores are much larger than the differences between schools. This finding has significant implications for the potential success of school-based efforts to improve test scores, as Barbara Nye, Spyros Konstantopoulos, and Larry Hedges wrote in their paper, “How Large Are Teacher Effects?”:

Many policies attempt to improve achievement by substituting one school for another (e.g. school choice) or changing the schools themselves (e.g. whole school reform). The rationale for these policies is based on the fact that there is variation in school effects. If teacher effects are larger than school effects, then policies focusing on teacher effects as a larger source of variation in achievement may be more promising than policies focusing on school effects.

(You can click to enlarge the picture above - courtesy of the Halloween Edu-Parade, Rod Paige is Armstrong Williams.)

Naked Hat Tip to NCLB

hat.jpg
On NCLB's birthday, Diane Ravitch suggests that we're prancing around in our birthday suits (Grading Schools):

I find myself (once again) in the uncomfortable position of seeing ideas that I have supported as part of a broader set of reforms turn into unhealthy obsessions. I feel like someone who said that people should wear hats and then turned around to discover that people were talking about nothing else but their hats and walking around naked.

Deb, I think that one of the things that has occasionally drawn us together is that we both have a vision about education, what it might be, even when we disagree about this or that detail. Now I find that no one seems to talk about education anymore, just testing and accountability.


More on NCLB in a bit.
The opinions expressed in eduwonkette are strictly those of the author and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Get RSS

Get eduwonkette delivered by e-mail. Enter your e-mail here:

Delivered by FeedBurner

Advertisement
Powered by
Movable Type 3.34
<

EW Archive