Wish #2: The End of Proficiency Only Accountability Systems

The No Child Left Behind Act may represent the largest threshold-based government accountability system in the country. Schools are evaluated not by how much progress students make, but by their success in pushing students over the proficiency bar. By now, you're probably familiar with the discontents of this system: states can game the system by setting that proficiency bar low; some schools have triaged their students, essentially reallocating resources to the kids most likely to become proficient in the very short-term; and policymakers can misleadingly make claims about declining racial achievement gaps based on proficiency rates, even as these gaps are unchanged or growing.

Proficiency-based accountability systems leave us in a terrible spot. On the one hand, we want to push kids and raise the bar for proficiency. But on the other hand, we want to make sure that the lowest performing students aren't kicked to the curb. The higher you raise that bar, the more likely you are to have a significant proportion of students in any given school below proficiency. And those are precisely the conditions under which it makes sense for educators to allocate their time and attention strategically.

All of this, of course, should have been expected in a system focused on proficiency rather than growth. And contrary to popular belief, NCLB's growth model pilot doesn't allow true value-added models, but is instead based on a "projection model" which requires all students to reach a fixed proficiency target regardless of their initial achievement levels.

What am I suggesting? The new Department of Education would do well to let states experiment with a few different accountability systems: 1) dump proficiency altogether and identify schools as in need of improvement based on whether they are making less growth than expected. In other words, drop NCLB's arbitrary targets and evaluate schools based on how they are doing compared to the schools we already have, or 2) keep proficiency around, but focus improvement efforts on schools that are both low-growth and low-proficiency - not relative to an arbitrary standard, but perhaps those in the bottom 15% of both categories. (That number should be set based on the number of schools to which states can provide targeted support.)

Either of those options would require significant new investments in better tests that are designed to measure growth, and careful attention to building a value-added model that is both valid and reliable. New Yorkers know well that a poorly designed value-added model at the center of the Progress Reports wreaks more havoc than no value-added model at all.

My recommendations will surely fail to impress the "no excuses" crowd (or more aptly, the "nuke the system" crowd--my belated entry into Elizabeth's Green's name-the-reformer contest) who see anything short of "100% proficiency" as not radical enough. "No excuses" is great rhetoric, but in the end it's just that. So my wish #2 is that we move past this bravado in the next four years and develop a more reasonable and effective way of identifying and supporting low-performing schools in getting better.

PS: Check out Richard Rothstein's related op-ed, Getting Accountability Right, which speaks back to Wish #4 (integrating a broad set of goals of public schooling into accountability systems).

Look at what Ofsted does for all providers of service to children, including all schools.


"We inspect and regulate to achieve excellence in the care of children and young people, and in education and skills for learners of all ages.

The new Ofsted brings together the wide experience of four inspectorates to make a greater difference for every child, and for all young people and adult learners, in England. Their educational, economic and social well-being will promote our success as a country."

Complete audits by teams of trained inspectors, sensitive to numerous current issues relevant to operations, (yes, skoolboy, including those behaviours and postures likely related to future success, no matter how poorly estimated, with or without bootstrapping for better confidence intervals.)
Reports are posted within weeks of inspections; the inspection schedule and format is revised semi-annually to improve it; and all audits are available online, crisply formatted with component grades, well-written, and providing an historic record for schools which do not get it right.
Inspectors have available a set of expectations of student performance based on far more data than available in the US, and the individual subject tests for older students, which Ofsted has no direct role in developing or administering.
Go ahead, find the PDF's just a few clicks away, on schools ranging in size from 150 to 1500.

To find, in comparison, the barber's approach to surgery, the cosmetologists approach to surveillance for skin cancer or the DC Public Charter School Board's guide version of education audits of its schools go to
and read some of the two page reports on Public Charter Schools in DC, which include some of the KIPP schools currently enrolling 17,000 students nationwide.

Remind me never to hire you to build a bridge.

Undoubtedly you would be pleased to make better and better progress with each passing day.

What bothers me is that you wouldn't seem to mind if the bridge ended four feet short of the other shore.

When NCLB was imposed, I was on the executive committee of a bipartisan reform coalition and that earned me a seat in the discussions at the central office. The district, the state, and the CEOs had an incredible staff of staticians who dutifully expressed their worries about NCLB-type accountability. I was like everyone else, not fully understanding the implications of changing from NRTs to CRTs, and frustated they they were opposing a "done deal." We had no choice to but to make NCLB work, and Oklahoma City (propelled by soul-searching in the aftermath of the Murrah) had plenty of people willing to bury ideology to save our collapsing schools. If we failed, the Republicans had the votes to shut the district down.

But I literally spent hours in the parking lots after the meetings listening to the researchers arcane arguments. They knew that the integrity of their profession would be threatened by testing to proficiency. Their loyalty was to the honest use of data, and they knew precisely how education would be corrupted.

As a former teacher and a researcher, I hear your arguments as to how NCLB affects the lower performing students. However, my concern was split between that group of students and the higher performing students. Just because the latter group is already "proficient" does not mean that they should be thrown by the wayside to make room (and save expenses) for those who are on the verge of proficiency. A lot of schools have stopped challenging kids who deserve to be challenged in an effort to drag the just-below-the-bar kids up to proficiency. Sure, the effect on the kids who are nowhere near proficiency is terrible. But I would say that it hurts our society more to quit challenging and strengthening those kids who were proficient all along.

I'm not sure why this needs to be an either/or debate. We need to measure both growth and proficiency. Neither is sufficient by itself.

Eduwonkette, which "no excuses" ed reformers are you talking about who think that anything less than 100% is insufficient? I haven't seen that argument from anyone other than the authors of NCLB - even the No Excuses crowd, in my experience, thinks that mandate is absurd. Can you provide quotes?


I don't know CodyPT, but he answers your question and he's not alone when he writes:

"Remind me never to hire you to build a bridge.

Undoubtedly you would be pleased to make better and better progress with each passing day.

What bothers me is that you wouldn't seem to mind if the bridge ended four feet short of the other shore."

Then, you have a situation like we have at my school, where nearly 90 percent of students are at or above grade level, but our "school report card" grade tanks because we are not making progress.

What this grade fails to consider is that if a student is at a level 3 (meets expectations for proficiency) for, say, 7th grade, and then is a level 3 for 8th grade, too, that student HAS made progress: one year of progress. S/he was meeting the standards for 7th grade and is now meeting standards for 8th grade. A 3 to a 3 is progress.

The reason for the 100% mandate is purely statistical artifact. The primary purpose is to eliminate the achievement gap. It's right there in the statute. The problem is that when you have one group performing about 3/4 of a standard deviation below the other group, the achievement gap doesn't begin to close significantly until we get into a high 90s pass rate (or a low single digit pass rate) and it doesn't fully close until the pass rate is 100%. That's why the law is written the way it is, absurdities notwithstanding.

It's no more absurd to think that there's some educational magic we can employ that is only going to improve the relative achievement of the lower group without affecting the higher group.

Unfortunately what CodyPT fails to see is that the bridge he's contracting to build is a Bridge to Nowhere.

Rome is burning and we keep fiddling. One obvious and tragic result of NCLB has been many states willingness to lower their definition of proficient to absurdly low levels in order to keep getting federal funds. World standards are already established for math and science. Algebra is taught in sixth grade in much of the world. With no measurement of proficiency based on a national standard we will continue to dance, sing and flounder. Measurable progress sounds nice but is harder to measure than an absolute standard and will encourage the “bigotry of low expectations”. We need a national high school exit exam that is at the world standard so we no longer continue to issue the counterfeit currency of a high school diploma based on any states politically motivated decisions.

Great article "Getting Accountability Right" highlighting the early NAEP assessment protocols. They were so much more holistic and humanistic than the current multiple-choice testing.

I think reviews or audits by human beings who are experienced educators is the only good way to assess whether a school is doing a good job.

I find it puzzling why so many people regard school testing as valid. Surely many, if not most, people must be aware of what is happening in schools across the nation. I'll just decribe what I witnessed at my own school:

Teachers would drill students on the test for the entire year. If the test was the same, they'd drill the students on the exact items. If the test was different, the principal would encourage the teachers "to look it over so you will be familiar with the instructions." Tests would then go into teachers rooms for several days. This gave them enough time to look at the test items and drill the kids on them. After the tests were collected, they would go into the principal's office, where they might remain for a day or two. Often, the principal would stay at school late during this time so he could "get the tests ready to send to district office." This person would be alone in his office for hours into the night. Just him and the tests.

Once I complained about this situation to a state senator: "Everyone is cheating on these tests."

"Oh, we know that," was her response.

Admittedly security became a little stricter after 2001, but there are still a million ways to game the system. Just imagine what would happen if school personnel administered the Scholastic Aptitude Test. Of course, the College Board would never allow this, and for good reason.

Perhaps scores have improved, but as Linda Perlstein points out in her book "Tested," learning (probably) has not.

I'd like to see a lot more security in testing so we can be more certain of the results. If we can't afford to ensure the validity of a test, then we can't afford to give it.

If we are truly interested in assessing student progress, we have to abide by strict procedures for accurate testing and evaluation.

Regarding the bridge analogy... not very useful. A bridge is a single concrete object, fixed in time and space, a project that has a pretty clear metric for success, a project that eventually ends. If the bridge were four feet short, it would get fixed. But schools do not end, and we can't agree on what our goals look like the way we can all agree the bridge should hit the other side. We never will entirely agree, unless we adopt some totalitarian groupthink, (though I think we can build some general consensus around a range of goals). If there's a bridge to be built, let's imagine the schools sending out some of their students each year, to take on a whole range of bridge-building work, to be further trained and supervised, while we have others still developing those skills, and novices coming in to the school every year as well.

i agree that NCLB has a lot of flaws, but they're also some positives. I think that schools are finally taking testing seriously because they are now high stakes. I do not agree with NCLB, but it is not all negatives.

Charlie Barone has already dealt pretty thoroughly with the issue of why 100% proficiency is not really 100% proficiency and how schools and districts can go on forever without hitting 100% and still not fail to achieve AYP, using Safe Harbor provisions.

My own state has adopted a "growth model" for both the state accountability (which actually is measuring the amount of learning for each student each year), as well as being included in the federal model for AYP--which allows some schools to show that even though they didn't make it this year, if they maintain their current rate they will get there on time. I put in a fair amount of time understanding these things and the nuances of all the different analyses of the same basic measure that are now put on our annual report cards. But that's the kind of geek I am. Most people I run into (both parents and teachers) are just puzzled by the proliferation of numbers--except anyone is willing to hang on to a number that says that there is some way to look at things that says that for us it ain't so bad.

But there remain a certain number of schools that are not making AYP--for any group of the overall population, don't get the green light for growth, have a performance index that's barely moving and hangs below any semblance of acceptability. The graduation rate (already a jumble of built-in excuses for lots of kids who just don't count) is an embarrassment. Frankly, I would love to see schools able to implement some other kinds of measures--performance-based assessments, for instance (not that anyone is currently stopping them from local implementation). But I don't hold out much hope that these different kinds of measures are going to uncover some new success rate that was always there but just couldn't be perceived by the existing measures. My interest would be in providing teachers with some guidance and a grain size to their information that light bulbs could go off for them (and their students, and their students' parents) about where students aren't getting it and what to do next. I believe this is possible. I don't believe we have anything near sufficient capacity in schools to understand or implement anything like this.

I would love to see school climate as an accountability indicator. But I know that if we stick to something easy to count (like suspensions and expulsions) we are just asking for schools to relegislate who receives discipline and how it is counted. There are more sensitive instruments (more appropriate to diagnostics and less appropriate for accountability)--but unless we can convince schools of their utility, they become more "meaningless paper work."

I dream of more careful skepticism in leaping onto bandwagons. The multiple measures bandwagon is not necessarily a bad one. But if we're getting on because we believe that we are dealing with a proficiency-only system, maybe we need to re-examine. If we believe that there are measures that will prove once and for all that the tests don't really measure real learning (and something else will), maybe we are chasing a pipe dream (BTW, Linda--the things that you describe are absolutely illegal in my state. Teachers and principals can and have lost licenses. If the power of this deterrant is not adequate to counter the presumed teacher shuffle that may occur five years out if scores don't improve, maybe we need to hire outside examiners--but the cost would be enormous).

What I described (compromising of tests by school personnel) is illegal in my state too and some people have lost their jobs and credentials because of it.

Twenty years ago when my children were little, teachers from public and parochial schools would send home xeroxed copies of the tests as "preparation" for them. Some teachers would send the whole test home while others would send a part each night for homework (I kid you not.) Not surprisingly both sons always scored in the 99th percentile along with most of their classmates (as in Lake Wobegon). In those days no one cared that much although I do remember being somewhat shocked by it. As a teacher, I thought it very important to give the test as directed but suspected that some of the other teachers did not. When one of them became a principal, she gave me her box of "stuff." At the bottom was a xeroxed copy of the standardized test! "So that's why her test scores were so much better than mine!" I thought.

Now a teacher or principal would get fired for doing the above. Now the misdeeds take place in the teacher's mind and there are no witnesses. For example, she might look at the child's booklet as she walks around the room. She notices that the next section is on spelling and makes a mental note of the words. After lunch she mixes those words up with some others and drills the kids on them. The principal takes the booklets into his office at night. He knows he doesn't dare make to many erasures so instead he looks for bubbles that were not filled in and answers them. He is careful to do this for only one or two items per test. This kind of outright cheating is (hopefully) rare but I am sure it is happening.

There are just so many ways to invalidate the test. When a neighboring district won a prestigious award, I asked a teacher how they did it. She responded, "We seated five low kids around one high-achiever and pushed the desks close together." Was that a joke? I'm not sure.

I was always told by merchants that teachers "do not write bad checks." We are usually honest people, but placing too much pressure on educators has caused them to resort to dishonest practices. This isn't right but it is happening.

We will lose many teachers within the next five years, but it won't be due to test scores. No judge will ever allow that to happen with the testing situation as it is now. In order to use them to evaluate teachers they will have to be reliable and valid. One way to do this without spending much more money would be to have teachers at one school administer the tests to students at another school. The tests would have to be different each year (duh!) and they would be handled by one person. There would be several forms of the same test. And yes, they'd have to measure growth from September to June!

