This Is Not Good Education


Dear Deb,

There are times when I feel that we are on the same wavelength, and times when I know we are not. Right now, my frustration is multiplied because in the course of your last mini-essay, I found myself alternately agreeing and disagreeing with your assertions.

I said that many people who have spoken out about the recent round of NAEP scores seem not to have read the report in which the scores were embedded. I expressed the wish that the commentators would take the trouble to read the report before characterizing what they read in the newspapers, which is third-hand at best. This observation sent you into musing about how the original sources themselves are “an interpretation of data,” and how we all rely on the writers that we trust—or happen to agree with.

But that was not my point. The NAEP data are an original source for those who wish to discuss the latest round of national tests. They are not an “interpretation of data.” They are the data. I assume that you mean to say that you are unimpressed by NAEP, that you do not like the content of the NAEP frameworks or the methodology of the NAEP assessments. That is fair enough. But that is a different discussion from the one I raised.

Policymakers in Washington and the state capitols are influenced by the every-other-year reports from NAEP about state and national progress. It is your right to dismiss NAEP out of hand, but the people making important decisions about education policy are on a different trajectory. They look at the numbers and they see a reality that you dismiss as trivial and unimportant. Maybe you are right and they are wrong.

My point is that if public policy is going to be affected by NAEP—and I believe it is (and should be)—then at least the people who write about the NAEP scores should read the data and not rely on second-hand or third-hand accounts. Like the tests or hate them, they are the best measure we have right now. As the recent report from the Thomas B. Fordham Institute (“The Proficiency Illusion”) showed, the state tests vary widely and randomly in terms of their expectations and standards.

As I said in my last post, the progress on NAEP in most areas has been slight or insignificant from 2003-2007. I take this to mean that NCLB has had trivial effects on student achievement in reading and math, the subjects tested every other year. Now that the president and the U.S. Department of Education have made it their business to show that federal legislation can and will raise test scores, every release of NAEP data is accompanied by a press statement from the U.S. Secretary of Education that magnifies slight gains as huge achievements.

This is troublesome. It is troublesome because the federal government’s role as the honest, impartial collector and distributor of information gets corrupted when it acts as a cheerleader. And it is troublesome because it is unrealistic to expect test scores to make major leaps in a few years. When they do, one should suspect chicanery of some kind.

NAEP shines a light on state testing practices, as the Fordham report shows. Many states are reporting unrealistic leaps in achievement and high levels of proficiency to satisfy the absurd demand of NCLB for a trajectory that will bring every child to "proficiency" by the year 2014. NAEP shows how unlikely it is that any state will meet that goal and how inflated most of the states' claims of achievement are.

You make a transition from national testing to the dangers of a national curriculum. We have discussed this often. Like you, I would like to see schools where children have time to build, to create, to explore, to experiment, to play. I would like to see kids in the primary grades building castles and fortresses and stores with blocks. But unlike you, I don’t think this kind of playful learning is at odds with a national curriculum.

What is really frightening today—due in large measure to NCLB—is that we have a national testing mania without any curriculum at all. So now our schools are obsessed with preparing to take tests, getting good scores on tests, and then starting the test prep all over again. Out the window goes any thoughtful or playful engagement with history, literature, or the arts, as well as time for physical education (in many New York City schools, children are lucky to have one period a week for physical education). This is outrageous. This is not good education.

So here is where we find our differences and we find our agreements. Unlike you, I am not frightened by a national curriculum and national testing; I believe we already have both, supplied by commercial publishers of textbooks and tests. And what we have is low-level and antithetical to good education. Where we agree is that we have a vision of what good education is and should be. Even if we don’t agree on every detail, we do agree that what we have now is far from good education.




Our school system does not invite improvements.

While I like the idea of standards and fully support an articulated curriculum and learning expectations, national standards would have as little effect on student learning as the NCLB has had. Increasing the role of the federal government by the establishment of national standards would be ineffective. Our federal government has neither infrastructure nor credibility in designing or implementing quality assessments / learning environments.

California has outstanding school standards. And yet, their translation into the classroom has yielded only moderate success, primarily because the lack of standards was not the prime reason that our schools are doing poorly. Our schools are doing poorly because our school system is highly fragmented and has no ability to improve.

Before quality standards, curriculum or assessments could have any effect on student learning, we would need to reexamine how we do “school.”

From international analyses, the evidence is compelling that external assessments, in lieu of grades, contribute greatly to student learning with a one year advantage at 8th grade. The teachers within those systems insist that the learning advantage (with external assessments) is due to the collaborative relationship between teacher and student. With external assessments, the students see their teachers as allies/advocates and not as a judge. This is not a particularly expensive improvement, but it would require a dramatic change in our idea about what schooling is.

With NCLB we currently look at whether a school is being effective. But a school does not learn, children do. We have no way (nor will) to track what each individual child is learning (or not.) We constantly assess our children but do not use that information to effectively change our teaching, curriculum or tests. Without a quality feedback system, how could our teachers learn to teach better? How would implementing quality standards ever find their way into practice?

Inertia drives our school system. Without school structural changes, quality standards and curricula will not improve student learning.

Erin Johnson

Diane: You know better than I do the arguments against testing in general and NAEP in particular. There is stereotype threat, built-in cultural bias in the questions, and so on. These factors pollute the data before any interpretation takes place. Policy makers rely on NAEP and other test scores, but this is not to be accepted as a fait accompli. Even in heavily bureaucraticized establishments like ours, there is reason to believe that policy can be changed by those dissatisfied with the impurity of the data and the methods of analyzing it. The problem, as I see it, lies in the very practice of "measuring" student learning by current instruments. Let's try assessing what our kids can do without seeking uniformity or scientific precision in our means of assessment.

Yes, but at least parents should have the right to know where their children stand and where their school and district stands. Without information, nothing will move the parents to ask for change. And I fully believe it is the parents, not the teachers, not the politicians, and not the administrators, that will ask for change in the right direction.

Saying tests have biases is a big whimp out. You might as well say teachers can not teach poor children because of the problems some of those children may have at home (lack of food, care, abuse). If that were true, some seriously poor urban schools would not get the results they get. Teachers may not be able to change the world but they can certainly do their best to affect each student, one at a time.

So while people are bickering that national curriculum and testing aren't fair, take away teacher creativity, won't work, whatever, we parents and future citizens get crap.

Yay, everybody wins, huh?

Some of the people who are the most supportive of testing are the people who least understand the results.

Diane, you say that policymakers should know better than to try to exaggerate minor differences in mean NAEP test score for political purposes. I agree. What I find disconcerting is that NAEP researchers have specifically said that minor differences are not reliable -- no matter what * appears by a mean scaled score.

Look at the mean scaled scores for 2003, 2005, and 2007:

READ Grade 4: 218, 219, 221
READ Grade 8: 263, 262, 263
MATH Grade 4: 235, 238, 240
MATH Grade 8: 278, 279, 281

Numerous factors influence how those numbers are derived, but consider only one: Test Equating. In trying to build test forms of similar overall difficulty, test developers can come close, but they are never perfect. In a 2003 study by Hedges and Vevea ("NAEP Validity Studies: A Study of Equating in NAEP," Working Paper #2003-13, April 2003), the authors estimated that random variation in difficulty of forms could account for about .5 scaled score difference at the mean for Reading and 1.0 of a scaled score for Math at the mean. On page 23, they explain that such errors can actually add up over time, with .5 drift turning into 1.0 to 1.5 and so on. The error was much more extreme at the top and bottom of the test scale. On page 22 of their study, the authors state: "It might be advisable to consider equating as introducing as much as 0.5 to 1 points of bias in trend comparisons. Thus, a viable procedure might be to test for differences between assessment waves by testing whether the difference is greater than 1.0 scale units...." (p. 22).

To put it in language anyone can understand: NAEP differences near 1.0 are probably the psychometric equivalent of pure dumb luck (good or bad).

How many of the differences above are close to 1.0? Notice that there are greater differences for Math than Reading, just as the the researchers noted would be true due to equating alone. Hmmm...

Given that the statisticians who understand test equating reported that differences at or near 1.0 scaled score points could be due solely to test equating imperfections, why is that not considered in more current reporting of these data? Instead, NAEP reports differences as small as 1 scaled score point with an * to indicate statistical significance -- ignoring what their own research team has concluded. And the politicians take it from there...

What is the public to think?

Understand that many more statistical decisions are made about how to calculate the means you see before anyone ever attempts a political spin. There are decisions about how to weigh different samples (especially with better economic data on students), how to cut the data (nonpublic schools reported in national means, but not state means), allowing or disallowing accommodations for students. All of these affect the means reported. These are not political decisions, but they greatly affect the results.

It would be interesting to see a study of the 2007 data that shows, for every decision that had to be made (or was made in the past), how that affected the mean scaled scores. What would the data be without non-publics in the national data? What would the data be if accommodations were not allowed? If twice as many Hispanics or Asians or whoever had been included in the sample, would that have affected the mean? What would the data be if only entities with a 95% (not 85%) participation rate had been included? Since NAEP is a model for statistical excellence in testing, it would be very illuminating for the public to see how such decisions affect the results. Given that education is really about the decisions that adults make, it could be a real teachable moment.

Why do we overemphasize these very small increases in the NAEP? Are we so desperate to believe that the tremendous amount of money and effort that we have spent on school reform has been worthwhile?

If we were at the top of international studies then small incremental improvements may in fact mean something. But the gap between our students' performance and those of their counterparts in other parts of the world is excessively large.

The exaggerated emphasis on small differences on the NAEP brings to mind an excessive interest in the rearrangement of the deck chairs on the Titanic.

Erin Johnson

This story hot off the wire. Teachers as well as parents, it seems, can have an impact on bad policy:

Wis. teacher protests No Child law By RYAN J. FOLEY, Associated Press Writer
Thu Nov 1, 5:30 AM ET

MADISON, Wis. - A middle school teacher is protesting the federal No Child Left Behind law by refusing to administer a standardized test to his eighth-grade students.

David Wasserman, a middle school teacher in Madison, began his protest Tuesday. Instead of giving students the Wisconsin Knowledge and Concepts Exam, he sat in the teacher's lounge, leaving his colleagues to oversee the test.

He said he has moral objections to the federal law, President Bush's signature education policy. The state test is used to measure whether schools are meeting annual benchmarks under the law. Schools that do not meet goals can face sanctions.

Like many teachers, Wasserman said he believes the test is a poor way to measure student progress, takes up too much class time and is used unfairly to punish schools. So after years of growing frustration, he said he decided to be a "conscientious objector" this year.

Wasserman said he originally planned to resume his protest on Thursday, the second day of testing, and through four more days of testing next week. But he said Wednesday he would likely back off and give the test after Superintendent Art Rainwater told a teacher's union official that Wasserman could be fired if the protest continued.

"I can't jeopardize health insurance for my family," said Wasserman, 36. "I want to still hold by my morals, which I feel very strongly about. But I have a family to think about."

In a statement released to The Associated Press on Wednesday evening, Rainwater noted the district was required by state law to fulfill the federal requirement.

"It is part of every teacher's duty to administer the test," he said. "Any failure to fulfill this required duty would be considered insubordination and subject to disciplinary action, up to and including termination."

Robert Schaeffer, a spokesman for FairTest, a national group that opposes the overuse of standardized tests, said he was unaware of any other teachers who have refused to administer tests to protest No Child Left Behind. Other teachers have boycotted high-stakes state tests used for graduation or promotion, he said.

"It is an act of moral courage, and it certainly helps call attention to the widespread misuse of standardized testing," he said. "The natural bureaucratic reaction is always to threaten people with severe sanctions. That's why people have to have the moral fiber to put themselves at risk."

Wasserman, who has taught in the district for six years, said he is being treated unfairly because his colleagues at Sennett Middle School could administer the test without him.

