« Why We Need National Testing | Main | In Defense of Judgment »

Honest Information Is Essential

| 9 Comments

Dear Diane,

I don't want to spend too much time on testing. But a few words! One: I'm arguing that tests are a poor way to assess schooling. I'm not arguing, as Paul Hoss suggested in a recent response, that given other concerns we can't focus as much on cognitive aims with poor kids. But I do agree with him that we need to provide a lot of support to poor families above and beyond schools so that we are not "distracted" by other issues.

Diane, I don't view the word "politically" negatively. I like and believe in politics. And anytime I turn over something to "experts" I take for granted that I'm going to get a "viewpoint"—which I may agree or disagree with. There may be a "majority" opinion and several "minority" ones, but even where "votes" (politics) decide, that doesn't prove who is right. Power and money have influence; so do good arguments. Fortunately in politics decision are never final; there's always tomorrow.

The "expertise" involved in deciding what kids "should" be reading like completely baffles me. Teachers from time immemorial have thought that whoever taught the kids before they got to them wasn't doing a good job—"they oughta already know x and y". As Richard Rothstein reminds us—so it has been and always will be. So the fact that NAEP is designed by experts doesn't cut it with me. The advantage of the other kind of "normed" test is that we find out how kids actually handle various material given to them. We can then have our wish list. That reporters—and educators, too—have misreported scores is, of course, a problem! Just as they now misreport so-called criterion test scores, too. If psychometricians (test-making experts) had principles or guts they'd discipline their own field. I presume there are state-of-the art standards regarding the proper size of samples, reliability measures, and what tests can claim to predict. Bah! Wrong assumption. Standards are just for kids.

I'd support a normed national test (NAEP-like) in literacy and math given on a sampled basis every few years to assess school districts—for informational purposes. To keep systems honest. States might require individual scores for students, but the federal government should neither mandate it nor reward them for doing so. Schools might also choose to do so. Hopefully all these would be informational—not high stakes, in nature.

When it comes to other subject disciplines let them set standards and try to persuade us to follow them. In fact, Diane, academic disciplines have disputes about what body of knowledge is central and these change over time. Let lay citizens through their own methods decide whether these fit their school and approach. We barely scratch the surface of what is possible today; let us encourage experimentation.

What worries us both now are two central concerns—and having just come back from visiting teachers in Indiana, it isn't a NYC problem!

(1) Test scores are now our definition of "achievement"! Reforms are being driven by finding ways to avoid schools being declared failures. There need to be ways to "red flag" based on data without presuming the school is failing; and solutions should address broader forms of learning than test improvement.

(2) Teachers, parents, students and local communities need to be reconnected to their schools in powerful ways. For the sake of learning and democracy.

The latter is what scared me in Indiana—the extent to which teachers have been cowed into thinking they are not experts. As a result they fall back on priding themselves on dedication rather than expertise. It took a MacArthur designation as a genius before I was acknowledged as an "expert". Otherwise, I was often invited to speak in order to get a "feel" for what life was like on the ground. The Generals gave the marching orders. Of course! But when it comes to schools (at least), as with families, there is no way to second guess from afar. No two kids or communities are quite alike—even though they have much to say to each other. Creating the conditions for "trust"—with skepticism—is the best we can do, plus plenty of resources.

But honest information is essential—how else can those in the field and far from it have real conversation, dialogue, and persuade each other of anything if there is no commonly trusted data. I honestly don't know the answer to this conundrum. I'm hoping that if we can institutionalize the practice of getting multiple forms of evidence from multiple sources we'll be in better shape. In short, your tendency is to look to more centralized federal expertise and me to look to the least centralized local ones. That says something about our histories, which suggests that probably we need a balance of both.

Meanwhile, I'm pondering what we need to do to keep teachers in the field—in the here and now—as a loud, noisy obstreperous voice to speak back to the power of money, corporations, and misinformed media—and reach kids at the same time.

Deborah

P.S. It was a combination of laymen, teachers and psychometricians who convinced NY State that the data collected locally by a group of high schools was more compelling than test scores. Thus a few dozen schools got waivers from having to focus on and meet all the NY State's Regents exams. That's a good example for us to look into—a lesson for the future.

9 Comments

What worries me is that so many schools and districts view proficiency on state tests as the CEILING, rather than the floor--and that they go to such lengths as cancelling science and social studies classes in order to drill students for reading and math tests that barely touch grade-level proficiency. Whatever use these tests may have, they should be seen as the absolute MINIMUM that a school should be aiming for. When schools insist that they have to stop everything in order to get kids to pass these tests, it's a damning indictment of what the schools have doing more than what the tests are now demanding. In most cases across the country, the tests are not asking for anything very rigorous. If our schools were doing what they should be doing (and I do NOT mean teaching to the test, or drilling with multiple choice questions)--and, equally importantly, if our parents were sending children to school ready to learn--most of these tests wouldn't be seen as such a big deal.

Deborah,

You are completely correct about the effects that our current tests have on our teachers and students. Because there is no consensus about what teachers should be teaching in a particular grade, there is much confusion about what should be taught. So teachers resort to “test taking strategies” which by itself is a rather thin education.

What national standards would do would is to clarify what should be taught in each grade. Now you may be skeptical about outside “experts” deciding for a school but national standards do not have to develop this way. As an example, E.D. Hirsch developed the Core Knowledge sequence with the extensive input of experienced teachers. The content sequence completely clarifies what topics should be covered in each grade, while allowing each individual teacher to determine the best way to teach the topics.

Teaching is difficult enough as it is. Explicitly outlining what should be covered allows the teacher to focus on teaching (that is helping children understand what they need to know).

What concerns would you have if we as a nation adopted the content topics in the Core Knowledge sequence (or any other similarly structured content) and developed tests that reflected that sequence?

Erin Johnson

Deb - your exchange with Diane is breathing fresh air and good thinking into the current scene..I agree with both of you most of the time (a dilemma)and am constantly applying your thoughts, opinions and ideas to the current state of arts education...as depressed as I am about the present, "listening" to you two is reassuring because as John Goodlad said to me (and other thinkers/doers) often, "I don't mind being lonely if you are lonely, too." I've been in education and the arts for over forty years, and this is the most depressing time I can remember...Specifically, Deb: Standards in the arts, for example, were fought for, successfully, on the national, and frequently state level. I assure you there has been little if any impact on teaching and learning, locally, despite some folks' habits of reeling off the item #'s to legitimate their lesson and unit planning...I would like to throw my vote in for starting a national conversation about schooling in a democracy, respect for professional educators, and the recognition that "reforms" have failed, one after the other, especially those that try to scale up and "transfer" or generalize from one unlike situation to another. Good teaching and learning must grow and develop for every child, in every classroom, responsive to the culture, needs and indiosyncracies of different towns, cities and states. Oh how I wish the MacArthur Foundation would confer "expertise" on my 40 plus years working and writing in the field...you are fortunate with that designation.
Regards, Jane Remer

Deb - your exchange with Diane is breathing fresh air and good thinking into the current scene..I agree with both of you most of the time (a dilemma)and am constantly applying your thoughts, opinions and ideas to the current state of arts education...as depressed as I am about the present, "listening" to you two is reassuring because as John Goodlad said to me (and other thinkers/doers) often, "I don't mind being lonely if you are lonely, too." I've been in education and the arts for over forty years, and this is the most depressing time I can remember...Specifically, Deb: Standards in the arts, for example, were fought for, successfully, on the national, and frequently state level. I assure you there has been little if any impact on teaching and learning, locally, despite some folks' habits of reeling off the item #'s to legitimate their lesson and unit planning...I would like to throw my vote in for starting a national conversation about schooling in a democracy, respect for professional educators, and the recognition that "reforms" have failed, one after the other, especially those that try to scale up and "transfer" or generalize from one unlike situation to another. Good teaching and learning must grow and develop for every child, in every classroom, responsive to the culture, needs and indiosyncracies of different towns, cities and states. Oh how I wish the MacArthur Foundation would confer "expertise" on my 40 plus years working and writing in the field...you are fortunate with that designation.
Regards, Jane Remer

Honestly? The whole truth and nothing but the truth?

What if we reported proficiency using the actual judgments of teachers from standard setting? Call it the Proficiency Confidence Scale (PCS). Here is how it could work.

In a typical standard setting, teachers judge where proficiency is on a test. There is usually significant disagreement among teachers, even after multiple rounds of judgments. To make matters worse, there may be as few as 20 teachers involved in making those judgments. But the small sample and large differences are masked by taking the median judgment. What if we reported using all of the available data?

Imagine a 60-point test with a cut somewhere in the middle as determined by 20 teachers. Here are the results from the final round of standard setting.

Score Theta TCH TOT_TCH
25 -.202 1 1
26 -.186 2 3
27 -.163 1 4
28 -.120 5 9
29 -.070 3 12
30 0.00 4 16
31 0.06 2 18
32 0.24 0 18
33 0.35 1 19
34 0.46 1 20

Score is the raw point total, Theta is the IRT difficulty, TCH is the number of teachers who put proficiency at that level of difficulty, and TOT_TCH is the cumulative total of teachers who called that score or below proficient.

Instead of telling parents that Junior is "proficient" or "on grade level," you could report the data using the PCS: Junior earned a 29 and 12 out of 20 teachers would call that proficient.

On a high-stakes test, you could reassure students who were close to passing by telling that that they earned a 28 -- close, but only 9 out of 20 teachers thought that was good enough to be proficient.

NAEP could be reported with observations like, "This year, 43% of students were what 12 out of 20 teachers thought was proficient" or “basic” or whatever they choose to call it. "12% of students were what 100% of teachers thought was proficient."

Of course, people might get the idea that 20 teachers is not enough to judge such an important thing. In some cases, the number can skyrocket up to 30. What if those teachers do not represent all teachers in the state? Don't ask. People might also get the idea that proficiency is not quite as certain as they had imagined -- no more certain even if you had only one national test to determine it.

In many states, the data necessary to create a valid PCS already exist in public records. Anyone want to start the ball rolling and provide a real table or two? If the statisticians don't do it, perhaps the lawyers will.

I couldn't agree with Deborah more than in this entry, and especially her "No. 2" about connecting parents, communities, and teachers more actively to schools. A lot of schools talk about the importance of "parent involvement" and some even require parents to attend parent teacher conferences. But involvement must also include decision making about what happens in schools. It's a lot easier to do with smaller schools, of course. But it seems what Deborah is arguing for is what Michael Apple calls "thick" democracy--where lots of sources are used for information and assessment, including standardized tests, and standards (even by distant "experts"), but local actors are the primary decision makers. The "politicking" that Deborah likes so much can and must happen, at least partly, verbally--in the same room--what better way is there to teach democracy to students than to practice it ourselves?

Schooling seems to be getting increasingly centralized, and less democratic. Henig & Rich have an excellent book out called "Mayors in the Middle" about the increasing power of mayors in big cities and loss of influence of school boards, for example (although I wonder whether even school boards are a "democratic" process that never seemed to really connect communities with schools in powerful ways).

I would like to know what you think, Deborah and Diane, about how parents, community members, administrators, and teachers can be more meaningfully connected to their local schools. What are models that might be worth talking more about?

What about the model of town-hall meetings in certain New England towns, for example? In many small NE towns all members of the community are expected to come together periodically and consider the assessments of their local school (looking at test score data, longitudinal data, etc.) and deciding what kinds of actions are necessary--arguing various points of view on key issues.

That sounds like "thick" democracy to me.

At the Mission Hill School, an urban school in Boston, community members, teachers, administrators, parents, and students sit on a school board that is empowered to make key decisions about the school--including hiring and firing the principal.

High stakes standardized testing leads to "thin" democracy where local actors are taken out of the decision making equation, forcing teachers to make detrimental compromises in their teaching that lead to many unfortunate results, many of which have already been mentioned here.

I know many educators and citizens have much less trust of schools "over there" or nationwide compared to where they send their own child. The Kappan/Gallup poll has shown this year after year. Are these fears of what "those other schools are doing" well founded? Diane early in this blog mentioned a fear of racist local decision making. Is that the primary fear? But I wonder, are high stakes standardized tests anti-racist? Evidence suggests otherwise. I would like to see any evidence, or cogent argument, that local (school or small town-based) decision making leads to lower school achievement, measured in any number of ways. Given the resources of using a variety of assessments, professional teaching staffs, adequate facilities, etc. (which may come from local, state or federal sources), I am sure local decision makers can realize "high standards" for their own children and community.

What if we reported individual student scores with standard errors represented in a way human beings without Ph.D.s in statistics could understand?

Pretend that the proficiency cut score is a single, unassailable truth. At the end of the day, you have a particular test administration, a cut score, an IRT difficulty, a standard error around each score, and the number of students who earned each score point. Why don't we report the data in a way a student or parent could understand? Here is a data set with that information.

Score Status Theta SE Students
25 BASIC -.4 .02 2,000
26 BASIC -.3 .02 2,000
27 BASIC -.2 .02 3,000
28 BASIC -.1 .02 4,000
29 PROF 0.0 .02 5,000
30 PROF 0.1 .02 6,000
31 PROF 0.2 .02 6,000
32 PROF 0.3 .02 7,000
33 PROF 0.4 .02 5,000
34 PROF 0.5 .02 6,000

Technically, if the standard error at the proficient cut is .02 (see score 29), the student's true score could very easily fall anywhere within one standard error of the observed score. No statistician would dare say that a student with a score of exactly 29 is exactly a 29. At the very least, the statistician would qualify it with one standard error around that score: somewhere in the range of 27 to 31 (Theta of 0.0-.2 to Theta of 0.0+.2). But we rarely report that in a way anyone can understand.

Why don't we write on a score report something like this: “John earned a 29, Proficient. Of the 24,000 students with scores like John's, 71% were called Proficient; about 29% were called Below Proficient.”

Or, “7 out of 10 students like John were Proficient. The other 3 students were Below Proficient.”

Modern computers could easily automatically generate such statements on a student’s score report.

For Mary, who earned a 28, it might be nice to know: "Mary's score is 28, Below Proficient. Of the students with scores like hers, 45% were identified as Below Proficient; 55% were Proficient, but Mary was not one of them."

Or, we could say, "About half of the students with scores like Mary's passed and the other half failed."

This may raise some serious concerns from those who receive the test results. In many states, the confidence bands will be very, very wide, even with only one standard error. But the results and interpretation would be much closer to the whole truth.

Many otherwise thoughtful people regard all of the statistical details as irrelevant to developing strong accountability standards. They notice how one test result doesn't match another and imagine the cure is to give only one test. They see their children's own schools mislabeled as too good or too bad using proficiency-based accountability and are told that a proficiency-based growth model will fix the problem. Some people even claim that it all depends on the state or the testing contractor who builds the test; the proficiency standards are right because a group of smart people at Company X did the work -- as if Company X has a secret formula that eliminates error variance. Norm-referenced testing by a company out-of-state would surely reveal the truth, they imagine. A really good national test built by the smartest minds to be the very best test ever made… Until the average person and the policy-makers who have a sincere interest in improving public education have a better understanding of the strengths and weaknesses of testing and proficiency testing, in particular, we will not be able to build a sound accountability system.

People who understand tests and statistics should work harder to develop reporting that tells the whole truth. It is never too late to start.

hey, im fr Phil, may i join?

A few responses.
Andrew: It reminds me of an old "joke"--if my grandma had wheels, she'd be a trolley. Yes indeed, if everyone was only doing x, y and z there'd be no need for schools! It sounds--perhaps unintentionally--as though teachers and parents are the major culprit If only they'd..... So, why don't they? If it's as simple as all that. That's what has fascinated me.

Jane. California once had art standards--starting in Kindergarten. They were very scary! I go back and forth--if we don't have tests for art, we won't teach it; and if we do, what will teach won't be art! I'm with you--a MacArthur would help us all.

I like Gutless and NCLB's ideas for accountability. If we could make it complex enough we'd all sit back and laugh.

Thanks, Matthew. Yes, democracy was indeed one of the best forms of accountability invented--despite its many many faults. Mission Hill's board meetings are a fine example, and what is much needed is some deep thinking and experimenting with school boards and their work. I was glad to read in Ed Week that there's new interest in deeper research into how school boards fit into reform.

And Bob of Phil--ys, yr wlcm 2 jn.

Deb

Comments are now closed for this post.

Advertisement

Most Viewed on Education Week

Categories

Archives

Recent Comments