eduwonkette_header_515.jpg

Through the lens of social science, eduwonkette takes a serious, if sometimes irreverent, look at some of the most contentious education policy debates. (Find eduwonkette's complete archives prior to Jan. 6, 2008 here.)

July 4, 2008

Happy Independence Day!

f_betsy.gif

Happy Independence Day! Today is an opportunity to reflect on the ideals and principles that founded this great country, and to renew our commitment to uphold and support them when we see signs of erosion and compromise.

What does it mean to be a citizen in the modern world? In the coming year, the International Association for the Evaluation of Educational Achievement (IEA) will be conducting the International Civic and Citizenship Education Study (ICCS), a study of eighth-graders’ knowledge about and attitudes towards civics and citizenship in 39 countries. Conspicuously missing from the list is the U.S.A. It’s disappointing that the National Center for Education Statistics is not supporting U.S. participation in the study.

The U.S. did participate in the IEA’s 1999 study of civic education among ninth-graders in 28 countries. Students were asked about fundamental concepts of democracy and citizenship that were not specific to the workings of particular governments, especially their attitudes and actions. An example of a content item was a multiple-choice item with the stem “In democratic countries what is the function of having more than one political party?” An example of a skills item was a multiple-choice item presenting a brief political advertisement and asking which group mentioned in the ad had probably issued it.

The U.S. did better than the international average on a test of civic knowledge (which combined civic content and civic skills), and led the world on civic skills. But before we pat ourselves on the back too much, the data also showed that civic knowledge, content and skills were distributed unequally across U.S. ninth-graders, with much higher levels among white and Asian youth than Black and Hispanic youth, and higher levels among ninth-graders with highly-educated parents than among students whose parents did not go very far through school. Black youth scored .85 to .90 standard deviations lower, and Hispanic youth about .70 standard deviations lower, than whites on civic knowledge and its components. Students with at least one parent who had only completed high school scored about .80 standard deviations lower on civic knowledge than students with at least one parent who had completed a bachelor’s degree.

It’s tempting to look at these gaps and infer that they simply reflect the large average differences in academic performance among racial/ethnic and social class groups observed among American youth more generally. But I don’t think that we can count on No Child Left Behind to increase the civic knowledge of our most disadvantaged youth. There’s something very pernicious about a system that fails to educate its most vulnerable members about the very institutions of democracy that were designed to enable them to become productive citizens.

eduwonkette will be back next week. Thanks for the opportunity to post, e.

July 2, 2008

Cool People You Should Know: Mike Rose

shapeimage_2.jpg

We’ve spent a lot of time here lately talking about tests and test scores. You can’t ignore ‘em – they’re a ubiquitous part of the educational landscape in the U.S., and their salience has only increased in the NCLB era. To the extent that they are able to tell us about students’ mastery of core academic skills, they can be a useful tool to guide education policy and practice.
But some of the importance of testing comes from the way we use tests for sorting, selecting and certifying individuals, and not from the intrinsic qualities that the tests are seeking to measure. I would never say that literacy and numeracy skills are unimportant; but there’s a lot more to being a competent adult and citizen than high test scores.

This point is driven home by a cool person you should know: Mike Rose, Professor of Social Research Methodology at UCLA. The son of working-class Italian immigrants, Mike was classified as a remedial student, until some perceptive high school teachers figured out he had the potential to go to college. He spent much of his early career teaching literacy skills to students at various levels of schooling who had not been well-prepared. His autobiographical book Lives on the Boundary is an inspiring account of the power of good teaching to engage struggling students in the study of written English.

In an article entitled “In the Basement of the Ivory Tower” published in the June, 2008 issue of Atlantic Monthly, “Professor X,” an adjunct English teacher at a private college and a community college, turned a lot of heads with his palpable resignation at teaching students who he believes don’t belong in college and are destined to fail. Rose, in his recent foray into blogging, considers how he might teach James Joyce’s short story “Araby,” which “Professor X” views as outside the ken of his students, to a group of underprepared students. I’ve never taught English, but it’s a tour de force.

Rose’s most recent monograph is entitled The Mind at Work: Valuing the Intelligence of the American Worker. Through portraits of blue-collar workers such as carpenters, waitresses and hair stylists, he persuades us that there is a tremendous amount of mental work involved in manual labor. People don’t live their lives taking tests; they live them engaging with tools, symbols, and, most importantly, with other people. Mike Rose calls for a conception of intelligence that acknowledges school, to be sure, but also the workplace and the public sphere of our democracy.

On his blog, Rose writes, “If I had to sum up the philosophical thread that runs through my work, it would be this: A deep belief in the ability of the common person, a commitment to educational, occupational, and cultural opportunity to develop that ability, and an affirmation of public institutions and the public sphere as vehicles for nurturing and expressing that ability.” As we approach the 4th of July holiday, it’s hard to imagine a philosophy more consistent with the founding ideals of this country.

Educational Testing: A Brief Glossary

While you’re waiting for Dan Koretz’ book on testing to arrive – I think eduwonkette and I should get some kind of consideration for shilling for this book so often here – here’s a brief skoolboy’s-eye view on testing. Actual psychometricians are welcome to correct what I have to say.

Tests are typically designed to compare the performance of students (whether as individuals, or as members of a group) either to an external standard for performance or to one another. Tests that compare students to an external standard are called criterion-referenced tests; those that compare students to one another are called norm-referenced tests. Even though criterion-referenced tests are intended to hold students’ performance up to an external standard, there is often a strong temptation to compare the performance of individual students and groups of students on such tests, as if they were norm-referenced.

A typical standardized test of academic performance will have a series of items to which students respond, generally either in a multiple-choice or constructed response format, which means that students are constructing a response to the item. There’s usually only one right answer to a multiple-choice item, whereas constructed-response items may be scored so that students get partial credit if they demonstrate partial mastery of the skill or competency that the item is intended to represent. For any test-taker, we can add up the number of right answers, plus the scores on the constructed-response items, to derive the student’s raw score on the test. A test with 45 multiple-choice items would have raw scores ranging from 0 to 45.

For individual test items, we can look at the proportion of test-takers who answered the item correctly, which is referred to as the item difficulty or p-value, which has nothing to do with the p-values used in tests of statistical significance, but rather the proportion (p) of examinees who got the item right. Some test items are more difficult than others, and hence items will have varying p-values.

Raw scores are rarely interpretable, in part because they are a function of the difficulty of the items. For this reason, they are typically transformed into scale scores, which are designed to generate a score that will mean the same thing from one version of a test to the next, or from one year to the next. The scale for scale scores is arbitrary; the SAT is reported on a scale ranging from 200 to 800, whereas the NAEP scale ranges from 0 to 500.

The process of transforming raw scores into scale scores is computationally intensive, generally using a technique known as Item Response Theory (IRT), which simultaneously estimates the difficulty of an item, how well the item discriminates between high and lower performers, and the performance of the examinee. An examinee who successfully answers highly difficult items that discriminate between high and low performers will be judged to have more ability, and hence a higher scale score, than an examinee who gets the difficult items wrong.

There’s no one right way to transform raw scores into scale scores, and it’s always a process of estimation, which is sometimes obscured by the fact that scores are reported as definite quantities. (A little skoolboy editorializing here…) The expansion of testing hastened by NCLB has placed a lot of pressure on states, and their testing contractors, to construct scale scores for a test that represent the same level of performance from one year to the next (a process known as test equating). Much of this is done under great time pressure, and shielded from public view. The process is complicated by the fact that states typically don’t want to release the actual test items they use, because then they can’t use them in subsequent assessments as anchor items that are common across different forms of a test, since students’ performance on such items could change due to practice. Some tests are vertically equated, which means that a given score on the fourth-grade version of a test represents the same level of performance as that same score on the fifth-grade version of the test. In a vertically-equated test, if the average scale score is the same for fourth-graders as it is for fifth-graders, we’d infer that the fifth-graders haven’t learned anything during fifth-grade.

Proficiency scores represent expert judgments about what level of scale score performance should describe a student as proficient or not proficient at the underlying skill or competency that the test is measuring. For example, NAEP defines three levels of proficiency for each subject at each of the grades tested (4th, 8th and 12th): basic, proficient, and advanced. Cut scores divide the scale scores into categories that represent these proficiency levels, with students classified as below basic, basic, proficient, or advanced. These proficiency scores do not distinguish variations in students’ performance within the category; one student could be really, really advanced and another just advanced, and whereas a scale score would record that difference, a proficiency score would simply classify both students as advanced. The fact that proficiency levels are determined by expert judgment, and not by the properties of the test itself, means that they are arbitrary; the level of performance designated as proficient on NAEP may not correspond to the level of performance designated as proficient on an NCLB-mandated state test. Many researchers (including Dan Koretz, eduwonkette, and me) are concerned that the focus on proficiency demanded by NCLB accountability policies has the unintended consequence of concentrating the attention of school leaders and practitioners on a narrow range of the test-score distribution, right around the cut score for the category of “proficient,” to the detriment of students who are either well below or well above that threshold. Such a focus is a political judgment, not a psychometric one, and there are arguments both for and against it.

I'll update this as more knowledgeable readers weigh in. If experts in measurement were to judge proficiency thresholds for knowledge about testing, I'd probably be classified as basic; Dan Koretz is definitely advanced. For a lively and readable treatment of these kinds of issues, get his book!

July 1, 2008

An Immodest Proposal

spiffboy2-thumb.jpg

This year’s statewide fourth-grade math exam administered in New York State -- the one with the remarkably high gains -- contained the following item:

“Janice bought a notebook for $3.75 and a pencil for $0.47. She gave the cashier $5.00. How much money did Janice receive in change?”

The item might have looked a little familiar to fourth-grade teachers. In 2007, a similar item appeared:

“Tony bought art supplies that cost $19.31. He gave $20.00 to the cashier. How much money did Tony receive in change?”

And in 2006, an item read:

“Mr. Marvin spent $54.10 on pants and shirts. He gave the cashier $60.00. How much money should Mr. Marvin receive in change?”

Other similarities abound. In 2008, an item read:

During the year, one thousand eight hundred four books were checked out of the school library. What is another way to write this number?

A. 184
B. 1,084
C. 1,804
D. 1,840

There was an uncanny resemblance to an item on the 2007 test:

The number of people who live in Goodwin Falls is three thousand nine hundred eight. What is another way to write the same number?

A. 398
B. 3,098
C. 3,908
D. 3,980

To be sure, the test-takers in 2008 still had to answer these questions correctly to get credit for them. But the similarity in item formats across the years gives some credence to concerns that scores are inflated.

Dan Koretz discusses the problem of score inflation in his excellent new book, Measuring Up: What Educational Testing Really Tells Us. One source of the problem, he explains, is that all tests sample the subject-matter domains that they are supposed to tap. If the same kind of item shows up repeatedly on the test from one year to the next, teachers and administrators can focus on this restricted set of test item types, and neglect other item types that are still part of the domain that the test is intended to represent.

The National Assessment of Educational Progress (NAEP) is sometimes referred to as the “gold standard” for standardized tests, and claims about test score inflation in a test, such as an NCLB-mandated state test, are often grounded in a discrepancy between NAEP and the other test either in the level of or trend in performance . The characterization of NAEP as the “gold standard” reflects the fact that it is designed to measure a much larger sample of student performance in a domain than is the typical state test. No individual child takes all of the items in the NAEP item pool; instead, students complete test booklets with blocks of items. In the 2000 12th-grade mathematics NAEP, for example, students completed one of 26 different test booklets, each containing three 15-minute blocks out of a total of 13 different blocks of mathematics items. Each student was asked to complete about 40 items across the domains of number sense, properties, and operations; measurement; geometry and spatial sense; data analysis, statistics and probability; and algebra and functions.

Overall, enough students respond to all of the items in the NAEP item pool to be able to measure how well the population of students in a state (or large urban district) is doing. But NAEP is not designed to yield scores for individual students, because no student responds to enough items to yield a reasonably precise measure of performance.

With tongue firmly in cheek, skoolboy offers the following solution to test score inflation: more testing. Imagine if students completed the entire pool of NAEP items (or some other broad pool of items assessing performance in a domain), instead of the relatively restricted sample of items used in most state-level testing programs. If students were assessed on a broad array of items tapping subject matter competence, teachers and administrators would not be able to concentrate their attentions on a subset of item types, and hence would not be able to artificially raise students’ scores relative to their true learning of the subject. Sure, the burden of testing would increase; we'd need to invest in better and more expensive tests; and increased testing wouldn't solve the incentive problems that high stakes create.

More testing. An idea whose time has come?

Nah.

June 30, 2008

Inspiration and Perspiration

spiffboy2-thumb.jpg

Graduations are sacred events in American society. They mark an important transition, and graduates and their loved ones are justifiably proud of their accomplishments. For this reason, it’s a very tricky thing to comment on news stories connected to graduations. One doesn’t want to appear to be denigrating the achievements of the graduating students, many who have overcome substantial odds to obtain a diploma.

Over the past week, Joel Klein, Chancellor of the New York City Public Schools, has been making the rounds at the graduation ceremonies of some of the small high schools in NYC. Regular readers of this blog know that eduwonkette has been sharply critical of some of the “turnaround” myths constructed about these small schools, pointing out that they enrolled students who were better off academically than the students in the large high schools they replaced. At my urging, she held off on posting about the Chancellor’s e-mail to teachers about the graduation ceremonies at Bronx Lab School, one of the small schools which replaced the larger Evander Childs high school, about which she has posted repeatedly.

Jenny Medina files a story in today’s New York Times on the graduation at the Urban Assembly School for Law and Justice in Brooklyn. Much of the piece describes the extraordinary time and effort put in by the staff in order to achieve a graduation rate of 93% among the senior class. The principal, who is leaving for another position, describes herself as “exhausted,” and expressed concern that her staff could not maintain the intensity required to do their jobs well.

”You are taking a bunch of hyper, type A perfectionist people and giving them a herculean task,” she said. “People have to work much too hard to do what we are doing. People cannot work at this level all their lives and nobody is prepared to do something at a level of mediocrity.”

Ms. Medina writes that the Chancellor “seemed unconcerned that so many of the teachers at small schools were working such long hours.”

”'When people are part of the world of changing things for children, they don’t view it as work,' he said, pointing to members of his own staff who log 14-hour days.”

An uncharitable critic (that would be me) might note that one of the reasons that the Chancellor’s staff must work 14-hour days is to clean up after his many missteps and mistakes. Such a critic might also point out that the average salary of the members of the Chancellor’s staff is $113,000, whereas the average salary among the teachers at the Urban Assembly school for FY 07 was $49,000.

But let’s take the Chancellor at his word. If you’re changing the world for kids, why would only 14 hours a day be enough? Why not 19 hours a day? Don’t the Chancellor and his staff really care about changing things for children?

We need to disrupt this ridiculous myth that expects superhuman effort from educators in order to achieve success for kids. Almost all of the teachers I know work very hard, and struggle to maintain a balance between their professional responsibilities to the children they teach and building and maintaining a life outside of their work. We don’t need cartoon-like superhero educators; we need a system that supports teachers to work hard and honestly at their craft, without the risk of burnout after a couple of years.

How Much Math Does a Teacher Need to Know to Teach Math?

spiffboy2-thumb.jpg

I once asked a colleague if he’d read a particular book. “Read it?” he replied incredulously. “I haven’t even taught it!” A former college English professor, he came by the joke honestly. The first time I taught a course that I had never taken myself, I acknowledged the absurdity, at least to myself. I stayed about a week ahead of my students. Out-of-field teaching? Not exactly. I was teaching a course that was in my field, but outside of my immediate area of expertise. The teaching assignment was justified on the grounds that, as a Ph.D.-holder, I was deeply grounded in the core theoretical perspectives and research traditions in my discipline, and that I could therefore pick up the literature in a subfield quickly and accurately, and teach that literature competently. (At the time, no one was concerned with pedagogical content knowledge, the idea that there is practical knowledge of how to teach a subject that differs from mastery of the subject itself.)

Last week, the National Council on Teacher Quality released a report on the mathematics preparation of elementary school teachers who teach mathematics. The report indicts education schools for failing to select and prepare elementary teachers who have an adequate mastery of mathematics. Singling out algebra as a topic that is shortchanged in preparation programs, the authors offer a number of sensible recommendations for states, education schools, textbook publishers, and institutions of higher education.

The Teacher Education and Development Study in Mathematics (TEDS-M), a comparative study of how 18 countries, including the U.S., prepare mathematics teachers at the primary and lower secondary grades, is currently underway under the auspices of the International Association for the Evaluation of Educational Achievement. We’ll learn a great deal from this study that will complement the NCTQ recommendations.

It seems obvious that teachers must have knowledge of the subject matter they will actually teach. But how much more knowledge should a teacher have than what she or he is seeking to assist students in learning? The case of secondary school mathematics is instructive. Is it enough for a high school trigonometry teacher to know trigonometry cold – but not, say, real analysis, or ordinary differential equations?

In the US, many states have content specialty tests that prospective teachers must pass prior to assuming full-time teaching positions; presumably these tests tell us something about the mathematical content that states think is important for teachers to master. The four-hour Massachusetts test covers number sense and operations; pattern relations, and algebra; geometry and measurement; data analysis, statistics, and probability; trigonometry, calculus, and discrete mathematics; and integration of knowledge and understanding. Approximately 23% of the test is devoted to patterns, relations, and algebra, and there are 100 multiple-choice items and two constructed-response items. From tests such as these, we can infer that some states do not demand that high school math teachers have an extensive understanding of the discipline of mathematics.

One of the reasons I was unhappy with much of the press reporting on the Urban Institute’s study of Teach for America teachers’ effects on end-of-course tests in Algebra I, Algebra II, and Geometry (among other subjects) in North Carolina is that it shifted the locus of policy discussion to whether to expand alternate routes to teacher certification, without addressing the more challenging questions about what knowledge about subject matter and about how to teach it is optimal for student learning in particular subjects in high school. The reality is that even if we could count on the incremental achievement observed in the Urban Institute study, lots of other countries would still be kicking our butts in international assessments of mathematics and other subjects. I think we’d be better off examining how these countries prepare secondary math teachers – and teachers in other subjects – to see if there are approaches that we can adapt to the U.S. context. One thing that we might learn is that other countries demand much higher levels of subject matter competence from their elementary and secondary school teachers than we do.

June 29, 2008

"Independence" Day

spiffboy2-thumb.jpg

I’ll try to stay reasonably serious this week, but some things are just too ridiculous to pass up. On Friday, the New York City Department of Education (DOE) announced that it had selected the NYC Leadership Academy to provide principal training and development services. The press release proclaimed that the Leadership Academy was “chosen from among multiple bidders in a competitive procurement process.” The DOE is negotiating a five-year contract for a total of $50 million, beginning Tuesday, July 1.

Long-time followers of New York City public schooling are aware that the NYC Leadership Academy was created by the DOE in 2003, and Chancellor Joel Klein serves as a Director of the organization. (At least according to the organization’s IRS filings – its website doesn’t list him as a director.) The Leadership Academy website describes the Leadership Academy as “the centerpiece of the NYC Department of Education’s transformational strategy,” a phrase that also appears in DOE press releases, and the staff have e-mail addresses provided to employees of the DOE. The April press release announcing this extraordinary competitive procurement spent more time crowing about the Leadership Academy’s accomplishments than describing the request for proposals.

So: The DOE had a competitive bidding process to award a contract to an organization that Mayor Mike Bloomberg and Chancellor Joel Klein had created and publicly supported over the past five years. Remarkably, the report of the award indicated that there were three other bidders. I can only imagine who would seriously think they had a shot at this.

Probably the same people who think they have a shot at this. In related news, skoolboy, who has been happily married for many years, is announcing a competitive procurement for spousal services. The successful bidder will have experience attending to the needs of a partner like skoolboy. Prior joint ownership of property with skoolboy and collaborative experience raising a family a plus. The date of the bidder’s conference will be announced later.

skoolboy returns!

sunglasses-ette.jpg

I'm taking a break this week, so skoolboy is taking the wheel. If you have compliments, thoughts, news, or tips, you can reach him at skoolboy2 (at) gmail (dot) com. An early Happy 4th to everyone!

Demographer Takes On New York City's Gifted and Talented Admissions

Andrew Beveridge, the New York Times' demographer, turns his attention to New York City's gifted program in this Gotham Gazette column. Based on his estimates, here's the bottom line on the change in gifted and talented admissions in NYC:

Non-Hispanic whites and Asians almost triple their percentage, while the percent non-Hispanic black and Hispanic plunges. In short, students accepted in the Gifted and Talented program are not all representative of the students in New York City, and are less so this year than last year.

June 28, 2008

Guest Blogger Sarah Reckhow: Easy to Blame

Sarah.jpg
Sarah Reckhow taught at Frederick Douglass High School in Baltimore from 2002 to 2004 and was a Teach for America corps member. Currently, she is a Ph.D. candidate in political science at UC Berkeley. Her dissertation explores the role of national philanthropies and community organizers in urban education policymaking.

Liam Julian’s review of “Hard Times at Douglass High” boils down a complicated stew of frustration, hope, and absurdity to a singular and simplistic point—many of the teachers are “just plain bad at their jobs.” Julian does begin with a fair remark—this documentary is not a systematic assessment of No Child Left Behind. Nonetheless, the film offers a vivid portrait of common NCLB observations and enough contextual information to make Julian’s reductive reaction dubious.

NCLB is most present in the film as a looming threat with vague and rarely applied consequences, including state takeover. The filmmakers bring us in on test day—students listlessly staring at test booklets, falling asleep, staring off into space. Many students did not take the tests seriously, assuming that the tests had no consequences or feeling too indifferent to try. We also hear from faculty commenting that they are forced to find ways to accommodate failing seniors at the end of the year in order to artificially raise the graduation rate.

We meet a state observer walking the halls with the academic dean. The state observer rattles off the various actions that may be taken if Douglass does not improve. At the end of the film, we learn that the state board of education finally tried to take over Douglass during 2005-2006, but the move was blocked by the state legislature. An impending gubernatorial election between Baltimore Mayor O’Malley and Governor Ehrlich added a heavy dose of partisan politics to that debate. The film implies that Ms. Grant, the principal in the film, was removed due to the school’s low performance. In fact, she was removed due to a school athletics scandal. Nonetheless, the school was “restructured” by the district in 2006, and the administration was replaced. The NCLB accountability system, as practiced at urban schools like Douglass, tends to operate like a merry-go-round; principal turnover rates in Baltimore are very high. School leaders get on board, ride until they get dizzy and stumble off, and then new leaders come aboard.

The bulk of Julian’s column focuses on Douglass’ teachers and seems oddly divorced from policy considerations. Drawing on clips from the film, he offers arm chair criticism of discipline and teaching methods, arguing that “the staff members at Douglass aren’t cutting it.” Even if this were true, Julian draws no clear policy lessons from his conclusion. It seems unlikely that Douglass hired only ineffective teachers from an otherwise talented pool of applicants.

Though there are great teachers at Douglass like Ms. Ray (she is featured in the film, but we never go in her classroom), it is also true that there are not enough. The film offers pieces to form an explanation—vacancies that go unfilled, long term substitute teachers, and a shortage of experienced teachers. The film features a 9th grade English class; the teacher makes a difficult choice to resign midway through the year. Substitutes come in, and the class flounders. The school has also hired a number of Teach for America corps members; some continue to teach there, but many have not stayed beyond the two year commitment, including me. All of these point to a clear problem of supply—Douglass cannot hire and keep enough good teachers to meet its needs. Teachers like Ms. Ray have heart and commitment that few of us can muster for even a few years, let alone decades.

The film does not provide new criticisms of NCLB, nor will it surprise anyone that the school struggles with teacher recruitment and retention. Viewers might be more startled by taking the longer view of Frederick Douglass High School: the school was founded in 1883 and has illustrious graduates including Thurgood Marshall; more than a century later, it is segregated, marginalized, and struggling.

Yet grumbling about the teachers who work in this difficult environment is not the answer. In fact, the film offers some illuminating scenes of teaching and learning at its best, only they don’t take place in a “typical” classroom setting. These include the school’s debate team, choir, band, and music production class. The students involved in these activities display precisely the attitudes we want schools to instill—pride, enthusiasm, and curiosity. Furthermore, the students are expected to perform well and rise to the occasion. Much of the commentary on this film has focused on Douglass at its worst, but much can be learned from Douglass at its best.
eduwonkette

eduwonkette
E-mail me

The opinions expressed in eduwonkette are strictly those of the author and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Get RSS

Get eduwonkette delivered by e-mail. Enter your e-mail here:

Delivered by FeedBurner

Advertisement
Powered by
Movable Type 3.34

EW Archive