eduwonkette_header_515.jpg

Through the lens of social science, eduwonkette takes a serious, if sometimes irreverent, look at some of the most contentious education policy debates. (Find eduwonkette's complete archives prior to Jan. 6, 2008 here.)

Main

January 22, 2009

Wish #3: Asking More "Why?" Questions

why-blog.jpg

Earlier this month, a team of researchers at MIT and Harvard released a report contrasting the impact of charter schools, “pilot” schools, and traditional public schools on student achievement. The finding of charter school effects on achievement, using a random assignment research design, fueled the rhetoric of charter school advocates, some of whom saw the findings as a license for unlimited expansion of charter schools.

The researchers themselves were more cautious. They acknowledged that the study was not designed to discern why the effects were found. In fact, if the study had found that students in charter schools had shown less growth in achievement than students in traditional public schools, they wouldn’t have known why either.

Good public policy depends on compelling answers to “why” questions about both the observed effects and non-effects of policies and programs. And these “why” questions pertain both to the inner workings of policies and programs as well as the context in which the policies and programs are situated. Borrowing policies that have been found to be effective in one setting and expecting the same results in another setting makes sense only if we know why the policies were effective in that first setting. A research study showing that a policy or program “worked” in a particular setting doesn’t tell us that.

Our wish, then, is for asking “why?” more loudly, and earlier in the lifecycle of a policy or program. Why might achievement be higher in charter schools? Why do children learn more in smaller classes? Why are some teachers more successful in teaching low-achieving students than high-achieving students? Why don’t school expenditures have a stronger association with student outcomes? In skoolboy’s view, the real leverage in education policy comes from good answers to the “why?” questions. To paraphrase Jim March, research that addresses “why?” questions is more useful than research that addresses “what works?” questions because it has so many more applications.

One challenge posed by our wish is that the researchers who are skilled at addressing “what works?” questions are not necessarily the ones who are good at addressing “why?” questions. Even in large federal evaluations, there typically is a division of labor in which the study of implementation and context is segregated from the study of program impacts, and different research organizations or researchers are responsible for differing parts of the overall enterprise. Asking “why?” more often will require some hard thinking about research training and the infrastructure for education research in the U.S.

January 21, 2009

Wish #4: Better Alignment of Accountability Systems to School Outcomes

18madoff_190.jpg

Here’s a little thought experiment: Suppose that, in addition to adequate yearly progress in literacy and mathematics, high schools had to demonstrate progress in students’ ethical behavior. Would the graduates of Far Rockaway High School in Queens in New York City be as proficient in their treatment of others as they are in math and literacy?

Victims of Bernard Madoff’s $50 billion Ponzi scheme might wish that Far Rockaway had spent more time on the development of its students’ non-cognitive skills as their ability to read, write and figure. Of course, we cannot tell what led Madoff astray, and his experience at Far Rockaway probably had little to do with it. But the thought experiment opens the door to a wish for accountability systems in education that are better-aligned with the diverse school outcomes we think are important.

What skills do employers value in their workers? A 2008 survey of members of the Society for Human Resources Management found that human resources professionals reported that some skills and practices were more important for experienced workers in 2008 than two years before. More than a third of the respondents reported that adaptability/flexibility; critical thinking/problem-solving; leadership; professionalism/work ethic; teamwork/collaboration; and information technology application had increased in importance in the recent past.

The story is not that different for the general public. Asked to allocate a total of 100 points across eight goals of public education, a sample of adults divided them up relatively evenly: basic academic skills (19%); critical thinking (15%); social skills and work ethic (14%); physical health (12%); preparation for skilled work (11%); emotional health (11%); citizenship (10%); and the arts and literature (8%).

Why, if the public and employers think that these are the most important goals of public education, have we constructed accountability systems that focus on a narrow subset of these goals – basic proficiency in literacy and mathematics? Part of the answer is that we had an existing technology for measuring literacy and mathematics proficiency – standardized tests of academic performance.

Richard Rothstein, Rebecca Jacobsen, and Tamara Wilder, in Grading Education: Getting Accountability Right, argue that if these broad goals are important – and skoolboy thinks they are—then we should develop measures of these goals, and incorporate them into accountability systems. One of the things we’ve learned about education accountability systems that rely on rewards and punishments is that educators respond to incentives, doing what they can to avoid punishments and to achieve rewards associated with a particular pattern of outcomes. Particularly when the inducements are high-stakes, we are liable to get precisely the outcomes that are to be rewarded and punished – no more, and no less.

Literacy and mathematical proficiency are extremely important skills for schools to cultivate, and it’s appropriate that accountability systems monitor students’ literacy and math performance and provide incentives for educators to help students achieve challenging performance standards. But it’s also critically important for U.S. children and youth to prepare to assume the responsibilities of citizenship in a democracy that depends on a tacit social contract which binds us together, and we count on schools to do this and much more. Our wish is for accountability systems in education that are designed to measure and promote genuine growth and development in children and youth.

January 16, 2009

The State of "State of the City" Speeches

While we await President-Elect Obama's Inauguration speech, here's a look at the rhetoric in ten mayors' "State of the City" speeches over the past year. Can you match the mayor with the quote?

1. We're going to demonstrate how a school community comes together when you give teachers, parents and principals real authority to make decisions in the classroom. We’re going to show how the atmosphere transforms with uniforms and parent contracts, when you instill a culture of discipline and respect. We’re going to show what happens when we set tall goals and raise expectations, when we publish clear benchmarks and hold ourselves accountable. We are going show what is possible when we make our children believe they can do anything.

2. In our schools, we have decreased the achievement gap and increased learning.

3. And on this 80th anniversary of Dr. King's birth and on the eve of the inauguration of our first African-American president, we can all be proud that African American and Latino students are leading the way in the rate of improvement.

4. No one wants to see their school closed, and there was controversy, but the leaders of our schools held their ground because they knew the change would help our kids, especially those with the greatest learning gap. And a year later we are beginning to see the positive results.

5. As adults, we have a responsibility to create hope in the lives of our children.

6. We’re continuing to work with nonprofits like the Bill and Melinda Gates Foundation to double the number of Class of 2010 students graduating from college -- and triple the number for the Class of 2013.

7. Our graduation rate remains dangerously low -- and while they talk about a world class education system, our Legislature slashes nearly a billion dollars in funding for our children. We refuse to accept the growing technology gap between children who will compete in the global economy, and those who, not by their choosing, will watch the world pass them by.

8. In 2007, 55 percent of seniors graduated -- the highest percent age since 1995. This year, we are working towards a goal of 60 percent. Each year, we will be working to increase graduation rates.

9. Anyone who believes in libraries also knows their importance to a major city -- not as monuments to civic pride, but as doors to education and opportunity.

10. All over our City we are seeing educational excellence in public and private schools -- beacons of light, illuminating the way forward.

a. R.T. Rybak, Minneapolis
b, Jerry Sanders, San Diego
c. Thomas Menino, Boston
d. Tom Barrett, Milwaukee
e. Manuel Diaz, Miami
f. Antonio Villaraigosa, Los Angeles
g. Adrian Fenty, Washington, D.C.
h. Cory Booker, Newark
i. Frank Jackson, Cleveland
j. Michael Bloomberg, New York


(Answers in the comments at the end of the day.)

January 13, 2009

Lies, Damned Lies, and Bush Administration Accomplishments

largevalues.jpg

Yesterday, President George W. Bush, as part of his swan song, released a compendium entitled “Policies of the Bush Administration 2001-2009.” Not surprisingly, No Child Left Behind is the centerpiece of administration’s accomplishments in K-12 education, and the fact sheets detail the administration’s claims about progress.

Skoolboy’s favorite section is the one on Reading First. You remember Reading First, don't you? The program whose interim impact evaluation, sponsored by the Institute of Education Sciences, found no evidence of effects on reading comprehension test scores in grades 1 to 3? The text reads:

NCLB established the principle that Federal funding should be invested in programs that have rigorous research demonstrating their effectiveness. Reading First has provided more than $6 billion to fund scientifically-based instructional programs, valid and reliable diagnostic assessments, and professional development for teachers. State data shows that Reading First students from nearly every grade and subgroup have made impressive gains in reading proficiency. For first grade, 44 of 50 States reported increases in the percentage of students proficient in reading comprehension; for second grade, 39 of 52 States reported improvement; and for third grade, 27 of 35 States reported improvement.

“52 States”? Maybe we should have invested a bit more in Math First.

Okay, cheap shot, there are 54 state education agencies (SEA’s) that received funds under Reading First, including American Samoa, the Bureau of Indian Education, the District of Columbia, and the Virgin Islands.

But seriously: How did “44 of 50 States” report increases in the percentage of first grade students proficient in reading comprehension when, according to the American Institutes for Research compilation of Reading First Annual Performance Reports from 2003-2007, only 40 of the 54 SEA’s even reported reading comprehension proficiency for first grade students for two or more years?

January 8, 2009

The Skillful Publicist

Snake_Oil-88g4bd-d.jpg

When a school district makes a big to-do about the use of "evidence to make decisions about how to help students learn, where to put our resources and how to manage our staff," is it fair to criticize it for implementing unproven and experimental programs? skoolboy supports modest experimental innovations, as long as they are evaluated carefully before expansion to a scale that would encompass an entire population. After all, students and teachers aren’t guinea pigs. The fact that schools are failing is not a justification to do any old thing, on the assumption that any innovation will be better than the status quo.

Speaking of any old thing … The Washington Post reported earlier this week that the Washington, DC Public Schools are abandoning support for National Board certification as a means of teacher professional development, shifting instead to, among other things, the Skillful Teacher program marketed by Research for Better Teaching, Inc. (RBT), founded by Jon Saphier in 1979. The program consists of a series of six one-day workshops; you can buy the book, which approaches 600 pages, for $70 at Amazon.

You might think that an organization that’s been peddling professional development for 30 years, with a book in its sixth edition, would have some compelling evidence of the effects of the jewel in its crown on teaching practice and student learning. If professional development doesn’t result in improvements in teaching and learning, what’s the point? But the RBT website doesn’t point to much evidence, emphasizing testimonials and brief "stories." skoolboy’s favorite is the account of Fairfax County, VA’s implementation, "Making Teacher Evaluation Substantive and Growth-Oriented." "In the first year of implementation, 162 teachers were dismissed or resigned compared to single digits the previous year," the website crows. Now, is that growth, or is it development? Sometimes skoolboy gets confused by the difference.

The blurb for Montgomery County, MD, on which the DC plan is based, touts an independent evaluation by Dr. Julia Koppich of the program’s effects on teachers and administrators, and claims that "in 2001 grade 2 students scored in the 68th percentile in math computation. In 2003 scores were in the 83rd percentile." The inference is that the Skillful Teacher program produced this change, but any reader of this blog knows that demonstrating program impact requires a careful design to rule out alternative explanations of changes over time in outcomes. (It also helps to have a good theory of how a program might plausibly produce particular changes.)

Montgomery County’s own internal evaluations of Studying Skillful Teaching aren’t as positive. Although 3rd grade teachers and Alegebra I teachers who took the course are "more likely to teach mastery lessons and less likely to miss opportunities to positively impact student learning" than comparable teachers who did not (Merchlinsky, 2006, 2007), there were no effects on elementary reading and math test scores or algebra performance.

And lest anyone think that people who live in glass houses shouldn’t throw stones, The Skillful Teacher isn’t the only professional development initiative that skoolboy, who teaches at the home of the Teachers College Reading and Writing Project, thinks could benefit from more rigorous evaluations before scaling up.


January 6, 2009

LDH, IES and the Reign of Frogs

rain.gif

Okay, barring the bad karma that seems to hang over the state of Illinois, Arne Duncan is now firmly ensconced as President Barack Obama’s nominee as Secretary of Education, thereby forestalling the Apocalypse predicted by the detractors of Linda Darling-Hammond. But eternal vigilance is the price of freedom, or something like that, and the fears have shifted to the future of the Institute of Education Sciences (IES), the federal government’s arm for education research and evaluation. Founding Director Grover “Russ” Whitehurst has moved on from his six-year term, and there are researchers lying awake at night in fear that President Obama might choose LDH, a one-time colleague of skoolboy at Teachers College, Columbia University, as his successor. I’m not entirely sure what they are afraid of, but clearly the dismantling of the research infrastructure built up over the past six years is near the top of the list.

If Linda Darling-Hammond is to serve in the Obama administration, skoolboy is not sure that the post of Director of IES would be the best use of her talents. But even if she were to be appointed to this post, I don’t think it would start raining frogs. Checks and balances on the actions of the IES Director abound, including the National Board for Education Sciences, which has the responsibility of approving the research priorities proposed by IES. Members of the National Board for Education Sciences are nominated by the President and confirmed by the Senate. The current chair of this board is Eric Hanushek, Senior Fellow at the Hoover Institution at Stanford University, and the vice-chair is Jon Baron of the Coalition of Evidence-Based Policy. And let’s not understate the difficulty of rapid change in a complex bureaucracy where most of the work is done by career civil servants, not by political appointees. (At least, that was my experience many years ago in a prior incarnation of IES. Working under Checker Finn, by the by.)

But the real problem is not with the organization of IES, which has made great strides in setting out criteria for fundable research, and implementing a rigorous peer review system that pushes ideological predispositions to the sidelines. (I serve on one of the IES standing review panels, along with a couple of hundred other researchers.) Rather, the problem is the failure of the education research community to develop a stockpile of effective educational interventions shown to work in multiple contexts. A quick overview of the kind of research that IES has been funding over the past four years tells the tale.

IES has adopted a progressive strategy, modeled on clinical trials in medical research, for funding education research to improve academic outcomes. Identification studies are intended to use existing data to identify existing programs and practices that are associated with better academic outcomes. A successful identification study will lead to a development project, in which a new education intervention (e.g., a new curriculum, a new instructional approach or program) is developed, and some preliminary data on effectiveness are gathered. Promising interventions are then evaluated in local settings using rigorous experimental and quasi-experimental methods designed to discern the efficacy of the intervention in those settings. Finally, those interventions demonstrating practically and statistically significant impacts on participants are “scaled up” – implemented more broadly in multiple settings, with multiple groups of participants, and without the direct involvement of the intervention developers in the replication sites.

Of the 275 regular education and special education research grants awarded by IES in research competitions between 2004 and 2008, only 7 – less than 3% – were scale-up projects, and an additional 26% were efficacy projects. The vast majority were development and identification projects that lacked prior evidence of program impact. Such projects are the raw material that might eventually lead to interventions that work at scale; but many will not, and even for the promising interventions, it may be many years before we are confident that they work as intended. That will be true no matter who inherits the IES Directorship.

December 24, 2008

Survivor: The TFA Edition, II

sn-cl-juggling.gif

Yesterday, I wrote about Morgaen Donaldson’s research on the survival rates of three cohorts of Teach for America teachers in their initial placement schools and in teaching overall. Today, I’ll describe one of her analyses of why TFA teachers leave their schools, focusing on the complexity of the teaching assignment and the corps member’s academic preparation for the subject(s) that she or he taught.

For this analysis, a complex teaching assignment for an elementary school teacher is one in which the teacher teaches more than one grade in a given year. Similarly, a complex teaching assignment for a secondary teacher is one in which the teacher was assigned to teach more than one subject in a given year. Many TFA recruits had complex teaching assignments during the years of observation. Between 16 and 20% of elementary TFA teachers were assigned to teach more than one grade in a given year, and 35% to 50% of secondary TFA teachers were assigned to teach more than one subject in a given year. (Note that this is different than teaching one grade in 2003 and a different grade in 2004, or one subject in 2004 and a different subject in 2005. These too might make teaching more complicated, but it’s across years rather than within them.)

In the 2000, 2001 and 2002 TFA cohorts, the vast majority of corps members majored in the social sciences or humanities in college. 52% were social science majors, 20% were English majors, 3% majored in the arts, and 6% majored in a foreign language. In contrast, about 5% were math majors, and 15% majored in science, computer science or engineering. Just 2% were education majors, and 4% majored in other subjects. (I think the numbers can exceed 100% due to double majors, but the text isn’t entirely clear on this.)

But their teaching assignments often differed dramatically from their formal academic preparation. One-half to three-quarters of the TFA recruits teaching secondary math were not math majors, and 38% to 50% of science teachers lacked a science major. Even in social studies, 16% to 31% of the TFA teachers were teaching out of their major field. Donaldson reports that out-of-field teaching diminished the longer a TFA teacher stayed in teaching.

Among elementary TFA teachers, a multiple-grade assignment increased the odds of leaving the initial school during or at the end of the first year of teaching by a factor of 3.29 (a probability of 19.1% for multiple-grade teachers, and 6.7% for single-grade teachers.) For the most part, this type of complexity did not influence retention in subsequent years, with the singular finding that in year 4, multi-grade teachers were significantly less likely to leave their initial schools than single-grade teachers. Many of the multi-grade teachers leaving their initial placement school in the first year transferred to another school, but multi-grade teachers also were more likely to leave teaching altogether in the first year than single-grade teachers.

At the secondary level, TFA recruits teaching multiple subjects were more likely than single-subject teachers to leave their initial placement schools and the field of teaching altogether in their first year of teaching. Beyond this first year, however, there were no significant differences in the likelihood of leaving the initial placement school, but multiple-subject teachers had a greater chance of leaving teaching altogether.

The out-of-field teaching story is complicated, with TFA math teachers teaching out of field more likely to leave their initial placement schools and the occupation of teaching altogether, and social studies teachers teaching out of field more likely to leave teaching. Oddly, science teachers lacking a science major were less likely to leave teaching than science teachers with a science major.

These patterns suggest that, at least in the years 2000, 2001 and 2002, TFA teachers often faced very complex teaching assignments for which they were not well-prepared academically, and the complexity of these assignments heightened the risk of leaving the initial placement school or of leaving teaching altogether. As I noted yesterday, there’s no comparison group, so we don’t know if novice teachers in these schools arriving via the traditional route had similarly complex assignments. Nor do we know if this pattern holds for more recent cohorts of TFA recruits, as there have been six cohorts since the three that Morgaen Donaldson studied.

One thing seems clear, however. If we want novice teachers to stay in their initial schools and to stay in teaching, they need adequate support as they learn their craft in the first years of teaching. Asking teachers to teach multiple grades, multiple subjects and/or subjects out of their college major fields is a peculiar way of supporting them.

A program note: We're going to take a break here for the next 10 days or so. eduwonkette and I wish you happy holidays!

December 23, 2008

Survivor: The TFA Edition

skoolboy remains fascinated by the way in which Teach for America, a program serving perhaps 3% of the students in the districts in which it operates, can seem like the tail wagging the dog. Like eduwonkette, I see many virtues to the program, but do not view it as a solution to the nation's challenge of developing a corps of skilled career teachers to serve our children and youth.

TFA recruits make a two-year commitment to teaching in a high-needs school, and the limited nature of this commitment is a recurring source of concern. If TFA recruits stay just two years and then leave, then the schools they serve face a revolving door of teachers shuffling in and out. TFA, for its part, cites recent evidence that TFA recruits are at least as effective in the classroom as other novice teachers. Moreover, TFA champions the enduring value of having its recruits see the challenges facing high-needs schools, if only for a few years, and claims that many recruits stay in the field of education beyond the two-year commitment.

There’s some new evidence on this latter point, emerging in the doctoral dissertation research of Morgaen Donaldson, formerly with Harvard’s Project on the Next Generation of Teachers, and now an Assistant Professor in the School of Education at the University of Connecticut. Donaldson surveyed the 2000, 2001 and 2002 cohorts of TFA recruits, obtaining 2029 responses, for a 62% response rate. Focusing on voluntary departures (approximately 16% in the sample were involuntary), she modeled the likelihood of staying in the initial placement school over time, as well as the likelihood of transferring to another school or leaving teaching altogether.

The charts below are from fitted hazard models that describe the cumulative probability of "survival" in the initial placement school across years, as well as the probability of voluntarily resigning from teaching for the first time. The first chart shows that about 90% of TFA recruits (voluntarily) remain in the initial placement school for a second year, and about 44% stay for a third year. These figures decline steadily over time, with about 22% staying in the initial placement school for a fourth year, 15% for a fifth year, and 9% for a sixth year.

TFA-initial.JPG

The probability of voluntarily staying in the teaching profession over time is higher than the likelihood of staying in the initial placement school, since some TFA recruits, like teachers in general, transfer to other schools. The fitted models suggest that about 94% of TFA recruits remain in teaching for a second year, and 60% teach for a third year. 44% remain in teaching for a fourth year, 35% for a fifth year, and 29% for a sixth year.

TFA-total.JPG

It’s difficult to know whether to think of these rates of persisting in the initial school placement or in teaching at large are high or low. As usual, the question is, compared to what? TFA recruits are placed in schools that are claimed to be "hard to staff," and they may be challenging places to work, regardless of the route that brought the teachers to such schools. If the attrition rates for other novice teachers in these schools are just as high as those observed for TFA recruits, it’s harder to argue that TFA is exacerbating the problem of building a stable, high-quality teaching force in high-needs schools. Donaldson’s study doesn’t shed any light on this issue.

I’ll have a bit more to say about Morgaen Donaldson’s research on how working conditions affect the persistence of TFA recruits in their initial schools tomorrow.

December 22, 2008

Slow News Day

arne.jpg

skoolboy still has nothing substantive to say about Arne Duncan. But he's pleased to note that Duncan's a member of the tribe: his B.A. from Harvard is in sociology. Duncan took a year off from school to write a senior thesis on life in Kenwood, the south side Chicago neighborhood in which his mother Sue had founded an after-school program in 1961. Duncan's 123-page thesis, entitled "The values, aspirations and opportunities of the urban underclass," was read and praised by William Julius Wilson, among the most eminent urban sociologists of our time.

Duncan's appointment will vault him into the list of prominent Americans who majored in sociology. It's not a long list! Ronald Reagan double-majored in economics and sociology, and Rev. Martin Luther King, Jr. was a sociology major. So too Rev. Jesse Jackson, novelist Saul Bellow, and a number of other prominent civil rights leaders and members of Congress. And Michelle Obama majored in sociology at Princeton.

But before too long, we're into B-list celebrities and leaders: Dr. Ruth, Regis Philbin, a smattering of NFL and NBA stars, Robin Williams, and a couple of Canadians: Dan Aykroyd and Late Show bandleader Paul Shaffer.

Oh well. At least there aren't any well-known crooked sociologists. Rod Blagojevich was a history major at Northwestern, and as for Bernie Madoff? Beats me. Nobody goes into sociology for the money.

December 17, 2008

NYC's Trojan Horse

trojan%20horse.jpg
skoolboy has absolutely nothing of substance to say about Education Secretary nominee Arne Duncan, whom he has met exactly once. But he continues to mouth off about New York City's Teacher Data Reports, the NYC Department of Education's version of value-added assessment. Which are not to be used to evaluate teacher performance. But rather for instructional improvement. Excuse me, skoolboy has something in his eye.

It's hard not to view these Teacher Data Reports as a Trojan Horse. Just how is a tool that is designed for capacity-sorting supposed to function for capacity-building? After all, a teacher value-added measure might tell us something useful about which teachers are more or less successful in raising their students' test scores, but it tells us nothing about the specific instructional practices that account for their relative success.

How are Teacher Data Reports supposed to improve instruction? In her videotaped comments to teachers, Amy McIntosh, the Chief Talent Officer at NYC's Department of Education, says, "These reports will provide information that will help teachers and school leaders gain insights about important aspects of a teacher's practice ... Whether individual teachers have a greater influence on the learning of some groups of students than on others ... Finally, we can see what teachers might benefit from development focused on, say, the needs of English language learners, and which teachers might be best positioned to lead that kind of professional development ... We also think they will ... help you think about how you can share the techniques you use with your colleagues in your school or across the city."

Hmm. So the specific strategies for improving teaching practice are what, exactly? Having more successful teachers lead the professional development of less successful teachers? Expert practitioners don't always make expert coaches. Hall-of-Fame pro basketball player Isiah Thomas--unquestioned as one of the best point guards of all time--was a mediocre coach for the Indiana Pacers and New York Knicks.

Here's why. Teaching is an extraordinarily complex activity, with teachers making thousands of decisions in the course of their work. Successful teachers make many good decisions and some bad decisions, whereas less successful teachers make many bad decisions and some good decisions. But the capacity to reflect on one's practice and figure out which of those decisions are good and which are bad is exceedingly rare, as is the capacity to share this knowledge with others. In the absence of this reflective capacity, we're all prone to attribute our successes and failures to our pet theories, which may or may not be correct. A Teacher Data Report that provides reassurance that a teacher is successful will only solidify and reinforce a personal folk theory about the reasons for that success.

Yet the Teacher Data Report provides no evidence whatsoever about why a teacher is successful--the many daily practices that promote student learning. And if a teacher's personal theory is inaccurate, then sharing it with others will not improve instruction, nor student achievement. It could even make things worse, focusing attention on ineffective practices. A tool like the Teacher Data Report that claims to be useful for increasing teachers' capacity to teach students effectively, but instead is only useful for ranking teachers on their effectiveness, is a modern-day Trojan Horse.

December 15, 2008

Don't Think about Elephants

elephant-klein.jpg

"Don’t think about elephants," skoolboy’s father used to joke, long before George Lakoff’s manifesto with a similar name. The joke, of course, is that by trying not to think about elephants, all that you can think about is elephants. The harder I tried not to think about elephants, the more I thought about them.

The New York City Department of Education has its own variation. This month, the DOE is sending Teacher Data Reports, which purport to estimate the effect of individual teachers in grades 4-8 on students’ test scores, to school principals, who will then distribute the reports to their teachers after the principals have been trained. "The Teacher Data Reports are not to be used for evaluation purposes," wrote Chancellor Joel Klein and UFT President Randi Weingarten in an October letter to teachers. "That is, they won’t be used in tenure determinations or the annual rating process. Administrators will be specifically directed accordingly." Similarly, the Frequently Asked Questions section of the DOE’s Teacher Data Tool Kit website poses the question "How can you be sure that principals won’t use the Teacher Data Reports to evaluate teachers?" The response: "Principals have been and will continue to be explicitly instructed not to use Teacher Data Reports to evaluate their teachers. The DOE has standard processes in schools for teachers to raise issues or concerns."

And yet. From the Frequently Asked Questions on the DOE’s Teacher Data Toolkit website: "By isolating individual teachers’ contributions to student progress, the Teacher Data Reports provide valuable information to school leaders and teachers about where to focus instructional improvement efforts. …Teacher Data Reports provide information about how individual teachers’ efforts influence student learning … A sophisticated multivariate regression analysis based on NYC data from 1999-2008 determined how much to weigh each factor [to calculate students’ predicted gains] … A panel of technical experts has approved the DOE’s value-added methodology. The DOE’s model has met recognized standards for demonstrating validity and reliability. Teachers’ value-added scores from the model are positively correlated with both School Progress Report scores and principals’ perceptions of teachers’ effectiveness, as measured by a research study conducted during the pilot of this initiative."

In other words: The Teacher Data Reports rely on sophisticated statistical techniques that are valid, reliable and approved by experts, and they isolate an individual teacher’s contributions to student learning. But, you principals who are under tremendous pressure to increase test scores or face losing your jobs, don’t you dare think about using these Teacher Data Reports to evaluate teachers.

Don’t think about elephants.

December 5, 2008

Early Warning Systems for School Dropouts

siren.jpg
The recent flurry of attention to high school completion rates has revived interest in early warning systems designed to identify students at risk of dropping out of high school. The idea behind these early warning systems is that, through the analysis of administrative data, schools and school districts can develop models of risk factors which predict a high probability of dropping out of high school. If the models successfully distinguish probable dropouts from probable graduates, students at high risk of dropping out can be identified, and support resources can be focused on these students identified as at risk of dropout.

A good early warning system will have high sensitivity and high specificity. High sensitivity means that the early warning indicators will identify a very high percentage of those youth who will eventually drop out (i.e., a high percentage of "true positives"). High specificity means that the indicators will not identify many youth who are not destined to drop out (i.e., a low percentage of "false positives".) Phil Gleason and Mark Dynarski of Mathematica Policy Research showed in the federally-funded School Dropout Demonstration Assistance Program evaluation that most dropout prevention programs had disappointingly low sensitivity and specificity: they failed to serve youth who would eventually would drop out, and they frequently served youth who would likely have graduated in the absence of the program.

Early warning indicators have been developed in Chicago, by Elaine Allensworth and John Easton, and in Philadelphia, by Robert Balfanz and Ruth Curran Neild, as well as other cities. The Chicago indicator is an indicator of being "on-track" for high school graduation; a student is "on track" if he or she earns at least five full-year course credits and no more than one F in one semester in a core course during the first full year of high school. The Philadelphia measure relies on sixth-grade measures of academic performance and behavior. A student with at least one of the following four characteristics had at least a 75% chance of dropping out of high school: (a) a final grade of F in math; a final grade of F in English; attendance below 80% for the year; and a final behavior mark of "unsatisfactory" in at least one class.

It’s not exactly rocket science to show that students who fail courses and have low attendance have an elevated risk of dropping out of high school, but the architects of these systems argue that the specific indicators that students manifest warrant different responses. Low attendance may stem from a different set of sources than poor behavior, for example, and a key feature of these indicator systems is that they frequently rely on administrative representations of students’ behavior in the school and classroom (especially the incidence of failing a core course), rather than more distal status measures that are less amenable to a programmatic response. Finding that low-SES youth are more likely to drop out, for example, would not give a school or district much to work with.

One issue to consider is the way in which early warning indicators are used in medicine. They’ve become controversial in instances in which the indicators don’t prescribe a reliably successful course of treatment. In the absence of an effective treatment plan, critics argue, indicators of the heightened risk of conditions such as prostate cancer or breast cancer may simply upset patients and not improve outcomes. In contrast, cholesterol tests are much more valuable as early warning indicators for heart disease because the use of statins to reduce cholesterol levels is recognized as an effective treatment that improves cardiovascular outcomes.

The question we might ask about dropout prevention is: If we knew that particular students had an elevated risk of dropping out of high school, what would we do differently? The problem here is that we do not have a dropout prevention wonder drug that has shown to be reliable in lowering dropout rates in multiple contexts. The history of dropout prevention research is littered with poorly-designed, small-scale research studies that have failed to identify a set of program elements that consistently work. Moreover, the best-designed of such studies have found modest program effects on the probability of dropping out.

None of this is to say that local efforts to reduce dropping out are ineffective. Many talented and motivated people lead and staff such programs, and they may in fact reduce the risk of dropping out for some groups of youth. The problem is that we don’t really know if they work or not. And in the absence of such knowledge, skoolboy is just not sure that early warning systems to identify potential dropouts are all that useful.

December 1, 2008

Micromanaging the Micromanager

blackberry-8820-smartphone-att.jpg

DC Schools Chancellor Michelle Rhee is on the cover of this week's Time magazine. The accompanying article features a striking statistic: according to her office, she answered 95,000 e-mails last year. Allow skoolboy to speculate about this figure.

Let's suppose that Chancellor Rhee responds to e-mail seven days a week, and that she worked 50 weeks last year. (skoolboy would hope that she worked less, because that's a grueling pace.) 95,000/350 is about 270 e-mails per day to which she responded. Suppose further that it takes one minute to read and respond to an e-mail. (Some will take more; few, I imagine, could take less.) That's a minimum of 270 minutes per day, or 4 1/2 hours per day of e-mail. Every day. Seven days a week, 50 weeks a year.

Amanda Ripley, the author of the article, describes spending a day with the Chancellor in August as she made unscheduled visits to DC public schools:

She emerged from her chauffeured black SUV with two BlackBerrys and a cell phone and began walking--fast--toward the front door of the first school... When we got inside, she walked into the first classroom she could find and stood to the side, frowning like a specter. When a teacher stopped lecturing to greet her, she motioned for the teacher to continue. Rhee smiled only when students smiled at her first. Within two minutes, she had seen enough, and she stalked out to the next classroom.

Later, Ripley writes, "She reads her BlackBerry when people talk to her. I have seen her walk out of small meetings held for her benefit without a word of explanation. She says things most superintendents would not. 'The thing that kills me about education is that it's so touchy-feely,' she tells me one afternoon in her office."

skoolboy finds all of this fascinating, and appalling. He's seen parallels in New York, with everyone from the Chancellor on down furiously thumbing their BlackBerries in meetings with real, live people who are trying to talk to them about issues they care about. Has technology fundamentally transformed the nature of leadership in educational organizations, reducing the need for sustained engagement with interested stakeholders around social, cultural and political issues? Can a big-city school superintendent really manage by e-mail?

There's always a danger of overinterpreting a journalistic account, and more data on the linkage between technology and theories of school leadership would provide valuable context. In the meantime, when it comes to Chancellor Rhee and her peers' preference for BlackBerries to people, maybe the medium is the message.

November 12, 2008

School Progress Grade Effects on NYC Achievement: Tame, Fierce, or a Hot Mess?

winters_photo.jpg

skoolboy ventured into the rarified air of NYC’s Harvard Club yesterday to hear Marcus Winters present his new Manhattan Institute research on the effects of the 2006-07 New York City School Progress Reports on students’ 2008 performance on state math and English tests in grades four through eight. The analysis uses a regression-discontinuity design, capitalizing on the fact that schools received a continuous total score summarizing their performance on school environment (15%), student performance (30%) and student growth (55%), but there are firm cut-offs that distinguish schools receiving an F from those receiving a D, those receiving a D from those receiving a C, etc. This means that there might be schools that are very similar in their total scores, and presumably on other school characteristics, on either side of a given cut-off, allowing researchers to study the test-score consequences of obtaining a specific letter grade.

The two tables below summarize the impact of the Progress Report grades on student math and English proficiency, respectively. Both tables contrast the consequences of getting an A, B, D or F with a reference category, a C grade. A green up-arrow indicates that students in a school that received a particular Progress Report Grade did better than students in C schools, whereas a red down-arrow indicates that students did worse than students in C schools. An X indicates that student performance did not differ significantly from that of students in C schools at the p<.05 level.

Winters-Math.jpg


Winters-ELA.jpg

There’s a lot of X’s. In math, students in F schools did better than students in schools receiving higher grades, although this seems to be primarily due to an effect in grade 5. Students in D schools also did better than those in schools receiving higher grades, also due to their advantages in grade 5, apparently. In English, the letter grade a school received did not have any consequences for student performance.

Although both Winters and discussant Jonah Rockoff were careful to note limits both to the analyses and what they can tell us about the incentive effects of accountability systems, both characterized the results as pretty clear evidence that schools reacted to receiving an F or a D in ways that boosted student achievement. This was particularly noteworthy, they argued, because such little time had elapsed between when a school learned that it had received a D or F and when students were tested—January, for English, and March, for mathematics.

Well, yeah, the short time between receiving the grade and the testing is certainly an issue, and surfaced as the likely explanation for why no effects of the School Progress Report grades were found in English. But skoolboy is still worried about math. There were no statistically reliable consequences for getting a D or an F in grades 4, 6, 7 and 8; only in grade 5 is there a test-score boost. How are we to make sense of this? If the letter grades are such a powerful incentive, wouldn’t they affect the performance of students in all of the grades in a school, not just fifth-graders?

Cool person Amy Ellen Schwartz posed a very smart question from the audience. "What about those A and B schools doing worse than the C schools in 5th grade math? What does that mean?" she asked. The panelists didn’t want to address that head-on, in skoolboy’s view, but he will: Looking at 5th grade mathematics, there’s as much evidence of the receipt of an A or a B causing a school to coast as there is evidence of the receipt of a D or an F causing a school to be more productive. Probably not a popular interpretation among the true believers in the power of incentives in the room.

But the bigger story is one of what Winters called "tame" effects. No effects of the School Progress Report grades in English, and limited evidence of effects in Math. A short time-horizon between the “treatment” of receiving the grades and student testing. Ambiguous incentives, both positive and negative, associated with the grades. A very weak theory of how the grades would be expected to increase student performance. It’s a wonder that Winters found anything at all.

A last point: Winters suggested that there were dire predictions that schools would "give up" if they got low Progress Report grades, and his findings, he said, did not show that. Although there were editorials at the time of the initial release of the Progress Reports last fall expressing concern that schools might be stigmatized by getting a C, D or F when students were performing at generally high levels, I question whether anyone thought that schools, and the educators who work in them, would "give up." The more predictable reaction—which I think was born out—was that principals, teachers and parents would simply not believe the Progress Report grades accurately characterized what they saw on a day-to-day basis. A lot of stakeholders don’t believe that the Progress Report grades are reliable measures of school performance, and given what eduwonkette and I have shown about the instability in the student progress measures at the heart of the system, those beliefs are well-founded.

A brief version of the research can be found here. The technical version is now available at the same location.

November 11, 2008

Bill Gates, U.S. Superintendent of Schools

Bill_Gates_718639.jpg

Few things cause skoolboy to laugh out loud uncontrollably, but this line from a story filed by Elizabeth Green at GothamSchools hit the spot:

As part of its new approach, the Gates Foundation will advocate for the politically thorny goal of national standards — and will aim to write its own standards and its own national test.

Read it again, slowly: The Gates Foundation will develop its own national standards and its own national test.

Does anybody else think this is a really, really bad idea? I'm delighted that the Gates Foundation has realized that throwing money at small schools didn't work, but I'm not prepared to turn over the public's interest in what is to be taught and learned to a private philanthropy, no matter how civic-minded it may be.

Update: Bad form on my part not to acknowledge that the title of this entry comes from an LA Times op-ed written by Diane Ravitch, available here. She deserves all of the credit for coining the phrase. Sorry, Diane!

November 7, 2008

Where Will Malia Ann and Sasha Obama Go to School?

bus.jpg

Why is there so much interest in where Barack and Michelle Obama plan to enroll their daughters, Malia Ann and Sasha, in Washington, DC schools? Probably because most observers think that the choice of a school will reveal something meaningful about President-Elect Obama’s views about schooling in the U.S. Is that so? Heck if I know. Up till now, the Obama girls have been attending the University of Chicago Lab School, a private PK-12 school associated with the University of Chicago with annual tuition and fees ranging from $18K-$21K for students in grades 1-12. (Full-time U of C staff are eligible for a 50% tuition remission.) Michelle Obama serves on the Board of Directors of the Lab School, and a couple of skoolboy’s friends, whose children attend the Lab School, say that both Obamas have been visibly involved in the life of the school.

Odds are that the Obamas will send their daughters to a private school in DC. Like most parents, they will likely want to ensure that their children get the best schooling they can. Few parents would be willing to risk sacrificing their children’s futures to make a point about the value of public schooling. We live in an era in which schooling is seen primarily as a vehicle either to move up the social ladder or to maintain the social standing that a family has achieved. As skoolboy’s long-time friend and colleague David Labaree argued in his book How to Succeed in School Without Really Learning, two once-prominent goals of American schooling—producing citizens prepared for life in a democracy and efficiently allocating individuals to work roles, both of which view schooling as a public good—have been overtaken by the objective of schooling as a means for vaulting over others, which construes schooling as a private good. This privatization of the purpose of schooling, Labaree argues, has resulted in a commodification of schooling, and a decoupling of genuine learning from the credentials that so many individuals chase after.

skoolboy invited some of his students to envision strategies to strike a new balance among the schooling goals of democratic citizenship, social efficiency and social mobility. One provocative idea was to eliminate private schooling altogether. Doing so, a student argued, would reduce both the temptation and the capacity for members of privileged groups to use their resources to maintain their advantages. Provocative, but not feasible, I thought. Eliminating private schooling would run headlong into other firmly-held American values, such as freedom of religious expression, the separation of church and state, and the importance of choice as a political value. One can, I believe, support public education and also envision a role for private schooling in the U.S.

And yet … skoolboy finds it troubling that in so many communities in the U.S., the most advantaged groups choose to opt out of the public schooling system, turning instead to private schools. I analyzed the association between median family income and the percentage of students enrolled in private schools for the 179 census tracts in Washington, DC that had non-zero family incomes in the 2000 Census. At the census tract level, weighted by the total number of students in grades 1-12 in each tract, the correlation between median family income and percentage of students enrolled in private schools was .90. What this means is that in Washington, our Nation’s capital, lower-income families send their children to public schools, and higher-income families send their children to private schools.

The chart below shows this association graphically. DC Census tracts are divided into four quartiles, defined by their median family incomes. In the lowest quartile, median family income is less than about $30K per year; in the second quartile, median family income is roughly between $30K and $43K per year; in the third quartile, it’s between $43K and about $74K per year; and in the top quartile, the median family income is higher than $74K per year. In the lowest quartile, 5% of the children attend private schools, whereas in the top quartile, 55% of the children attend private schools.

DC_private.jpg

President Obama’s salary of $400,000 per year will place the Obama family unambiguously in the top income quartile in the District. I think the only question here is which private school will Malia Ann and Sasha attend.

November 6, 2008

Obama Wins! Have We Overcome the Scourge of Race?

ward_headshot_lowres.jpg

Why, we sure have, according to Ward Connerly, former University of California Regent and longtime opponent of affirmative action. Connerly quotes one of his college professors as saying that we’ll know that we’ve overcome the scourge of race when (a) white men no longer object to their daughters marrying a black man; (b) a white person can honestly say that s/he would be willing to walk in the shoes of any black person; and (c) Americans are willing to elect a black person to the presidency.

We’ve now learned that the third condition has been met, which is a wondrous and historic event. But what about the second condition? Connerly’s evidence is the following:

It is not hard to imagine a considerable number of whites who would not mind trading places with Obama, Halle Berry, Oprah Winfrey, Condoleezza Rice, Colin Powell, Tiger Woods and an endless list of other individuals identified as or perceived to be "black" - or partially so. In the case of Oprah Winfrey, $1.5 billion is enough to cause one to be willing to endure a whole lot of prejudice. Little boys wearing their "I want to be like Mike" tee shirts as a tribute to Michael Jordan is another vivid example of the waning influence of race in our nation.

Too polite to point out how these highly successful African-Americans aren’t just “any black person,” skoolboy turns to a social-scientific criterion for the willingness of a white person to walk in the shoes of any black person: equal odds of educational attainment. Drawing on the U.S. Bureau of the Census’ March, 2007 Current Population Survey, I calculate the relative odds that black and white 20- to 24-year-olds have graduated from high school and the relative odds that black and white 25- to 29-year-old high school graduates have obtained a bachelor's degree or higher.

% of white 20- to 24-year-olds reporting completion of high school: 94.9%
% of black 20- to 24-year-olds reporting completion of high school: 90.1%
Relative odds of completing high school for whites vs. blacks: 2.03

% of white 25- to 29-year-old high school graduates reporting a bachelor’s degree or higher: 38.0%
% of black 25- to 29-year-old high school graduates reporting a bachelor’s degree or higher: 22.2%
Relative odds of a bachelor’s degree, conditional on completing high school, for whites vs. blacks: 2.14

So the odds of completing high school are twice as high for whites as for blacks, and the odds of obtaining a bachelor’s degree conditional on graduating from high school are also twice as high for whites as for blacks. Given what we know about the importance of high school and college degrees for adult socioeconomic success in the U.S., my guess is that most whites would not honestly say that they’d be willing to walk in the shoes of a black person, if that black person has such a lower likelihood of obtaining high school and college diplomas.

The election of Barack Obama to the Presidency is a signal event, and the consequences of his breaking the color barrier will reverberate for many years to come. Although our President-Elect is a singular, charismatic individual who is the right man at the right time, the social, economic and political forces that shape the educational opportunities of African-Americans in U.S. society are deeply entrenched. Sadly, the scourge of race will not be easy to overcome.

October 31, 2008

Halloween Edu-Parade, 2008!

It’s Halloween, and time for skoolboy to present eduwonkette’s second annual Edu-Parade. Here are some of the few costumes you won’t see at the Greenwich Village Halloween parade in New York City:

First up are Philissa Cramer, Kelly Vaughan, and Elizabeth Green as the Gossip Girls. (That’s Elizabeth as vulnerable part-good, part-evil Blair Waldorf.) Over at GothamSchools, Philissa, Kelly and Elizabeth spill all the gossip on what’s happening in New York City schools and beyond. XOXO, ladies!

gotham-girls.jpg

Here comes Jim Liebman, Director of the Office of Accountability and Assessment in New York City, as ARIS, the $80M information system that the New York City Department of Education purchased from IBM. Like ARIS, Jim produces unreliable data and is inaccessible to teachers and parents.

liebman-aris.jpg

Next, there’s Lisa Graham Keegan and Linda Darling-Hammond as a couple of fuzzy dice, reflecting their candidates’ fuzzy and dicey education platforms. One of them could wind up as Secretary of Education!

dice-keegan-ldh.jpg

Here’s Margaret Spellings as a Texas Longhorn, as she prepares to move back to the Lone Star State. (Given skoolboy’s feelings about Madame Secretary’s role in promoting No Child Left Behind, I considered several kinds of horns before settling on the bovine variety.)

maggie-longhorn.jpg

Next up, blogger/journalist Alexander Russo shows his devilish side. He’s got horns too, but Ed Week is a family newspaper. (Sort of.)

Baby-devil-russo.jpg

Here are Checker Finn and Mike Petrilli of the Fordham Foundation as a pushmi-pullyu, the two-headed character that Dr. Dolittle found on his voyage. When it tries to move, both heads try to go in different directions.

pushmepullme-fordham.jpg

Following the Fordham boys, DC Schools Superintendent Michelle Rhee is the Dark Knight, an obsessive superhero who relies on her strength and intelligence. There’s good, there’s evil … and then there’s Michelle Rhee.

michelle-batrhee.jpg

Right behind Rhee is the education Brat-Pack, straight out of 1985 (about when most of them were born, it seems): clockwise, it’s David Levin (co-founder of KIPP), Jason Kamras (TFA alum and 2005 national Teacher of the Year), and the redoubtable Wendy Kopp, founder of Teach For America. The tagline for the Breakfast Club was, “They only met once, but it changed their lives forever.” These Brat-Packers meet all the time, and they’re changing lots of other people’s lives…let’s hope for the better.

brat-pack-ed.jpg

AFTer the Brat Pack comes Randi Weingarten as an astronaut, seeking to broadly and boldly go where no man or woman has gone before: a president of a national teacher union with such complex views that she’s equally hated by the left and the right. (Okay, maybe Al Shanker planted that flag first.)

broad-and-bold-randi-in-spa.jpg

And finally, eduwonkette (not pictured) as Bill Henrickson, the lead role in HBO’s series “Big Love,” about a modern-day polygamist, surrounded by spouses and acolytes Jay Greene, Kevin Carey, Andy Rotherham and Mike Bloomberg. This costume simultaneously fulfills her fantasies and theirs! (Of course, she’s already been married to Mayor Mike…)

eduwonkette-fan-club.jpg

Happy Halloween, everybody!

October 29, 2008

Where Do Teachers Come From? (Other than the Stork)

stork.bmp

Recently, skoolboy’s students had a spirited discussion of Subtractive Schooling (SUNY Press, 1999), Angela Valenzuela’s wonderful book chronicling the social relations between teachers and students in Seguin, a Houston high school serving a high concentration of Mexican immigrant and Mexican-American youth. A central theme of the book is that teachers and students often fail to understand one another’s orientations and values, resulting in a kind of mutual alienation. Valenzuela, now on the faculty of the University of Texas-Austin (and founder of a blog on educational equity in Texas) demonstrated that students often felt that their teachers didn’t care about their family, community and national histories as Mexican immigrants with a strong attachment to Spanish language. In turn, many Seguin teachers felt that their students didn’t care about doing well in school. Both groups calibrated their effort and engagement with the other based on these perceptions. A particularly vivid quote from a teacher is: “’As if teaching were not enough to preoccupy myself with’ she sighed, and then continued in a more defensive tone, ‘It’s overwhelming to think that this is the level we’re dealing at, and frankly, neither was I trained nor am I paid to be a social worker.’”

In Seguin, it seemed that the teachers were often of a different social class and cultural background than their students. The process that yielded this outcome, and what might be done about it, were of great interest to my students. Students were particularly intrigued by the finding of the Pathways Project researchers, reported in the Journal of Policy Analysis and Management in 2005, that most new public school teachers take their first teaching job near to where they grew up or went to college. In New York state, at least, 61% of the teachers starting their careers between 1999 and 2002 taught within 15 miles of their hometown (defined as where they attended high school or their home address from their college application). Eighty-five percent of new teachers in New York State began teaching within 40 miles of their hometown.

Alternate-route programs such as Teach for America pose an intriguing contrast, and the recruits to such programs probably weren’t very prominent in the Pathways data. TFA is highly selective, recruiting bright, energetic, and committed young people to teach for two years in a high-needs area, often some distance from where they grew up or went to college. (This should not be surprising, since many TFA recruits are graduates of elite institutions, and the high-needs communities that TFA serves send relatively few students to such colleges and universities.) But TFA’s practices create an interesting tradeoff: the recruitment process may select novice teachers who are predisposed to engage in the kind of caring teaching practice that Angela Valenzuela champions, while simultaneously parachuting these teachers into settings where they have little understanding of the cultural practices and values of the local community. The bounded commitment of TFA may be particularly problematic in such instances, as there seems to be little hope of cultivating an experienced corps of teachers with a deep knowledge of the local community if most of the novice teachers leave after their two-year commitments expire. Some TFA recruits do continue, but I haven’t seen data documenting how many stay in the same schools in which they began, building up the local knowledge that might enable them to sustain mutual caring.

October 28, 2008

Is the Term "Lame Duck" Offensive?

LameDuck.jpg

Maybe to a duck. The term originated in the world of finance, accompanying bulls and bears, and gravitated from referring to businessmen who couldn’t pay their debts to describing politicians who lose political power in anticipation of their scheduled loss of office. It’s hard not to view President George W. Bush and the men and women who surround him as lame ducks. But lame duck status generally doesn’t serve as a muzzle.

On Friday, White House Domestic Policy Director Karl Zinsmeister published a letter to the Editor of the New York Times responding to Sam Dillon's front-page article entitled “Under ‘No Child” Law, Even Solid Schools Falter.” Zinsmeister argued that the testing and accountability at the heart of NCLB accounted for demonstrable educational progress in the U.S. “Over the last five years, 9-year-olds in the United States have made more progress in reading than in the previous three decades combined,” he wrote. “Achievement gaps between white and black students in reading and math are now the narrowest they have ever been. That’s the reality behind your June 24 New York Times on the Web headline ‘Reading and Math Scores Rise Sharply Across N.Y.’”

Oy.

Longitudinal data on 9-year-olds are not a good indicator of recent changes in achievement in the U.S. The long-term trend data that NAEP collects has sampled 9-, 13 and 17-year-olds since the early 1970’s, and the most recent data were collected in the spring of 2008—and are not scheduled to be released until next year. So either Zinsmeister is discussing old data from 2004, or he’s drawing on data that have not yet been released and subject to public scrutiny. The main NAEP assessment has sampled 4th-, 8th- and 12th-graders since about 1990. We’ve got more recent data from the main NAEP available to judge trends over time than from the long-term trend NAEP. Moreover, the main NAEP is a more accurate measure of how students are performing in relation to current curriculum frameworks, as the content in the long-term trend NAEP hasn’t changed since its inception, whereas the main NAEP periodically revises the content covered to correspond to new curricular frameworks defined by the National Assessment Governing Board.

Data from the main NAEP do not show substantial gains associated with the implementation of NCLB. In the main NAEP, at the fourth-grade level, reading scores rose an average of two points from 2002 to 2007, the same gain as observed from 1992 to 2002. (Beginning in 1998, NAEP allowed testing accommodations.) At the eighth-grade level, reading scores fell an average of one point from 2002 to 2007, whereas they rose four points from 1992 to 2002. This is progress? In math, fourth-grade scores rose five points from 2003 to 2007, which is encouraging, and continues a long-term trend that began much earlier, as the average gain from 1990 to 2000 was 13 points. (There also was a gain of 9 points between 2000 and 2003.) The math story is much the same at the eighth-grade level. Scores rose an average of three points between 2003 and 2007, continuing a trend that began in 1990. Average 8th-grade math scores increased by 10 points between 1990 and 2000, and five points between 2000 and 2003.

As for New York scores? Sure, scores on the state-administered assessments rose substantially this past year, but on NAEP, 4th- and 8th-grade reading scores were essentially flat from 2003 to 2007, as were 8th-grade math scores. 4th-grade math scores did increase significantly from 2003 to 2007, by an average of 7 points. eduwonkette has written extensively about the reasons why high-stakes accountability test data may be inflated relative to tests with no stakes, such as NAEP, and why this might lead to distortions in how much the black-white achievement gap has declined over time.

It’s hard to isolate the impact of NCLB on NAEP scores, and there may be other student outcomes that at least suggest the possibility that NCLB has had some beneficial effects. But hanging the argument for the reauthorization of NCLB on rising NAEP scores and New York State test scores is downright foolish.

Mr. Zinsmeister, your analysis is lame. And now I’m going to duck.

October 25, 2008

2nd Annual Halloween Edu-Parade!

soapy-maggie.png

eduwonkette's Halloween parade last year was such a big hit, it's time for a reprise. Because she's on the road, skoolboy (who lacks her Photoshop chops) is at the wheel. Please post suggestions for parade participants and their costumes below, or e-mail them to me at skoolboy2 (at) gmail (dot) com. This year, participants will be on floats. There could be a big-city mayor float, with mayor-for-life Mike Bloomberg and Adrien Fenty; big-city superintendents, such as the ever-popular Joel Klein, Michelle Rhee, Arne Duncan, and Paul Vallas; high-flying free spenders, such as Eli Broad and Bill Gates; policy entrepreneurs, such as Wendy Kopp, Steve Barr, and Roland Fryer; and the bloggers you love to love or hate, such as Andy Rotherham, Checker Finn and the Flygirls/Flyboys, Kevin Carey, Deborah Meier & Diane Ravitch, and Alexander Russo. And how can we leave out last year's parade marshal, Margaret Spellings, or Randi Weingarten, or even Bill Ayers?

Please submit entries by noon on Wednesday, October 29th. The floor is open!

October 17, 2008

Brownsville's Station

Brownsville.jpg

On Tuesday, the Broad Foundation awarded the 2008 Broad Prize for Urban Education to the Brownsville, Texas School District. skoolboy has a soft spot in his heart for Brownsville: skoolboy’s spouse (who has made it clear that he is a dead man if he refers to her as Mrs. skoolboy here) is a product of the Brownsville schools, and she briefly taught English as a Second Language at the middle-school level there a long time ago. I don’t know enough about the Broad Foundation process or what the administrators and teachers in Brownsville have done to warrant this recognition to comment. But I imagine that very few readers here know much about Brownsville, so I wanted to tell you a little bit about the city on the border by the sea.

Brownsville is at the southernmost tip of Texas, separated from Mexico by the Rio Grande River, and most demographers would probably define the metropolitan area as Brownsville-Matamoros, the larger city on the Mexican side. Historically this has been a porous border, with foot traffic across the International Bridge, and thousands of vehicles traveling in both directions daily. In the 2000 Census, more than 90% of Brownsville residents described themselves as Hispanic or Latino, 31% were foreign-born, and 88% of adults aged 18 to 64 reported speaking Spanish at home.

In 1980, only 43% of adults 25 or older in Brownsville had completed high school, and 11% had completed four or more years of college. By the year 2000, about 52% of adults 25 or older were high school graduates, and 13% had a bachelor’s degree or higher. This significant progress in educational attainment, however, was not matched by improvements in the economic standing of Brownsville’s residents. In 1980, 44% of those under 18 years of age lived below the federal poverty line, and in 2000 the figure was 45%. It’s difficult to judge why adult education levels increased, but child poverty held steady; but whatever the explanation, this kind of concentrated poverty is found in few places in the U.S. The most striking figure I’ve seen is that 10% of Brownsville residents with a bachelor’s degree or higher live below the poverty line. (The figure is about 3.5% for the nation as a whole.) It’s hard not to see the local economy as the culprit when so many college graduates are living in poverty.

The best representation of Brownsville I’ve seen is in the writings of Oscar Casares, a Brownsville native who teaches writing at the University of Texas-Austin. Casares’ short-story collection Brownsville: Stories captures the culture expertly. His first novel, Amigoland, which will be published next year by Little, Brown, is also set in Brownsville. “Amigoland” may seem like a contrived title for a book, but it’s actually the name of a now-defunct shopping mall in Brownsville. When I first visited Brownsville two decades ago, there were two malls, Amigoland and Sunrise, and I was struck by the fact that neither one had the kind of retail bookstore—e.g., a Waldenbooks or B. Dalton Bookseller—that I thought were staples in such malls. How could a town’s major mall have a Chia pet kiosk but no bookstore, I wondered. But that was the reality of Brownsville just 20 years ago.

Amigoland was in the news recently, but not in its original guise. When the mall folded, it was bought by the University of Texas at Brownsville/Texas Southmost College, a unique institution merging a two-year community college with a new regional branch of the University of Texas, founded in 1991 when a Texas district court jury was persuaded that the state’s failure to fund a public four-year institution to serve the residents of the lower Rio Grande Valley in Texas was kind of a problem. UTB/TSC sits close to the Mexican border, and Amigoland, which is now a vocational-technical center for the school, is even closer. Over the past year, the Department of Homeland Security has been rolling out a plan for an 18-foot high fence along the Mexican border to deter illegal border-crossing. To the amazement of many, the DHS plan had the fence running through the UTB/TSC campus, with the former Amigoland facility on the Mexican side of the fence—even though it’s entirely on U.S. soil. In July, the university reached an agreement that scaled back the DHS plans, and ground was broken last week on fencing, reaching eight to 10 feet in places, that will supplement the existing fences. skoolboy sides with Doug Massey and many other immigration experts in arguing that, if the problem is the flow of Mexican migrants to the U.S., a fence is not a very thoughtful policy solution.

October 1, 2008

Why skoolboy Is Uncertain about the NYC School Progress Reports

It’s election season, which means that we’re being inundated with polls. The reporting of poll results drives statisticians nuts, because the press often reports the percentage of those surveyed who favor one candidate or another, without taking into account the poll’s margin of error. The margin of error is a way of quantifying the uncertainty in the poll numbers, because even a well-designed poll that surveys a random and representative sample of the population is going to generate an estimate of the true proportion of those in the population who favor a particular candidate. The general rule of thumb is, the more information available in a sample, the less uncertainty in the estimate. A smaller batch of information will yield a more uncertain, or imprecise, estimate than a larger batch of information. This is as true for estimates of the relative performance of schools and teachers—whether in the form of a complex value-added assessment model or a simple percentage—as it is for political polls.

With apologies to anyone who’s had an introductory statistics course, suppose that we were trying to estimate the average age of the teachers in a very small school—one with only four teachers—but we can only draw a sample of three of the teachers to estimate that average. The four teachers are 25, 30, 30, and 55 years old, and the true average age is (25+30+30+55)/4=35. If our sample was the teachers who are 25, 30 and 30, our estimate of the average age of teachers in the school would be (25+30+30)/3=28.25. If our sample was the teachers who are 30, 30 and 50, our estimate of the average would be (30+30+55)/3=38.33. It’s a simple example, but it shows that different samples drawn from a given population can produce quite different estimates, that can be some distance away from the true population value. You wouldn’t want to place too much confidence in a particular estimate if you knew that another, equally valid sample of the same size could generate an estimate that was quite different.

That same logic applies to estimates of school and teacher performance, such as the New York City School Progress Reports. Most of the elements of the Progress Reports are estimates (for an explanation why, see here), but the calculation of the overall letter grades which receive so much attention do not take the uncertainty in these estimates into account. Today, I’ll show that using the 2008 School Progress Reports.

One of the indicators of student progress on the School Progress Reports is the percentage of students who made a year’s worth of progress in English (ELA) and in math from 2007 to 2008. In a given school, each child who was tested in both years can be classified as having made a year’s worth of progress or not, and by totaling up those students who made a year’s worth of progress and dividing by the number of students who were tested in both years, a percentage can be calculated. (There’s an additional wrinkle for students who transferred from one school to another, but it doesn’t affect the logic I’m writing about.)

Each school is compared to a group of 40 peer schools that are judged to be similar based on their demographic and other characteristics. A school’s percentage of children making a year’s progress in ELA is compared to the highest and lowest values in its peer group, and the school gets a peer horizon score that represents its location between the high and low peer group values. For example, if a school had 55% of its students make a year’s progress in ELA, and the percentage for the lowest school in its peer group was 47%, and the percentage for the highest school in its peer group was 71%, the school was located one-third of the way between the lowest and highest schools (8 percentage points above the minimum, out of a possible 24 percentage points above the minimum in the peer group.) That peer horizon score of .33 would be multiplied by the 5.625 points that this component is counted in the calculation of the overall letter grade of the school, yielding a net contribution of 1.875 to the school’s overall score.

The problem is that this calculation doesn’t take into account the fact that all of these percentages are estimates. The chart below looks at one elementary school in particular—Senator John Calandra School (08X014)—and compares it to its peer group of 40 schools. At Calandra, 58.3% of the students made a year’s worth of progress in English in 2008. But the standard error of that percentage is 3.5%, which means that it’s possible that Calandra's true percentage could be anywhere from 51.3% to 65.3%, a wide range. (This range is shown in the “error bars” above and below the estimated percentage for each school.) The same is true for most of the other schools in the peer group. In fact, only two of the 40 schools in the peer group (the ones with the blue markers in the chart) have a percentage that we are confident is higher than Calandra’s percentage. For the other 38 schools in the peer group, we can’t rule out the possibility that Calandra’s percentage is equal to the estimated percentage in those schools. There’s a tremendous amount of overlap among these schools.

08X014.JPG

And yet Calandra received a peer horizon score of .463, and other schools in the peer group whose percentages of students making a year’s worth of progress in English did not differ statistically from Calandra received peer horizon scores ranging from .169 to .903. Calandra’s peer horizon score of .463 counted for 2.6 out of a possible 5.625 points toward the overall score on the School Progress Report. Other peer schools whose percentages did not differ significantly from Calandra’s received from 1.0 to 5.1 points out of a possible 5.625 points on this component of the overall score. Differences of this magnitude could easily make the difference between an overall grade of A and of B, or of B and of C—just due to chance. An accountability system such as the New York City School Progress Reports that doesn’t acknowledge the importance of chance and uncertainty is fundamentally misleading the public about its ability to distinguish the relative performance of schools. Some schools are likely doing significantly better than other schools; the problem is that the School Progress Reports don't provide enough information to judge which ones.

September 24, 2008

Could a Monkey Do a Better Job of Predicting Which Schools Show Student Progress in English Skills than the New York City Department of Education?

monkey4.JPG

eduwonkette and I have been blogging about the School Progress Reports released last week by the New York City Department of Education. We’ve shown that, although the performance and environment scores of schools were pretty consistent from last year to this year, the student progress scores were virtually unrelated—knowing a school’s progress score from last year didn’t predict which schools would demonstrate a lot of progress this year. This, we argued, demonstrated that the progress part of the School Progress Report—representing 60% of the letter grade each school received—wasn’t really telling us which schools consistently are promoting student progress, but rather was mostly random error.

The problem was particularly acute in the domain of English Language Arts (ELA). The stability in the student progress scores from 2007 to 2008 was so low that it led skoolboy to wonder if a monkey could actually do a better job predicting which schools show progress in students’ ELA performance in 2008 than relying on the DOE’s 2007 student progress score. The particular measure I examined was the percentage of students in the school making at least one year of progress on the ELA test from last year to this year. (As we've noted in earlier posts, the calculation of this measure changed slightly from 2007 to 2008.)

In the interest of full disclosure, skoolboy didn’t actually rent a monkey to pick the schools. Animals scare him, and he wouldn’t have been able to record the picks while hiding under his bed. What I did instead was use a random number generator to assign each school to the top or bottom half of the distribution of schools on last year’s peer and citywide measures of the percentage of students making a year of progress in English Language Arts.

The DOE got credit for a correct prediction if it correctly predicted that a school would be in the top half of this year’s schools, based on the school being in the top half on the DOE’s 2007 measure, or correctly predicted that a school would be in the bottom half of this year’s schools, based on the school being in the bottom half last year. The monkey got credit for a correct prediction if the randomly-selected location of a school as being in the top half of the 2007 distribution correctly predicted that a school would be in the top half of this year’s schools, or the random pick of being in the bottom half of last year’s distribution correctly predicted that a school would be in the bottom half of this year’s schools. These predictions were done separately for the 570 elementary schools, 128 K-8 schools, and 289 middle schools which received overall letter grades last year and this year.

Round 1. We begin with the peer horizon score for the 570 elementary schools. The DOE’s peer horizon progress score from last year correctly predicted the progress status of 46% of the elementary schools this year. The monkey correctly predicted the status of 51% of this year’s schools.

Score: Monkey 1, DOE 0.

Round 2. We next turn to the citywide horizon score for the 570 elementary schools. The DOE’s citywide horizon progress score from last year correctly predicted the progress status of 47% of the elementary schools this year. The monkey correctly predicted the status of 52% of this year’s schools.

Score: Monkey 2, DOE 0.

Round 3. In this round, we examine the peer horizon scores for the 128 K-8 schools. The DOE’s peer horizon progress score from last year correctly predicted the progress status of 45% of the K-8 schools this year. The monkey correctly predicted the status of 55% of this year’s schools.

Score: Monkey 3, DOE 0.

Round 4. Next, we look at the citywide horizon progress scores for the 128 K-8 schools. The DOE’s citywide horizon progress score from last year correctly predicted the progress status of 43% of the K-8 schools this year. The monkey correctly predicted the status of 47% of this year’s schools.

Score: Monkey 4, DOE 0.

Round 5. The final stage of the competition examines the 289 middle schools. The DOE’s peer horizon progress score from last year correctly predicted the progress status of 40% of the middle schools this year. The monkey correctly predicted the status of 50% of this year’s middle schools.

Score: Monkey 5, DOE 0.

Round 6. The last round looks at the citywide horizon progress scores for the middle schools. The DOE’s citywide horizon progress scores from last year correctly predicted the progress status of 45% of this year’s middle schools. The monkey correctly predicted the status of 49% of this year’s middle schools.

Score: Monkey 6, DOE 0.

skoolboy will forego the cheap jokes about how a monkey could do a better job of managing New York City’s accountability system than the people currently in charge. On the whole, they’re smart, hard-working people, and ridiculing them is not likely to persuade them to change their behavior (as satisfying as it may be at particular moments.) But the system that they have designed and implemented is profoundly flawed, as this comical example illustrates, and it needs to change. eduwonkette and I are going to keep hammering on this point, because it has such important consequences for students and for schools.

And besides: I bet the DOE would beat the monkey in predicting school progress scores in math. (But it wouldn’t be a rout.)

September 23, 2008

Happy Anniversary!

birthday-cake.jpg

Today marks the one-year anniversary of eduwonkette's bold entry into blogging about education. A lot has happened here over the past year, across 487 different posts, and thousands and thousands of comments. (Heck, back then, eduwonk and eduwonkette were BFF.)

eduwonkette has tackled a remarkably diverse set of education policy issues: teacher quality, No Child Left Behind, gender differences in academic performance, myths about small schools, New York City's School Progress Reports, the "it's being done/no excuses" argument, the achievement gap and "acting white", value-added assessment, choice, incentives, unions ... the list goes on and on. And she's done it all with great style and wit, first with and now without the mask.

Today is an opportunity to revisit the principles that brought her to the blogging world:

Are you tired of listening to the usual suspects on education policy? So am I. Education policy debates are dominated by a small number of very loud voices. In these debates, ideological claims, rather than research, data, the experience of educators, and common sense, are wielded as weapons. What are some of the problems I see with these debates?


A selective reading of educational research: The loudest outlets pick and choose which studies are relevant, often leading to a skewed view of what we know and don’t know about how to improve schools.

An inattention to the costs and benefits of policies: Policy solutions are endorsed as if they have no downside. But we know that all actions have positive and negative consequences. The education policy debate would benefit from such an acknowledgement.

A fundamental disrespect for the knowledge of teachers and principals who work in public schools: Too often, teachers and administrators are dismissed as “self-interested” or “protecting the status quo” when they question what policymakers wreak on their classrooms and schools. In no other profession are we willing to discount the opinions of those closest to the work at hand. Education should be no different.

Rather than stepping into this ideological boxing ring, this blog takes a different approach.

And so she has. Happy anniversary, eduwonkette!

September 17, 2008

Between a Political Rock and a Statistical Hard Place

Some days, skoolboy feels bad for the hard-working folks in the New York City Department of Education. They’re caught between a political rock and a statistical hard place. The political rock is the New York State accountability system, which complies with No Child Left Behind’s requirements to test students annually in grades 3-8 in Mathematics and English Language Arts, and to classify students, based on their test scores, as either Not Meeting Learning Standards (Level I), Partially Meeting Learning Standards (Level II), Meeting Learning Standards (Level III), or Meeting Learning Standards with Distinction (Level IV), and then aggregate the performance of students, and subgroups of students, to assess the school’s progress toward the goal of 100% proficiency for all students by the year 2014. The mechanism for this is a series of grade-specific exams, with a broad (but arbitrary, as Dan Koretz explains in Measuring Up) standard-setting process that define the scores on the exam that correspond to the four proficiency levels. Whatever a student’s scale score on the exam, he or she is classified into a particular proficiency level.

The statistical hard place is that the proficiency levels are only part of the story. The NYC DOE has found that the scale scores matter, such that a student whose scale score is halfway between the cutoffs for Level II and Level III, and therefore whose proficiency level is Level II, has a higher probability of graduating from high school on time than a student whose scale score is right at the cutoff for Level II. The scale scores have predictive validity—that is, they predict educational outcomes that we think of as important—but they don’t have the political currency of the proficiency levels specified by the state and the federal government.

There’s no evidence, to skoolboy’s knowledge, that achieving a proficiency level on NCLB-style exams has any predictive validity over and above the scale scores on which they are based. (Another regression discontinuity design study waiting to happen.) But I’ll wager that they don’t.

Whether or not the state/NCLB proficiency levels matter, the NYC DOE is stuck. They have to pay homage to the state standards, even though their internal evidence shows that partial progress—“learning quite a bit,” in skoolboy’s terms—really does matter for students’ futures, and therefore is something that schools should be held accountable for.

And I don’t disagree. I would be comfortable (though not ecstatic) with school progress reports that used changes in scale scores to quantify how much students had learned from one year to the next, under two conditions: (a) if the exams were vertically linked, and (b) if the uncertainty in the estimates of school-level effects on the average change were taken into account. Neither of these conditions is met in the current New York City School Progress Reports.

Navigating the political rock and the statistical hard place is definitely a challenge, both rhetorically and in the construction of the School Progress Reports. Rhetorically, the DOE is obliged to argue that a student who is Level III in fourth grade and Level II in fifth grade has lost ground—that student has fallen off of the sharp Level III cliff—because the state and federal accountability metrics treat this as a sharp discontinuity. But as a practical matter, the student may not have fallen off a cliff; rather, she may be just a little bit lower on a gradual hill in fifth grade than we’d like, but still higher on the hill than she was in fourth grade--and the DOE’s internal analyses document that anyone who is higher on the hill is better off than someone lower.

What’s the DOE to do? Well, it could continue to escalate the rhetoric directed toward its critics. (I note with alarm that the DOE went from calling me by my blogging name “skoolboy” on Monday to calling me “Professor Pallas of Teachers College” on Wednesday—whose proclivity to giving A’s to all of his students will come as a surprise to many of them—what’s next? Examining my teeth?) Or it could speak honestly and openly about the challenge of incorporating political and technical realities into the School Progress Reports. I think readers know which path skoolboy recommends.

September 14, 2008

Let the Spin Begin

top.gif

Suppose that your fourth-grader takes a state test that shows that she understands the associative property of multiplication, can multiply two-digit numbers by two-digit numbers, and can find the perimeter of a polygon by adding up the length of the sides. A year later, as a fifth-grader, she takes a test that shows that she can compare fractions and decimals using <, > or =; identify the factors of a given number; simplify fractions to their lowest terms; and knows that the sum of the interior angles of a quadrilateral is 360 degrees—but she cannot yet create algebraic or geometric patterns using concrete objects or visual drawings (e.g., rotate and shade geometric shapes). Would you say that your child had lost ground in proficiency, or actually gone backward?

Jim Liebman would. Liebman, the Columbia University law professor on leave as Chief Accountability Officer at the New York City Department of Education, is quoted and paraphrased in an article by Jim Dwyer in Saturday’s New York Times on the F grade that P.S. 8 in Brooklyn Heights will receive in this year’s School Progress Reports—a grade that many are finding hard to believe, given that 80% of the students tested in the school are judged proficient in math, and two-thirds are judged proficient in English Language Arts. Doubly embarrassing, in that Chancellor Joel Klein and Mayor Mike Bloomberg have publicly declared the school to be successful and worthy of emulation.

So the spinmeisters are out, and the spin here is justifying the grade of F by arguing that the children in P.S. 8 are going backward. “You drop them off at the beginning of the year, and on average, by the end of the year, your child lost ground in proficiency,” Dwyer quotes Liebman as saying. “Where was the child last year, and where is the child this year?” Liebman asked. “You’re comparing them to themselves.”

A gentle reminder to Mr. Liebman, who was hired in January, 2006: the state math and ELA tests which children take, and are the primary basis for assigning these lovely letter grades, are not vertically equated. (See skoolboy's testing primer here.) This means that there is no basis for comparing performance on the fourth-grade test with performance on the fifth-grade test. For each test, there is a subjective judgment about what level of performance constitutes proficiency, but the tests are independent. There is no basis for claiming that children are going backward; there’s no justification for claiming that a child “lost ground in proficiency,” since proficiency doesn’t exist in the abstract, but rather in grade-specific skills; and the children are not being compared to themselves, but rather their location in the distribution of children’s performance in one year is being compared to their location in the distribution of children’s performance the following year.

Perhaps Jim Liebman simply misspoke, as perhaps did Chancellor Joel Klein when he referred to statistical significance as “playing something of a game.” Such missteps might arise from the tremendous pressure to justify a particular high-stakes evaluation of a school when there are multiple sources of information about school performance that point in different directions—NCLB status, achievement levels, gains, school quality reviews, not to mention the public pronouncements of Liebman’s boss, and his boss’s boss.

There’s nothing wrong, in skoolboy’s view, in looking at students’ achievement growth as one of several criteria for judging how well a school is doing in relation to other schools. But I would never think of using year-to-year changes in proficiency levels on just two tests as the primary basis for evaluating a school’s performance. And neither would most people who study testing and assessment for a living.

September 12, 2008

Schools Restructuring under NCLB: Blow ‘em up Good?

95129c.jpg

This morning, the Center for Education Policy in Washington, DC is issuing the latest in a series of state-level reports on the fate of schools restructuring under NCLB policy. Today’s report, authored by Brenda Neuman-Sheldon (a one-time student of skoolboy’s, but I hear that she’s back on solid food), examines restructuring schools in Maryland. In 2007-08, Maryland had 38 schools in restructuring planning, a huge increase over the four schools the preceding year, and 64 schools in restructuring implementation, a 7% decline from the preceding school year. The restructuring schools are concentrated in a small number of Maryland’s 24 school districts, with 61% of the restructuring schools in Baltimore City, and an additional 30% in Prince George’s County, which adjoins Washington, DC. This concentration has stretched the capacity of the state and these districts to support restructuring planning and implementation. Prince George’s County, for example, soared from one school in restructuring planning in 2006-07 to 21 in 2007-08.

Neuman-Sheldon identifies a major shift in the form that restructuring schools in Maryland is taking. Whereas 58% of the schools in restructuring implementation in 2007-08 relied primarily on the appointment of a school “turnaround specialist” as the engine of restructuring (already a decline from the 73% using this option in 2005-06), all of the schools in restructuring planning that had submitted a plan at the time the report was written were proposing some form of “zero-based staffing”—i.e., replacing most or all of the staff in the school or asking all staff to reapply for their positions. It’s the neutron bomb theory of school reform!

But is it a good theory? That remains to be seen. What mechanism will bring highly-qualified teachers to these failing schools? Where will the tenured teachers who leave the schools go? In schools that replace only some of their staff, how will decisions about who stays and who leaves be made?

Beyond these logistical questions, though, lies another fundamental challenge: will changing the staffing—including the principals, who, Neuman-Sheldon reports, are often surprised to learn that when they select zero-based staffing as an option, they’re placing their own jobs on the line—fundamentally alter the context for teaching and learning in the school, when other powerful forces shaping teaching and learning aren’t changing at all?

September 10, 2008

Obama-Biden on the New Report Cards

parent%20report%20card.bmp

skoolboy doesn’t fancy himself a particularly political creature, although some readers would likely argue that I’m kidding myself, in that blogging is an inherently political activity. In any event, I haven’t chosen to do a close analysis of the positions or proposed policies of the finalists in our Presidential derby. I’ll make a brief exception today, not to make political hay, but rather to try to illuminate an enduring sociological challenge.

Yesterday, Barack Obama issued a new plan for school reform, emphasizing choice and innovation, investments in technology, enhanced college readiness, incentives for improved classroom teaching, and heightened responsibility from parents and from the federal government. The last piece of this agenda calls for the creation of quarterly parent report cards to support individual learning plans. Press reports of this component of the Obama agenda conveyed the impression that such report cards would simply be a fancy repackaging of the periodic report cards that parents already receive itemizing how their children are doing in school. But the Obama plan has something more ambitious in mind, including “the concrete information [that parents] need to help improve their child’s performance each year and plan for post-high school education”:

  • Where their child is expected to perform at their grade level to be ready for high school graduation and post-high school education

  • Information about local afterschool, summer learning, tutoring, and/or mentoring programs that might provide additional assistance to students who have fallen behind and provide additional hands-on learning opportunities for students who excel in certain subject areas

  • Information about alternative public schooling options in the area that the student may be able to attend, and how those schools’ students are performing

  • Expected amount of savings a family should have for future college tuition and information about eligibility for federal and state tax credits, grants, and other financial assistance

Is more information inherently better than less information? No, skoolboy thinks, not if more information is overwhelming. This is a remarkably diverse set of objectives, and each of them would require at least a term paper’s worth of material to convey what’ s important. Providing parents with the information necessary to enable them to choose between their child’s current school and alternatives? What’s the right metric here? Value-added models of school effects? I've seen highly-educated professionals struggle to understand them. Concrete information on how a child is expected to perform at the child’s grade level? You can find this on most state department of education websites, but it’s not something that can be summarized in a page or two.

The more serious problem, though, is the assumption that providing information in and of itself creates a logic for action. The available evidence calls into question both the inclination and the ability of parents to use information to make decisions regarding their children’s schooling. Moreover, these orientations and predispositions are linked to social class. skoolboy’s long-time colleague Annette Lareau, noted here as a cool person you should know, has written extensively about the differing childrearing and schooling practices of middle-class and working-class parents. Her analyses show that middle-class parents are predisposed to see family and school as connected, and to be proactive in seeking out and evaluating educational opportunities for their children. Working-class parents care just as much about their children’s education, but they see family and school as separate, and are less likely to intervene in what they view as the responsibility of the school.

Provision of this information, therefore, could have the unintended consequence of exacerbating social class differences in schooling. Middle-class parents may be better able to make sense of the information, and will be more prepared to act on it. Working-class parents may be overwhelmed by it, and will not necessarily know how to translate the information into concrete action steps. It wouldn’t be the first policy initiative to founder by assuming that everyone behaves like the middle class.

And finally: “quarterly”? Maybe that’s just rushed copyediting…

September 9, 2008

Grading skoolboy

spiffboy2.jpg
What bloggers need, Michael Bloomberg prophesied last year, is a "wake-up call." Joel Klein agreed: "If you're not making progress, if your [posts] are not moving forward, then I don't think the [blog] is doing well." Jim Liebman couldn't have agreed more: "“When you say, we’re going to hold you to the best that other [blogs] like you can do, all of a sudden, [there are] no more excuses."

skoolboy, as you all already know, is that pesky curvebreaker in your calculus class. An A+ for you, skoolboy, and a hearty thanks for relieving me from blogging for my conference/vacation.

And I'm a huge fan of skoolboy's report card contest - don't forget to enter! If you are lazy, you can just hit the diagonals (i.e. what percent of schools that received As will still receive As, what percent of schools that received Bs will still receive Bs, etc). As for prizes, I am still working on it. A pony? The right to choose Joel Klein's costume in this year's Halloween Parade?

Got ideas? Let me know.

September 7, 2008

Predicting the Near Future*

question_marks.jpg

Sometime soon, with great fanfare, the New York City Department of Education will release this year’s School Progress Reports. (Word on the street is that schools already know their grades.) The School Progress Reports, for better or worse, are the centerpiece of the NYC accountability system. (skoolboy thinks for worse, but more on that later.)

The DOE has made a number of changes to the Progress Reports for this second iteration, and I think that eduwonkette had something to do with that (as did other critics and analysts outside of the Tweed inner circle.) We can expect to see separate letter grades for the three major dimensions on which the Progress Reports are based: school environment (including attendance, and parent, teacher and student surveys), student performance, and student progress. But the overall format appears to be unchanged: most of the grade is based on student progress on test scores, and such gains are not very reliable from one year to the next. There is, in skoolboy’s opinion, a false sense of precision conveyed by these letter grades, as they are based on components that are measured with error, but that measurement error is not reflected in how the grades are calculated. And I’m particularly annoyed at the misuse of social surveys for accountability purposes.

Nevertheless, the DOE is marching onward, and we’ll have this year’s grades to pore over in the near future. (And you can bet that eduwonkette will put on the green eyeshade for this, even though it clashes with her cape and mask.) How many schools will improve their grade from last year to this year? How many will fall? It’s time to make some predictions. What do you think, readers?

Here's a five-by-five table designed to show how this year’s grades are associated with last year’s grade. Each column represents last year’s grade, and each row represents a possible outcome for this year. The column percentages will add up to 100%. Try to fill in the blanks: What percentage of the schools that received A’s last year will receive an A this year? What percentage of A’s will decline to B’s? What fraction will fall further to C’s, D’s, and F’s? At the other end of the spectrum, what percentage of last year’s F’s will remain F’s? What percentage will climb out of the cellar to obtain a D? Will any make the leap from F to A?

crosstab.JPG

As a reminder, last year, about 23% of schools received an A; 38% received a B; 26% received a C; 8% received a D; and 4% (i.e., 53 schools) received an F.

A caveat: The DOE knows that the legitimacy of the School Progress Reports depends on the grades not being too volatile from year to year. If 75% of last year’s A’s became F’s this year, no one would take this scheme seriously. (And if schools that everyone views as exemplary or high-performing got middling grades, this too would call the scheme’s legitimacy into question. So don't expect Stuyvesant High School to get a C.) There may not be very much fluctuation from last year to this. You can be sure that the DOE has constructed this year’s scores so that there’s not too much instability from last year to this year.

But since we believe in incentives on this blog, the reader who comes closest to the actual association between last year and this year shall receive a prize to be selected by eduwonkette—and we know how creative she can be. Be sure to fill in all 25 blanks.

*Employees of Tweed Courthouse, KPMG Consulting, and the Parthenon Group are ineligible for this contest.

September 5, 2008

COWAbunga! Post-Convention Edition

cowabunga-award.jpg

No, there's no convention commentary here (or else skoolboy would have to shoot himself). This week’s “Comment of the Week Award,” also known as the COWAbunga Award, goes to NYC Educator, for a comment on yesterday’s Coffee Talk question about which big-city school district is the worst-managed. NYC Educator wrote:

I see the system in which I work on a daily basis, and I don't always see its reality reflected in the press--although they've made great strides over the last few years.
Really, when you're a teacher and you find blatantly preposterous statements in the NY Times, you have to wonder about the reporting from other cities. Who knows whether or not they're telling the truth, or whether they've sent anyone to find out what was really happening. Certainly it's easier to just ask City Hall what's going on and write whatever they tell you.

Big-city school districts are notorious for turning inward—transparency has never been their strong suit. A vigorous press is one of the ways that those in charge of these districts can be held to account for their responsibilities as public servants. This is one of the reasons why yesterday’s announcement that the New York Sun may be folding at the end of the month was so disappointing. skoolboy didn’t often agree with the editorial pages of the Sun, but I always felt better knowing that there was a venue for opinions different from mine to be aired and debated.

Even more importantly, though, the shutdown of the Sun would mean less daily beat reporting on New York City schools. eduwonkette has said repeatedly, and I agree wholeheartedly, that Sun reporter Elizabeth Green has been breaking important stories since she arrived on the scene last year, and it would be a shame if those of us with a stake in New York City schools were to be deprived of her investigative skills. (And yes, she wrote a feature on eduwonkette, and I’ve assisted her in a story or two, but the quality of her work speaks for itself.) Alexander Russo over at This Week in Education has also lamented the recent transitions of a number of well-regarded education writers to new positions that remove them from day-to-day beat reporting. Really, is it possible to have too much high-quality reporting on public education? Maybe … but we have a long way to go before that’s a serious question to consider.

In the meantime, the gap between the person-power devoted by school systems to transmitting messages about public schools to the public and the person-power available in an independent press to interpret these messages in a critical and thoughtful way for the public continues to widen. This, in skoolboy's view, does not serve the public interest.

skoolboy Throws Down the Class Size Gauntlet

moneymouth.JPG

Long-time followers of skoolboy (hi, Mom!) know that his first posts on eduwonkette’s blog were about class size. I argued for championing class size reduction as the right thing to do for children and for teachers—an argument grounded in the moral content of public schooling more so than in the technical consequences of class size reduction for standardized test scores.

Over the past year, I’ve observed a number of trends in the operation of big-city school districts. I’ll use New York City as my key example, because it’s my hometown, but the issues are sufficiently general to warrant posting here.

First, large districts are increasingly trying out innovative policies and practices for which there is little or no pre-existing research support. In New York City, the issuing of school report cards and conduct of school quality reviews are high-stakes evaluative practices for which there’s no prior evidence showing beneficial outcomes. In Washington, DC and New York City, school officials are offering incentives in the form of cash and cellphones to students in exchange for meeting academic performance targets. Some of these innovations have evaluations built into their design, whereas others do not.

Second, the arguments in support of these innovations often rely on claims that other innovations have not been successful. The best example is the juxtaposition of teacher quality and class size reduction. All kinds of policies regarding teachers—value-added assessment, merit pay, new recruitment strategies—are being justified on the grounds that teacher quality has much larger consequences for student achievement (read: test scores) than other policy choices, such as class size reduction.

Third, a lot of the claims about these effects take the form of “Research shows…”, which eduwonkette has derided as glib and poorly documented. There are, of course, important studies of both teacher quality effects and class size effects on student outcomes, but different studies yield different estimates of the magnitude of these effects. In part, this is because the impact of a particular innovative policy or practice is contingent on how the policy or practice is implemented and the features of the local organizational and institutional context for the new intervention. (We might expect, for example, that class size reduction would have different effects in classrooms with novice teachers than in classrooms with experienced teachers, or in classes that differ in the amount of prior student misbehavior.)

So when a policymaker confidently says that we should prefer innovations designed to influence teacher quality rather than class size reduction in a particular local setting—say, New York City—what’s the evidence for such a claim? Specifically, what does research tell us about the consequences of a well-designed class size reduction intervention in New York City?

The answer is, we don’t know—because there has never been a carefully-controlled study of class size reduction in New York City.

So at this point, skoolboy throws down the gauntlet: If we’re serious about data-driven decision-making, we should put our money where our mouth is, and demonstrate the relative effectiveness of class-size reduction and other policy initiatives. I call on the New York City Department of Education to carry out a well-designed study—ideally, a randomized experiment—of class size reduction in New York City public schools. View it as a small-scale pilot, as is true for some of the other initiatives, such as the student incentive plans, and look for some private funding (if it’s not feasible to draw on the operating budget). It will not be hard to pull together some of the leading researchers on class size to inform the design (and it wouldn’t kill anybody to have a couple of knowledgeable parents and teachers at the table too.) There's nearly a full year to get this off the ground for the start of the 2009-10 school year.

skoolboy is willing to live with the findings of a well-designed and well-implemented study of class size reduction in New York City, whether they support or refute claims about the efficacy of class size reduction. What I cannot support are claims that “research shows” that teacher quality is more important than class size reduction for student outcomes in New York City—or any other local education setting—in the absence of research that actually does show this.

September 4, 2008

Talk amongst Yourselves

Linda_Richman.jpg

skoolboy was having a spirited discussion with some of his students the other night, who have taught in school systems such as New York City, Detroit, LA, New Orleans, Washington, DC, Newark, Oakland, and elsewhere. The topic of the day: what's the worst-managed big-city school system--and why? Readers, what do you think? Discuss.

September 3, 2008

COWabungle

cowabunga-award.jpg

skoolboy has been worrying about how he was going to make this week's COWabunga award. There haven't been any comments to his posts! Hard to believe that such witty and incisive remarks would draw nary a "well done!" or "you're full of it, skoolboy!" Turns out that the website woes that Ed Week has endured the past few days include a disabling of the comment features here. The good people at Ed Week are now aware of this, and I look forward to hearing what readers have to say when the problem is resolved. It's not the first time that technology has kicked skoolboy in the butt, and I'm sure it won't be the last.

If you can't wait for the site to get fixed to get something off your chest, feel free to e-mail me at skoolboy2 (at) gmail.com.

The Chicago Boycott: Publicity Stunt or Principled Protest?

Yesterday, State Senator Rev. James Meeks engineered a boycott of the Chicago Public Schools, urging CPS students to travel with him to high-spending districts in Chicago’s suburban North Shore to try to register for school. The objective of the protest was to draw attention to inequalities in school funding in Illinois. Rev. Meeks sought to contrast the Chicago Public Schools, which annually spends a bit over $10,000 per student, with New Trier High School, which spends in the neighborhood of $18,000 per student. Publicity stunt, or principled protest?

Probably a bit of both, in skoolboy’s view. Illinois still relies heavily on local property taxes to fund its schools, and the variability in income and wealth across school districts means that different districts have differing capacities to raise money to support the schooling of the children who reside in them. State and federal funds are supposed to compensate for these inequalities, and they do, but not completely. The available evidence suggests that total per-pupil spending on students in the wealthiest 20% of school districts in Illinois is considerably higher than total per-pupil spending on students in the poorest 20% of school districts—a difference on the order of $2,500 per pupil per year.

The chart below shows these dynamics. skoolboy divided Illinois’ school districts into national deciles based on the median family income of the district in 2000. Districts with a median family income of $30,000 or lower were in the lowest decile, whereas those with a median family income of $66,000 or higher were in the highest decile. I looked at three different revenue streams: per-pupil local revenues; per-pupil state revenues; and per-pupil federal revenues. The sum of these three is reported as total per-pupil revenues. (I use revenues because they’re reported by source in the federal data, and expenditures are not. The data are also weighted by the number of students enrolled in each district, so smaller districts count less than larger ones. I also excluded districts in which the total per-pupil revenues exceeded $40,000 per year. The story is pretty much the same whether one looks at median family income or the percentage of children living in poverty within a district.)

Illinois.JPG


You can see just how strongly district median income and local per-pupil revenues are correlated in Illinois (r=.68). It’s also clear that state and local funds flow disproportionately to lower-income districts. But when the three funding streams are added together, there is a moderate positive correlation (r=.38) between a district’s median family income and its total per-pupil revenue. Although federal and state revenues do help to close the gap between wealthy and poor districts in Illinois, the remaining inequalities in spending are not trivial.

Having said that, a comparison between the Chicago Public Schools and New Trier is fundamentally misleading. By skoolboy’s calculations, the average total per-pupil revenue in New Trier in 2006 was nearly $22,000, which is way, way above the average total per-pupil revenues for the 113 Illinois districts in the top national income decile ($11,400). Moreover, CPS is in the 5th median income decile, not one of the lowest, and its total per-pupil revenues are a tad above the average for the 87 Illinois districts in that decile.

Not all states show this pattern; some have been more successful in reducing the association between a school district’s total per-pupil spending and the characteristics of the students in that district. (For example, like Ken DeRosa, I also find no correlation between per-pupil revenues and the percentage of children in poverty among Pennsylvania school districts. However, in Pennsylvania, as in Illinois, districts with higher median family incomes do spend more than those with lower median family incomes. ) How schools are funded has a lot to do with the inequalities across districts, but funding formulae don’t change easily. Don't expect high-spending districts to be happy with policies that ask them either to spend less or to subsidize the spending on children in other districts.

September 2, 2008

A Brief Word on Nomenclature

spiffboy2-thumb.jpg

Even though eduwonkette and skoolboy have been unmasked, skoolboy plans to continue to refer to himself in the third person. Why? If I did it at school, my students would laugh me out of the classroom. If I did it at home, my wife would kick my butt. So let me (er, skoolboy) have some fun, OK?

And for the record: both skoolboy and eduwonkette are lower case. Only proper nouns warrant capitalization, and it should be clear by now that skoolboy isn't very proper.

Back to (Home) School

hs4.jpg

It’s back to school! Today, more than one million schoolchildren will get up from the breakfast table, strap on a backpack, and trundle off to … the living room. Home schooling has been expanding rapidly over the course of this decade, according to data from the National Center for Education Statistics, representing approximately 2.2% of the student population in 2003. (The NCES definition of home schooling is children who are schooled at home instead of in a public or private school for at least part of their education, and whose part-time enrollment in public or private schools does not exceed 25 hours per week.) skoolboy hoped to be able to report some new evidence from the Parent and Family Involvement (PFI) module of the National Household Education Survey’s 2007 sample, but those data have not yet been released. Unfortunately, that means that the best available information is from 2003, the prior wave gathering information on the incidence of home schooling. Moreover, only 239 homeschooled children were included in the PFI module of the 2003 NHES, and thus our knowledge about their characteristics isn’t very precise.

There are a lot of misconceptions about home schooling, such as homeschooled children lack normal social graces due to isolation from peers, or they’re all very well-prepared for college. skoolboy has seen no persuasive evidence of any problems of social adjustment among homeschooled children. The reality is that most homeschooled children and youth are not isolated from others; they often participate in homeschooling networks, may participate in extracurricular activities sponsored by public and private schools, and, for a significant fraction, are part of religious communities that provide opportunities for interaction with peers and adults. Homeschooled children and youth probably have fewer opportunities to interact with other youth with differing social characteristics than do students who attend public school; but you don't need to be a homeschooler to select yourself into settings where you engage almost exclusively with other people who are like you.

It’s challenging to assess the impact of home schooling on children who are home schooled, because families self-select into home schooling, and the kinds of families that choose to home school differ, on average, from those who do not. (And don’t hold your breath waiting for the definitive randomized experiment!) Homeschooled children are more likely to be white than Black or Hispanic; to be in a household with three or more children than one with fewer children; to live in a two-parent household with one parent in the labor force than in another configuration; and to have college-educated parents.

One of the most interesting features of home schooling, from skoolboy’s view, is its implications for defining teaching as a profession. For the most part, parents who home school their children are subject to very little oversight by the state. Contrast this with the rules for licensing teachers who teach in the public schools. Although eduwonkette pooh-poohs my “1950’s” thinking about what defines a profession in the sociological sense, I think she would agree that the fact that the state will allow parents with no formal training, and who are not accountable to other teachers for what they do, to teach weakens the case for teaching as a profession.

In February, a California appeals court held that parents can be prosecuted for failing to ensure that either (a) their children attend a full-time public or private day school, or (b) their children are instructed by a tutor who holds a state credential for the child’s grade level. The case alarmed home schoolers and their supporters across the country. On appeal, that same court ruled last month that “(1) California statutes permit home schooling as a species of private school education; and (2) the statutory permission to home school may constitutionally be overridden in order to protect the safety of a child who has been declared dependent.” The court made clear that it was not taking a stand on whether or not home schooling should be allowed, and blamed the California legislature for a lack of clear legislation on the issue. What counts as a threat to the safety of a dependent child is not inscribed in the law, but physical and sexual abuse (which were alleged in the California case) surely count; skoolboy’s guess is that mediocre instruction would not. (If it did, there’d be an awful lot of usual suspects to round up!)

August 28, 2008

skoolboy Peeks out of the Closet

00380018cropped.JPG

spiffboy2-thumb.jpg

Now that eduwonkette has revealed herself as Columbia doctoral student Jennifer Jennings, skoolboy is gingerly sticking his head out of the closet and looking around. (If I see my shadow, I may go back inside for another six weeks.) skoolboy is Aaron Pallas, a Professor of Sociology and Education at Teachers College, Columbia University. I study inequalities that are created and perpetuated by the ways schools sort and select children and youth, and the role that education plays in individuals’ adult lives. Recently, I went on the record in the New York Sun on a topic near and dear to eduwonkette’s heart: the failure of New York City to make substantial progress in reducing the achievement gap among different racial and ethnic groups.

What’s my relationship to eduwonkette? She took a couple of courses with me, and I’m on her dissertation committee. (Her dissertation contrasts the consequences of accountability systems in education and medicine. A provocative entry into the topic is her Ed Week commentary, under her own name, here.) More importantly, though, we’ve been collaborating on a series of studies that look at the mechanisms by which some New York City schools garner more resources than others. All of the qualities that make her blog compelling and so much fun are just as evident in her approach to academic research.

eduwonkette said the other day that she stands behind everything she wrote under the pseudonym. I do too, on substance, but I’m not as sure about tone. I think the conventions of blogging, especially anonymously, allow for shooting from the hip quite easily, and my usual writing is more painstaking. (More long-winded than my interminable dreary posts? Yep.) Also, I think that sometimes I try to emulate eduwonkette’s style, which is appealing—and she is expert at it—but I’m not skilled enough to pull it off. So if I’ve offended anyone through my tone, either in the past or in the future, my apologies.

Finally, unlike eduwonkette, I did become an academic to talk to five guys in a room with transparencies. Only now, we use PowerPoint.

And I’m so glad it’s no longer just guys.

Apologies to Ed Week: earlier, I said that only subscribers could get to eduwonkette's Ed Week article on accountability and risk adjustment in education and medicine. You can get to it through the link above. eduwonkette's better at the technology than I am, too. And while I have the floor: Thanks, Ed Week, for giving eduwonkette the space to create such an interesting forum for discussions of education research and policy.

August 26, 2008

Cool People You Should Know: Amy Ellen Schwartz

Amy-Schwartz.jpg
Yesterday, in eduwonkette’s bombshell revelation that she is Jennifer Jennings, a Columbia doctoral student in sociology, she explained that the timing was influenced by the fact that there was potentially damaging misinformation about her identity swirling in the blogosphere and beyond. Many people thought that eduwonkette was Amy Ellen Schwartz. Who is this Amy Ellen Schwartz? Why, she’s a cool person you should know.

Amy is the Director of the Institute for Education and Social Policy at NYU, and a Professor of Public Policy and of Education and Economics appointed both in NYU’s Wagner Graduate School of Public Service and the Steinhardt School of Culture, Education and Human Development. She’s also the President of the American Educational Finance Association, which makes her a wonk among wonks. Amy’s a New Yorker through-and-through, and through her analyses of administrative data gathered by the New York City Department of Education, she's made important contributions to our understanding of how New York City schools serve immigrant children; strategies for measuring school performance and efficiency; and racial/ethnic differences in students’ test scores. And that’s just a sampling of her work in education; she also writes on public finance and housing.

About two weeks ago, the Census Bureau reported that the U.S. is projected to become a “majority-minority” country by 2042. New York City passed that threshold a long time ago, and few people are aware of the actual racial/ethnic make-up of the New York City public schools. About 40% of the children in the system are Hispanic; 30% are Black; 15% are Asian; and just 15% are white. At the elementary and middle-school level, one in six children was born in another country; and in a city as large and diverse as New York City, these children hail from more than 180 countries. High schools for newcomers can serve students from as many as 50 different countries.

Over the past several years, Schwartz, along with her long-time collaborator Leanna Stiefel (who is also cool, but two people wouldn’t fit on the card) and their colleagues, have sought to understand the experience of immigrant students in New York City elementary and middle schools. Two pieces of good news are that immigrant students in New York City are not, for the most part, isolated from native-born students, and that immigrant students typically attend schools that receive their fair share of school resources—largely because immigrant children are more likely to be English language learners and living in poverty than their native-born peers. Moreover, their analyses suggest that foreign-born students perform better than similar native-born students on reading and math tests, have better attendance, and are less likely to participate in part-time special education.

There is not, of course, just one immigrant experience in New York; the resources that families bring with them, and the contexts of reception they encounter when they arrive, differ across regions and countries. Moreover, what Amy and her colleagues have learned about immigrant elementary and middle school students may not apply to the experiences of immigrant high school students, and extending their analyses in this direction is definitely on their agenda.

The ways that Amy Ellen Schwartz and her colleagues have used administrative data to address fundamental questions about the performance of the New York City public schools have been a model for our masked marvel eduwonkette, and for education researchers across the country. And get this, David Cantor: an eduwonkette post on New York City that isn’t discouraging!

August 22, 2008

skoolboy Goes to the Olympics, IV: Differences across Schools

skoolboy’s jaunt to the Olympics concludes today with an examination of how much going to one school versus another matters for students’ achievement in different countries. The basic approach is to look at the average achievement in a sample of schools within a country, and to see how much those averages differ from one another. If students were randomly distributed across schools in a country, and each school had similar resources, we might expect to see relatively similar average achievement across schools, and we might conclude that which school a student attends in that country doesn’t matter that much. On the other hand, if some schools in a country enroll poor students and others enroll wealthy students, and the schools serving poor students have fewer social, cultural, and economic resources available to support student achievement than the schools serving wealthy students, we might expect to see large differences in achievement across schools, suggesting that which school a student goes to in such a country matters a lot.

Data such as these don’t tell us about school effects , because they confound two different processes: selection into a school, and what happens to students after they enter the school. The latter is what we usually think of as a school effect. School-to-school differences in achievement could represent either selection or impact; they could occur because some schools raise students’ achievement more than others, or because schools enroll students who are already achieving at very different levels, or some combination of the two. In contrast, school-to-school differences in the social and economic composition of who is enrolled are best interpreted as evidence of selection, because going to one school or another doesn’t typically affect a student’s family background.

Once again, I’m using data from the PISA 2006 assessments of science, reading and math, a sample of about 30 OECD countries and an additional 25 partner countries or economies. (For those playing along at home, the data are from Chapter 4 of the report PISA 2006: Science Competencies for Tomorrow’s World.)

The first figure below shows the proportion of the variation in individual student achievement in a country that is between schools; put differently, how much the average achievement in a school differs from one school to the next within a country. I’ve averaged the proportions for reading, math and science for each country (they’re very highly correlated with one another.) This proportion can vary from 0% to 100%. Zero percent of the variance in achievement between schools would be observed if every school in the country had exactly the same average achievement, with some students in each school doing very well, and others doing poorly. It’s hard to picture what 100% of the variance between schools would look like, but imagine a ladder with many, many rungs that are pretty far apart, and each rung represents a particular school’s average achievement, with everybody in that school scoring right at the level of the rung. Some schools would have very high average achievement, and some would have very low average achievement, and there’d be no overlap among the schools—if you knew which school a student attended, you could predict that student’s performance perfectly.
PISA%20ICC%20Ach.JPG
Not surprisingly, the reality lies somewhere in between, and the figure shows that countries differ substantially from one another in how spread out achievement is across different schools. Fifteen countries, headed by Hungary, Slovenia, and Germany, have systems in which more than 50% of the variance in student achievement lies between schools. Conversely, Scandinavian countries have the most even distribution of student achievement across schools, headed by Finland, Iceland and Norway. In the U.S., 25% of the achievement of 15-year-olds is between schools, which is significantly lower than the proportion in 37 countries, and significantly higher than the proportion in a dozen countries.

The second figure shows the proportion of the variation in individual students’ socioeconomic background that is between schools—how much the school average socioeconomic status differs from one school to the next within a country, using the PISA index of economic, social and cultural status I described last week. If none of the variance in students’ socioeconomic status were between schools, we could say that students are randomly distributed across schools according to their socioeconomic backgrounds. If a great deal of the variance in students’ socioeconomic status is between schools, schools in that country are socially segregated from one another.
PISA%20ICC%20SES.JPG
The U.S. is pretty much in the middle of the distribution of countries in terms of how spread out schools are from one another in their socioeconomic composition. 26% of the variance in individual student socioeconomic status is between schools in the U.S., which is significantly lower than 18 countries, and significantly higher than 16 countries. The countries that have the most socially segregated schools are headed by Chile, Bulgaria, Thailand, and Hungary; those that have the least socially segregated schools are the Scandinavian countries of Finland, Norway, Sweden, Denmark and Iceland.

It’s likely no surprise to thoughtful readers that schools differ substantially in their achievement levels and social compositions in most countries, but what is intriguing is that this happens in spite of the fact that there are substantial differences across countries in how education systems are organized, with some systems centralized, and others decentralized; variability in the extent to which schools are run by the state or by private entities such as religious institutions; and differences in the extent to which the secondary schools in a country prepare students for particular vocational or postsecondary destinations. The U.S. is recognized as a large, decentralized system of schools that are mostly local. Residential segregation by race, ethnicity and economic status leads to neighborhood schools that are similarly segregated, as poor people live in different places than rich folks, and therefore generally attend different schools. Increasingly, we see in the U.S. more explicit processes by which students and schools mutually select one another, on the basis of economic status (in the case of private schools charging tuition, or high-spending suburban districts with high property taxes), or on the basis of prior academic achievement (in the case of schools with entrance exams or, as eduwonkette has shown repeatedly in New York City, in the ways that new small high schools enroll higher-performing students than the large, comprehensive high schools they’ve replaced). It is important to recognize that when a school is selecting on achievement, it’s also selecting on social class background, and vice versa, because achievement and family background are correlated.

A final caveat: The PISA data I’ve reported are at the country level, but this may not be the most meaningful geographic unit when it comes to the distribution of students across schools by socioeconomic background and achievement. What we see at the national level might not apply to geographic subunits such as states, counties, or large school districts.

August 12, 2008

skoolboy Goes to the Olympics, III: Socioeconomic Status

skoolboy doesn’t know who was the first to say that the true measure of a society is how it treats its weakest members, but it’s an appealing proposition. All societies have children and adults who vary in their economic, social and cultural status within the society. In virtually every modern society, the more advantaged, as a group, do better than those with lower status, although individuals can rise or fall in relation to their peers. Today’s visit to the Olympics looks at the relationship between a child’s socioeconomic status and proficiency in math and science across countries.

PISA 2006 created an index of Economic, Social and Cultural Status (ESCS), which is based on a parent’s occupational status (using a standard international scale); the highest level of a parent’s education, in years of schooling completed; an index of family wealth (e.g., number of computers, automobiles, and televisions; whether the child has own room); an index of home educational resources (e.g., a dictionary, a calculator); and an index of possessions in the home representing “classical” culture (e.g., classical literature, works of art). The index was standardized to have a mean of 0 and a standard deviation of 1 for OECD countries. Keep in mind, though, that PISA sampled youth currently enrolled in school as 15-year-olds in the participating countries, and in some countries (e.g., Mexico, Turkey) fewer than 60% of youth at this age are still in school. (In most OECD countries, more than 90% of this age cohort is still enrolled.) Out-of-school youth are, on average, of substantially lower socioeconomic status than youth still in school at age 15.

The figure below shows the percentage of 15-year-olds in each of the PISA countries and economies whose ESCS is in the bottom 15% of ESCS among students in OECD countries. Countries are arrayed from highest to lowest, and columns in red represent significantly higher percentages than the U.S. percentage of 11%; columns in blue represent significantly lower percentages than the U.S.; and grey columns are statistically indistinguishable from the U.S. Ten countries, headed by Thailand, Indonesia, Turkey and Tunisia, have more than 40% of the PISA participants in this low-ESCS category. Three countries—Norway, Iceland and Canada—have fewer than 5% in the low-ESCS category. Based on the ESCS scale, which is intended to be standardized across countries, there are many countries with higher concentrations of low-SES students than the U.S.

PISA%20SESa.bmp

The next figure shows the correlation between ESCS and mathematics proficiency for each PISA country and economy. The correlation can range from -1.0 to +1.0, with 0 representing no correlation. A positive correlation indicates that students with higher ESCS score higher, on average, in math proficiency than students with lower ESCS. The presence of such a correlation is almost universal—only in Azerbaijan is there a realistic possibility of no correlation. Columns in red represent countries with a significantly higher correlation between ESCS and mathematics proficiency than the U.S. correlation, which is .42. Blue columns represent countries that have significantly lower correlations than the U.S., and grey columns are countries that are statistically indistinguishable from the U.S.

SES%20Corr.JPG

Only Chile and Hungary have significantly higher correlations than the U.S., whereas 28 countries have significantly lower correlations than the U.S. does. Along with Azerbaijan, Macao-China, Hong Kong-China, Canada, Iceland, Norway, Montenegro, and the Russian Federation all have a correlation between ESCS and mathematics proficiency that is less than .30.

The “sweet spot” for schools, districts, and countries is a configuration in which average achievement is high, and the achievement gap between the more and less advantaged is low—a configuration that some would describe as both excellent and equitable. skoolboy’s summary based on the PISA data: the U.S. isn’t very sweet.

A topic for another day: What should the correlation between a student's socioeconomic status and his or her school achievement be? Is it possible that some degree of correlation between socioeconomic status and school achievement is appropriate? Or should we not rest until we've driven the correlation to zero?

August 11, 2008

skoolboy Goes to the Olympics, II: Gender

On Friday, eduwonkette wondered about how gender figured into my Olympics-inspired international comparison of high student literacy in math and science. Ask and you shall receive, e. Today I’m reporting data on the percentage of males and females in different countries and economies that are high achievers, and within-country differences in these percentages. On Friday, I was looking at the top 5% of students in each country. Today, I’m using the percentage of students in each country scoring at the highest level on the 2006 PISA science and math literacy scales. (Yeah, proficiency scores, but what can you do.) In science, there are six levels of proficiency, with 1.3% of students across the OECD countries scoring at Level 6. This is more selective than the top 5% in each country. But I should point out that PISA assesses the real-world application of math and science skills, and is not a narrowly-tailored test of particular math and science disciplines. Such tests might well yield different country rankings and gender differences.

Only five countries have a statistically significant difference in the percentage of males and females achieving the top level in science: Austria, Japan, the United Kingdom, Hong Kong-China, and Israel. In 37 other countries, including the U.S., the percentages of males and females at the top level in science are statistically indistinguishable. Among males, seven countries have a significantly higher percentage at the top level than the U.S. does, and 19 countries have a significantly lower percentage. 17 countries have a percentage of males at the top level that is indistinguishable from the U.S. percentage. (Finland and New Zealand are at the top of the international heap, with 4.6% and 4.4%, respectively, whereas 1.6% of U.S. males are at the top level.) Among females, only two countries (also Finland and New Zealand) have a significantly higher percentage scoring at the top level in science than does the U.S., whereas 19 countries have a percentage that is significantly lower. 20 countries are statistically indistinguishable from the U.S.’s percentage of 1.5% of females at the top level.

In math, about 3% of the students in OECD countries score at Level 6, the top level of mathematics proficiency. In 24 countries, the percentage of males at Level 6 is reliably higher than the percentage of females, and in no country does the percentage favor females. (In 22 countries, including the U.S., the percentages of males and females at Level 6 do not differ statistically.) 26 countries have a statistically greater percentage of males at Level 6 than the 1.5% of U.S. males who achieve this level, and only 9 countries are lower than the U.S., with 15 countries at about the same percentage. U.S. females don’t fare much better against their international peers. In 15 countries, the percentage of young women scoring at Level 6 exceeds the U.S. percentage of 1.0%, and in six countries the percentage at Level 6 is significantly lower than the U.S. percentage.

Most striking to skoolboy was a comparison of the math performance of females in other countries to that of U.S. males. In 15 countries, the percentage of females achieving Level 6 on the PISA mathematics assessment exceeds the percentage of U.S. males at Level 6. Chinese Taipei (which is kicking everybody’s butts), Hong Kong-China, Liechtenstein, and Korea all have at least four times as many females at Level 6 in math, proportionally, as the U.S. has males at Level 6.

What about reading? About 8% of students in OECD countries scored at Level 5, the top proficiency level, in 2006. In 35 countries, the percentage of females at Level 5 exceeded the percentage of males by a statistically significant amount. In 18 countries, the percentages for males and females were indistinguishable. Unfortunately, the U.S. is not included in this comparison, because we dropped the baton in the relay: a mistake in printing the reading test booklets invalidated the scores.

August 8, 2008

skoolboy Goes to the Olympics

skoolboy has always found Olympic medal counts by country to be silly. Sure, it's fine to take pride in the accomplisments of one's countrymen and countrywomen. But the Olympics for me are about appreciating excellence, regardless of the flag (or swoosh) on the uniform.

Ah, but student achievement! That's a horse race of a different color. We have a venerable tradition dating back at least to A Nation at Risk of comparing the academic achievement of U.S. schoolchildren to the performance of kids in other countries. The Olympics serves as a quadrennial site for seeing how we measure up to other countries.

Yesterday, eduwonkette decried former West Virginia governor Bob Wise's comparison of the relative performance of elite U.S. athletes against the world in the Olympics with the relative performance of average U.S. students against the world in high school graduation rates. Aren't our elite students doing just as well as those in other countries?, she asked.

skoolboy doesn't have the performance of elite students cued up for comparison, but here are some data from the 2006 Programme for International Student Assessment (PISA), an international survey of 15-year-olds in 57 countries. The figures below show the performance achieved by students at the 95th percentile--that is, the top 5%--in each country. The countries are arrayed from lowest achievement to highest on the PISA assessments, with each column representing a country. Dark blue columns are countries scoring significantly higher than the U.S. Grey columns are statistically indistinguishable from U.S. performance, and bright red columns are countries doing worse than the U.S. The length of the column represents how far away a country is from the U.S. based on the standard deviation of individual scores around the world.

In mathematics, the performance of top U.S. students is dismal. In 28 countries, students at the 95th percentile score significantly higher than students at the 95th percentile in the U.S., and the gaps are surprisingly large. Students in Chinese Taipei, Korea, Hong Kong, Switzerland, Finland, Belgium, the Czech Republic and Liechtenstein all score at least .5 standard deviations above the U.S. in this comparison.PISA%20Math.bmp
Things look a little bit brighter in science achievement. Ten countries have students at the 95th percentile scoring higher than the U.S., and 35 countries have students at this level scoring significantly worse than U.S. students at the 95th percentile. Eleven countries are statistically indistinguishable from the U.S. Still, the best that we can claim is that the U.S. is tied for 11th internationally, although the magnitude of the gap between U.S. elite students and elite students in the top-ranked countries (e.g., Finland, New Zealand, the United Kingdom, Australia and Japan) is smaller in science than it is in math.
PISA%20Science.JPG

These comparisons don't address the performance of students entering the most selective of U.S. colleges and universities--the MIT's, Cal Techs, Harvards, Princetons and Yales. In a national cohort of 3 million 15-year-olds, the top 5% is 150,000 students, and the vast majority of these are not entering the most selective colleges. Still, the fact that the top 5% of U.S. students are getting their butts kicked in math and science is alarming to those who tie U.S. global competitiveness to the academic performance of American youth. Just as in sport, there are no quick fixes: a well-planned training regimen (including plenty of time in the academic weight room) is the key to success.

August 6, 2008

Cool People You Should Know: Suet-Ling Pong

pong_sml.jpg

Regular readers know that eduwonkette was an early endorser of the Broader, Bolder Approach to Education policy statement crafted by Sunny Ladd, Pedro Noguera, and Tom Payzant, and co-signed by some of skoolboy’s favorite scholars, policymakers and activists. The fundamental premise of the policy agenda is that efforts to advance student’s learning and development need to combine policies intended to improve schools with policies designed to transform the social and economic contexts in which children and youth develop. The approach is described as broader and bolder because it postulates that school improvement—which includes holding schools accountable for students’ learning and development—can’t do it alone. Rather, investments in communities, families and other social institutions that shape children’s lives outside of formal schooling are critical to moderating the powerful linkage between socioeconomic advantage and children’s learning and development.

The potential of this approach is illustrated through the research of Suet-Ling Pong, a cool person you should know. Dr. Pong is Professor of Education, Sociology and Demography at Penn State, where she serves as the Professor-in-Charge of the Educational Theory and Policy Program. (Some colleges and universities have program heads, or chairs, or coordinators. At Penn State, apparently, someone is actually in charge!) Over the past 15 years, she has pursued a program of research that has illuminated the mechanisms by which families, neighborhoods, and labor markets – important out-of-school contexts – shape students’ achievement in school.

Dr. Pong’s research strongly suggests that policies can weaken the links between a child’s social and economic background and her achievement. A key example is in the arena of family structure and family policy. In the U.S., we are accustomed to thinking of single-parent families, typically headed by women, as inherently disadvantaged. Female-headed families without another adult in the household struggle economically, and these mothers find it difficult to balance long hours at work and the time they spend with their children at home. As David Ellwood and Christopher Jencks point out, single-parent families are defined as an economic problem, a child development problem, and a moral problem; and the moral overtones have shaped American family policy.

Suet-Ling Pong and her colleagues have shown that there is nothing deterministic about the correlation between growing up in a single-parent family and children’s school achievement. Instead, she finds, the association between single-parenthood and children’s academic outcomes varies across countries. In the U.S., children growing up in single-parent families are comparatively worse off in their math and science achievement, relative to similar children in two-parent families, than is true in other countries, and some European countries have much smaller achievement gaps between single-parent and two-parent families than do others.

A country’s family policy environment is what makes the difference. Family policy takes many forms, including maternity and parental leave, child-care programming and subsidies, public after-school programs, and housing subsidies, to name a few. Countries which Pong and her colleagues describe as having strong family and welfare policies have smaller achievement gaps in math and science between children in single-parent and two-parent families than are found in other countries.

There’s no guarantee, of course, that policies that have helped to close gaps in other countries will have the same effect in the U.S. Policy-borrowing is a very delicate matter, and the successful enactment of a policy depends on many factors beyond the substance of the policy itself. Nor can we conclude that family and welfare policies are likely to eliminate the many disparities in academic outcomes observed in the U.S. Schools can’t do it alone—and neither can families and communities. But policies that unite these social institutions in a concerted effort have more potential to create progress than those that treat them in isolation from one another.

July 28, 2008

Cool People You Should Know: Jim Spillane

j_spillane_1_.jpg

The current policy discourse about teachers and teaching in the U.S. emphasizes the recruitment and retention of “high-quality” teachers, defined either by the teachers’ credentials, or their value-added influence on students’ achievement, or both. It has not, in skoolboy’s view, paid sufficient attention to the ways in which the school serves as a context for teachers’ work, shaping the conditions under which a teacher might be more or less successful in advancing students’ learning. Teachers don’t teach in a vacuum; the ability of the leaders in a school to set a direction, secure resources, facilitate professional development, and build a culture for teachers to work in concert has a lot to do with whether a teacher can be successful. One of the implications of this perspective is that a teacher’s effectiveness may be contingent on the school context, which eduwonkette has pointed to as an issue that needs further research before we embrace value-added assessment as the last word on teacher effectiveness.

Jim Spillane, who studies school leadership, is a cool person you should know. He’s the Spencer T. and Ann W. Olin Professor of Learning and Organizational Change in the School of Education and Social Policy at Northwestern University. Over the past half-dozen years or so, he has led a series of research projects on distributed leadership and instructional improvement. A key principle of distributed leadership is the distinction between leadership roles and leadership practices. The conception of the “great man” theory of leadership is only exacerbated by calls by business leaders, politicians and high-level school administrators for “strong” principal leadership. (“Strong” is always cast as better than “weak.”) Leadership, Spillane explains, is not limited to people who are formally designated as leaders. Rather, there are times when people other than the school principal perform key organizational functions, and the principal works with these others—who may include curriculum specialists and coaches, assistant principals, and of course teachers.

Spillane also emphasizes the importance of organizational routines to the practice of leadership. All organizations have a set of routines and rituals that guide the day-to-day work and interactions of teachers, students and administrators in schools. Leaders can purposively design organizational routines that might contribute to improved teaching and learning. A distributed leadership perspective is no panacea, he warns; but it can be a useful lens for making sense of the practice of leadership, and how schools can create organizational routines that allow a broader array of educators in schools to take on leadership responsibilities and develop as leaders.

The goals of schooling are too complex, and the technology for achieving those goals is currently too weak, to rely on a single person—no matter how talented—to be defined as the sole leader of a contemporary U.S. school.

July 9, 2008

The Rhetoric of Reform: Does Research Count?

nv20nf.jpg

“Better schools. Higher scores. And satisfied parents. That's the record of the D.C. Opportunity Scholarship Program.”

Thus begins Secretary of Education Margaret Spellings’ column in yesterday’s Washington Post. In this piece, she seeks to rally public support to renew the DC Opportunity Scholarship Program (OSP), which provides scholarships up to $7,500 to use towards the costs of a participating private school, including tuition, fees, and transportation. The authorizing legislation stipulated that priority for scholarships was to be given first to students attending schools that were judged in need of improvement (SINI) under NCLB standards.

Last month, the Institute of Education Sciences, the research arm of the U.S. Department of Education which Spellings heads, released the results of the Congressionally-mandated evaluation of the OSP, which reports impacts after two years. As the first federally-funded private school voucher program in the U.S., the OSP is a political football, and this evaluation report and its predecessors have been pored over by policy wonks across the land. The statute that authorized the OSP mandated that it be evaluated in terms of its impact on student test scores and school safety, as well as a more ambiguous criterion of “success,” which was operationalized in the study as parents’ and students’ satisfaction with their schools. The evaluation used a randomized controlled trial (RCT) to assess the impact of the OSP.

The executive summary of the report tells the tale, in unambiguous terms. (a) After two years, there was no effect of the OSP on reading or math test scores either for students who were offered a scholarship or those who actually used a scholarship. (b) If we look at 10 different subgroups of students—girls or boys, students attending SINI or non-SINI schools at the time of application, elementary or high school students, those from application cohort 1 or cohort 2, or students performing relatively higher or lower at the start of the study—there were no statistically significant effects of participating in the OSP on math for any subgroup, and for reading, three subgroups (students attending non-SINI schools at the time of application, relatively high-performing students, and students from cohort 1) might have done better than their nonparticipating peers. But even here, the evaluators caution that the statistical significance of these effects did not hold up when conventional adjustments for multiple comparisons were made. In other words, these subgroup effects might be due to chance, given how many comparisons were being made at the same time. Notably, the subgroup specifically identified in the legislation—students who had attended a SINI public school under NCLB—did not do better either in reading or math.

skoolboy isn’t crazy about using public funds to support private schools, but he’s a big supporter of using public funds to support the education of children in D.C., who historically have been among the lowest performers in the nation. Congress authorized this program, it’s survived legal scrutiny, and it’s deserving of a fair shake. But distorting the results of an evaluation doesn’t serve the public good. If Ms. Spellings wants to argue that the program should be renewed by Congress because parents are more satisfied with their child’s school, or because they are less likely to report serious concerns about school danger, she’s welcome to make that argument. Those are good outcomes, and some might argue that they’re ample justification for renewing the program. (Others might point out that students who received scholarships did not report higher levels of satisfaction with their school, or better school safety.) Or, alternatively, one could argue that the program needs more time to mature in order to be successful. But let’s not kid ourselves, Madame Secretary: the evidence on the academic success of the D.C. Opportunity Scholarship program—measured on your preferred metric, scores on standardized reading and math tests—is far too weak to make a persuasive case. Misrepresenting the evidence does honor neither to education research nor to education policy.

July 4, 2008

Happy Independence Day!

f_betsy.gif

Happy Independence Day! Today is an opportunity to reflect on the ideals and principles that founded this great country, and to renew our commitment to uphold and support them when we see signs of erosion and compromise.

What does it mean to be a citizen in the modern world? In the coming year, the International Association for the Evaluation of Educational Achievement (IEA) will be conducting the International Civic and Citizenship Education Study (ICCS), a study of eighth-graders’ knowledge about and attitudes towards civics and citizenship in 39 countries. Conspicuously missing from the list is the U.S.A. It’s disappointing that the National Center for Education Statistics is not supporting U.S. participation in the study.

The U.S. did participate in the IEA’s 1999 study of civic education among ninth-graders in 28 countries. Students were asked about fundamental concepts of democracy and citizenship that were not specific to the workings of particular governments, especially their attitudes and actions. An example of a content item was a multiple-choice item with the stem “In democratic countries what is the function of having more than one political party?” An example of a skills item was a multiple-choice item presenting a brief political advertisement and asking which group mentioned in the ad had probably issued it.

The U.S. did better than the international average on a test of civic knowledge (which combined civic content and civic skills), and led the world on civic skills. But before we pat ourselves on the back too much, the data also showed that civic knowledge, content and skills were distributed unequally across U.S. ninth-graders, with much higher levels among white and Asian youth than Black and Hispanic youth, and higher levels among ninth-graders with highly-educated parents than among students whose parents did not go very far through school. Black youth scored .85 to .90 standard deviations lower, and Hispanic youth about .70 standard deviations lower, than whites on civic knowledge and its components. Students with at least one parent who had only completed high school scored about .80 standard deviations lower on civic knowledge than students with at least one parent who had completed a bachelor’s degree.

It’s tempting to look at these gaps and infer that they simply reflect the large average differences in academic performance among racial/ethnic and social class groups observed among American youth more generally. But I don’t think that we can count on No Child Left Behind to increase the civic knowledge of our most disadvantaged youth. There’s something very pernicious about a system that fails to educate its most vulnerable members about the very institutions of democracy that were designed to enable them to become productive citizens.

eduwonkette will be back next week. Thanks for the opportunity to post, e.

July 2, 2008

Educational Testing: A Brief Glossary

While you’re waiting for Dan Koretz’ book on testing to arrive – I think eduwonkette and I should get some kind of consideration for shilling for this book so often here – here’s a brief skoolboy’s-eye view on testing. Actual psychometricians are welcome to correct what I have to say.

Tests are typically designed to compare the performance of students (whether as individuals, or as members of a group) either to an external standard for performance or to one another. Tests that compare students to an external standard are called criterion-referenced tests; those that compare students to one another are called norm-referenced tests. Even though criterion-referenced tests are intended to hold students’ performance up to an external standard, there is often a strong temptation to compare the performance of individual students and groups of students on such tests, as if they were norm-referenced.

A typical standardized test of academic performance will have a series of items to which students respond, generally either in a multiple-choice or constructed response format, which means that students are constructing a response to the item. There’s usually only one right answer to a multiple-choice item, whereas constructed-response items may be scored so that students get partial credit if they demonstrate partial mastery of the skill or competency that the item is intended to represent. For any test-taker, we can add up the number of right answers, plus the scores on the constructed-response items, to derive the student’s raw score on the test. A test with 45 multiple-choice items would have raw scores ranging from 0 to 45.

For individual test items, we can look at the proportion of test-takers who answered the item correctly, which is referred to as the item difficulty or p-value, which has nothing to do with the p-values used in tests of statistical significance, but rather the proportion (p) of examinees who got the item right. Some test items are more difficult than others, and hence items will have varying p-values.

Raw scores are rarely interpretable, in part because they are a function of the difficulty of the items. For this reason, they are typically transformed into scale scores, which are designed to generate a score that will mean the same thing from one version of a test to the next, or from one year to the next. The scale for scale scores is arbitrary; the SAT is reported on a scale ranging from 200 to 800, whereas the NAEP scale ranges from 0 to 500.

The process of transforming raw scores into scale scores is computationally intensive, generally using a technique known as Item Response Theory (IRT), which simultaneously estimates the difficulty of an item, how well the item discriminates between high and lower performers, and the performance of the examinee. An examinee who successfully answers highly difficult items that discriminate between high and low performers will be judged to have more ability, and hence a higher scale score, than an examinee who gets the difficult items wrong.

There’s no one right way to transform raw scores into scale scores, and it’s always a process of estimation, which is sometimes obscured by the fact that scores are reported as definite quantities. (A little skoolboy editorializing here…) The expansion of testing hastened by NCLB has placed a lot of pressure on states, and their testing contractors, to construct scale scores for a test that represent the same level of performance from one year to the next (a process known as test equating). Much of this is done under great time pressure, and shielded from public view. The process is complicated by the fact that states typically don’t want to release the actual test items they use, because then they can’t use them in subsequent assessments as anchor items that are common across different forms of a test, since students’ performance on such items could change due to practice. Some tests are vertically equated, which means that a given score on the fourth-grade version of a test represents the same level of performance as that same score on the fifth-grade version of the test. In a vertically-equated test, if the average scale score is the same for fourth-graders as it is for fifth-graders, we’d infer that the fifth-graders haven’t learned anything during fifth-grade.

Proficiency scores represent expert judgments about what level of scale score performance should describe a student as proficient or not proficient at the underlying skill or competency that the test is measuring. For example, NAEP defines three levels of proficiency for each subject at each of the grades tested (4th, 8th and 12th): basic, proficient, and advanced. Cut scores divide the scale scores into categories that represent these proficiency levels, with students classified as below basic, basic, proficient, or advanced. These proficiency scores do not distinguish variations in students’ performance within the category; one student could be really, really advanced and another just advanced, and whereas a scale score would record that difference, a proficiency score would simply classify both students as advanced. The fact that proficiency levels are determined by expert judgment, and not by the properties of the test itself, means that they are arbitrary; the level of performance designated as proficient on NAEP may not correspond to the level of performance designated as proficient on an NCLB-mandated state test. Many researchers (including Dan Koretz, eduwonkette, and me) are concerned that the focus on proficiency demanded by NCLB accountability policies has the unintended consequence of concentrating the attention of school leaders and practitioners on a narrow range of the test-score distribution, right around the cut score for the category of “proficient,” to the detriment of students who are either well below or well above that threshold. Such a focus is a political judgment, not a psychometric one, and there are arguments both for and against it.

I'll update this as more knowledgeable readers weigh in. If experts in measurement were to judge proficiency thresholds for knowledge about testing, I'd probably be classified as basic; Dan Koretz is definitely advanced. For a lively and readable treatment of these kinds of issues, get his book!

July 1, 2008

An Immodest Proposal

spiffboy2-thumb.jpg

This year’s statewide fourth-grade math exam administered in New York State -- the one with the remarkably high gains -- contained the following item:

“Janice bought a notebook for $3.75 and a pencil for $0.47. She gave the cashier $5.00. How much money did Janice receive in change?”

The item might have looked a little familiar to fourth-grade teachers. In 2007, a similar item appeared:

“Tony bought art supplies that cost $19.31. He gave $20.00 to the cashier. How much money did Tony receive in change?”

And in 2006, an item read:

“Mr. Marvin spent $54.10 on pants and shirts. He gave the cashier $60.00. How much money should Mr. Marvin receive in change?”

Other similarities abound. In 2008, an item read:

During the year, one thousand eight hundred four books were checked out of the school library. What is another way to write this number?

A. 184
B. 1,084
C. 1,804
D. 1,840

There was an uncanny resemblance to an item on the 2007 test:

The number of people who live in Goodwin Falls is three thousand nine hundred eight. What is another way to write the same number?

A. 398
B. 3,098
C. 3,908
D. 3,980

To be sure, the test-takers in 2008 still had to answer these questions correctly to get credit for them. But the similarity in item formats across the years gives some credence to concerns that scores are inflated.

Dan Koretz discusses the problem of score inflation in his excellent new book, Measuring Up: What Educational Testing Really Tells Us. One source of the problem, he explains, is that all tests sample the subject-matter domains that they are supposed to tap. If the same kind of item shows up repeatedly on the test from one year to the next, teachers and administrators can focus on this restricted set of test item types, and neglect other item types that are still part of the domain that the test is intended to represent.

The National Assessment of Educational Progress (NAEP) is sometimes referred to as the “gold standard” for standardized tests, and claims about test score inflation in a test, such as an NCLB-mandated state test, are often grounded in a discrepancy between NAEP and the other test either in the level of or trend in performance . The characterization of NAEP as the “gold standard” reflects the fact that it is designed to measure a much larger sample of student performance in a domain than is the typical state test. No individual child takes all of the items in the NAEP item pool; instead, students complete test booklets with blocks of items. In the 2000 12th-grade mathematics NAEP, for example, students completed one of 26 different test booklets, each containing three 15-minute blocks out of a total of 13 different blocks of mathematics items. Each student was asked to complete about 40 items across the domains of number sense, properties, and operations; measurement; geometry and spatial sense; data analysis, statistics and probability; and algebra and functions.

Overall, enough students respond to all of the items in the NAEP item pool to be able to measure how well the population of students in a state (or large urban district) is doing. But NAEP is not designed to yield scores for individual students, because no student responds to enough items to yield a reasonably precise measure of performance.

With tongue firmly in cheek, skoolboy offers the following solution to test score inflation: more testing. Imagine if students completed the entire pool of NAEP items (or some other broad pool of items assessing performance in a domain), instead of the relatively restricted sample of items used in most state-level testing programs. If students were assessed on a broad array of items tapping subject matter competence, teachers and administrators would not be able to concentrate their attentions on a subset of item types, and hence would not be able to artificially raise students’ scores relative to their true learning of the subject. Sure, the burden of testing would increase; we'd need to invest in better and more expensive tests; and increased testing wouldn't solve the incentive problems that high stakes create.

More testing. An idea whose time has come?

Nah.

June 30, 2008

Inspiration and Perspiration

spiffboy2-thumb.jpg

Graduations are sacred events in American society. They mark an important transition, and graduates and their loved ones are justifiably proud of their accomplishments. For this reason, it’s a very tricky thing to comment on news stories connected to graduations. One doesn’t want to appear to be denigrating the achievements of the graduating students, many who have overcome substantial odds to obtain a diploma.

Over the past week, Joel Klein, Chancellor of the New York City Public Schools, has been making the rounds at the graduation ceremonies of some of the small high schools in NYC. Regular readers of this blog know that eduwonkette has been sharply critical of some of the “turnaround” myths constructed about these small schools, pointing out that they enrolled students who were better off academically than the students in the large high schools they replaced. At my urging, she held off on posting about the Chancellor’s e-mail to teachers about the graduation ceremonies at Bronx Lab School, one of the small schools which replaced the larger Evander Childs high school, about which she has posted repeatedly.

Jenny Medina files a story in today’s New York Times on the graduation at the Urban Assembly School for Law and Justice in Brooklyn. Much of the piece describes the extraordinary time and effort put in by the staff in order to achieve a graduation rate of 93% among the senior class. The principal, who is leaving for another position, describes herself as “exhausted,” and expressed concern that her staff could not maintain the intensity required to do their jobs well.

”You are taking a bunch of hyper, type A perfectionist people and giving them a herculean task,” she said. “People have to work much too hard to do what we are doing. People cannot work at this level all their lives and nobody is prepared to do something at a level of mediocrity.”

Ms. Medina writes that the Chancellor “seemed unconcerned that so many of the teachers at small schools were working such long hours.”

”'When people are part of the world of changing things for children, they don’t view it as work,' he said, pointing to members of his own staff who log 14-hour days.”

An uncharitable critic (that would be me) might note that one of the reasons that the Chancellor’s staff must work 14-hour days is to clean up after his many missteps and mistakes. Such a critic might also point out that the average salary of the members of the Chancellor’s staff is $113,000, whereas the average salary among the teachers at the Urban Assembly school for FY 07 was $49,000.

But let’s take the Chancellor at his word. If you’re changing the world for kids, why would only 14 hours a day be enough? Why not 19 hours a day? Don’t the Chancellor and his staff really care about changing things for children?

We need to disrupt this ridiculous myth that expects superhuman effort from educators in order to achieve success for kids. Almost all of the teachers I know work very hard, and struggle to maintain a balance between their professional responsibilities to the children they teach and building and maintaining a life outside of their work. We don’t need cartoon-like superhero educators; we need a system that supports teachers to work hard and honestly at their craft, without the risk of burnout after a couple of years.

How Much Math Does a Teacher Need to Know to Teach Math?

spiffboy2-thumb.jpg

I once asked a colleague if he’d read a particular book. “Read it?” he replied incredulously. “I haven’t even taught it!” A former college English professor, he came by the joke honestly. The first time I taught a course that I had never taken myself, I acknowledged the absurdity, at least to myself. I stayed about a week ahead of my students. Out-of-field teaching? Not exactly. I was teaching a course that was in my field, but outside of my immediate area of expertise. The teaching assignment was justified on the grounds that, as a Ph.D.-holder, I was deeply grounded in the core theoretical perspectives and research traditions in my discipline, and that I could therefore pick up the literature in a subfield quickly and accurately, and teach that literature competently. (At the time, no one was concerned with pedagogical content knowledge, the idea that there is practical knowledge of how to teach a subject that differs from mastery of the subject itself.)

Last week, the National Council on Teacher Quality released a report on the mathematics preparation of elementary school teachers who teach mathematics. The report indicts education schools for failing to select and prepare elementary teachers who have an adequate mastery of mathematics. Singling out algebra as a topic that is shortchanged in preparation programs, the authors offer a number of sensible recommendations for states, education schools, textbook publishers, and institutions of higher education.

The Teacher Education and Development Study in Mathematics (TEDS-M), a comparative study of how 18 countries, including the U.S., prepare mathematics teachers at the primary and lower secondary grades, is currently underway under the auspices of the International Association for the Evaluation of Educational Achievement. We’ll learn a great deal from this study that will complement the NCTQ recommendations.

It seems obvious that teachers must have knowledge of the subject matter they will actually teach. But how much more knowledge should a teacher have than what she or he is seeking to assist students in learning? The case of secondary school mathematics is instructive. Is it enough for a high school trigonometry teacher to know trigonometry cold – but not, say, real analysis, or ordinary differential equations?

In the US, many states have content specialty tests that prospective teachers must pass prior to assuming full-time teaching positions; presumably these tests tell us something about the mathematical content that states think is important for teachers to master. The four-hour Massachusetts test covers number sense and operations; pattern relations, and algebra; geometry and measurement; data analysis, statistics, and probability; trigonometry, calculus, and discrete mathematics; and integration of knowledge and understanding. Approximately 23% of the test is devoted to patterns, relations, and algebra, and there are 100 multiple-choice items and two constructed-response items. From tests such as these, we can infer that some states do not demand that high school math teachers have an extensive understanding of the discipline of mathematics.

One of the reasons I was unhappy with much of the press reporting on the Urban Institute’s study of Teach for America teachers’ effects on end-of-course tests in Algebra I, Algebra II, and Geometry (among other subjects) in North Carolina is that it shifted the locus of policy discussion to whether to expand alternate routes to teacher certification, without addressing the more challenging questions about what knowledge about subject matter and about how to teach it is optimal for student learning in particular subjects in high school. The reality is that even if we could count on the incremental achievement observed in the Urban Institute study, lots of other countries would still be kicking our butts in international assessments of mathematics and other subjects. I think we’d be better off examining how these countries prepare secondary math teachers – and teachers in other subjects – to see if there are approaches that we can adapt to the U.S. context. One thing that we might learn is that other countries demand much higher levels of subject matter competence from their elementary and secondary school teachers than we do.

June 29, 2008

"Independence" Day

spiffboy2-thumb.jpg

I’ll try to stay reasonably serious this week, but some things are just too ridiculous to pass up. On Friday, the New York City Department of Education (DOE) announced that it had selected the NYC Leadership Academy to provide principal training and development services. The press release proclaimed that the Leadership Academy was “chosen from among multiple bidders in a competitive procurement process.” The DOE is negotiating a five-year contract for a total of $50 million, beginning Tuesday, July 1.

Long-time followers of New York City public schooling are aware that the NYC Leadership Academy was created by the DOE in 2003, and Chancellor Joel Klein serves as a Director of the organization. (At least according to the organization’s IRS filings – its website doesn’t list him as a director.) The Leadership Academy website describes the Leadership Academy as “the centerpiece of the NYC Department of Education’s transformational strategy,” a phrase that also appears in DOE press releases, and the staff have e-mail addresses provided to employees of the DOE. The April press release announcing this extraordinary competitive procurement spent more time crowing about the Leadership Academy’s accomplishments than describing the request for proposals.

So: The DOE had a competitive bidding process to award a contract to an organization that Mayor Mike Bloomberg and Chancellor Joel Klein had created and publicly supported over the past five years. Remarkably, the report of the award indicated that there were three other bidders. I can only imagine who would seriously think they had a shot at this.

Probably the same people who think they have a shot at this. In related news, skoolboy, who has been happily married for many years, is announcing a competitive procurement for spousal services. The successful bidder will have experience attending to the needs of a partner like skoolboy. Prior joint ownership of property with skoolboy and collaborative experience raising a family a plus. The date of the bidder’s conference will be announced later.

May 23, 2008

skoolboy wonders: Could a Parrot Pass the New York State ELA Exam?

spiffboy2.jpg
A few days ago, A Voice in the Wilderness broke the story that the retest for the New York State English Language Arts exam had a task that required students to write a position paper arguing that inexperienced people can provide leadership, after listening to a speech by Wendy Kopp, founder of Teach For America. Some were appalled by the one-sided nature of the task, likening it to propaganda. eduwonkette’s take was that the task would be more defensible if students were given information on both sides and then asked to choose a side to argue.

The scoring guide for the task is now available on line, and it leads me in a different direction. I’m not close enough to high school English classrooms to know what a realistic level of competency is.

Here’s the task. Students were told that they would listen to a speech about young people who have become leaders in their communities. They were provided with the following situation:

Your leadership group has been debating whether leaders should have experience in their chosen fields. As part of this debate, you have decided to write a position paper in which you argue that inexperienced people can provide leadership. In preparation for your paper, listen to a speech by Wendy Kopp. Then use relevant information from the speech to write your position paper.

Students were instructed to be sure to : Tell your audience what they need to know about why inexperienced people can provide leadership; Use specific, accurate, and relevant information from the speech to support your argument; use a tone and level of language appropriate for a position paper for members of your leadership group; Organize your ideas in a logical and coherent manner; Indicate any words taken directly from the speech by using quotation marks or by referring to the speaker; and Follow the conventions of standard written English.

The passage, reproduced below, is from Wendy Kopp’s commencement speech at the University of North Carolina in 2006.

Thinking back to my own senior year in college, I wasn’t intending to start something like Teach for America—or to start anything at all for that matter. As a college senior I was applying to two-year corporate training programs, seeking out political internships, and generally struggling in my search for something that I really wanted to do. My generation was dubbed the “Me Generation.” People thought all we wanted to do was focus on ourselves and make a lot of money. But that didn’t strike me as right. I felt as if thousands of us talented, driven graduating seniors were searching for a way to make a social impact but simply couldn’t find the opportunity to do so.

Well, during my senior fall, I helped organize a conference about education reform, where one of the topics was the shortage of qualified teachers in urban and rural communities. It was at that conference that I thought of an idea: Why doesn’t our country have a national teacher corps that recruits us to teach in low-income communities the same way we’re being recruited to work on Wall Street?

From that moment, I was possessed by this idea—I thought it would make a huge difference in kids’ lives, and that ultimately it could change the very consciousness of our country, by influencing the thinking and career paths of a generation of leaders.

So I did the obvious thing. I wrote a long and very passionate letter to the President of the United States suggesting he start this corps. That didn’t get very far—I received a job rejection letter in response. So in my undergraduate senior thesis, I declared that I would try to create such a corps myself, as a non-profit organization. When my thesis advisor looked at my budget, which showed that to recruit 500 new teachers into this corps during the first year would cost two-and-a-half million dollars, he asked me if I knew how hard it was to raise $2,500, let alone two-and-a-half million dollars. Aided by my inexperience, I was unphased by his question. When school district officials and potential funders laughed at the notion that the Me Generation would jump at the chance to teach in urban and rural communities, their concerns, too, went unheard.

That year 2,500 graduating seniors competed to enter Teach For America, in response to a grassroots recruitment campaign—flyers under doors since there was no email back then! And one year after I graduated, with two-and-a-half million dollars in hand from the corporate and foundation community, I was looking out on an auditorium full of 489 recent college graduates who had joined Teach For America’s first corps.

My very greatest asset in reaching this point was that I simply did not understand what was impossible. I would soon learn the value of experience, but Teach For America would not exist today were it not for my naivete.

I see this same phenomenon every day as I watch 23-year-olds walking into classrooms and setting goals for themselves and their students that most people believe to be entirely unrealistic. The conventional wisdom is that there is only so much schools can do to overcome the challenges of poverty and the lack of student motivation and parental involvement that is perceived to come with it. But then there’s Liam Honigsberg, a Teach For America corps member in Phoenix whom I met a couple of weeks ago. His school’s vice principal saw that he had a degree in cognitive neuroscience and, naturally, called him the day before school started to ask him to teach a math class wholly comprised of seniors who were in danger of not graduating because they had not been able to pass the math portion of the state’s exit exam. It was a daunting task. Liam’s students seemed to be entirely uninterested in math. Their performance levels ranged from not having passed algebra to not having passed geometry. But Liam determined that they could and would gain the skills to graduate. The Arizona Republic estimated last year that 5,000 students didn’t graduate in Arizona because they didn’t pass that exit exam, and yet thanks to Liam’s idealism, all of his students will walk across the stage this spring.

Just over 100 miles from here, Tammi Sutton and Caleb Dolan were teaching middle schoolers in Gaston County. Tammi and Caleb were just 25 years old when they decided that to truly ensure their students had the opportunities they deserved, they would have to actually go out and start a new school in their community—a school that would set their students up to go to college. This was a pretty radical idea in Gaston County and there were many skeptics. In spite of the many who said it could not be done, Tammi and Caleb designed a program with rigorous expectations that would run from 7:30 in the morning until 5 at night, on two Saturdays a month and three weeks during the summer.

There were many who said this could not be done. Yet now their 8th graders—students who came to them in 5th grade performing anywhere between the 1st to the 4th grade levels—are performing at a level that places their school among the state’s top 15 schools in reading, writing, and math.

Teach For America’s story, and Liam, Tammi and Caleb, show us that your inexperience is a real asset. I hope you will put it to good use.

Here’s the anchor paper, scored 4, the top score (in a scale from 1 to 4). Text verbatim.

As shown in Wendy Kopp’s speech, experience is not required to be a leader. I believe leaders can be anyone who has the drive and motivation to be seccessful in the task that is at hand. Experience is aquired through years of doing the same thing over and over again, leadership does not require that.

Wendy Kopp, the woman who stated Teach for America, was inexperienced when she started the program, yet she was very seccessful. She had the drive and motavation necessary to be a leader and never gave up. Many people believed her program would never be a success because her generation was dubbed the “me” generation. The “me” generation is a generation in which money and themselves are all that matter. However, peoples thoughts about how her program would never be a sucess did not stop her. Wendy Kopp started out by writing a letter to the president, this was unseccessful. She decided to write her undergraduate thesis on her idea for Teach for America and the teacher told her it was not possible, it required too much money. Wendy was still determined, so she went to buisnesses to asked for donations and she got laughed at. They believed she could not do it. She believed her generation had people who wanted to make a social impact. Urban and rural areas needed experienced teachers and her program was designed to help. Once she finally got the money, her program was a success, about 489 recent graduates joined her program.

Liam is a part of Teach for America. He was determined to make every senior in his class graduate, although, he did not have much support because many people thought they were hopless cases. Liam taught in Arizona, in a class of seniors who needed to pass a math exam to graduate. In Arizona about five thousand students did not graduate last year. Liam’s did.

Then there was Tammy and Caleb. They started a new school in Gaston County to teach children that were considered hopeless. Tammy and Caleb took 5th graders who were considered at the 1st to 4th grade level and made them model students by 8th grade. Thier school is now a top school.

Experience is not needed to be a effective leader. Motivation and determination is all that is necessary. Wendy Kopp is the proof of that.

The scoring commentary states the following:

Meaning: The response reveals an in-depth analysis of the text making clear and explicit connection between information and ideas in the text and the assigned task.

Development: The response develops ideas clearly and fully, making effective use of relevant and specific details from the text to argue that inexperienced, but determined, people can provide leadership.

Organization: The response maintains a clear and appropriate focus on how motivation and determination, rather than experience, are necessary for leadership. The response exhibits a logical and coherent structure through use of appropriate transitions.

Language Use: The response uses appropriate language, with some awareness of audience and purpose. The response occasionally makes effective use of sentence structure or length.

Conventions: The response demonstrates partial control of conventions, exhibiting occasional errors in spelling, punctuation, capitalization, and grammar that may hinder comprehension.

So readers, what do you think? Is the problem here the task, or what’s scored as an excellent response to it, or both?

May 21, 2008

skoolboy's Platinum Law of Educational Research

spiffboy2.jpg
eduwonkette's "Iron Law of Qualitative Research in Education" is that the number of participants in the study should exceed the number of authors on the paper. Ha-ha, very funny, but the subtext is that (a) we cannot learn anything of value from studies that have small sample sizes; (b) qualitative research often has small samples; (c) therefore, we can't learn very much from qualitative research. Eduwonkette would protest that that's not what she's saying at all—"qualitative research is critical to educational research and policy," and I know that she does believe this. But poking fun at a paper reporting qualitative data without explaining why does her readers, and those who believe that qualitative research can be of great value, a disservice. I'd like to upgrade eduwonkette's Iron Law to skoolboy's Platinum Law of Educational Research: Poorly designed and conceived research is poorly designed and conceived research, regardless of the sample size.

I'll leave a defense of research using small samples for another day, and focus on why I think that the paper eduwonkette drew to our attention is poorly designed and conceived. I don't want to go on too long about this—there's a lot more to say than will hold the attention of casual readers—but here's the gist. The authors claim that teaching for social justice evokes a range of emotions in novice teachers, and they seek to understand the strategies that teachers use to navigate their emotional responses, and the implications of those strategies for their self-understandings and practices. I found the concept of socially just teaching confusing, but I'll accept the possibility that there are teacher education programs and novice teachers that are committed to the idea of teaching in ways that promote the life chances of members of marginalized groups in society, such as the poor and racial/ethnic minorities. In this paper, teaching for social justice is taken for granted as a good thing, which I know vexes some readers here, and the study seeks to build on previous work on emotions and emotional navigation in teaching. It's not news that teachers often express ambivalence about their work, and that they might struggle with how to respond to feelings of ambivalence.

The authors introduce the term critical emotional praxis to characterize the role of emotions in socially just teaching. This is not an analytic term emerging from their analysis of data on how teachers manage emotions in their work; rather it is a normative term—that is, a term that describes what the authors think the role of emotions in teaching for social justice should be. In their view, critical emotional praxis involves understanding the role of emotions in engaging with unequal power relations in classrooms and society; acknowledges the interplay between a teacher's local context and her emotional responses; and moves from a theoretical understanding of emotion to a practical set of relationships and teaching practices that promote teaching for social justice. I find this concept to be of minimal value for research purposes, since it has no apparent relationship with observations of teachers' practices and emotional states.

The purpose of the study is to describe how a novice teacher seeking to teach for social justice navigates her ambivalent emotions. The authors don't offer an explanation of why a case study of a single teacher is appropriate to address the questions they pose about emotional navigation in teaching. In this particular study, one of the authors observed the teacher for 80 minutes per day during the final 9-week period of her first year of teaching, and interviewed the teacher six times for two to three hours at a clip. The teacher's department chair and 10 students were interviewed as well. A year later, an author interviewed the teacher once for three hours, and did two more 80-minute classroom observations. Although the authors acknowledge some of the problems associated with the fact that the subject of the study was a former student of one of the authors, a teacher educator who taught her about socially just teaching, these problems are not adequately addressed in the research design.

What are some of the key findings of the research? One pertains to the teacher's mode of response to her emotions. The teacher, Sara, began seeing a professional counselor in December of her second year of teaching. She also enrolled in a course on nonviolent communication, and began sponsoring her school's forensics team. These three concrete modes of response, the authors contend, gave her insight into her self and emotions, and provided concrete strategies for relaxing, having fun, and balancing her feelings of sadness stemming from her observations of social injustice. With what consequences? She quit teaching, leaving her school and volunteering at an orphanage and school in a developing country.

What's wrong with this picture? I think the authors lacked a theory of when novice teachers might develop feelings of ambivalence and seek out strategies for coping with them. In this study, most of the action took place in the teacher's second year of teaching, and the primary source of data on these strategies is a retrospective interview conducted at the end of the second year. Therefore, the authors missed most of the action, and can only provide a bare-bones understanding of even this one case. Moreover, the fact that this teacher left the field of teaching raises serious questions about whether this case can inform teacher education in the ways that the authors hope. One reading of the results is that the teacher's leaving of the field is prima facie evidence that her strategies for coping with the feelings of ambivalence associated with seeking to teach for social justice didn't work; and although we can certainly learn from strategies that don't work, a study that shows strategies that do work would likely be more valuable.

The problem with this paper is that the intellectual payoff is nowhere near commensurate with the amount of space it took up in a major journal—45 journal pages, from start to finish. I agree with eduwonkette that it doesn't reflect well on the field of education research to have papers which make marginal contributions taking up so much airtime, and the time I spent reading this paper is lost forever—time that I could have spent in other, more valuable ways, like updating my Facebook page or grading papers.

But: the take-away message here is not that a study with a small sample—even an N of 1!—cannot contribute new knowledge to the field of educational research. It's that a badly designed and executed study won't contribute much. And bad design and execution have to do with a lot more than sample size.

April 28, 2008

skoolboy says: Some of My Best Friends are Psychometricians, But...

spiffboy2.jpg
Deborah Meier added a comment to the end of the value-added thread from last week. (Thanks for stopping by eduwonkette's blog, Deb!) Her point is too important to overlook. She writes that standardized tests of reading proficiency are only loosely correlated with good reading habits—i.e., that a student can score well on a test of reading proficiency without demonstrating the habits of mind that could enable him or her to engage in a critical discussion of a text. Meier also writes that we do not have tests that measure "the more significant intellectually sound habits of heart and mind fundamental to being a well-educated member of society. The capacity to confront a phenomenon of interest in ways that help one best understand it, and then to make use of the knowledge acquired, is surely more important than being able to guess the one out of four 'best answer.'"

She's absolutely right, in my view. Preparing children and youth to be citizens in a democracy is a critical purpose of schooling. eduwonkette has written that there's a lot to schooling that can't possibly be measured by standardized tests – I think my favorite line is from the title of a post in January riffing on New York City's "Thank a Teacher" nomination process, "They Never Say, 'Thanks for Improving My Test Scores!'" – but it's easy to fall into the trap of treating the current testing regime as the natural order of things.

We need to be mindful that public schooling is now what institutional analysts such as Pat Burch call an organizational field, with lots of actors influencing our definitions of schooling and its outcomes, including textbook publishers, testing firms, test-prep firms, and a variety of other commercial entities. Lots of commercial enterprises and non-profits owe their livelihood to public education, and are engaged in an ongoing project to shape our definitions of "real school."

Testing is big business in the U.S. Non-profits such as the Educational Testing Service and ACT have annual gross revenues approaching $900 million and $400 million, respectively. ETS's K-12 testing operation had gross revenues of $172 million in their 2006 IRS filings. On the for-profit side, Pearson Education had gross revenues worldwide of $4.6 billion in 2006, with $600 million in adjusted operating profit. Their annual report crowed of a "healthy outlook in school testing underpinned by 2005 contract wins with a lifetime value of $700m (including Texas, Virginia, Michigan and Minnesota)." McGraw-Hill Education had revenues of $2.7 billion in 2007, with operating profit of $400 million.

With this much money, and more, at stake, you can bet that there are ongoing projects to define tests and testing as the appropriate way of defining what counts as good education. They tap into a logic that defines the modern world as increasingly rational, and society as a collection of individuals with increasingly differentiated roles, identities and personal preferences.

I'm not sure what the right approach is to counter all of this. At one point, I thought that giving politicians, educators and parents vivid representations of good teaching and good learning –e.g., videos, or portfolios--would be sufficient to persuade them that test scores don't come close to capturing what we aspire to in public education. But I haven't seen that strategy be successful. Preaching to the choir isn't going to do it – we need to find a way to put people in the pews. Readers, do you have ideas?

April 20, 2008

skoolboy on: The Status of the Status Quo in Education Policy

spiffboy2.jpg
Over at The Quick and the Ed, one of the many house organs of Education Sector, Kevin Carey is conducting a serial monologue belittling eduwonkette as an “alleged social scientist.” “Alleged”? Yeah, I’ll allege it – eduwonkette is a social scientist. It’s not an epithet, as much as Carey might believe; to some of us, it’s a way of life.

What’s the latest bee in Carey’s bonnet? It’s eduwonkette’s contention that particular value-added assessment systems for evaluating teacher performance are not ready for prime time. Carey views the claim that a particular policy alternative has identifiable flaws as tantamount to embracing a status quo that is demonstrably flawed. Public education clearly isn’t working, he argues. Therefore, any policy alternative to the status quo is to be preferred. Anyone who raises caveats about any kind of change is just an apologist for the status quo, a weasel, and probably a bed-wetter too.

The problem, Carey opines, is that social scientists such as eduwonkette – wait a minute, is she a social scientist or not? – have unrealistic standards for evaluating policy alternatives. “The standard in public policy isn't 95%,” he writes, “ it's whatever is most likely to be best: 51%. “ I’m not sure what the 95% refers to here, but most policy analysts I know are in the business of trying to recommend a policy alternative based on multiple criteria: the likely consequences of the alternative for various desirable outcomes; its cost; its feasibility and sustainability; its consistency with public values; and the likelihood of successful implementation. The hard reality is that there often isn’t a very strong evidence base for making these judgments, and policy analysts have to consider a range of possible outcomes along these criteria (a confidence interval that expresses the uncertainty about what might happen), and confront the tradeoffs, because invariably no single policy alternative looks best across all of these criteria. Simply having a good big idea—choice, accountability, charters, vouchers, whatever—isn’t enough to carry the day, because the devil of public policy is in the details. The world of policy analysis is littered with examples of good ideas that were implemented poorly, and thus did not have the desired effects—even though they were very costly initiatives.

For this reason, scholars of policy analysis (e.g., Eugene Bardach of the Goldman School of Public Policy at UC-Berkeley) almost always recommend considering allowing present trends to continue undisturbed as one of a set of policy alternatives intended to address a problem condition. Enacting an alternative that costs more than the current approach and doesn’t work is arguably worse than the status quo.

As for value-added assessment systems for evaluating teacher performance, we need to consider particular policy alternatives to the status quo in particular settings, not the big idea of value-added assessment for evaluating teacher performance (which both eduwonkette and I agree is promising.) If I can return to the New York City case which eduwonkette has discussed at length, the one new issue with regard to policy analysis that I’d like to introduce is feasibility and sustainability. It’s my opinion—and I’m not a lawyer, just an alleged social scientist—that the New York City approach, which defines 50% of a particular teacher’s effectiveness on the basis of how that teacher’s students do in other teachers’ classes over which the teacher has no control, would not survive a legal challenge. Other policy analysts might disagree, and might therefore be more favorably disposed towards this particular alternative. Either way, though, good policy analysis considers feasibility and sustainability as important criteria in evaluating policy alternatives.

March 28, 2008

AERA filing: Good Teachers: Who Are They? Where Are They? When Do They Stay and Move?

spiffboy2.jpg
skoolboy went to a session Thursday that was billed as about all things teachers -- mobility, retention, etc. But the session was a bait-and-switch; three CALDER (National Center for Longitudinal Data in Educational Research) papers, only two of which were about teachers. Tim Sass led off with a paper on charter high school effects on high school graduation and college attendance in Chicago and the state of Florida. Using eighth grade test scores and demographic variables as controls, and studying students who attended charter middle schools to control for selection bias, Sass and his colleagues found that students who attended charter high schools were 11 to 14 percentage points more likely to graduate from high school, and 10 to 13 percentage points more likely to attend college, than similar students who did not attend charter high schools. He concluded that expanding school choice at the high school level may be part of an effective policy to reduce high school dropout rates and promote college attendance.

Sunny Ladd presented a paper coauthored by Charlie Clotfelter and Jake Vigdor on high school teacher credentials and student achievement. Examining North Carolina end-of-course tests in English I, Algebra I, biology, geometry, and ELP (social studies), Ladd modeled achievement as a function of teacher credentials and characteristics, classroom characteristics, and student fixed effects. Students of teachers who entered via lateral entry rather than a regular license had lower test scores, whereas students with more experienced teachers and National Board certified teachers had higher test scores. Certification in the subject taught enhanced test scores by .08 standard deviations -- a sizeable amount, given that low SES black students scored .12 standard deviations below other students. Ladd found that teacher credentials explain 1/5 to 1/3 of the overall variation in teacher quality, and that teacher credentials are distributed unevenly across schools, with black students and students in high-poverty schools less likely to have highly-qualified teachers. Thus, racial differences in access to teacher credentials contributes to the black-white achievement gap.

Jane Hannaway reported on a study of Teach For America effects on high school math and science outcomes in North Carolina. (Basically the same data that Ladd used.) Estimating a cross-subject student fixed effects model, Hannaway found that students of TFA teachers performed better than students of several different comparison groups of teachers. At least in high school, she concluded, there is a greater payoff to teacher selection than to teacher retention.

Dan Goldhaber, discussing the papers, raised questions about the generalizability of the findings, and argued that the question that policymakers are likely to ask -- "What kind of a bet am I making?" in picking a policy alternative -- would best be addressed by a distribution of likely outcomes, not a point estimate of the average effect. A number of other thoughtful comments.

These are all skilled researchers, who analyzed their data with great care. And yet I came away disappointed in two respects. First, these presentations were largely atheoretical. They answered a set of "what works?" questions, but didn't yield much in the way of insights about mechanisms. Second, the two North Carolina papers relied on end-of-course test scores, but I was dismayed that Ladd and Hannaway didn't really know very much about the tests. One of the challenges in large-scale longitudinal data analysis is that just getting the data in shape to analyze is a big deal. But tests have psychometric properties, and no one in the room knew very much about them -- or about what the history and details of teacher certification requirements in North Carolina was. Since these were central concerns in the North Carolina papers, I left uneasy.

March 23, 2008

Cheap Eats for AERA

spiffboy2.jpg
There's no reason to eat overpriced Midtown food at this week's meeting. Thankfully, skoolboy pulled together a list of good (and affordable) restaurants near AERA. My vote goes to Wondee Siam II - hands down, my favorite Thai restaurant in the city. Let us know about other finds - and check out program recommendations here.

Angelo’s Pizza, 117 W. 57th (6th & 7th Aves.), thin-crust coal oven pizza

Azuri Café, 465 W. 51st (9th & 10th), Israeli falafel/shawarma

Ise, 58 W. 56th (5th & 6th Aves.), sushi

Island Burgers & Shakes, 766 9th Ave. (51st St.)

Lenny’s, 60 W. 48th (5th & 6th), sandwiches

Menchanko-Tei, 43 W. 55th (5th & 6th Aves.), Japanese noodle shop

Roberto Passon, 741 9th Ave. (50th St.), Italian (Venetian)

Sarabeth’s Central Park South, 40 Central Park South (5th & 6th), brunch

Topaz Thai, 127 W. 56th (6th & 7th), Thai

Wondee Siam II, 813 9th Ave. (53rd & 54th), Thai

Wu Liang Ye, 36 W. 48th (5th & 6th Aves.), Szechuan

February 8, 2008

Do Quality Reviews Lead to Increased Student Achievement?

spiffboy2.jpg
skoolboy wraps up his posts on Quality Reviews. His first two posts can be found here and here.

Do quality reviews lead to increased student achievement? There’s been surprisingly little research that addresses this question. Most research on quality reviews has examined the school inspection process in Great Britain managed by the Office for Standards in Education (Ofsted), a national agency which reports to the Parliament. Since school inspections for primary and secondary schools were instituted in 1993, there have been several iterations in the school inspection process. But I haven’t found any persuasive evidence that inspections improve student achievement. Some teachers and administrators report that they intend to change their practices in response to the inspection report, but I’ve not seen studies which examine whether those intentions translate into improved practice.

You might get the impression from my postings this week that I think that quality reviews are a bad idea. Not necessarily! But there are some things that I think are essential for quality reviews to be a good idea. Here’s a brief list:

The purpose of the review must be clear. Sociologist Gary Natriello has written about four potential purposes for evaluations in schools: motivation, direction, certification and selection. The first two can contribute to school improvement, whereas the latter two are more concerned with regulation, accountability, and control; and it’s desirable to confront the tensions between improvement and control directly. If the purpose of a quality review is to improve how schools work, then all phases of the review process need to be oriented towards this purpose.

Definitions of quality must be clear and transparent. If there are clear criteria and standards for what constitutes school quality, then both educators and inspectors can orient their activities towards these criteria and standards. Unclear standards and definitions undermine the legitimacy of the quality review process. My impression is that the Ofsted criteria are a lot clearer than those that I’ve seen stateside. Quality teaching is a particularly challenging phenomenon to articulate; but if the goal is to improve teaching, we’ve got to be able to do it.

The quality review process must be designed to collect a sufficient amount of data on quality. If, for example, the purpose of the quality review is to improve teaching, then presumably there should be sustained collection of data on teaching quality, primarily through direct observation, but perhaps in other ways as well. Ms. Frizzle recently commented that in her New York City school, the quality reviewer was planning to observe 9 different classrooms in 30 minutes. Not much data on teaching quality will come from such a process. The intensity of data collection is a recurring challenge in evaluation research that involves site visits, because they are labor-intensive. “Drive-by” site-visits just aren’t very useful, even if conducted by well-trained observers, because they don’t gather enough data on the things that matter.

The frequency of quality reviews should be synchronized with a theory of how fast school quality is changing. This is Social Research 101: phenomena that change more quickly need to be measured more frequently to detect such changes, and phenomena that change more slowly don’t need to be measured as often. How frequently should we assess school quality? The school year is an arbitrary metric, and it may be wasteful and counterproductive to conduct school quality reviews on an annual basis. (In Great Britain, Ofsted inspects primary schools every three years.) Given a choice, I’d rather have less frequent, but more intensive, quality reviews.

February 7, 2008

Quality Reviews and the Fetishization of Data: A Fantasy

spiffboy2.jpg
skoolboy returns for part II on Quality Reviews. You can find his first post here.

The year is 1975. Coach John Wooden of UCLA has just won his 10th NCAA men’s basketball championship in 12 years, a record that will likely never be matched in collegiate sports. Cambridge Associates sends Clive Wingtip to conduct a Quality Review of the UCLA program. Over the course of a day and a half, Wingtip talks with Coach Wooden, his assistant coaches, the players, and other staff, and observes the team practices. He also observes a collaborative activity: a meeting between Coach Wooden and his assistant coaches. The program is evaluated on five quality statements, each scored as either underdeveloped; underdeveloped with proficient features; proficient; well developed; or outstanding. Here’s a summary of the report he filed:

Quality statement 1: “The coach and staff consistently gather and generate data and use it to understand what each player knows and is able to do, and to monitor the player’s progress over time.”

There is evidence that Coach Wooden studies each individual, and his strengths and weaknesses, very carefully. But he does not rely on statistics collected during practice sessions or games to inform his judgment, and his observations are subjective, not objective. He does not measure performance and progress based on comparisons to similar schools. Overall Score: Underdeveloped

Quality statement 2: “The coach and staff consistently use data to understand each player’s next learning steps and to set suitably high goals for accelerating each student’s learning.”

The Coach and staff convey consistently high expectations to the players, and set specific goals for the team and for individuals. But Coach Wooden regularly played only 7 of the team’s 12 in games, and these same 7 practiced as a unit, suggesting that the reserves were not as important as the regulars. Coach Wooden occasionally displays the soft bigotry of low expectations. He states, “There is nothing wrong with the other fellow being better than you are, as long as you did everything you possible could to prepare yourself for the competition. That is all you have control over. It may be that the other fellow’s level of competency is simply higher than yours. That doesn’t make you a loser.” Overall Score: Underdeveloped with Proficient Features

Quality statement 3: “The program aligns its work, strategic decisions and resources, and effectively engages players, around its goals and plans for accelerating players’ learning.”

Although there was evidence of individualized instruction, the Coach and staff did not use objective team and individual data to plan for and provide this instruction. Team members trusted and respected Coach Wooden. Overall Score: Proficient

Quality statement 4: “The program has structures for monitoring and evaluating each player’s progress throughout the year and for flexibly adapting plans and practices to meet its goals for accelerating learning.”

Coach Wooden relies heavily on repetition during the season. One player said, “He never talks about strategy, statistics, or plays but rather about people and character.” The program did not rely on objective measures to assess progress towards goals, such as the final score of games or written tests, and did not have a playbook. Coach Wooden frequently adjusted the plans for each practice, and made notes after each practice about adjustments; but these notes were based on subjective judgments, not hard data on performance. Overall Score: Underdeveloped with Proficient Features

What’s wrong with this picture? How does the man voted Coach of the Century by ESPN receive a quality review rating of “underdeveloped”? It’s all about the quality criteria, which privilege using data to make decisions about how to help student/athletes to learn and develop over simply making good decisions about teaching and learning. I refer to this as the “fetishization of data.” In the Quality Review game, quantitative performance data have become an end in themselves, rather than a means to an end. One of the most cutting insults that one social scientist can hurl at another is to label another’s research as a “data dump.” In some school districts which have embraced external quality reviews, compiling notebooks full of undigested data has become a substitute for thoughtful analysis of a reasonably small number of important themes and problems.

What’s the solution? I’d start by broadening the definition of data beyond quantitative performance measures. There’s no doubt in my mind that Coach John Wooden relied heavily on data to inform the design of his practices, and his approach to cultivating the talents of his players and his team. But those data took the form of the systematic observations and judgments of an expert practitioner. I’d also seek to evaluate schools on the basis of the quality of the teaching within them, not whether the educators in a school arrived at their teaching practices via the analysis of quantitative performance data.

“Don’t mistake activity for achievement…If you spend too much time learning the tricks of the trade, you may not learn the trade.” (John Wooden)

January 24, 2008

Data-Driven Decision Making Gone Wild: How Do We Know What Data to Trust to Inform Decision-Making?

spiffboy2.jpg
skoolboy returns to weigh in on data-driven decision making:

I’m as much a fan of data as the next guy. But I worry that proponents of data-driven decision-making are understating just how hard it is to use data thoughtfully.

I’d like to describe the strategy championed by the New York City Department of Education, and point out the difficulties involved. The logic that the DOE is promoting is (a) use data to identify an area where a school is lagging, either in relation to some absolute standard or to other similar schools; (b) use the available data systems to identify similar schools that are doing better in this area; (c) ask these more effective schools what they are doing that accounts for their success; and (d) adapt their suggestions for use in the school.

It’s not as easy as it looks to determine which schools are doing better than others. Two different criteria are relevant: is the difference in performance between two schools large enough to matter, which is sometimes termed educational significance or practical significance; and is the difference in performance between two schools real, or could it just be due to chance, which is typically described as statistical significance. Ideally, we are interested in differences that are both practically and statistically significant. But a difference could be large, but not statistically significant (which is often the case when we have a small sample of information about performance), or statistically significant, but very small (in which we are pretty sure that the difference is real, but it’s just not very important). (Yes, statistical significance does matter!)

This is kind of abstract, so here’s an example, drawn from the NYC Department of Education’s Survey Access tool, which reports the results of the system’s first round of Learning Environment Surveys in the spring of 2007. The Department’s spiffy PowerPoint presentation imagines the principal and a group of teachers in (mythical) IS 402 identifying teacher engagement as an issue. In particular, teachers in this school generally disagreed that “Obtaining information from parents about student learning needs is a priority at my school.” Using the Survey Access tool, it’s possible to identify 12 similar NYC schools (i.e., middle schools with an enrollment over 700 and at least 25% ELL students), seven of which have more positive scores on this question. In the top school, the Eleanor Roosevelt School, 71% of the teachers strongly agreed or agreed with the statement, whereas in the bottom school, 13% of the teachers strongly agreed or agreed. (In mythical IS 402, 36% of the 31 teachers who responded to the survey strongly agreed or agreed.)

So why not just look at the seven schools above IS 402? Because the percentages of teachers strongly agreeing or agreeing is an estimate of the true percentage that would be observed if all teachers in the school responded to the survey. (In these 12 schools the teacher response rate ranged from 26% to 53%; in mythical IS 402, 40% of the teachers responded.) Our interest is in the population of teachers in the school, not just the sample that chose to respond. And there’s a degree of uncertainty in these estimates. If a different group of 31 teachers in IS 402 responded, just by chance, we might not have obtained an estimate of 36% strongly agreeing or agreeing. In fact, with a sample of 31 teachers responding and a sample estimate of 36%, the percentage of all of the teachers in IS 402 agreeing or strongly agreeing could plausibly range from 23% to 49%. (There’s a finite population correction in there, for those who care about such things.) That’s a pretty big range, and the range of possible values is pretty large for the other dozen schools as well.

Of the seven schools above IS 402, just one of them, the Eleanor Roosevelt School, is really head-and-shoulders above it in a statistical sense. The other six are statistically indistinguishable, because there’s so much overlap in the intervals in which the true percentage of all of the teachers strongly agreeing or agreeing in each school lies.

Would the principal and teachers in IS 402 learn something from asking the staff in these seven other schools how they do things? Sure! It doesn’t hurt to think about new ways of doing business. Will doing so raise performance in IS 402? Probably not. Because an assessment of statistical significance suggests that, with the exception of Eleanor Roosevelt, these other schools really aren’t doing better, and therefore there’s no reason to think that adopting their practices will yield genuine improvements.

Data-driven decision makers, beware of spurious comparisons.

January 11, 2008

Jay Greene and the Magic Abacus

spiffboy2.jpg
From time to time, my colleague skoolboy will pop in and say a word. (You can check out his holiday posts about class size here):

Greene and Catherine Shock, writing in the Winter issue of City Journal, contend that ed schools care more about the political and social ends of education than basic academic skills. In a survey of U.S. News and World Report’s top 50 ed schools and 21 other flagship state universities, they examined course titles and descriptions in order to calculate a “multiculturalism-to-math ratio”—the ratio of courses that emphasize multiculturalism to those that focus on math. At the average education school, they contend, the multiculturalism-to-math ratio is 1.82, but at some schools, the ratios are much higher. At UCLA, for instance, 47 courses include the words “multiculturalism” or “diversity,” whereas only three contain the word “math,” for a ratio of almost 16 to 1.

skoolboy likes this kind of research, because he doesn’t have to leave the comfort of his office to figure out what students are expected to know before they are admitted to their degree programs; what courses they are required to take for their degrees; and what they actually take. It’s so much more convenient to look at course catalogs. I decided to do the same kind of analysis for Harvard Medical School, looking at the course offerings for the 2007-08 academic year. Did you know that there’s not a single course that mentions the word math?! But there are two that mention either diversity or social justice. Why, that’s a ratio of … hmm, I’m in an ed school, I guess I’m not sure. But I think it’s outrageous that the faculty of Harvard Medical School don’t care if their students know anything about math.

I decided to take a closer look at UCLA, which offers a Mathematics for Teaching B.S. degree. The preparation for the major requires seven courses in mathematics, and courses in physics, computing, and chemistry or biochemistry. The major itself requires 13 mathematics courses. (UCLA operates on a quarter system.) None of these courses is offered in UCLA’s Graduate School of Education and Information Sciences, but I don't think you can say that the school doesn't care about the mathematical preparation of its prospective math teachers.

The villains of Greene and Shock’s story are familiar: ed school professors accountable to no one but themselves, and blindly allegiant to multiculturalism and diversity; students shying away from math because it’s hard; and spineless accreditation bodies such as NCATE that care more about multiculturalism and diversity than subject matter teaching. Little wonder we’re getting our butts kicked by Slovakia in international assessments!

Too bad that the story is so distorted. One of NCATE’s constituent organizations is the National Council for Teachers of Mathematics, which has explicit standards for prospective math teachers’ content knowledge, field experiences, and mathematics teaching processes. Most states now regulate teacher preparation programs in ways that are intended to insure that teachers have adequate subject matter knowledge. And a little-known piece of legislation called No Child Left Behind has sought to promote this as well. We’re still some distance from agreement on how to discern teachers who know their subjects and who know how to enable students to master them; but no one’s proposing using course catalogs for this purpose.
The opinions expressed in eduwonkette are strictly those of the author and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Get RSS

Get eduwonkette delivered by e-mail. Enter your e-mail here:

Delivered by FeedBurner

Advertisement
Powered by
Movable Type 3.34
<

EW Archive