eduwonkette_header_515.jpg

Through the lens of social science, eduwonkette takes a serious, if sometimes irreverent, look at some of the most contentious education policy debates. (Find eduwonkette's complete archives prior to Jan. 6, 2008 here.)

Main

August 20, 2008

Educational Research Cherry Pickers Need a Union

large4cherries.jpg
Over at EdWize, Leo Casey has offered to help Educational Research Cherry Pickers form their own union - they have been working too hard. Will the next cover of Education Next proclaim, "Hasta la victoria siempre?"

July 14, 2008

Lessons for Education Policy Research from the Market for Lemons

insights_lemon_car.jpg
Does the market for research in education policymaking work pretty well? For once, eduwonk, Dean Millot, and I all agree - it doesn't. The “market for lemons,” which Jay Greene makes reference to in his most recent post, gives us insight into why.

A common rationale given by economists for intervention in selected markets – for example, insurance markets - is the problem of asymmetric information, a gap in information available to buyers and sellers in a market. Using the example of used car markets, Nobel prize winning economist George Akerlof lays out this dilemma in his famous paper, “The Market for Lemons: Quality Uncertainty and the Market Mechanism.”

Imagine you’re selling a used car. You know the problems with your car, but your potential buyers don’t. You may be trying to swindle unsuspecting buyers because you know it has major defects. But your potential buyers aren’t stupid, and they know that they can’t trust you to provide an honest appraisal of your car’s problems.

If buyers don’t decide to avoid this market altogether, they end up betting on averages. They’ll only pay a price that reflects the average frequency of lemons in the used car market. That’s a price that’s too high for a lemon, but too low for a car of good quality. If you’ve got a good car, you know you’re going to get too low a price in the used car market, so you’re likely to not to sell there.

When sellers of good cars refuse to sell, lemons increase in frequency in the market. As a result, the people selling good cars are really in trouble, because they will end up getting an even lower payout for a good car. Now they are even less likely to sell them there, and the frequency of lemons continues to rise.

Left unchecked, the end result is market failure. What this means is that there are people who want to buy good cars and people who have them to sell, but that they are afraid of getting stuck with a lemon keeps that trade from happening.

The situation in education policy is analogous, but a little different. Sellers in the research market know what they are selling, but buyers like policymakers, journalists, and superintendents don’t have the expertise to evaluate what they are buying. They don’t differentiate between a paper in the American Economic Review – the best peer-reviewed journal in economics – and a report issued by a pro-vouchers thinktank. Unlike the used car market, the buyers aren’t always suspicious enough, in some cases because the buyers are constantly changing and don’t have the time to build up knowledge about reputations, which help to regulate markets with asymmetric information. Journalists get moved around from beat to beat, and policymakers come and go.

For some parties, there’s no incentive to be suspicious. Stories need to be written, laws need to be pushed through, and it’s not the editor or reporter or legislator who gets stuck on the side of the road when the car sputters out. It’s the public that gets left holding an empty bag when we rely on potentially flawed research to shape public policy.

Anyone have ideas on how this market could operate better? Or do ideologically driven policymakers, who can find “research” to support just about anything, simply prefer the status quo?

July 11, 2008

The Influence Spectrum: From Blogging to Academic Research

Let me use the occasion of Jay Greene's response to my earlier post to explain the differences between blogging and research, as I see it. Greene makes no distinction between the two activities, and is, as a result, skeptical about my anonymity. As he explained to me off-blog, "The same basic principles apply. They are both part of the spectrum of how people communicate ideas that may be related to policy decisions."

Blogs provide opinions, commentary, and analysis. Blogs are a place to discuss ideas, consider other points of view, and hear what a community of readers has to say. Blogging is great for testing out ideas, reflecting on the news of the day, and discussing and disseminating existing research. But bloggers don't do academic research. Academic research, in contrast, is subject to norms about method. The central norm in academic research is subjecting your work to the scrutiny of a critical community of scholars.

Undoubtedly, blogs, thinktank research, and academic research are "part of the spectrum of how people communicate ideas that may be related to policy decisions." But different levels of confidence should be assigned to different parts of that spectrum in educational policymaking. Below, from least to most credible:

1) Blogs: Blogging is free-form exchange, and the blogger is judged by the quality of his or her arguments and content by readers who seek out the blogger. Blogs are grassroots online communities where everyone, irrespective of their identity, is entitled to an opinion.

2) Thinktank research: Thinktank research is generally released without external review. The questions that are asked and the policy recommendations that are put forth are usually – but not always – tied to the stated objectives of the organization, which are sometimes ideological in nature. Thinktanks are well-funded and endowed with PR departments that publicize studies to policymakers and the media. As a result, thinktank research on education receives more attention than blogs and academic research in the media.

It is important to note that thinktanks do vary significantly in the extent to which they internally and externally review work before releasing it. They also vary in the extent to which they make their methods transparent enough that their analyses can be evaluated and replicated. Some thinktanks are more judicious than others about describing the implications of their work for policy and in spinning their findings. And some thinktanks don’t appear to sanction researchers when their studies are consistently discovered to be wrong. I imagine that other thinktanks would treat such a violation differently, because at the end of the day, these mistakes reflect poorly on the institution.

3) Academic research: Academic research is intended to contribute to a body of scholarly knowledge, and is subject to thorough peer review and to norms of scholarly inquiry. Though it is often policy relevant, the primary audience for this research is a community of scholars, who judge the research not for its policy contributions but its innovativeness, rigor, and contributions to a body of literature.

But peer reviewers are human, too, and they come with their own set of biases; the idea of a search for truth immune to ideology is a fantasy. Academic research that is imperfect does get published. And people do make mistakes in their papers, both innocent and intentional. That's why one of the norms of scholarly inquiry is to replicate studies and to take caution in declaring that the case is closed on any issue. This can be thorny, because academic research communities are small and dense. Everyone knows everyone else, and the scholars that take on prominent colleagues, even when they are clearly wrong, can pay a handsome price. People also have personal relationships with mentors and colleagues, and sometimes we don’t challenge each other as much as we should.

For all of these reasons, peer review is double-blind. In practice, papers are submitted to conferences before they are submitted to journals. On more than one occasion, I have reviewed papers of scholars who have sat on the same conference panels that I have. But the academics whose work is under review do not know the identity of their reviewers (except when reviewers cry foul that their work wasn’t cited, and suggest references that give away their identity!), and this provides a countervailing force against the social dynamics that sometimes cloud our judgment. And with academic research, no study is taken as a "killer study," and Jeff Henig has advocated for the same in the policymaking arena. Rather, individual studies are put in context of a larger evidence base.

To be sure, I, and some other bloggers, will occasionally present and analyze data in our postings, with the goal of persuading readers of a point of view. When I do so, I provide links to the data, which are generally in the public domain. When these data are not publicly available, I have always extended an offer to my readers to request data from me, which they have often done. When these posts involve more than making figures using publicly available numbers, I also provide detail about what I've done, which is simple descriptive analysis that a competent Excel user can replicate.

But there's no pretense that this is peer-reviewed academic work. And let's be realistic: an anonymous blogger isn't shaping public policy. In equating the two, Greene either overstates the influence of this blog on education policy, or diminishes the contributions of his own work. Of course, if my postings lead readers to think differently about research and policy matters, then those readers may have an influence. I see this as a very different dynamic than with thinktank research, where, because the objective is to influence public policy directly through research, the researchers have a greater obligation to their audience to vet what they've done before taking it public.

Finally, to Greene's point that my anonymity makes it impossible to "consider the source" - is it likely that Education Week would host an anonymous blog by someone working for or funded by "special interests" in education? Or that they would allow me to critique policymakers with whom I have some conflict of interest? The editors at Education Week know who I am, and decided to host this blog with full knowledge of my professional biography. I'm quite proud of - and grateful for - the community that we've built here, which has challenged and refined my own thinking on a wide variety of topics. At the end of the day, potential readers can decide for themselves whether this blog is worth reading, can tell me when they think I'm wrong (and you often do), and can expect me to listen, and even modify my positions in response.

Update: Be sure to check out Dean Millot's exceptional post related to this issue, The Letter From: "In short, I see no problem with research becoming public with little or no review” (I) , as well as Sherman Dorn's from earlier in the week, Can reporters raise their game in writing about education research? eduwonk also weighs in here: Politics of Information.

July 8, 2008

The Trouble with the Education Policy Advocacy Industry: "Building on the Basics"

extra-extra.gif
Today, Marcus Winters, Jay Greene, and Julie Trivitt are releasing a study called, "Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects."

It may be an elegantly executed study, or it may be a terrible study. The trouble is that based on the embargoed version released to the press, on which many a news article will appear today, it's impossible to tell. There is a technical appendix, but that wasn't provided up front to the press with the glossy embargoed study. Though the embargo has been lifted now and the report is publicly available, the technical appendix is not.

By the time the study's main findings already have been widely disseminated, some sucker with expertise in regression discontinuity may find a mistake while combing through that appendix, one that could alter the results of the study. But the news cycle will have moved on by then. Good luck interesting a reporter in that story. And even when researchers working in the policy advocacy industry make sloppy, indefensible errors - for example, when Greene and Winters used data that the Bureau of Labor Statistics warned against using to show that teachers are overpaid - they're not approached with caution by the press when the next report rolls around.

So as much as I like to kvetch about peer review and the pain and suffering it inflicts, it makes educational research better. It catches many problems and errors before studies go prime time, even if it doesn't always work perfectly.

As for the Winters, Greene, and Truitt study, the jury is still out - as it should be until we have more information. I'll get back to you once I've read the technical appendix.

Update:

Let me first apologize to Jay Greene and my readers for shooting off a short post before teasing out all of the complexities around thinktanks, research, and the reporting of research in the popular media. I used Greene's paper as a vehicle for doing so, and that may have made it appear that I was criticizing the quality of that study when I was not in a position to do so. I shouldn’t have raised questions, even hypothetical ones, about the methods in that paper until the technical report was available for review, and you should definitely read Greene's response here.

This issue, however, is much larger than this particular Manhattan Institute report, and I want to use Greene's critique - that I have posted on working papers from the National Bureau of Economic Research (NBER) - to point out some important differences between papers issued by outlets like NBER and thinktank reports:

1) With NBER papers, everything is on the table upfront. They are scholarly papers that include extensive methods sections and robustness checks in every paper. Greene writes that, “If [reporters] requested the technical report, they could get that.” But the press release makes no mention of a technical report at all. The key difference is that there’s an extra step in the process to get to the detailed methods, which reporters writing articles could ostensibly circulate to other scholars for comment before writing an article.

2) There is no PR machine behind NBER papers. It’s one thing for me to write about a study on my blog. It’s entirely another to send press releases to reporters at newspapers and other media outlets, who in turn – and this is their fault, not Greene’s – cover his report like it’s a final product. The more complex the methods are, the more there is a need for peer review because it becomes more difficult to eyeball the problems from the sidelines – and Greene and his colleagues are using sophisticated methods in this report.

3) NBER papers generally aren’t trying to persuade anybody in particular of anything. They are not intended to sway public policy. Contrast this with the press release approach of policy advocacy thinktanks. For example, the press release for this study said, "In this report, Winters, Greene, and Trivitt dispel the myth that high stakes testing in reading and math will harm student proficiency in low-stakes subjects. The data from Florida provides further evidence for policy makers considering the renewal of No Child Left Behind, showing that national testing incentives improve overall educational achievement levels.”

4) NBER has implicit quality controls. It is a community of scholars to which one must be invited, one that has strong norms about how research is conducted and reported. The quality of the average NBER working paper is extremely high. There is much less variation in the quality of NBER papers than there is in thinktank reports. On some level, this is an issue of the trustworthiness of institutions; for example, I trust a report coming out of RAND or Mathematica more than I do one coming out of the Heritage Foundation, because neither RAND nor Mathematica have a stated ideological agenda.

For the best treatment of the thinktank issue I’ve ever seen, see this post by Dean Millot, and his preceding posts here and here.

May 30, 2008

Culture, Gender, and Math

i%20love%20math.gif
Larry Summers' fatal gaffe, in which he suggested that innate differences between men and women may explain why fewer women succeed in math and science careers, set of the latest round of the gender math wars. Though many are in a tizzy over a "boy crisis" in education, as early as the fall of kindergarten, boys outperform girls in math at the top of the distribution (i.e. if we compare girls at the 95th percentile with boys at the 95th percentile). By the end of third grade, boys outperform girls in math not just at the top, but throughout the entire distribution. These early differences persist through high school.

To pull apart culture and biology, authors of a study in this week's Science analyzed data from PISA, an international assessment which tested students in 40 countries (see USA Today coverage here). The authors linked PISA data to survey data on gender attitudes (questions like "Should women work outside the home?" and "Is it more important for a man to get a college education than a woman?"), rates of women's political and economic participation, and the World Economic Forum's gender gap index.

The gender gap in math varies substantially in size across countries, which suggests that innate factors alone cannot explain this gap. In more gender-neutral societies, girls do as well as boys in math. In Iceland, Sweden, and Norway, which are characterized by more gender equity, girls do as well or better in math. The largest gap was in Turkey. The authors also found that the reading gap favoring girls was widest in countries with more gender equity.

Here's the kicker - in a finding likely to incite those who believe girls' gains have come at the expense of boys, the authors found that overall scores in math and reading were highest in countries offering more advantages to women, and lowest in those with more gender inequities. Said study author Paola Sapienza, "This is important because it shows that advances for girls do not come at the expense of boys."

Gender equitable countries are also different in many other ways, so it's possible that other factors explain these findings. Nonetheless, these findings serve as a potent reminder that the gender gap in math achievement is not driven by nature alone.

May 29, 2008

Educating a New Majority: The Condition of Education 2008

The National Center for Education Statistics released the 2008 Condition of Education report this morning. If you need any basic stats on education – early childhood through post-secondary – this 300+ page report is for you.

In this year's report, the NCES drew attention to the changing demography of American schoolchildren. Minority students make up 43 percent of American public school enrollment, and higher proportions in the South (48%) and West (55%). One in five children speak a language other than English at home. The graph below shows demographic enrollment trends from 1986-2006 by region.

Minority%20enrollment.jpg

Also striking is the extreme racial segregation of our schools. No, it’s not new news – but these figures never fail to astound me. 31% of African-American students attend schools that are 75% or more African-American, while 64% of white public school students attend schools that are 75% or more white. The graph below shows a slightly different cut of the segregation data – the proportion of students in each racial/ethnic group that attend schools with various concentrations of minority students.

Minority%20concentration.jpg

May 22, 2008

Sol Stern and the SUTVA Shenanigans

reading%20first.jpg
Sol Stern's new article on the Reading First study shenanigans offers a window into the central challenge of randomized experiments in education. That challenge is the violation of the Stable Unit Treatment Value Assumption (SUTVA) required for clean causal inference in randomized experiments. As articulated by ninja statistician Don Rubin, the most common violation of SUTVA involves "interference between units."

What does interference mean? The idea is that Serena's outcome should not be affected by whether her peers Blair and Vanessa were assigned to the treatment or control condition. In other words, one subject’s outcome should depend only on the treatment to which that subject is assigned, not on the treatment assignments of other subjects. But in many social settings, there are peer effects and social spillovers and, as a result, others' treatment assignments likely do affect one's own outcomes. Vaccinations provide another clear example - my risk of contracting a disease is dependent on my exposure, which is in turn affected by whether others have been vaccinated. In the case of Reading First, one school's treatment likely affected what other schools in the district did, as Stern details in his article.

The lesson here is not just about the Reading First study. Experiments in social science are fundamentally different than experiments in medicine, and it turns out gold standard is often more silver or bronze than we would have hoped. Don't get me wrong - I still dig experiments - but I'm not counting on them to solve all of the problems that vex our schools.

May 21, 2008

skoolboy's Platinum Law of Educational Research

spiffboy2.jpg
eduwonkette's "Iron Law of Qualitative Research in Education" is that the number of participants in the study should exceed the number of authors on the paper. Ha-ha, very funny, but the subtext is that (a) we cannot learn anything of value from studies that have small sample sizes; (b) qualitative research often has small samples; (c) therefore, we can't learn very much from qualitative research. Eduwonkette would protest that that's not what she's saying at all—"qualitative research is critical to educational research and policy," and I know that she does believe this. But poking fun at a paper reporting qualitative data without explaining why does her readers, and those who believe that qualitative research can be of great value, a disservice. I'd like to upgrade eduwonkette's Iron Law to skoolboy's Platinum Law of Educational Research: Poorly designed and conceived research is poorly designed and conceived research, regardless of the sample size.

I'll leave a defense of research using small samples for another day, and focus on why I think that the paper eduwonkette drew to our attention is poorly designed and conceived. I don't want to go on too long about this—there's a lot more to say than will hold the attention of casual readers—but here's the gist. The authors claim that teaching for social justice evokes a range of emotions in novice teachers, and they seek to understand the strategies that teachers use to navigate their emotional responses, and the implications of those strategies for their self-understandings and practices. I found the concept of socially just teaching confusing, but I'll accept the possibility that there are teacher education programs and novice teachers that are committed to the idea of teaching in ways that promote the life chances of members of marginalized groups in society, such as the poor and racial/ethnic minorities. In this paper, teaching for social justice is taken for granted as a good thing, which I know vexes some readers here, and the study seeks to build on previous work on emotions and emotional navigation in teaching. It's not news that teachers often express ambivalence about their work, and that they might struggle with how to respond to feelings of ambivalence.

The authors introduce the term critical emotional praxis to characterize the role of emotions in socially just teaching. This is not an analytic term emerging from their analysis of data on how teachers manage emotions in their work; rather it is a normative term—that is, a term that describes what the authors think the role of emotions in teaching for social justice should be. In their view, critical emotional praxis involves understanding the role of emotions in engaging with unequal power relations in classrooms and society; acknowledges the interplay between a teacher's local context and her emotional responses; and moves from a theoretical understanding of emotion to a practical set of relationships and teaching practices that promote teaching for social justice. I find this concept to be of minimal value for research purposes, since it has no apparent relationship with observations of teachers' practices and emotional states.

The purpose of the study is to describe how a novice teacher seeking to teach for social justice navigates her ambivalent emotions. The authors don't offer an explanation of why a case study of a single teacher is appropriate to address the questions they pose about emotional navigation in teaching. In this particular study, one of the authors observed the teacher for 80 minutes per day during the final 9-week period of her first year of teaching, and interviewed the teacher six times for two to three hours at a clip. The teacher's department chair and 10 students were interviewed as well. A year later, an author interviewed the teacher once for three hours, and did two more 80-minute classroom observations. Although the authors acknowledge some of the problems associated with the fact that the subject of the study was a former student of one of the authors, a teacher educator who taught her about socially just teaching, these problems are not adequately addressed in the research design.

What are some of the key findings of the research? One pertains to the teacher's mode of response to her emotions. The teacher, Sara, began seeing a professional counselor in December of her second year of teaching. She also enrolled in a course on nonviolent communication, and began sponsoring her school's forensics team. These three concrete modes of response, the authors contend, gave her insight into her self and emotions, and provided concrete strategies for relaxing, having fun, and balancing her feelings of sadness stemming from her observations of social injustice. With what consequences? She quit teaching, leaving her school and volunteering at an orphanage and school in a developing country.

What's wrong with this picture? I think the authors lacked a theory of when novice teachers might develop feelings of ambivalence and seek out strategies for coping with them. In this study, most of the action took place in the teacher's second year of teaching, and the primary source of data on these strategies is a retrospective interview conducted at the end of the second year. Therefore, the authors missed most of the action, and can only provide a bare-bones understanding of even this one case. Moreover, the fact that this teacher left the field of teaching raises serious questions about whether this case can inform teacher education in the ways that the authors hope. One reading of the results is that the teacher's leaving of the field is prima facie evidence that her strategies for coping with the feelings of ambivalence associated with seeking to teach for social justice didn't work; and although we can certainly learn from strategies that don't work, a study that shows strategies that do work would likely be more valuable.

The problem with this paper is that the intellectual payoff is nowhere near commensurate with the amount of space it took up in a major journal—45 journal pages, from start to finish. I agree with eduwonkette that it doesn't reflect well on the field of education research to have papers which make marginal contributions taking up so much airtime, and the time I spent reading this paper is lost forever—time that I could have spent in other, more valuable ways, like updating my Facebook page or grading papers.

But: the take-away message here is not that a study with a small sample—even an N of 1!—cannot contribute new knowledge to the field of educational research. It's that a badly designed and executed study won't contribute much. And bad design and execution have to do with a lot more than sample size.

May 20, 2008

Violation of the Iron Law of Qualitative Research in Education, #1,321

The Iron Law: The number of participants in the study should exceed the number of authors on the paper.

Yet I opened up the latest issue of the American Educational Research Journal to discover a violation of said rule in the article, "The Emotional Ambivalence of Socially Just Teaching: A Case Study of A Novice Urban Schoolteacher," which has two authors. Got to love the "convenience sample" - the novice teacher is a former student of one of the authors. Jay Greene, I am totally going to dominate your bingo game with one article only.

No disrespect to qualitative research intended here - in fact, I believe qualitative research is critical to educational research and policy. But qualitative research comes in many flavors (as does quantitative research), and studies like this one do not help the cause.

May 16, 2008

Brain-Based Education: Don't Get Snookered!

Brain_Witelson.jpg
"Brain-based education" is K-12's latest fad. Dan Willingham, a professor of psychology at the University of Virginia, has put together a 10 minute video about what we know - and what we need to know - about brain-based education. If you know nothing about cognitive psychology (like me) but want to size up this trend, this video is a helpful introduction. Kudos to Dan Willingham for putting this resource together.

May 15, 2008

Dean Millot's Comment on the Ayers AERA Affair

deanmillot-blog.jpg
Regarding the Ayers affair, Dean Millot posted the following comment below and over at Flypaper, but it is worth reprinting in full:

I'm a lawyer now involved in k-12 education with a long background in national security.

Putting on my lawyer hat - Ayers was a fugitive from justice, but all charges against him were dropped in light of prosecutorial misconduct.

Putting on my national security hat - to describe him and the Weather Underground as terrorrists is a bit of hyberbole. As a tactic of political struggle, terrorism refers to the indiscriminate use of force against innocents. The Weather Underground targeted government and military facilities - and warned potential victims prior to their actions. Their actions were criminal, but they were not Al Quada, the IRA, Bader-Meinhoff, or the Red Army faction. It devalues the serious nature of terrorism to slap the label on every misguided or even deranged person with a bomb.

Putting on my k-12 hat, the man may have radical views, but presumably members of AERA havent found them to be a bar to his role in an norganization nfocused on research. If AERA is too radical for some, they might form a separate group.

As a citizen of this free society, I also have something to say. To call someone who has never been found guilty of of a violent crime, let alone terrorism - the highly charged word "terrorist," is to take political debate back to the atmosphere of McCarthyism. "If you don't agree with me, you must be a Communist - or in this case a terrorist (and I, by implication, must be a patriot)."

I don't agree with Mr Ayers politics or many of his views, but I'll be damned if I'm not going to protest actions and tactics that can only drag poitical discourse into the mud.

To paraphrase one historic response to Senator McCarthy - "Have you no shame?"

May 14, 2008

Mike Petrili and the Meese Police

police.jpg
Earlier in the week, Mike "Milli" Petrilli asked if I "favor electing former terrorists to key positions of authority within the education research community." Here's the backstory: In his Memo to the AERA, Petrilli suggested that the AERA council should unseat Bill Ayers as Vice-President Elect of Curriculum Studies. I disagree. While I do not condone his actions, Bill Ayers was democratically elected, and the right of professional associations to self-govern should be respected.

Mike believes that Ayers' presence reflects badly on the whole association, but guilt by association is a shaky principle. I don't judge Mike Petrilli, whose colleagues at the Hoover Institution include upstanding guys like Ed Meese and Donald Rumsfeld, based on his association with them, nor do I believe that AERA is tainted by having Ayers among its leadership. Mike might argue that Meese and Rumsfeld have records of accomplishment that justify their affiliation with Hoover. The same is true regarding Ayers and AERA.

All that said, Mike deserves props for his memo to ED in '08's Roy Romer.

April 20, 2008

skoolboy on: The Status of the Status Quo in Education Policy

spiffboy2.jpg
Over at The Quick and the Ed, one of the many house organs of Education Sector, Kevin Carey is conducting a serial monologue belittling eduwonkette as an “alleged social scientist.” “Alleged”? Yeah, I’ll allege it – eduwonkette is a social scientist. It’s not an epithet, as much as Carey might believe; to some of us, it’s a way of life.

What’s the latest bee in Carey’s bonnet? It’s eduwonkette’s contention that particular value-added assessment systems for evaluating teacher performance are not ready for prime time. Carey views the claim that a particular policy alternative has identifiable flaws as tantamount to embracing a status quo that is demonstrably flawed. Public education clearly isn’t working, he argues. Therefore, any policy alternative to the status quo is to be preferred. Anyone who raises caveats about any kind of change is just an apologist for the status quo, a weasel, and probably a bed-wetter too.

The problem, Carey opines, is that social scientists such as eduwonkette – wait a minute, is she a social scientist or not? – have unrealistic standards for evaluating policy alternatives. “The standard in public policy isn't 95%,” he writes, “ it's whatever is most likely to be best: 51%. “ I’m not sure what the 95% refers to here, but most policy analysts I know are in the business of trying to recommend a policy alternative based on multiple criteria: the likely consequences of the alternative for various desirable outcomes; its cost; its feasibility and sustainability; its consistency with public values; and the likelihood of successful implementation. The hard reality is that there often isn’t a very strong evidence base for making these judgments, and policy analysts have to consider a range of possible outcomes along these criteria (a confidence interval that expresses the uncertainty about what might happen), and confront the tradeoffs, because invariably no single policy alternative looks best across all of these criteria. Simply having a good big idea—choice, accountability, charters, vouchers, whatever—isn’t enough to carry the day, because the devil of public policy is in the details. The world of policy analysis is littered with examples of good ideas that were implemented poorly, and thus did not have the desired effects—even though they were very costly initiatives.

For this reason, scholars of policy analysis (e.g., Eugene Bardach of the Goldman School of Public Policy at UC-Berkeley) almost always recommend considering allowing present trends to continue undisturbed as one of a set of policy alternatives intended to address a problem condition. Enacting an alternative that costs more than the current approach and doesn’t work is arguably worse than the status quo.

As for value-added assessment systems for evaluating teacher performance, we need to consider particular policy alternatives to the status quo in particular settings, not the big idea of value-added assessment for evaluating teacher performance (which both eduwonkette and I agree is promising.) If I can return to the New York City case which eduwonkette has discussed at length, the one new issue with regard to policy analysis that I’d like to introduce is feasibility and sustainability. It’s my opinion—and I’m not a lawyer, just an alleged social scientist—that the New York City approach, which defines 50% of a particular teacher’s effectiveness on the basis of how that teacher’s students do in other teachers’ classes over which the teacher has no control, would not survive a legal challenge. Other policy analysts might disagree, and might therefore be more favorably disposed towards this particular alternative. Either way, though, good policy analysis considers feasibility and sustainability as important criteria in evaluating policy alternatives.

April 8, 2008

Why Do Journalists Love Shaky Science on Race?

BadScience128.gif
Let me preface this post by saying that I am predisposed to believe that peer effects influence students' success. But I am consistently frustrated that journalists pick up, run with, and extrapolate from poorly executed studies on the topic of "acting white" or "acting Black." Let's walk through two examples from the last month:

1) I've now seen two articles on this mess of a study published in Professional School Counseling. The articles feature headlines like, "Having a best friend of a different race can make a big difference in the academic achievement of black and Hispanic high school students, according to a University of Arkansas study." The study compared the achievement of students with same-race and different race best friends, and found that kids with different race best friends do better in school.

Never mind the well-known finding in the friendship selection literature that birds of a feather flock together - that is, kids self-select into friendships. Is it any wonder that kids choose friends of similar achievement levels, and that given current distributions of achievement and patterns of tracking and school segregation, higher achieving African-American and Hispanic kids are probabilisitcally more likely to choose a different race best friend if they are selecting friends with similar achievement levels?

Yet the authors appear totally oblivious to the causal inference problems raised by their study, and are ready to design interventions around these findings: "The researchers suggested that school counselors 'could create opportunities for students to interact with other students from different racial backgrounds in the hopes that they might develop friendships over time.' Peer mentoring programs could be one way to introduce struggling students from various racial groups to academically successful students of other racial groups." I'm all for creating spaces to nurture interracial friendships (though this is hard to do when kids attend racially isolated schools?!), but I wouldn't hold my breath on their achievement effects.

2) Consider the Ed Week article, Gifted Black Pupils Found Pressured to Underperform, which leads with, "Gifted black students who underperform in school may do so because of peer pressure to 'act black,' according to new research published this month in the journal Urban Education." (HT: Robert Pondiscio) The study, based on surveys of 166 gifted black students, asked students whether "they have ever heard the phrases 'acting white' or 'acting Black,'" among other questions.

Given how widespread pop conversations about these terms have been for the last 25 years, the authors unsurprisingly found that students associate the phrase "acting White" with school achievement, intelligence, and positive school behaviors and attitudes; most attribute acting Black to negative school achievement, low intelligence, and poor behaviors and attitudes. Furthermore, based on questions about being teased, the authors contend that gifted black students face peer pressure to perform poorly. The study did not link students' attitudes to student achievement, and did not compare these gifted students' experiences with high-achieving white students' experiences (who also report high rates of teasing - see here and here). Furthermore, the authors did not ask these students about their own racial identities, which are more likely to be associated with their own achievement. Yet the authors conclude with confidence, "this can and does contribute to the achievement gap." But the authors conducted no analyses linking achievment to students' attitudes about acting white or black!

We could invoke the standard explanation that journalists don't understand research, but there is plenty of research (bad and good) on structural causes of achievement gaps (i.e. boring stuff like prenatal care) that receives much less coverage. Journalists need a story that gels with the commonly accepted narrative about inequality, which focuses on individual responsibility for success and failure (see Americans' Attitudes on Inequality). Culture is much easier to write about than structure - the reasons why black kids show up to kindergarten .4-.6 standard deviations behind white kids don't translate into a chatty crowd-pleasing story about why school isn't cool (HT: Joanne Jacobs).

What do you think? Are "acting white/acting black" stories over-reported?

April 2, 2008

AERA continued: The Teachings of Russ Whitehurst

In a talk last Thursday entitled, “Seven Things I've Learned About Education Research and Policy, Plus or Minus Two,” Russ Whitehurst, the Director of the Institute of Educational Sciences, summarized what he’s learned about education policymaking during his seven years at IES.

1) The research community is oriented towards understanding, while the policy community is oriented towards action.

Researchers are often upset that their work is not defined as “policy relevant” (and thus not included in IES’s funding priorities). But they usually haven’t thought about what’s actionable in their own research. Whitehurst gave the example of a study in which the researcher coded classroom interactions between teachers and children of different ability levels, and found meaningful differences. When asked what the policy implications were, the researcher looked at him like a deer in headlights.

Said Whitehurst, “I’m not suggesting there’s anything wrong with that kind of scholarship,” but this quest for understanding rather than prescription is perplexing to policymakers. Researchers often defend research oriented towards understanding by appealing to the long arc of science. In response, Whitehurst argued that education is not a discipline in the sense that neuroscience is a discipline. Rather, education is more like transportation, in that it presents a series of problems that need to be solved.

2) Researchers operate under the logic of disconfirmation, while policymakers operate under the logic of confirmation, especially once they’ve committed to a course of action.

Once policymakers have signed on to an initiative, they are not looking for evidence that they’ve committed to the wrong program. Using NCLB as an example, Whitehurst explained that once NCLB was passed, the Department of Education necessarily had to shift from being a buyer to a seller of education policy. A series of complex policy decisions had to be made – i.e. establishing subgroup size and the percentage of students who could sit for alternate assessments. Whitehurst’s point was that the sweet spot of policymaking is where people are uncertain and uncommitted. Policymakers like to have the weight of research behind them, and it’s most effective to offer advice before they’ve publicly commented about the issue.

3) Much research that’s relevant to policymakers shouldn’t be because it’s too methodologically weak to be taken seriously.

Whitehurst lamented that he’s had to provide assessments of research to major newspapers, though the research “was so weakly done that it’s a shame that anyone had to spend time thinking about it.” Unfortunately, Whitehurst said, a report put out by a thinktank is given the same weight as an article published in Science. Whitehurst argued that until policymakers don’t have to worry that what they’re reading is a political document, rather than a research document, the relationship between educational research and educational policymaking will be troubled.

4) Demonstrating that popular programs don’t work is risky business.

No good evaluation goes unpunished, Whitehurst quipped. He provided the example of the Upward Bound evaluation, which found no effects of Upward Bound on college going, and discussed the subsequent shutdown of a randomized trial to further evaluate Upward Bound.

5) The combination of high-stakes for policymakers and high uncertainty about what they can do generates unreasonable expectations for educational research.

While medical research has invested millions of dollars in the search for an AIDS vaccine, it has been unsuccessful. The medical community is willing to accept that research and progress takes time, while there’s no understanding that identifying solutions takes time in educational research, too.

This was probably the best session I attended at AERA. His points weren’t particularly novel, but Whitehurst pulled them together coherently, if sometimes naively. For example, he worried that policymakers see research articles as political documents rather than research documents – but isn’t the choice of research questions and outcome variables political to begin with? I also expected more fireworks from the audience about funding priorities – only a few years ago, many researchers were stomping mad that their research was ineligible for funding.

March 28, 2008

Skoolboy Strikes Again: Research on Schools, Neighborhoods and Communities (& Value-Added Bonus!)

spiffboy2.jpg
AERA President-Elect Carol Lee moderated a Division G Vice-Presidential session Thursday entitled Research on Schools, Neighborhoods and Communities: Implications for Research Methods on Social Contexts. The participants were Shirley Brice Heath, Kris Gutierrez, Margaret Beale Spencer, and Steve Raudenbush. Heath and Gutierrez emphasized the cultural features of contexts in their remarks. Heath argued that a central task of educational research is studying the co-occurrence of contexts with specific behaviors. She made a case for quantitative data records that allow for comparisons across contexts and time periods, using a study of the role of language in the context of young children coming to think of themselves as scientists as an example.

Kris Gutierrez argued for the importance of studying the resources and constraints of ecologies that constitute families’ everyday lives, especially in nondominant communities. A key example she drew on was the difficulty of understanding behaviors without a deep understanding of the setting. For example, in one study, there was evidence that Latino children spent more time watching TV than did children in other groups. A conventional interpretation of this pattern might be that Latino parents are lax in not clamping down on this unproductive activity. But a deeper look might reveal that keeping children inside watching TV is an adaptive response to parents’ perceptions that their neighborhood is unsafe. Gutierrez suggested that a cultural view of human learning requires attention to the mechanisms that account for regularity, variation and change.

Margaret Beale Spencer and Steve Raudenbush focused on neighborhoods. Spencer noted the importance of cross-classifying the presence or absence of risks and protective factors; each of these four configurations represents a different context for children’s development. In a study she carried out in 41 Philadelphia-area high schools, she found that neighborhood characteristics affected the behaviors and perceptions of high school students. Neighborhood quality was associated with the fear of neighborhood risk. Moreover, youth from higher-quality neighborhoods perceive that teachers have higher opinions of them than do youth from lower-quality neighborhoods, and these perceptions may influence their school engagement and performance.

Raudenbush discussed a study he carried out with Rob Sampson on the effects of neighborhood disadvantage on the verbal skills of Black children. A major methodological problem is that individual risk factors are correlated with neighborhood risk factors, and Raudenbush skimmed over some fancy statistical footwork to make an argument for large neighborhood effects on cognitive achievement. Neighborhood poverty doesn’t tell the whole story: we can classify neighborhoods (i.e., census tracts) according to the percentage of the residents who are on welfare, who are poor, who are unemployed, and who are single parents, as well as the percentage in the neighborhood who are children under the age of 18. In Chicago, 24% of Black children live in the highest quartile of concentrated disadvantage. Shockingly, not a single white or Hispanic child lived in the highest quartile. Raudenbush linked his argument to William Julius Wilson’s book The Truly Disadvantaged, and suggested that one mechanism by which neighborhood disadvantage might stunt cognitive development is isolation from the academic English needed to succeed in school.

Spencer and Raudenbush’s presentations led me to think about the difficulty of constructing defensible value-added models of school and teacher effects on student learning and development. Both of them have documented that neighborhoods matter in ways that go beyond the simple demographic characteristics of students and the schools they attend, which are the customary inputs (along with prior achievement) in value-added models. I think we need to think about neighborhoods as contexts that represent affordances or constraints for student learning, and to control for these contexts in value-added models, because neighborhood characteristics are largely beyond the control of schools and teachers. Shirley Brice Heath’s comment on Raudenbush’s argument was that we need to build out-of-school time into the model, since kids spend a lot of time out of school in their families and communities, again in ways that may not be under the control of schools and teachers. I’d add that many kids are out of school during the summers, and yet value-added models generally rely on annual testing that is not synchronized with exposure to schools and teachers.

AERA Quote of the Day Finale

cigars.jpg
Overheard at AERA:

"I'm sorry [that I'm leaving], but I've got to get to Rick Hess's bachelor party."

Ed Research Angst: An AERA Challenger?

munch01.jpg
Researchers spend a lot of time at AERA bemoaning the heterogeneous quality of the work presented. After a few glasses of wine, someone will suggest that the dissatisfied band together and start an organization to compete with AERA. Few realize that this has already happened, albeit quietly, with the founding of the Society for Research in Educational Effectiveness with support from the Institute of Educational Sciences. Here's more detail:

The Society for Research on Educational Effectiveness (SREE) was formed to provide an organizational infrastructure that supports and promotes research focused on cause-and-effect relations important for education. The field of education research has always worked to construct a foundation of knowledge upon which educational practices may be reliably based. For nearly a century now, the American Educational Research Association has been the main professional organization that has supported and disseminated the work of education researchers. While recognizing the great contribution that AERA has made and will continue to make to education, many in the field of education have expressed the need for a more narrowly focused research organization.

The advisory board is stacked with heavy hitters, and folks have big aspirations for turning its flagship journal, the Journal of Research on Educational Effectiveness, into the educational equivalent of the Journal of the American Medical Association. To be sure, educational research should not be limited to the study of the causal effects of interventions. But AERA has not exercised the quality control that it should and, quite frankly, I'm frustrated. For the disenchanted, SREE now offers a promising complement - or alternative - to AERA.

One more session, and I'm done for this year. Stay tuned for summaries of the Dropout Factories session, the Russ Whitehurst talk, a session on K-2 literacy coaches, yesterday morning's vice-presidential session on families and neighborhoods, and this afternoon's session on charters and choice.

Bonus AERA Quotes of the Day

rotherham-russo-peas.gif
Norm Scott sends along some bonus quotes from this morning's ed blogs session. Alexander Russo sums it up here and here. From these snippets and Alexander's summaries, it appears that our bloggy boys threw some barbs above the panel table.

"I urge all of you [researchers] to get in the fight." (Russo - on the need for more researchers to engage with journalists and the blogosphere)

"Be humble." (Andy Rotherham, on relaying what your study does and does not say to reporters)

"Sometimes [the blogs] seem like an echo chamber. We don't want to say it's what everyone is talking about it if [the education blogs] are just six people talking to each other." (Jenny Medina, New York Times)

"Alexander offers a remarkably unsophisticated view of social science." (Rotherham)

March 27, 2008

AERA Quote of the Day: Thursday

whitehurst2.jpg
"There may be a nirvana 100 years from now where we can slap policymakers into jail if they don't have enough research to support what they are doing."

-Russ Whitehurst (Director, Institute of Education Sciences)

AERA Round-Up: Wednesday