« Exceptional Ed Week Commentary on Testing and Accountability | Main | Data-Driven Decision Making Gone Wild: How Do We Know What Data to Trust to Inform Decision-Making? »

The NYC Teacher Experiment Revisited

Over at the Ed Sector, there's some confusion about my concern with the ethics of the NYC teacher experiment (see here). To be clear, my problem is not that NYC is collecting value-added data. As I have written before, standardized tests have a role to play in teacher assessment alongside holistic evaluation of teachers' effectiveness. But as eduwonk himself noted, the methodological issues are hairy and as of yet unresolved.

The concern expressed in my earlier post was how this experiment was conducted in secret and, in my opinion, in violation of generally accepted human subjects policies. The entire enterprise of social science relies on potential study participants trusting researchers to minimize risks and fully disclose the purpose of their study. Every time a gaff like this happens, it undermines researchers' ability to build trust with study participants in the future. Let's review the chronology:

1) In September, an academic experiment headed by two very talented researchers, Jonah Rockoff (Columbia Business School) and Tom Kane (Harvard Grad School of Ed), was announced. It was presented as an experiment intended to generate academic knowledge, not to inform human resources decisions in real time. (You can watch a video of a study recruitment session here.)

2) Academic research is bound not only by common sense research ethics, but by the conventions of university Institutional Review Boards. What this means is that when academic researchers conduct research intended to produce generalizable knowledge - i.e. if researchers want to publish off of these data - the experiment has to proceed within generally accepted research ethics and a university IRB has to approve it. (Even if this was not an academic research project, the DOE should have notified teachers of an intervention of potential consequence for them. After all, the data are not just being collected, but distributed to principals in the experiment's treatment group.)

IRBs are primarily concerned with the harm that researchers could do to subjects by intervening in their lives, and applicants to IRBs must demonstrate that their project poses minimal risks, that participants have been notified of these risks, and that participants have consented to the research. Teachers did not need to consent in this case, as they are government employees and their employers can collect whatever data they want.

However, it is difficult for me to understand how one could justify not notifying teachers in the study. After all, the information given to their principal - which, given the ongoing methodological problems with value-added, may or may not be accurate - has the potential to permanently change their principals' perceptions of them and their future employment prospects. Moreover, this treatment is not being applied universally to NYC teachers. By simply having the bad luck to be selected into the study's treatment condition, some teachers are affected and others are not.

It is important to note that a "live experimental" study like this one is different from the secondary data analysis studies that eduwonk cites. He wrote:

By that logic, all these various studies with panel data, choice studies using lotteries, etc...all constitute human experimentation and are wrong.

Studies based on secondary data analysis are fundamentally different - and are treated differently by IRBs - because researchers are analyzing "dead" data that have no effect on real people's lives. Ongoing research projects in which interventions are made in real people's lives are held to a different standard. And should be.

3)According to Edwize and the NYT article, teachers were not notified of the study. What went wrong is that at some point this went from an academic study to a human resources project that Chris Cerf wants to take prime time. Perhaps he mispoke, or the NYT article had this wrong, but it appears that these data, collected under the auspices of an academic research study, may be used as early as June. As eduwonk noted, simply gathering the data is not a problem. The problem is that under the cover of "academic research," data are being given to princpals in ways that affect teachers' future employment without teachers' knowledge.

The irony, of course, is that none of this would be a big deal if the project had been announced to teachers. When I watched the recruitment session video back in September, it didn't seem like a big deal at all. I bookmarked that this was an interesting experiment conducted by two reseachers whose work is first rate, and assumed that the experiment would proceed under normal conditions (i.e. full disclosure of the study). For reasons I don't fully understand, it didn't. And here we are.

There's much more to say about the methodological and broader philsophical issues with value-added measures. I'll follow up with a post on these issues later.

Update: eduwonk and I continue our bridging differences exercise. He wrote:

Her position here would be a lot more compelling if (a) this were an actual experiment in the way she and other anti-Klein partisans are seeking to describe it rather than what it is. In addition --and again-- the fact is that we don't know what they are doing with the data so at this point all these leaps to various consequences are unfounded.

But we do know what they are doing with the information, at least in the context of this experiment (and, as I have explained above, it is an experiment). Principals in the treatment group are given value-added data reports on each of their teachers. These principals' perceptions of teachers' academic effectiveness are thus affected - correctly or incorrectly - by this information. Saying "principals can't use it" is like trying to strike evidence from the record in a courtroom. Jurors' perceptions are already influenced, and the damage is done.

The rhetorical strategy of Rotherham & Carey is classic. (a) eduwonkette compared the NYC experiment to Tuskegee; (b) Tuskegee was awful, and the comparison is inappropriate and preposterous; (c) therefore, whatever concerns eduwonkette might have are also inappropriate and preposterous. Of course eduwonkette never did actually say that Tuskegee was comparable to NYC, and in fact went out of her way to say that she wasn't equating the two; but somehow saying that was interpreted as exactly the opposite.

Another study that's kind of an interesting parallel with regards to human subjects concerns is the old Rosenthal and Jacobson (1968) study Pygmalion in the Classroom. In Pygmalion, teachers were told that a handful of students in their classes were identified as late bloomers based on a standardized test, and the performance of students was tracked over time to see if the students who were (randomly) identified as late bloomers did better academically than those who were not. So the experimental manipulation was presumably of teachers' expectations for students' performance (although there are questions about whether the experiment actually did manipulate teachers' expectations in the way that was claimed.) Would an IRB allow such a study now? I doubt it, largely due to concerns about the impact on students. And yet the students in Pygmalion aren't the research participants; the teachers are. This looks a lot like the Rockoff-Kane scenario to me, where principals are getting information that is intended to manipulate their expectations for, and evaluations of, teachers' performance. The presumed difference is that the data being given to principals are in fact accurate and objective, whereas the data provided to the teachers in Pygmalion were fictitious. But the assumption that the data provided to principals truly are an accurate indicator of their contribution to student learning is highly questionable both in the abstract, based on the methodological challenges in value-added assessment, and in how the data are likely being represented to the principals in the reports they are receiving.

This looks a lot like the Rockoff-Kane scenario to me, where principals are getting information that is intended to manipulate their expectations for, and evaluations of, teachers' performance. The presumed difference is that the data being given to principals are in fact accurate and objective, whereas the data provided to the teachers in Pygmalion were fictitious.

But the thing that confuses me is that for the Pygmalion experiment to be interesting, the data had to be fictious. You had to know that any observed correlation was the result of the information given teachers, and not the abilities of the kids themselves.

But the NYC experiment gives principals potentially meaningful data, and then looks for what???

The reporting on this has been kind of spotty, but here's my guess. Both treatment and control groups of principals will be asked to rate teachers, and within the treatment group, the researchers can see if there is a correlation between the value-added measure and principals' ratings. (If so, that's evidence that the principals are incorporating the value-added information into their ratings of teacher's "skill" or "quality".) Then, next year, it will be possible to see if the ratings of the principals in the treatment group are a better predictor of next year's value-added score than the ratings of the principals in the control group. If so, then this year's value-added information has increased the ability of a principal to judge a teacher's future performance--at least in terms of that teacher's contribution to the standardized test socre on which the value-added info is based.

Comments are now closed for this post.


Recent Comments

  • skoolboy: The reporting on this has been kind of spotty, but read more
  • Rachel: This looks a lot like the Rockoff-Kane scenario to me, read more
  • skoolboy: The rhetorical strategy of Rotherham & Carey is classic. (a) read more




Technorati search

» Blogs that link here


8th grade retention
Fordham Foundation
The New Teacher Project
Tim Daly
absent teacher reserve
absent teacher reserve

accountability in Texas
accountability systems in education
achievement gap
achievement gap in New York City
acting white
AERA annual meetings
AERA conference
Alexander Russo
Algebra II
American Association of University Women
American Education Research Associatio
American Education Research Association
American Educational Research Journal
American Federation of Teachers
Andrew Ho
Art Siebens
Baltimore City Public Schools
Barack Obama
Bill Ayers
black-white achievement gap
books on educational research
boy crisis
brain-based education
Brian Jacob
bubble kids
Building on the Basics
Cambridge Education
carnival of education
Caroline Hoxby
Caroline Hoxby charter schools
cell phone plan
charter schools
Checker Finn
Chicago shooting
Chicago violence
Chris Cerf
class size
Coby Loup
college access
cool people you should know
credit recovery
curriculum narrowing
Dan Willingham
data driven
data-driven decision making
data-driven decision-making
David Cantor
Dean Millot
demographics of schoolchildren
Department of Assessment and Accountability
Department of Education budget
Diplomas Count
disadvantages of elite education
do schools matter
Doug Ready
Doug Staiger
dropout factories
dropout rate
education books
education policy
education policy thinktanks
educational equity
educational research
educational triage
effects of neighborhoods on education
effects of No Child Left Behind
effects of schools
effects of Teach for America
elite education
Everyday Antiracism
excessed teachers
exit exams
experienced teachers
Fordham and Ogbu
Fordham Foundation
Frederick Douglass High School
Gates Foundation
gender and education
gender and math
gender and science and mathematics
gifted and talented
gifted and talented admissions
gifted and talented program
gifted and talented programs in New York City
girls and math
good schools
graduate student union
graduation rate
graduation rates
guns in Chicago
health benefits for teachers
High Achievers
high school
high school dropouts
high school exit exams
high school graduates
high school graduation rate
high-stakes testing
high-stakes tests and science
higher ed
higher education
highly effective teachers
Houston Independent School District
how to choose a school
incentives in education
Institute for Education Sciences
is teaching a profession?
is the No Child Left Behind Act working
Jay Greene
Jim Liebman
Joel Klein
John Merrow
Jonah Rockoff
Kevin Carey
KIPP and boys
KIPP and gender
Lake Woebegon
Lars Lefgren
leaving teaching
Leonard Sax
Liam Julian

Marcus Winters
math achievement for girls
meaning of high school diploma
Mica Pollock
Michael Bloomberg
Michelle Rhee
Michelle Rhee teacher contract
Mike Bloomberg
Mike Klonsky
Mike Petrilli
narrowing the curriculum
National Center for Education Statistics Condition of Education
new teachers
New York City
New York City bonuses for principals
New York City budget
New York City budget cuts
New York City Budget cuts
New York City Department of Education
New York City Department of Education Truth Squad
New York City ELA and Math Results 2008
New York City gifted and talented
New York City Progress Report
New York City Quality Review
New York City school budget cuts
New York City school closing
New York City schools
New York City small schools
New York City social promotion
New York City teacher experiment
New York City teacher salaries
New York City teacher tenure
New York City Test scores 2008
New York City value-added
New York State ELA and Math 2008
New York State ELA and Math Results 2008
New York State ELA and Math Scores 2008
New York State ELA Exam
New York state ELA test
New York State Test scores
No Child Left Behind
No Child Left Behind Act
passing rates
picking a school
press office
principal bonuses
proficiency scores
push outs
qualitative educational research
qualitative research in education
quitting teaching
race and education
racial segregation in schools
Randall Reback
Randi Weingarten
Randy Reback
recovering credits in high school
Rick Hess
Robert Balfanz
Robert Pondiscio
Roland Fryer
Russ Whitehurst
Sarah Reckhow
school budget cuts in New York City
school choice
school effects
school integration
single sex education
small schools
small schools in New York City
social justice teaching
Sol Stern
Stefanie DeLuca
stereotype threat
talented and gifted
talking about race
talking about race in schools
Teach for America
teacher effectiveness
teacher effects
teacher quailty
teacher quality
teacher tenure
teachers and obesity
Teachers College
teachers versus doctors
teaching as career
teaching for social justice
teaching profession
test score inflation
test scores
test scores in New York City
testing and accountability
Texas accountability
The No Child Left Behind Act
The Persistence of Teacher-Induced Learning Gains
thinktanks in educational research
Thomas B. Fordham Foundation
Tom Kane
University of Iowa
Urban Institute study of Teach for America
Urban Institute Teach for America
value-added assessment
Wendy Kopp
women and graduate school science and engineering
women and science
women in math and science
Woodrow Wilson High School