Who Slipped a Mickey in John Merrow's Kool-Aid?
You can't swing a fish anymore without hitting a glitch in value-added models - try some of the papers from the recent Wisconsin Value-Added Conference on the complexities of measurement error (value-added models must contend with measurement error in both last year and this year's scores), interval scaling (few tests are scaled so that 1 unit of growth at the bottom of the scale means the same thing as 1 unit of growth at the top of the scale), and non-random assignment (see Jesse Rothstein's new paper on just how large these biases can be). Or you can refer to these earlier posts:
* skoolboy on: The Status of the Status Quo in Education Policy
* More Signs of the Apocalypse! (More on NY's Teacher Tenure Law)
* After NY's Teacher Tenure Law, Blogosphere Plays Union Pinata
* My Value-Added Bucket List
* Do Value-Added Models Add Value? A New Paper Says Not Yet
* The Oops Factor in Measuring Teacher Effectiveness
* Ignoring the Great Sorting Machine
* No Teacher is an Island
* What Does It Mean for a Teacher to Be Good?
Alexander Russo, also commenting on Merrow, makes the mistake of equating teachers' evaluation of students with tests and quizzes with the evaluation of teachers by students' test scores. It's just a bad comparison. Teachers give tests, assignments, reports, homeworks, etc in order to evaluate students and to see what they've learned. These measures are part of an extended interactive process through which a teacher hopes to move students forward. The purpose is not simply to label a student as "good" or "bad" based on one assessment. But when we evaluate teachers based on students' scores, the teacher is being evaluated on a more narrow set of skills than are students, and high-stakes are attached to a single test. So the intent of the process is different; few value-added plans are designed to help teachers improve, but focus instead on assigning rewards and sanctions.
The measurement issues are also different. In an elementary school year, a teacher probably collects 900 data points on student performance (let's say 5 a day); with teacher value-added, we end up with 20-25 data points a year. Teacher value-added is, in short, a low precision enterprise. Readers, what do you think of Alexander's comparison?
Happy weekend, everyone!