Recently in Evaluation Category

October 30, 2009

Duncan Calls for Multiple Measures in Evaluation

Education Secretary Arne Duncan made a particular point yesterday of underscoring that teacher evaluations should be based on "multiple measures" that would include student achievement alongside other factors, such as peer evaluations.

He was speaking at conference here in Washington for state officials hosted by the National Comprehensive Center on Teacher Quality.

Frankly, the multiple-measures comment shouldn't come as a big surprise if you've been paying attention. A number of other ED officials have made the same point in other forums. But a lot of the state officials told me they were nevertheless glad to hear the message. They noted that the Race to the Top proposed criteria make a big deal about incorporating student achievement, but are silent on other teacher-evaluation criteria.

Perhaps all of those comments worrying about whether value-added is ready for prime-time hit home at the Education Department. If I were a betting man, I'd wager that the final Race to the Top guidelines will retain the requirements that test scores be factored in evaluations, but also make some recognition of the fact that they shouldn't be the sole measure for rating a teacher.

Duncan did stress, though, that the student-achievement element is the one that's missing from most evaluation systems. "We don't look at student work at all, we're a zero there," he said.

He demurred when asked whether he could point to a model evaluation system: "I'm hesitant to call one out because people think that that's it."

American Federation of Teachers President Randi Weingarten must be doing a victory lap around her office about now. She's been expounding upon the multiple-measures theme for months.

(Weingarten's stance on this issue has developed, too. Last year around this time she was adamant that "we have a moral, statistical, and educational reason" not to use test scores in evaluations. Now, her union is helping to fund projects to explore how it might be done fairly.)

October 05, 2009

D.C. Unveils Complex Evaluation System

I've finally had a chance to take a look at Washington, D.C's new teacher-evaluation system, known as IMPACT, which generated a lot of buzz for being among the first in the nation to incorporate student test scores as part of the teacher rating. (Race to the Top, anyone?)

To be fair, IMPACT is not all about test scores: the evaluation system also includes other pieces, such as scores on a "Teaching and Learning Framework," an extensive set of observational measures similar to Charlotte Danielson's Framework for Teaching, or the rubrics used by the New Teacher Center or the Teacher Advancement Program. Teachers in the district will be observed five times before a final rating is generated, three times by a building administrator and twice by an outside "master evaluator" who is a subject-matter expert and does not report to the building administrator.

There is even a "core professionalism" component to measure whether teachers show up on time and to ensure they don't go missing without an excuse, no doubt to counteract problems with chronically absent teachers.

The Washington Post has a pretty good write-up on IMPACT here, but one thing the story doesn't really convey is that IMPACT really is more a composite of 20 different evaluation systems. There are standards for teachers who teach in tested subjects, and those who do not, standards for counselors, for instructional paraprofessionals, for non-instructional paraprofessionals, even for custodians.

So, if you're teaching in grades 4-8 in reading or math, 50 percent of your rating is based on an "individual" value-added score and 40 percent on observational ratings aligned to the Teaching and Learning Framework. But if you're in a subject without an accountability test in place, then 80 percent of your rating is based on the TLF and 10 percent is based on a non-value-added assessment chosen by the teacher, such as a unit test from an approved textbook. Special-ed teachers are rated in part according to their ability to turn out well-crafted individualized education plans. You get the idea.

Almost all of the teachers will have at least 5 percent of their evaluation based on schoolwide (not individual) growth.

The American Federation of Teachers and the Washington Teachers' Unions are already on the record as not liking this new system. It's probably worth noting that the district was not obligated to consult with the WTU in crafting it. But the AFT has expressed discomfort with using test scores beyond the building level, and research is certainly not unequivocally supportive of individual value-added measures.

Though IMPACT was not collectively bargained, Rhee did meet with a bunch of focus groups of teachers while she was developing it. In the preamble to the IMPACT guidelines, she says that the system is first and foremost supposed to provide a pathway to teacher-effectiveness growth, and not just serve as an accountability measure.

But with hundreds of layoffs going on right now—many of which the union says could have been avoided had the district not hired so many new teachers this summer— I wonder how many teachers are going to believe her.

September 14, 2009

Ed. Dept. Official Says Teacher Evaluations Shouldn't Rest on Test Scores Alone

Vis-a-vis this recent blog item, Education Department official Judy Wurtzel apparently won plaudits today from educators for reiterating that teacher evaluations should be based on several different measures of performance, not on test scores alone. (She was speaking at the Association for Curriculum Supervision and Development's legislative conference, which I've been following at Twitter.)

Wurtzel, the deputy assistant secretary for planning, evaluation, and policy development at ED, added, however, that such data should not be excluded from the evaluation process.

Now that that's cleared up, the question becomes to what extent test scores (or other indicators of student growth) should be weighted in making determinations of teacher effectiveness. I went back through the draft Race to the Top guidelines to find out, and unfortunately the language here is fairly vague. Student-growth data should be "a significant factor" in such decisions, the proposed criteria state.

Now just what does that mean? Some teachers and administrators, no doubt, would think that basing 10 percent of an evaluation test scores would be significant. But that's a far cry away from specifying that the student data should make up the majority of the data sources weighed in an evaluation or be made the preponderant criterion. (Those terms leave no doubt that such data would make up at least 50 percent of the rating.)

Does ED want to leave this decision up to states and districts to determine? That's a possibility, but seems a bit odd given that the RTTT application is so detailed in all other respects.

If you support the use of these measures, how much weight do you think they should be given?

August 20, 2009

On Student Achievement and Teacher Evaluations

We're evidently headed to a lot of wrangling on this topic, given the focus on student-teacher data in the Race to the Top proposed criteria. So, once again Teacher Beat provides you with a cheat sheet to help you make sense of it.

First off, we must start by assuming, as the federal government does, that it is appropriate to consider student achievement at least to some degree in evaluating teachers. (I fully realize there are people and groups out there who vociferously disagree. If you are one of them, I invite you to leave a comment below to tell us all why, but this would be a short blog item if we didn't start from that assumption.)

Next, how do we define student achievement? This is the place where things really start to get dicey, because most of the annual testing is done in math and language arts. But only perhaps a third of teachers explicitly teach those subjects. So how do we get estimates about student performance in non-tested grades and subjects?

The National Council on Teacher Quality, in this report on Colorado's bid for the Race to the Top funding, elaborates on a few interesting alternatives. It suggests randomly sampling student work, as long as these samples are reviewed independently and audited centrally to ensure consistency.

As for test scores, probably the most promising option is to use "value-added" models that track growth over time rather than absolute proficiency levels, so that teachers aren't penalized off the bat for having poor-performing students.

Now, we've all heard that value-added estimates of teacher performance are problematic. The estimates of a teacher's effectiveness can vary from one year to the next. Sometimes tests aren't appropriately scaled to give good estimates; and the models are typically better at identifying outliers (very good or very weak teachers) than making finely-graded distinctions in the middle.

Still, there is a possibility of reducing error here by focusing only on the top and bottom teachers and comparing results over time, (i.e., if you are a bottom-quartile teacher for three consecutive years, something's wrong.)

Additionally, such scores could be compared to scores on measures conducted by trained observers (principals and/or peer teachers) that describe, for instance, whether a teacher effectively engages students in content, makes the purpose of the lesson clear, and engages in formative assessment to ensure students have mastered concepts.

Finally, we have this important question: Just how reliable should we expect teacher-evaluation systems to be? What margin of error are we willing to accept? Right now, districts lean toward one end, rating nearly all teachers as proficient, even those who are very poor. Clearly we don't want to go the other way, either, and misidentify scores of good teachers.

But if we expect a system to be infallible we're probably going to be disappointed. As any good scientist will remind you, measurement comes with error. Are stakeholders, especially teachers and teachers' unions, willing to accept a system that is highly reliable but not perfect? (If 95 percent of judgments are accurate, is that high enough? What if 90 percent are accurate?)

Now that I've put all that out there, let's hear your thoughts. Is this doable, or should we all give up and go home?

August 06, 2009

Peer-Assistance and -Review: The Toledo Numbers

There was a bit of a mini-controversy in June when the New Teacher Project released its Widget Effect report.

But it wasn't the report's overall thrust that did it. Pretty much everyone agreed that our current systems for evaluating and offering assistance to struggling teachers are crummy.

The controversy was about the data on dismissals in one particular district: Toledo, Ohio. According to the district's personnel records, Toledo dismissed one tenured veteran and did not renew five novice teachers' contracts, the NTP reported.

But what about the district's much-heralded Peer-Assistance and -Review model, a number of sources wrote me afterward. Aren't the dismissal numbers much higher than that? Most people were merely confused, but some accused the NTP of willfully skewing the data.

And the American Federation of Teachers put out this press release:


"While the overarching conclusions of the report are sound, we have concerns about the report's data, particularly with respect to teacher evaluations in Toledo, Ohio. Toledo has a highly regarded teacher evaluation system ... that produces much better results than those described in this report."

So what gives? Well, I've been talking to folks on both ends and I've started to do some digging around the numbers. To put it kindly, the data-gathering is really a mess. What it boils down to is that the district and the Toledo Federation of Teachers had entirely different ways of "coding" dismissals through the PAR system.

For instance, the district doesn't appear to record teachers who resigned after being in the program as having been dismissed, but the union counts them as such. Also, it appears that the PAR program applied to some long-term substitute teachers. These teachers wouldn't have necessarily shown up in personnel records, but the union may have counted them within the overall PAR figures. And finally, the union used the term "terminated" teachers to refer both to nontenured teachers who were so poor that they were let go before their contracts were up, and to tenured veterans who were put into PAR and ultimately dismissed. The district, by contrast, separated out tenured from non-tenured teachers.

There's a lesson here for any district or union that wants to try peer review: Agree on common definitions and change your data systems accordingly, or it will be really hard to justify your numbers.

OK, I know you want the actual figures. Well, I'm not going to publish what I've got for a couple of reasons. First, my information isn't complete, and second, it's my understanding that officials from TFT and NTP are working together on trying to audit the numbers and come up with figures they can agree on. I don't want to insert myself in that process.

But count on the fact that we'll be bringing you updates once we know what's what.

Does the Toledo PAR really produce "much better results" than other evaluation systems, as AFT has asserted? We'll see.

July 02, 2009

Duncan's NEA Speech Mirrors Stance Taken in Stimulus

To answer the question I'm sure you all have: Yes. Teachers booed and hissed during some of the performance-pay portions of Secretary of Education Arne Duncan's speech. And they weren't overwhelmingly happy with the talk of reform to seniority and tenure systems, either.

But some of the stories I've seen around the Web on the speech are billing this as "tough love" for the teachers' unions. There was some of that, sure, but President Barack Obama and Duncan clearly telegraphed their intentions to push hard on these issues in the stimulus legislation, and that passed months ago.

So there was an element to this whole proceeding that came off as a little bit rehearsed to me. I wonder if Duncan had prepared his seemingly ad-libbed line for when the booing started: "You can boo; just don't throw any shoes, please." And I'm pretty sure most of the delegates had gotten their vocal chords ready, too.

To me, the biggest news out of the speech is that the administration is increasingly emphasizing student achievement as one measure of teacher pay or evaluation, although not the only measure. That is a big issue, and it's one that helped sink congressional attempts to renew the No Child Left Behind Act in 2007.

Also, large parts of the speech seemed to key directly off of the stimulus legislation. When Duncan talked about seniority putting some teachers in schools and classrooms they're not prepared for, well, that gets to the equitable-distribution-of-teachers language in the stimulus. When he talked about the poor state of evaluations, well, that lines up to the language that will require states and districts to report the number and percentage of teachers scoring at each performance level on local evaluation instruments.

Check back at edweek.org soon for a full story.

June 29, 2009

Washington Post Article on Peer Review

As a reporter, it's always irritating to discover that another paper has beaten you to a story you've had in mind, in this case following a teacher through the peer-assistance and -review process. Nevertheless, this Washington Post article is a pretty thorough look at things in Montgomery County, Md., and includes a glimpse at the PAR panel that makes the call on whether to renew teachers or proceed with dismissal.

June 09, 2009

Duncan: Test Scores and Evaluations Not Mutually Exclusive

Not long ago, I did a story pointing out that some states have passed laws that basically prohibit the linking of student- and teacher-data systems. New York and California are the high-profile examples.

Presumably, these data could inform a variety of different initiatives, both low- and high-stakes: performance-based pay, teacher evaluations, tenure decisions, professional development, and the determination of which teacher colleges produce the strongest graduates.

Now, it looks as though dismantling these firewalls might be a prerequisite for qualifying for "Race to the Top" discretionary funds, reports my colleague Michele McNeil over at Campaign K-12.

Education Secretary Arne Duncan admonished states that have prevented student-achievement data from being linked to individual teachers, apparently throwing Wisconsin's name into the mix, at a recent address at an education research conference here in Washington.

June 08, 2009

Using Teacher-Evaluation Data

The New Teacher Project had a really interesting study out not long ago on teacher evaluation that found that pretty much all teachers get high ratings on local evaluation instruments. This is something of a portent for things to come, since one of the stimulus assurances will probably deal with this piece of data.

See my write-up of the TNTP study for additional details and some feedback from teachers, union officials, and so forth.

One interesting element in the report that I didn't include in my story has to do with where these records are kept. Of the 12 districts TNTP examined, only five of them—Denver; Chicago; Elgin, Ill; Rockford, Ill; and Cincinnati;—keep electronic evaluation data. For most other districts, the results of evaluations basically sit in folders in the district HR office, never to be looked at again by anyone.

Imagine the possibilities if these evaluations actually meant something and districts could sort through the results of such evaluations electronically and analyze them in various ways. Not for punitive purposes, mind you, but to do a better job of offering professional development at the district level to supplement what teachers, coaches, and administrators are doing within schools.

If a lot of teachers are struggling with how to model math problems, for instance, that could be the beginning of a district PD program or even a new math initiative.

If your district does something like this, send me an e-mail! I'd love to hear from you.

April 08, 2009

Common Ground on D.C. Evaluations?

The Washington Post has this story up about the new teacher-evaluation system that D.C. Chancellor Michelle Rhee and her team are devising.

The story does a good job talking about the benefits and perils of a "value added" system that uses test-score growth to estimate teacher effectiveness, a model I've written about before. But it doesn't elaborate on one of the most interesting pieces Rhee has proposed: to use a system of "impartial master teachers" to observe and evaluate teachers' practices, rather than a principal.

At a recent Washington event, Rhee gave a few more details about how this system of teacher observations might work. The master teachers wouldn't have ties to particular schools, and would be grade-level and subject-matter experts, so that teachers are evaluated by someone who knows the content area and the grade-level expectations of that teacher, rather than an administrator who might not have experience in the teacher's area, she said at a research conference sponsored by the National Council for Teacher Quality, in Washington.

The WaPo story notes that the city pretty much holds the reins on the teacher-evaluation system, which means that the Washington Teachers' Union/American Federation of Teachers can't do much to protest it apart from filing an unfair labor-practice complaint. AFT President Randi Weingarten is said to want to include the teacher-evaluation system under the scope of bargaining. Although Rhee has gathered input from teachers, that's still a far cry from a collectively bargained evaluation system. I can definitely imagine Weingarten resisting a system that isn't negotiated as part of the contract.

On the other hand, Weingarten recently told me that neither test scores nor principal observations should be the only factor for determining teacher performance. Instead, they should be based on multiple measures. And since Rhee's system incorporates both growth-based test scores and the aforementioned master-teacher observations, perhaps there's some common ground here.

Let's hope we'll have more details on this soon. At the NCTQ conference, Rhee said she and the AFT were hammering out dates to return to the bargaining table. Don't touch that dial...

Follow This Blog

Advertisement

Powered by Movable Type 4.31-en

Archives

Recent Comments

  • marty: I was once a superb teacher. Students loved me, parents read more
  • J. S. Gephardt: I totally agree that teachers should be evaluated on a read more
  • Lisa: Senority... most parents want their children in a seasoned teachers read more
  • Susan W. Morrison: Even though I love teaching and my ELD students read more
  • Paul Hoss: John is correct about proceeding down this path diligently. I read more

EW Archive