eduwonkette_header_515.jpg

Through the lens of social science, eduwonkette takes a serious, if sometimes irreverent, look at some of the most contentious education policy debates. (Find eduwonkette's complete archives prior to Jan. 6, 2008 here.)

Main

December 17, 2008

NYC's Trojan Horse

trojan%20horse.jpg
skoolboy has absolutely nothing of substance to say about Education Secretary nominee Arne Duncan, whom he has met exactly once. But he continues to mouth off about New York City's Teacher Data Reports, the NYC Department of Education's version of value-added assessment. Which are not to be used to evaluate teacher performance. But rather for instructional improvement. Excuse me, skoolboy has something in his eye.

It's hard not to view these Teacher Data Reports as a Trojan Horse. Just how is a tool that is designed for capacity-sorting supposed to function for capacity-building? After all, a teacher value-added measure might tell us something useful about which teachers are more or less successful in raising their students' test scores, but it tells us nothing about the specific instructional practices that account for their relative success.

How are Teacher Data Reports supposed to improve instruction? In her videotaped comments to teachers, Amy McIntosh, the Chief Talent Officer at NYC's Department of Education, says, "These reports will provide information that will help teachers and school leaders gain insights about important aspects of a teacher's practice ... Whether individual teachers have a greater influence on the learning of some groups of students than on others ... Finally, we can see what teachers might benefit from development focused on, say, the needs of English language learners, and which teachers might be best positioned to lead that kind of professional development ... We also think they will ... help you think about how you can share the techniques you use with your colleagues in your school or across the city."

Hmm. So the specific strategies for improving teaching practice are what, exactly? Having more successful teachers lead the professional development of less successful teachers? Expert practitioners don't always make expert coaches. Hall-of-Fame pro basketball player Isiah Thomas--unquestioned as one of the best point guards of all time--was a mediocre coach for the Indiana Pacers and New York Knicks.

Here's why. Teaching is an extraordinarily complex activity, with teachers making thousands of decisions in the course of their work. Successful teachers make many good decisions and some bad decisions, whereas less successful teachers make many bad decisions and some good decisions. But the capacity to reflect on one's practice and figure out which of those decisions are good and which are bad is exceedingly rare, as is the capacity to share this knowledge with others. In the absence of this reflective capacity, we're all prone to attribute our successes and failures to our pet theories, which may or may not be correct. A Teacher Data Report that provides reassurance that a teacher is successful will only solidify and reinforce a personal folk theory about the reasons for that success.

Yet the Teacher Data Report provides no evidence whatsoever about why a teacher is successful--the many daily practices that promote student learning. And if a teacher's personal theory is inaccurate, then sharing it with others will not improve instruction, nor student achievement. It could even make things worse, focusing attention on ineffective practices. A tool like the Teacher Data Report that claims to be useful for increasing teachers' capacity to teach students effectively, but instead is only useful for ranking teachers on their effectiveness, is a modern-day Trojan Horse.

December 15, 2008

Don't Think about Elephants

elephant-klein.jpg

"Don’t think about elephants," skoolboy’s father used to joke, long before George Lakoff’s manifesto with a similar name. The joke, of course, is that by trying not to think about elephants, all that you can think about is elephants. The harder I tried not to think about elephants, the more I thought about them.

The New York City Department of Education has its own variation. This month, the DOE is sending Teacher Data Reports, which purport to estimate the effect of individual teachers in grades 4-8 on students’ test scores, to school principals, who will then distribute the reports to their teachers after the principals have been trained. "The Teacher Data Reports are not to be used for evaluation purposes," wrote Chancellor Joel Klein and UFT President Randi Weingarten in an October letter to teachers. "That is, they won’t be used in tenure determinations or the annual rating process. Administrators will be specifically directed accordingly." Similarly, the Frequently Asked Questions section of the DOE’s Teacher Data Tool Kit website poses the question "How can you be sure that principals won’t use the Teacher Data Reports to evaluate teachers?" The response: "Principals have been and will continue to be explicitly instructed not to use Teacher Data Reports to evaluate their teachers. The DOE has standard processes in schools for teachers to raise issues or concerns."

And yet. From the Frequently Asked Questions on the DOE’s Teacher Data Toolkit website: "By isolating individual teachers’ contributions to student progress, the Teacher Data Reports provide valuable information to school leaders and teachers about where to focus instructional improvement efforts. …Teacher Data Reports provide information about how individual teachers’ efforts influence student learning … A sophisticated multivariate regression analysis based on NYC data from 1999-2008 determined how much to weigh each factor [to calculate students’ predicted gains] … A panel of technical experts has approved the DOE’s value-added methodology. The DOE’s model has met recognized standards for demonstrating validity and reliability. Teachers’ value-added scores from the model are positively correlated with both School Progress Report scores and principals’ perceptions of teachers’ effectiveness, as measured by a research study conducted during the pilot of this initiative."

In other words: The Teacher Data Reports rely on sophisticated statistical techniques that are valid, reliable and approved by experts, and they isolate an individual teacher’s contributions to student learning. But, you principals who are under tremendous pressure to increase test scores or face losing your jobs, don’t you dare think about using these Teacher Data Reports to evaluate teachers.

Don’t think about elephants.

November 19, 2008

The ATR Deal: An Acknowledgement that Teacher Price Incentives Aren't All They're Cracked Up to Be?

hand-shake.jpg
Following up on a long discussion last spring about teachers displaced from their schools and not rehired - teachers who are part of New York City's "Absent Teacher Reserve:" the city and the union have reached a deal. Principals will not have to pay more for hiring more experienced teachers (for eight years), and will also receive a cash incentive, equal to half of a starting teacher's salary, for hiring a teacher from the ATR.

In effect, this deal undoes a central - and in my opinion, unfortunate - component of weighted student funding, through which it costs principals more to hire an experienced teacher than an inexperienced one. Without a doubt, experienced teachers have been remaining in the pool longer than their less experienced counterparts. What the union and the DOE have debated is whether this is a cost issue or a quality issue. Some figures from the original New Teacher Project report: Because of seniority rules, 44% of teachers excessed in 2006 had 0-3 years experience, while 22% of teachers in this pool had 13+ years of experience. Of the 235 teachers who remained unplaced as of December 2007, only 25% of these teachers had 0-3 years of experience, while 42% had 13+ years of experience. (See graph below.)

Now we'll have a strong test of the claim that these are "bad teachers" that no principal wants to hire. But beyond the ATR issue, it's worth thinking about how the deal - and principals' reactions to it - may affect the future of teacher price incentives in New York City and beyond. Sure, experienced teachers should be more evenly distributed across schools, but I've never seen any evidence that making principals pay more for them is going to achieve that outcome. In the worst case scenario, we end up with a tragedy of the commons dilemma in which individual principals, each acting in their own short-term interest, end up turning experienced teachers away from their schools, and the collective impact is to push them out of the district altogether.

NTP%20graph.jpg

November 13, 2008

The NYC High School Progress Reports Meet Credit Recovery

Yesterday, the NYC Department of Education released its high school progress reports - 83% of high schools received A or B grades. Like the K-8 reports, 60% of the grade is based on "student progress," which in the case of high school includes credit accumulation.

We know that many students fall of the wagon, so to speak, early in high school and fail enough courses that it makes it hard for them to catch up. So tracking students' credit accumulation closely - and intervening when students fall behind - makes a lot of sense.

But holding schools accountable for credit accumulation creates a number of perverse incentives, and readers have provided a number of examples of how this is unfolding in their schools. The central issue is how "credit recovery" is being used - and in some cases, abused. For the uninitiated, credit recovery involves "letting those who lack credits make them up by means other than retaking a class or attending traditional summer school." (See this NYT article on credit recovery.)

Teachers have complained that they've been pressured to change grades (more than they have in the past) because of these credit accumulation measures. Other teachers have reported that students who fail a course in the first term are allowed to sign "contracts" that promise that their grade will be changed to a passing grade if they attend tutoring two hours a week, but tutoring attendance is never monitored. Still other teachers have reported that students who've failed their courses are given simple tasks - i.e. a packet of math problems - that students can complete to get credit.

Readers know well that I generally come down on the side of keeping kids in school (see this exchange about the dropout age, for example). But some of these wild year-to-year jumps in the fraction of students earning 10+ credits do make me wonder what's happening with credit recovery on the ground. For example:

* At the Secondary School for Journalism, 6% of first year students earned 10+ credits last year; this year, 60% did.

* At the Rachel Carson High School for Coastal Studies, 11% of first year students earned 10+ credits last year; this year, 68% of students did.

* At Canarsie High School, 10% of first year students earned 10+ credits last year; this year, 41% did.

* At the Law, Government and Community Service High School, 17% of first year students earned 10+ credits last year; this year, 44% did.

* At the Cobble Hill School of American Studies, 29% of first year students earned 10+ credits last year, this year, 57% of students did.

It is, of course, possible that changes in the student population from year-to-year may explain some of these jumps, or that schools made substantial changes that led to real increases in course passing rates. It could also be the case that students are better off in a world where educators are paying close attention to credit accumulation, even if it does lead to some practices that many educators would frown upon. Alternatively, we could end up making schools look better than they really are, and students find themselves in a lurch when they come face-to-face with kids who actually did master these courses.

At the very least, let's hope that reporters make use of the new data available and try to find out how these increases are being produced. Readers, where do you draw the line on credit recovery? Got insights on how credit recovery is being used in NYC schools or elsewhere? Leave a comment below.

November 12, 2008

School Progress Grade Effects on NYC Achievement: Tame, Fierce, or a Hot Mess?

winters_photo.jpg

skoolboy ventured into the rarified air of NYC’s Harvard Club yesterday to hear Marcus Winters present his new Manhattan Institute research on the effects of the 2006-07 New York City School Progress Reports on students’ 2008 performance on state math and English tests in grades four through eight. The analysis uses a regression-discontinuity design, capitalizing on the fact that schools received a continuous total score summarizing their performance on school environment (15%), student performance (30%) and student growth (55%), but there are firm cut-offs that distinguish schools receiving an F from those receiving a D, those receiving a D from those receiving a C, etc. This means that there might be schools that are very similar in their total scores, and presumably on other school characteristics, on either side of a given cut-off, allowing researchers to study the test-score consequences of obtaining a specific letter grade.

The two tables below summarize the impact of the Progress Report grades on student math and English proficiency, respectively. Both tables contrast the consequences of getting an A, B, D or F with a reference category, a C grade. A green up-arrow indicates that students in a school that received a particular Progress Report Grade did better than students in C schools, whereas a red down-arrow indicates that students did worse than students in C schools. An X indicates that student performance did not differ significantly from that of students in C schools at the p<.05 level.

Winters-Math.jpg


Winters-ELA.jpg

There’s a lot of X’s. In math, students in F schools did better than students in schools receiving higher grades, although this seems to be primarily due to an effect in grade 5. Students in D schools also did better than those in schools receiving higher grades, also due to their advantages in grade 5, apparently. In English, the letter grade a school received did not have any consequences for student performance.

Although both Winters and discussant Jonah Rockoff were careful to note limits both to the analyses and what they can tell us about the incentive effects of accountability systems, both characterized the results as pretty clear evidence that schools reacted to receiving an F or a D in ways that boosted student achievement. This was particularly noteworthy, they argued, because such little time had elapsed between when a school learned that it had received a D or F and when students were tested—January, for English, and March, for mathematics.

Well, yeah, the short time between receiving the grade and the testing is certainly an issue, and surfaced as the likely explanation for why no effects of the School Progress Report grades were found in English. But skoolboy is still worried about math. There were no statistically reliable consequences for getting a D or an F in grades 4, 6, 7 and 8; only in grade 5 is there a test-score boost. How are we to make sense of this? If the letter grades are such a powerful incentive, wouldn’t they affect the performance of students in all of the grades in a school, not just fifth-graders?

Cool person Amy Ellen Schwartz posed a very smart question from the audience. "What about those A and B schools doing worse than the C schools in 5th grade math? What does that mean?" she asked. The panelists didn’t want to address that head-on, in skoolboy’s view, but he will: Looking at 5th grade mathematics, there’s as much evidence of the receipt of an A or a B causing a school to coast as there is evidence of the receipt of a D or an F causing a school to be more productive. Probably not a popular interpretation among the true believers in the power of incentives in the room.

But the bigger story is one of what Winters called "tame" effects. No effects of the School Progress Report grades in English, and limited evidence of effects in Math. A short time-horizon between the “treatment” of receiving the grades and student testing. Ambiguous incentives, both positive and negative, associated with the grades. A very weak theory of how the grades would be expected to increase student performance. It’s a wonder that Winters found anything at all.

A last point: Winters suggested that there were dire predictions that schools would "give up" if they got low Progress Report grades, and his findings, he said, did not show that. Although there were editorials at the time of the initial release of the Progress Reports last fall expressing concern that schools might be stigmatized by getting a C, D or F when students were performing at generally high levels, I question whether anyone thought that schools, and the educators who work in them, would "give up." The more predictable reaction—which I think was born out—was that principals, teachers and parents would simply not believe the Progress Report grades accurately characterized what they saw on a day-to-day basis. A lot of stakeholders don’t believe that the Progress Report grades are reliable measures of school performance, and given what eduwonkette and I have shown about the instability in the student progress measures at the heart of the system, those beliefs are well-founded.

A brief version of the research can be found here. The technical version is now available at the same location.

November 10, 2008

Race, Ethnicity, and the Gifted and Talented Pipeline in New York City

“I’m convinced that there are gifted and talented children in all communities, and that we need to make sure that they avail themselves of the opportunities."
-Joel Klein, October 30, 2007


In the last two weeks, the New York Times reported on two striking trends in gifted and talented enrollments in New York City. Not only have the new admissions requirements to G&T programs reduced the representation of poor, African-American, and Hispanic students in these programs at the elementary level, but we see growing racial disparities in the composition of specialized high schools like Bronx Science, Stuyvesant, and Brooklyn Tech, schools that require a competitive entrance exam.

To be sure, the trend in specialized high school composition sits uncomfortably with the DOE's claim that they are closing the achievement gap separating black and Hispanic students from their white and Asian peers, "in some cases by half." As many others have found, it appears that the achievement gap at the top of the distribution - i.e. the gap separating high-achieving white and high-achieving black students - is even larger than the average gap between the two groups.

Let's take a closer look at enrollment trends in the three of the selective schools. Starting with Bronx Science, we see that the school has a shrinking proportion of African-American students, dipping from 9% in 1999 to 4% this year. The most notable trend here is the increase in the percentage of students who are Asian from 46% in 1999 to 61% in 2008.

bronx%20science.jpg

On to Stuyvesant: Even in 1999, only a small fraction of Stuyvesant students were African-American (3.7%) and Hispanic (3.9%). This year, 2% of Stuy students were African-American and 3% were Hispanic. Again, the most striking trend is the increasing fraction of the student body that is Asian - from 47.8% in 1999 to 68% in 2008.

stuy.jpg

Finally, we see that the proportion of students that are African-American and Hispanic has decreased at Brooklyn Tech. In 1999, 24% of Brooklyn Tech students were African-American; today, 13% of Brooklyn Tech students are. In 1999, 13% of students were Hispanic; today, 8% are. brooklyn%20tech.jpg

These figures do raise some questions about pipeline issues. The Times article makes a lot of the Specialized High School Institute, which starts in 6th grade and helps to prepare students for the test. But only 12% of African-American students and 14% of Hispanic students that attended this institute were ultimately offered admission to a specialized school, while 52% of Asian and 42% of white students were.

Readers, a number of thoughts to kick around: Might 6th grade be too late to start leveling the playing field? What role might the changes in the gifted and talented admissions policy at the elementary level play in shaping admissions to the specialized high schools in the future? Or does the composition of the selective schools in NYC matter at all?

October 23, 2008

Educational Malpractice? Why NYC School Progress Reports Deserve an F

Apologies for being AWOL, folks - both skoolboy and I are on the road. Below, you can find the beginning of an op-ed about the NYC Progress Reports that the two of us wrote for our local West Side Spirit - the link to the full text is below.
Each fall, many New Yorkers head to family physicians for an annual physical. Doctors record some standard measures—body temperature and blood pressure, for example—and perhaps draw some blood to send to the lab. Doctors will also ask about changes in health over the last year. Only after considering all of this information will they make a holistic assessment and recommend an appropriate treatment plan.

This fall, the New York City Department of Education is releasing its version of the annual check-up for schools: the School Progress Reports. September brought the reports for approximately 1,000 elementary, K-8 and middle schools, with the high schools coming shortly. The progress reports assigned these schools a letter grade ranging from A to F, based mostly (60 percent) on their contribution to students’ test scores from last year to this year. The progress report letter grades drive the “treatment plan” for the schools: schools which receive an A or a B are eligible for cash rewards, whereas those receiving a D or F face eventual restructuring or closure.

In doctors’ offices, we count on lab tests, X-rays and other reliable measures of our health. Can we count on the progress reports in the same way?

Our analyses of last year’s and this year’s progress reports suggest that we cannot.
Click here to read the rest.

October 10, 2008

Sol-lywood! Stern Hits the Big Screen on Mayoral Control

In this interview with Simon Doolittle at AfterEd, Sol Stern explains where mayoral control went awry, tries to sell you some stock in Lehman Brothers, and gives Mayor Bloomberg a gentleman's C. Watch a clip below, or check out the whole 15 minute interview here.

October 9, 2008

Driving Michelle Rhee, Plus: Joel Klein Needs Your Help!

ask%20about%20ach%20gap.jpg
If you just got laid off from Lehman, do I have an edu-job for you!

Michelle Rhee needs a "safe, prompt, reliable and comfortable driver to assist [the] Chancellor with her daily schedule and a variety of duties. The incumbent’s primary responsibility is the safe operation of DCPS vehicles for the purpose of transporting the Chancellor to and from events in accordance with the daily itinerary of events."

Do all superintendents of big districts get drivers? I had no idea. But just think about the splashy, tell-all book you could write! I'm out of the running as I have no license, but you can find the job posting here. Or if you're in the market for something different, you might apply for the "Critical Response Team."

Speaking of job postings, Juan Gonzalez wrote a column in the Daily News yesterday about the increasing administrative headcount at the NYC Department of Education. Even though there is a hiring freeze, Gonzalez reported that there are 30 new jobs posted at Tweed with precious titles like, "Knowledge Management Domain Leader for Leadership & Organizational Management" ($170,000). It reminds me of college, where there were gazillions of clubs - coalitions supporting homeless guinea pigs and what have you - so everyone could be the president of something.

With an 18% growth in jobs at Tweed over the last three years, I realized that these guys must be running out of catchy job titles fast, and could probably use our help. Plus, Michelle Rhee will eventually need job titles that sparkle, too.

Help these kids out, folks, and submit a job title below. For the first time ever, there is a real, live prize involved (I've been reading too much Roland Fryer, obviously) - the achievement gap tee shirt pictured above. Boys, don't fret - it comes in other colors. Get your entries in by Monday, October 13th at 5pm. Here are some ideas to get you going:

* Senior Blackberry Correspondent
* Director of Achievement Gap Termination
* Truth Squad Captain
* Senior Finder of Efficiencies
* Chief Term Limit Obliterator
* Senior Transcriber of Diane Ravitch's Remarks

October 2, 2008

A Course in Statistics at Columbia: $3186. The NYC DOE's Comment on Confidence Intervals: Priceless!

two%20headed.jpg
You have to hand it to the New York City Department of Education's Department of Assessment and Accountability. You really do.

Yesterday morning, the NY Times reported that the DOE will now distribute teacher value-added reports to teachers and principals.* Here's the thing - the value-added reports don't just report that a teacher performs at the 65th percentile, the 25th percentile, etc. Instead - as they should - the DOE reports a confidence interval around each teacher's value-added to represent the uncertainty of the estimate. And unsurprisingly, these confidence intervals are quite wide. A 65th percentile teacher in this example has a confidence interval ranging from the 46th to the 84th percentile. In providing this range, the DOE is formally acknowledging that we do not know if this is a below average, average, or above average teacher.

So imagine my surprise when DOE Accountability Czar Jim Liebman popped over last night to criticize skoolboy's post showing that the report cards are flawed because they don't address the uncertainty of the estimates of school progress. To Liebman, this is so much sound and fury.

Morning Jim Liebman, it's my pleasure to introduce you to Evening Jim Liebman. You two are going to be BFF. Or frenemies. It's hard to tell. Though you are the same person, you have remarkably conflicting views about what role uncertainty should play in accountability reports. Maybe it's the coffee.

What beats me is why the DOE reports confidence intervals in their teacher reports if uncertainty is just a concern of pesky bloggers like eduwonkette and skoolboy. By providing confidence intervals on the new teacher reports, the DOE basically concedes that their school grading system is bunk. Oops!

* Sidenote: I question the wisdom of giving teachers and principals information that is likely to be inaccurate, based on all of the reasons articulated here (in short, New York's testing schedule is problematic), but that is for another post.

October 1, 2008

Why skoolboy Is Uncertain about the NYC School Progress Reports

It’s election season, which means that we’re being inundated with polls. The reporting of poll results drives statisticians nuts, because the press often reports the percentage of those surveyed who favor one candidate or another, without taking into account the poll’s margin of error. The margin of error is a way of quantifying the uncertainty in the poll numbers, because even a well-designed poll that surveys a random and representative sample of the population is going to generate an estimate of the true proportion of those in the population who favor a particular candidate. The general rule of thumb is, the more information available in a sample, the less uncertainty in the estimate. A smaller batch of information will yield a more uncertain, or imprecise, estimate than a larger batch of information. This is as true for estimates of the relative performance of schools and teachers—whether in the form of a complex value-added assessment model or a simple percentage—as it is for political polls.

With apologies to anyone who’s had an introductory statistics course, suppose that we were trying to estimate the average age of the teachers in a very small school—one with only four teachers—but we can only draw a sample of three of the teachers to estimate that average. The four teachers are 25, 30, 30, and 55 years old, and the true average age is (25+30+30+55)/4=35. If our sample was the teachers who are 25, 30 and 30, our estimate of the average age of teachers in the school would be (25+30+30)/3=28.25. If our sample was the teachers who are 30, 30 and 50, our estimate of the average would be (30+30+55)/3=38.33. It’s a simple example, but it shows that different samples drawn from a given population can produce quite different estimates, that can be some distance away from the true population value. You wouldn’t want to place too much confidence in a particular estimate if you knew that another, equally valid sample of the same size could generate an estimate that was quite different.

That same logic applies to estimates of school and teacher performance, such as the New York City School Progress Reports. Most of the elements of the Progress Reports are estimates (for an explanation why, see here), but the calculation of the overall letter grades which receive so much attention do not take the uncertainty in these estimates into account. Today, I’ll show that using the 2008 School Progress Reports.

One of the indicators of student progress on the School Progress Reports is the percentage of students who made a year’s worth of progress in English (ELA) and in math from 2007 to 2008. In a given school, each child who was tested in both years can be classified as having made a year’s worth of progress or not, and by totaling up those students who made a year’s worth of progress and dividing by the number of students who were tested in both years, a percentage can be calculated. (There’s an additional wrinkle for students who transferred from one school to another, but it doesn’t affect the logic I’m writing about.)

Each school is compared to a group of 40 peer schools that are judged to be similar based on their demographic and other characteristics. A school’s percentage of children making a year’s progress in ELA is compared to the highest and lowest values in its peer group, and the school gets a peer horizon score that represents its location between the high and low peer group values. For example, if a school had 55% of its students make a year’s progress in ELA, and the percentage for the lowest school in its peer group was 47%, and the percentage for the highest school in its peer group was 71%, the school was located one-third of the way between the lowest and highest schools (8 percentage points above the minimum, out of a possible 24 percentage points above the minimum in the peer group.) That peer horizon score of .33 would be multiplied by the 5.625 points that this component is counted in the calculation of the overall letter grade of the school, yielding a net contribution of 1.875 to the school’s overall score.

The problem is that this calculation doesn’t take into account the fact that all of these percentages are estimates. The chart below looks at one elementary school in particular—Senator John Calandra School (08X014)—and compares it to its peer group of 40 schools. At Calandra, 58.3% of the students made a year’s worth of progress in English in 2008. But the standard error of that percentage is 3.5%, which means that it’s possible that Calandra's true percentage could be anywhere from 51.3% to 65.3%, a wide range. (This range is shown in the “error bars” above and below the estimated percentage for each school.) The same is true for most of the other schools in the peer group. In fact, only two of the 40 schools in the peer group (the ones with the blue markers in the chart) have a percentage that we are confident is higher than Calandra’s percentage. For the other 38 schools in the peer group, we can’t rule out the possibility that Calandra’s percentage is equal to the estimated percentage in those schools. There’s a tremendous amount of overlap among these schools.

08X014.JPG

And yet Calandra received a peer horizon score of .463, and other schools in the peer group whose percentages of students making a year’s worth of progress in English did not differ statistically from Calandra received peer horizon scores ranging from .169 to .903. Calandra’s peer horizon score of .463 counted for 2.6 out of a possible 5.625 points toward the overall score on the School Progress Report. Other peer schools whose percentages did not differ significantly from Calandra’s received from 1.0 to 5.1 points out of a possible 5.625 points on this component of the overall score. Differences of this magnitude could easily make the difference between an overall grade of A and of B, or of B and of C—just due to chance. An accountability system such as the New York City School Progress Reports that doesn’t acknowledge the importance of chance and uncertainty is fundamentally misleading the public about its ability to distinguish the relative performance of schools. Some schools are likely doing significantly better than other schools; the problem is that the School Progress Reports don't provide enough information to judge which ones.

September 30, 2008

Vanity Fair

Rest assured that this blog will not run out of troubling things to write about anytime soon.

NYC-New-Clothes.jpg

September 29, 2008

Guest blogger Betsy Gotbaum on: The Future of Mayoral Control

Betsy Gotbaum is the Public Advocate for the City of New York. The Public Advocate is an independently elected citywide official who serves as a public ombudswoman.

Six years ago, Mayor Michael Bloomberg accomplished what those before him could not: he gained control of New York City public schools, a fragmented, famously troubled bureaucracy that now has about 1.1 million students, 80,000 teachers, 1,450 schools and a budget that, at more than $21 billion, is larger than that of several states. When the New York State Legislature authorized mayoral control in 2002, it added a sunset provision, which takes effect next June. At that time, the Legislature will decide whether mayoral control should continue.

I believe that it should.

I also believe that the law should be amended in certain important ways. Last year, Catherine Nolan, chair of the education committee in the New York State Assembly, asked me to appoint a School Governance Commission to assess mayoral control. Over the course of a year, this independent Commission heard testimony from more than 100 individuals representing broad and diverse constituencies, hosted parent forums, and held public hearings. It also commissioned eight academic papers from experts on mayoral control of schools, which in turn shed light on how the process has worked in other cities. (These papers are to be published as an edited volume, When Mayors Take Charge: School Governance in the City, by Brookings Institute Press.)

In its final report, the Commission recommended that mayoral control be maintained. It also recommended changes to ensure greater public accountability as well as meaningful input from parents and the community.

For some time, it's been clear that we need better oversight of city Department of Education (DOE) finances. We also need better oversight of certain DOE-produced data. I enthusiastically endorse the idea that the city’s Independent Budget Office serves as an outside evaluator to monitor and assess such DOE data as test scores and graduation rates. And, since the DOE spends billions of tax dollars, it must follow the same procurement procedures as other city agencies, including bidding protocols created and monitored by the city comptroller. Since 2003, we have seen the DOE give away more than $300 million by skirting the competitive bidding process. The era of no-bid contracts must end.

The DOE has ignored parents, community leaders and others who have a valid stake in the ways and means of educating New York kids. Virtually shut out of the decision-making process, these stakeholders have been unable to provide meaningful input about issues that directly affect their children’s education.

This regrettable DOE attitude must change. While overall mayoral control should continue, it should be flexible enough to include a certain amount of decentralized authority. This is needed to address such local problems as enrollment, school transfers and school bus routes. Also, given the immense size of the school system and the DOE bureaucracy, parents desperately need someone who's knowledgeable, effective, and locally based to consult when problems or questions arise. Toward this end, the local geographic school districts that were created decades ago should be re-established. They should include superintendents with adequate staff and explicit oversight over principals in a given district.

The Commission recommended that Community District Education Councils (CDECs) should continue as well, though a process should be developed to give them meaningful input into decisions about budgets, general education practices and the opening and closing of schools. I support this recommendation, which provides a valuable tool for local involvement. I also believe that eligibility criteria for CDEC membership should be expanded.

To maintain mayoral control, the mayor must continue to appoint the majority of members of the Panel for Education Policy, a 13-member group that, among other things, reviews education policies proposed by the Chancellor. However, members should be appointed for fixed terms, which would ensure their independence. As it is now, they can be fired at will, and they have been, for disagreeing with the Mayor and Chancellor. I also believe that Panel members should select their own chairperson; currently the Chancellor serves as chair of the PEP. Further, the Panel should be comprised of members with relevant backgrounds and a stake in the education system.

While the Commission sets down the groundwork for stronger community and parent participation in school governance, much more needs to be done. It's difficult to legislate greater opportunities for community input, but the state's Contracts for Excellence model for parental involvement, cited in the report, is a good start. It's also an approach with which the State Senate has experience.

I’m opposed to one Commission recommendation, that the Panel for Education Policy be involved in collective bargaining agreements. Third-party approval would undermine collective bargaining by empowering an entity that is not involved in the process.

Some may believe that the Commission's final report does not adequately assess the current governance arrangement under Mayor Bloomberg and Chancellor Joel Klein. I understand this criticism. The purpose of this effort, however, was to develop a map for the future, regardless of who is mayor or chancellor. I think the Commission has done that.

Some may disagree with the findings, but there's no question that, by and large, they reflect the views that stakeholders expressed throughout this process. Passions and tensions run high when debating this issue, but the debate must take place. The Commission has established a framework in which this debate can and should continue. I look forward to the discussion that lies ahead, and I'm confident that, through an open and deliberative process, school governance in New York City can be improved.

I encourage you to read the Commission report.

September 24, 2008

Could a Monkey Do a Better Job of Predicting Which Schools Show Student Progress in English Skills than the New York City Department of Education?

monkey4.JPG

eduwonkette and I have been blogging about the School Progress Reports released last week by the New York City Department of Education. We’ve shown that, although the performance and environment scores of schools were pretty consistent from last year to this year, the student progress scores were virtually unrelated—knowing a school’s progress score from last year didn’t predict which schools would demonstrate a lot of progress this year. This, we argued, demonstrated that the progress part of the School Progress Report—representing 60% of the letter grade each school received—wasn’t really telling us which schools consistently are promoting student progress, but rather was mostly random error.

The problem was particularly acute in the domain of English Language Arts (ELA). The stability in the student progress scores from 2007 to 2008 was so low that it led skoolboy to wonder if a monkey could actually do a better job predicting which schools show progress in students’ ELA performance in 2008 than relying on the DOE’s 2007 student progress score. The particular measure I examined was the percentage of students in the school making at least one year of progress on the ELA test from last year to this year. (As we've noted in earlier posts, the calculation of this measure changed slightly from 2007 to 2008.)

In the interest of full disclosure, skoolboy didn’t actually rent a monkey to pick the schools. Animals scare him, and he wouldn’t have been able to record the picks while hiding under his bed. What I did instead was use a random number generator to assign each school to the top or bottom half of the distribution of schools on last year’s peer and citywide measures of the percentage of students making a year of progress in English Language Arts.

The DOE got credit for a correct prediction if it correctly predicted that a school would be in the top half of this year’s schools, based on the school being in the top half on the DOE’s 2007 measure, or correctly predicted that a school would be in the bottom half of this year’s schools, based on the school being in the bottom half last year. The monkey got credit for a correct prediction if the randomly-selected location of a school as being in the top half of the 2007 distribution correctly predicted that a school would be in the top half of this year’s schools, or the random pick of being in the bottom half of last year’s distribution correctly predicted that a school would be in the bottom half of this year’s schools. These predictions were done separately for the 570 elementary schools, 128 K-8 schools, and 289 middle schools which received overall letter grades last year and this year.

Round 1. We begin with the peer horizon score for the 570 elementary schools. The DOE’s peer horizon progress score from last year correctly predicted the progress status of 46% of the elementary schools this year. The monkey correctly predicted the status of 51% of this year’s schools.

Score: Monkey 1, DOE 0.

Round 2. We next turn to the citywide horizon score for the 570 elementary schools. The DOE’s citywide horizon progress score from last year correctly predicted the progress status of 47% of the elementary schools this year. The monkey correctly predicted the status of 52% of this year’s schools.

Score: Monkey 2, DOE 0.

Round 3. In this round, we examine the peer horizon scores for the 128 K-8 schools. The DOE’s peer horizon progress score from last year correctly predicted the progress status of 45% of the K-8 schools this year. The monkey correctly predicted the status of 55% of this year’s schools.

Score: Monkey 3, DOE 0.

Round 4. Next, we look at the citywide horizon progress scores for the 128 K-8 schools. The DOE’s citywide horizon progress score from last year correctly predicted the progress status of 43% of the K-8 schools this year. The monkey correctly predicted the status of 47% of this year’s schools.

Score: Monkey 4, DOE 0.

Round 5. The final stage of the competition examines the 289 middle schools. The DOE’s peer horizon progress score from last year correctly predicted the progress status of 40% of the middle schools this year. The monkey correctly predicted the status of 50% of this year’s middle schools.

Score: Monkey 5, DOE 0.

Round 6. The last round looks at the citywide horizon progress scores for the middle schools. The DOE’s citywide horizon progress scores from last year correctly predicted the progress status of 45% of this year’s middle schools. The monkey correctly predicted the status of 49% of this year’s middle schools.

Score: Monkey 6, DOE 0.

skoolboy will forego the cheap jokes about how a monkey could do a better job of managing New York City’s accountability system than the people currently in charge. On the whole, they’re smart, hard-working people, and ridiculing them is not likely to persuade them to change their behavior (as satisfying as it may be at particular moments.) But the system that they have designed and implemented is profoundly flawed, as this comical example illustrates, and it needs to change. eduwonkette and I are going to keep hammering on this point, because it has such important consequences for students and for schools.

And besides: I bet the DOE would beat the monkey in predicting school progress scores in math. (But it wouldn’t be a rout.)

September 22, 2008

Come on Feel the Noise!

Last week, New Yorkers scratched their heads and tried to make sense of the Progress Report results. What does it mean, for example, when 77% of schools that received an F last year jump to an A or a B? Michael Bloomberg has a resolute answer to this question, “Not a single school failed again....The fact of the matter is it’s working.”

Last week, skoolboy and I took to our computers with the newly released data. Of particular concern is the progress measure, which makes up 60% of a school’s grade. Both skoolboy and Dan Koretz have already identified serious flaws in DOE’s test progress model. Even in the absence of these problems, we know that all models of year-to-year growth must contend with measurement error present in two different tests.

What the heck is measurement error? Bear with us for two paragraphs, because this is critical to understanding the central problem with the Progress Reports. A test score is just a proxy for students' underlying skills and competencies. If you give a student a test, the test score represents the combination of her "true" level of skills plus measurement error. This error may be a function of idiosyncratic factors like not eating breakfast (which might hurt your score), having the good fortune of having studied the material that happens to be on the test (which would increase your score over your true level of skill), or a dog barking during the test (which might decrease the scores of all students in a classroom). A "gain score" represents the difference between two test scores, both of which are measured with error, so they provide noisy estimates.

If measurement error was constant, then it would just cancel out when we difference the two scores. But we know that measurement error is likely to be random – the two errors do not just cancel out. Another kind of error stems from sampling variation, which I have discussed here before. In short, the more measurement error (or “noise”) in the results, the harder it is to detect the “signal” that represents a school’s actual contribution to growth in student learning.

In what follows, we demonstrate that there is almost no relationship between NYC schools' progress scores in 2007 and 2008. The progress measure, it appears, is a fruitless exercise in measuring error rather than the value that schools themselves add to students. If we believe that the Progress Reports are in the business of cleanly identifying schools that consistently produce more or less progress, this finding is rather troublesome.

First, some sunnier results: Below, we provide scatterplots of the relationship between the overall environment and performance-level scores in 2007 and 2008 for the 566 elementary schools that received overall grades in both years. In both cases, last year’s score is a strong predictor of this year’s score. To quantify the extent to which two variables move together, we can make use of a measure called a correlation coefficient. A correlation of 0 implies that the variables have no relationship, while a correlation of 1 represents a perfect positive relationship. We find that the correlation is .82 for the performance score and .75 for the environment score. This is exactly what we would expect – schools’ performance or climates do not wildly change from year to year.

Environment%20and%20Performance%20Plots.jpg

But the relationship between the 2007 and 2008 progress scores is quite different – the correlation is -.02. In other words, there is almost no relationship! This is precisely what we would expect to see if the growth measures were primarily capturing measurement error. (These correlations are still low, but slightly larger, for K-8 and middle schools - the correlations were .11 and .15, respectively.)

Progress%20Plot.jpg

We are left with three possible explanations:
1) The poorly constructed progress measure is simply measuring noise.

2) The DOE somewhat tweaked the progress measure for this year, so the results are not comparable.

3) The receipt of and publicity around last year’s progress measures fundamentally changed how New York City’s elementary schools do business, so that schools that were more successful in raising student achievement in 2007 suddenly became less so, and schools that were less successful in raising student achievement in 2007 suddenly became more so.
New Yorkers are left with three courses of action:
* If explanation 1 is correct, we should ignore these report cards altogether because they are primarily (60%) measuring error.

* If explanation 2 is correct, we should not compare schools' grades in 2007 with their grades in 2008, because they are measuring fundamentally different dimensions of school performance. In this case, the collective hysteria that has ensued in NYC schools last week about why grades are up or down is all for naught.

* And if explanation 3 is correct, eduwonkette and skoolboy should shut up and get out of the way of the silent revolution that has transformed public schooling in New York City.
Thanks to skoolboy’s masterful analysis of the data, we present evidence below the fold to suggest that the likely culprit is measurement error. The evidence is not conclusive, because every single element of the progress measure—and there are 16 of them in this year’s student progress measure—changed slightly from last year to this year. The strategy that we pursue below is to compare those elements of the progress measure that were used in both years - for example, the percentage of students making at least one year of progress, or the average change in proficiency scores. Again, we stress that these measures were not identical across years, but one would expect them to be moderately related. Needless to say, that is not what we found. We think it extremely unlikely, given these analyses described in detail below, that this is simply due to a tweaking of the progress report measures.

And what of the third explanation—a fundamental overhaul in the effectiveness of New York City’s elementary and middle schools over the past year that reshuffled the effective and ineffective schools? Magical transformations that shift schools from low to high-progress, or vice versa, are the fabled stuff of Hollywood movies, not reality. Real school change, unfortunately, is not an overnight affair.

Where does this leave NYC parents, teachers, and principals, all of whom are trying to make sense of what these measures mean? Bottom line: It's impossible to know what your A or your F means, because these grades are dominated by random error. Let's hope that the DOE heads back to the drawing board rather than continuing to defend the indefensible.

Continue reading "Come on Feel the Noise!" »

September 18, 2008

GothamSchools Geeks Out on Sampling Error!

Philissa Cramer totally geeks out over at GothamSchools, and posts a great figure showing that smaller schools were more likely to experience wild swings in their school grades. Head over and check it out.

September 17, 2008

Between a Political Rock and a Statistical Hard Place

Some days, skoolboy feels bad for the hard-working folks in the New York City Department of Education. They’re caught between a political rock and a statistical hard place. The political rock is the New York State accountability system, which complies with No Child Left Behind’s requirements to test students annually in grades 3-8 in Mathematics and English Language Arts, and to classify students, based on their test scores, as either Not Meeting Learning Standards (Level I), Partially Meeting Learning Standards (Level II), Meeting Learning Standards (Level III), or Meeting Learning Standards with Distinction (Level IV), and then aggregate the performance of students, and subgroups of students, to assess the school’s progress toward the goal of 100% proficiency for all students by the year 2014. The mechanism for this is a series of grade-specific exams, with a broad (but arbitrary, as Dan Koretz explains in Measuring Up) standard-setting process that define the scores on the exam that correspond to the four proficiency levels. Whatever a student’s scale score on the exam, he or she is classified into a particular proficiency level.

The statistical hard place is that the proficiency levels are only part of the story. The NYC DOE has found that the scale scores matter, such that a student whose scale score is halfway between the cutoffs for Level II and Level III, and therefore whose proficiency level is Level II, has a higher probability of graduating from high school on time than a student whose scale score is right at the cutoff for Level II. The scale scores have predictive validity—that is, they predict educational outcomes that we think of as important—but they don’t have the political currency of the proficiency levels specified by the state and the federal government.

There’s no evidence, to skoolboy’s knowledge, that achieving a proficiency level on NCLB-style exams has any predictive validity over and above the scale scores on which they are based. (Another regression discontinuity design study waiting to happen.) But I’ll wager that they don’t.

Whether or not the state/NCLB proficiency levels matter, the NYC DOE is stuck. They have to pay homage to the state standards, even though their internal evidence shows that partial progress—“learning quite a bit,” in skoolboy’s terms—really does matter for students’ futures, and therefore is something that schools should be held accountable for.

And I don’t disagree. I would be comfortable (though not ecstatic) with school progress reports that used changes in scale scores to quantify how much students had learned from one year to the next, under two conditions: (a) if the exams were vertically linked, and (b) if the uncertainty in the estimates of school-level effects on the average change were taken into account. Neither of these conditions is met in the current New York City School Progress Reports.

Navigating the political rock and the statistical hard place is definitely a challenge, both rhetorically and in the construction of the School Progress Reports. Rhetorically, the DOE is obliged to argue that a student who is Level III in fourth grade and Level II in fifth grade has lost ground—that student has fallen off of the sharp Level III cliff—because the state and federal accountability metrics treat this as a sharp discontinuity. But as a practical matter, the student may not have fallen off a cliff; rather, she may be just a little bit lower on a gradual hill in fifth grade than we’d like, but still higher on the hill than she was in fourth grade--and the DOE’s internal analyses document that anyone who is higher on the hill is better off than someone lower.

What’s the DOE to do? Well, it could continue to escalate the rhetoric directed toward its critics. (I note with alarm that the DOE went from calling me by my blogging name “skoolboy” on Monday to calling me “Professor Pallas of Teachers College” on Wednesday—whose proclivity to giving A’s to all of his students will come as a surprise to many of them—what’s next? Examining my teeth?) Or it could speak honestly and openly about the challenge of incorporating political and technical realities into the School Progress Reports. I think readers know which path skoolboy recommends.

Guest Blogger Daniel Koretz on New York City's Progress Reports

Koretz.jpg
Daniel Koretz is a professor who teaches educational measurement at the Harvard Graduate School of Education. He is the author of Measuring Up: What Educational Testing Really Tells Us. Below, he weighs in on the NYC Progress Reports that were released yesterday.

eduwonkette: One of the key points of your book is that test scores alone are insufficient to evaluate a teacher, a school, or an educational program. Yesterday, the New York City Department of Education released its Progress Reports, which grade each school on an A-F scale. 60 percent of the grade is based on year-to-year growth and 25 percent is based on proficiency, so 85 percent of the grade is based on test scores. Do you have any advice to New Yorkers about how to use - or not to use - this information to make sense of how their schools are doing?

Koretz: This is a more complicated question in New York City than in many places because of the complexity of the Progress Reports. So let’s break this into two parts: first, what should people make of scores, including the scores New York released a few weeks ago, and second, what additional should New Yorkers keep in mind in interpreting the Progress Reports?

In the ideal world, where tests are used appropriately, I give parents and others the same warning that people in the testing field have been offering (to little avail) for more than half a century: test scores give you a valuable but limited picture of how kids in a school perform. There are many important aspects of schooling that we do not measure with achievement tests, and even for the domains we do measure—say, mathematics—we test only part of what matters. And test scores only describe performance; they don’t explain it. Decades of research has repeatedly confirmed that many factors other than school quality, such as parental education, affect achievement and test scores. Therefore, schools can be either considerably better or considerably worse than their scores, taken alone, would suggest.

However, there is another complication: when educators are under intense pressure to raise scores, high scores and big increases in scores become suspect. Scores can become seriously inflated—that is, they can increase substantially more than actual student learning. This remains controversial in the education policy world, but it should not be, because the evidence is clear, and similar corruption of accountability measures has been found in a wide variety of different economic and policy areas (so widely that it goes by the name of “Campbell’s Law”). High scores or big gains can indicate either good news or inflation, and in the absence of other data, it is often not possible to distinguish one from the other. As you know, this was a big issue in New York City this year, in part because some of the gains, such as the increase in the proportion at Levels 3-4 in 8th grade math, were remarkably large.

New York City is a special case. It is always necessary to reduce the array of data from a test to some sort of indicators, and NYC has developed its own, called the Progress Reports, which assign schools one of five grades, A through F. My advice to New Yorkers is to pay attention to the information that goes into creating the Progress Reports but to ignore the letter grades and to push for improvements to the evaluation system.

The method for creating Progress Reports is baroque, and it is hard to pick which issues to highlight in a short space. The biggest problems, in my opinion, lie in the estimation of student progress, which constitutes 60% of the grade. The basic idea is that a student’s performance on this year’s test is compared to her performance in the previous grade, and the school gets credit for the change. It sounds simple and logical, but the devil is in the details. (For a non-technical overview of the issues in using value-added models to evaluate teachers and schools, see “A Measured Approach”.)

To keep this reasonably brief, I’ll focus on three problems. First, the tests are not appropriate for this purpose. skoolboy made reference to part of this problem in a posting on your blog. To be used this way, tests in adjacent grades should be constructed in specific ways, and the results have to be placed on a single scale (a process called vertical linking). Otherwise, one has no way of knowing whether, for example, a student who gets the same score in grades 4 and 5 improved, lost ground, or treaded water. The tests used in New York were not constructed for this purpose, and the scale that NYC has layered on top of the system for this purpose is not up to the task.

And that points to the second problem, which again skoolboy noted: the entire system hinges on the assumption that one unit of progress by student A means the same amount of improvement in learning as one unit by student B. This is what is called technically an interval scale, meaning that a given interval or difference means the same thing at any level. Temperature is an interval scale: the change from 40 to 50 degrees signifies the same increase in energy as the change from 150 to 160. There is no reason to believe that the scale used in the Progress Reports is even a reasonable approximation to an interval scale. It starts with the performance standards, which are themselves arbitrary divisions and cannot be assumed to be equal distances apart. The NYC system assigns to these standards new scores that nonetheless assume that the standards are equidistant—so, for example, a school gets the same credit for moving a student from Level 1 to Level 2 as for moving a student from Level 2 to Level 3. Moreover, the NYC system assumes that a student who maintains the same level on this scale has made “a year’s worth of progress.” That assumption is also unwarranted, because standards are set separately by grade, and there is no reason to believe that a given standard, say, Level 3, means a comparable level of performance in adjacent grades. (There is in fact some evidence to the contrary.)

The result is that there is no reason at all to trust that two equally effective schools, one serving higher achieving students than another, will get similar Progress Report grades. Moreover, even within a school, two students who are in fact making identical progress may seem quite different by the city’s measure. There may be reasons for policymakers to give more credit for progress with some students than for progress with others, but if one does that, you no longer have a straightforward, comparable measure of student progress.

And finally, there is the problem of error. People working on value-added models have warned for years that the results from a single year are highly error-prone, particularly for small groups. That seems to be exactly what the NYC results show: far more instability from one year to the next than could credibly reflect true changes in performance. Mayor Bloomberg was quoted in the New York Times on September 17 as saying, “Not a single school failed again. That’s exactly the reason to have grades…It’s working.” This optimistic interpretation does not seem warranted to me. The graph below shows the 2008 letter grades of all schools that received a grade of F in 2007. It strains credulity to believe that if these schools were really “failing” last year, three-fourths of them improved so markedly in a mere 12 months that they deserve grades of A or B. (The proportion of 2007 A schools that remained As was much higher, about 57 percent, but that was partly because grades overall increased sharply.) This instability is sampling error and measurement error at work. It does not make sense for parents to choose schools, or for policymakers to praise or berate schools, for a rating that is so strongly influenced by error.

We should give NYC its due. The Progress Reports are commendable in two respects: considering non-test measures of school climate, and trying to focus on growth. Unfortunately, the former get very little weight, and the growth measures are not yet ready for prime time.

2008 Letter Grades of Schools that Received an F Grade in 2007

NYC%20F%20schools.png

NYC Progress Report Chutes and Ladders!

A week ago, skoolboy encouraged readers to predict schools' upward and downward grade mobility. Here's how that shook out. When 26% of elementary and middle schools that received Fs last year - 9 schools - climb from a F to an A, it does make you wonder what exactly it is that we are measuring. Likewise, 26 schools cascaded from As or Bs to Ds or Fs. Readers, stare into the table and tell me what you see...

grade%202007%20and%202007.jpg

September 16, 2008

In NYC, More F Schools than A Schools in Good Standing with NCLB

Some of you have asked what fraction of NYC schools receiving each Progress Report grade are in good standing with NCLB. As a refresher, NCLB labels schools in need of improvement based on overall proficiency. NYC's system is based 60% on year-to-year growth, 25% on proficiency, 5% on attendance, and 10% on surveys.

Given these differences, perhaps you won't be surprised to find that a higher fraction of F schools are in good NCLB standing than are A schools:

* 74% of A schools are in good standing with NCLB

* 67% of B schools are in good standing with NCLB

* 69% of C schools are in good standing with NCLB

* 48% of D schools are in good standing with NCLB

* 89% of F schools are in good standing with NCLB

What if we just look at the "performance grade", aka the proficiency grade, that each school received, and see how that maps on to NCLB good standing? Recall that this year, schools also were given separate grades for the performance, progress, and environment categories. I guess the peculiar results below are a function of the fact that schools are being compared to peer groups, but here's what I've got:

* 86% of A schools based on proficiency on the are in good standing with NCLB

* 60% of B schools are in good standing with NCLB

* 60% of C schools are in good standing with NCLB

* 51% of D schools are in good standing with NCLB

* 75% of F schools are in good standing with NCLB

Irreconcilable Differences: Why NYC’s Surveys Provide a Misleading Portrait of School Quality

eduwonkette-NYC.jpg
My heart went out to Charlie Gibson last week, as he stared into those doe eyes that will not blink and realized that he could not wrangle a single straight answer out of Miss Wasilla.

So I can only imagine how the NYC Department of Education analysts’ felt when they sat down to analyze the data from student, parent, and teacher surveys this year. It turns out that you get as much valid and reliable information out of these surveys as Gibson managed to pull out of Sarah Palin.

The problem is a very simple – and very predictable – one. Survey responses constitute 10% of the Progress Report Grades, and schools face very real consequences if their schools receive a poor grade. Faced with such pressures, we expect that the adults who fully understand these consequences – parents and teachers – will provide a rosier picture of the school than truly exists.

If all schools did this equally, the inflation of survey responses would not be a problem; we could still rank schools by their perceptions of safety, engagement, or what have you. We would not have a clean measure of how safe a school is overall, but we would know how safe it was relative to other schools – a central objective of the grading system.

Alas, schools face different incentives to inflate their survey responses. If you’re a teacher filling out a survey in an F school, you know that your school could very well be closed if its grade doesn’t improve. Compared to a teacher filling out a survey in an A school, you’re more likely to put on a happy face.

One way to get at this problem is to compare changes in the teacher responses to the survey with changes in the student responses. We know that students and teachers don’t see eye-to-eye about school conditions, so we don’t expect them to provide comparable assessments of the school in any given year. But if teachers report improvement at a rate that far outpaces the improvement reported by the students, and this happens more in D and F schools than A and B schools, we have pretty good evidence that teachers have inflated their responses.

To get a handle on survey inflation, I did a basic calculation for each of the 4 survey domains: safety, communication, academic expectations, and engagement. Using the example of safety, I calculated:

(2008 Teacher Survey Score for Safety – 2007 Teacher Survey Score for Safety) –
(2008 Student Survey Score for Safety – 2007 Student Survey Score for Safety)


At schools that have positive scores on this measure, teachers report a pace of improvement that outpaces the improvement that students report. Kids are often the best check on us wily adults, and it turns out that they function as a first-rate BS detector in this case. I should also note that students may be pressured to inflate their scores, so if anything, the difference between the teacher and student changes is a lower bound measure of survey inflation.

The first graph below reports the average of these differences for the safety measure for high schools receiving A to F grades. At A schools, students and teachers saw improvement happening equally – there is almost no difference between the change in teacher scores and the change in student scores. At F schools, there are tremendous differences between the rate of improvement reported by teachers and students.

hs%20safety.jpg

The teacher-student discrepancy exists for every measure on the survey. Next, let’s look at the engagement measure for high schools.

hs%20engagement.jpg

Bottom line: survey inflation exists across the board, but is worst at D and F schools. If you’d like figures for the other domains or school levels, feel free to email me. The irony, of course, is that instead of having better information about how things are going in NYC schools, incorporating the surveys in the grading scheme has fundamentally corrupted this measure.

September 14, 2008

Let the Spin Begin

top.gif

Suppose that your fourth-grader takes a state test that shows that she understands the associative property of multiplication, can multiply two-digit numbers by two-digit numbers, and can find the perimeter of a polygon by adding up the length of the sides. A year later, as a fifth-grader, she takes a test that shows that she can compare fractions and decimals using <, > or =; identify the factors of a given number; simplify fractions to their lowest terms; and knows that the sum of the interior angles of a quadrilateral is 360 degrees—but she cannot yet create algebraic or geometric patterns using concrete objects or visual drawings (e.g., rotate and shade geometric shapes). Would you say that your child had lost ground in proficiency, or actually gone backward?

Jim Liebman would. Liebman, the Columbia University law professor on leave as Chief Accountability Officer at the New York City Department of Education, is quoted and paraphrased in an article by Jim Dwyer in Saturday’s New York Times on the F grade that P.S. 8 in Brooklyn Heights will receive in this year’s School Progress Reports—a grade that many are finding hard to believe, given that 80% of the students tested in the school are judged proficient in math, and two-thirds are judged proficient in English Language Arts. Doubly embarrassing, in that Chancellor Joel Klein and Mayor Mike Bloomberg have publicly declared the school to be successful and worthy of emulation.

So the spinmeisters are out, and the spin here is justifying the grade of F by arguing that the children in P.S. 8 are going backward. “You drop them off at the beginning of the year, and on average, by the end of the year, your child lost ground in proficiency,” Dwyer quotes Liebman as saying. “Where was the child last year, and where is the child this year?” Liebman asked. “You’re comparing them to themselves.”

A gentle reminder to Mr. Liebman, who was hired in January, 2006: the state math and ELA tests which children take, and are the primary basis for assigning these lovely letter grades, are not vertically equated. (See skoolboy's testing primer here.) This means that there is no basis for comparing performance on the fourth-grade test with performance on the fifth-grade test. For each test, there is a subjective judgment about what level of performance constitutes proficiency, but the tests are independent. There is no basis for claiming that children are going backward; there’s no justification for claiming that a child “lost ground in proficiency,” since proficiency doesn’t exist in the abstract, but rather in grade-specific skills; and the children are not being compared to themselves, but rather their location in the distribution of children’s performance in one year is being compared to their location in the distribution of children’s performance the following year.

Perhaps Jim Liebman simply misspoke, as perhaps did Chancellor Joel Klein when he referred to statistical significance as “playing something of a game.” Such missteps might arise from the tremendous pressure to justify a particular high-stakes evaluation of a school when there are multiple sources of information about school performance that point in different directions—NCLB status, achievement levels, gains, school quality reviews, not to mention the public pronouncements of Liebman’s boss, and his boss’s boss.

There’s nothing wrong, in skoolboy’s view, in looking at students’ achievement growth as one of several criteria for judging how well a school is doing in relation to other schools. But I would never think of using year-to-year changes in proficiency levels on just two tests as the primary basis for evaluating a school’s performance. And neither would most people who study testing and assessment for a living.

September 7, 2008

Predicting the Near Future*

question_marks.jpg

Sometime soon, with great fanfare, the New York City Department of Education will release this year’s School Progress Reports. (Word on the street is that schools already know their grades.) The School Progress Reports, for better or worse, are the centerpiece of the NYC accountability system. (skoolboy thinks for worse, but more on that later.)

The DOE has made a number of changes to the Progress Reports for this second iteration, and I think that eduwonkette had something to do with that (as did other critics and analysts outside of the Tweed inner circle.) We can expect to see separate letter grades for the three major dimensions on which the Progress Reports are based: school environment (including attendance, and parent, teacher and student surveys), student performance, and student progress. But the overall format appears to be unchanged: most of the grade is based on student progress on test scores, and such gains are not very reliable from one year to the next. There is, in skoolboy’s opinion, a false sense of precision conveyed by these letter grades, as they are based on components that are measured with error, but that measurement error is not reflected in how the grades are calculated. And I’m particularly annoyed at the misuse of social surveys for accountability purposes.

Nevertheless, the DOE is marching onward, and we’ll have this year’s grades to pore over in the near future. (And you can bet that eduwonkette will put on the green eyeshade for this, even though it clashes with her cape and mask.) How many schools will improve their grade from last year to this year? How many will fall? It’s time to make some predictions. What do you think, readers?

Here's a five-by-five table designed to show how this year’s grades are associated with last year’s grade. Each column represents last year’s grade, and each row represents a possible outcome for this year. The column percentages will add up to 100%. Try to fill in the blanks: What percentage of the schools that received A’s last year will receive an A this year? What percentage of A’s will decline to B’s? What fraction will fall further to C’s, D’s, and F’s? At the other end of the spectrum, what percentage of last year’s F’s will remain F’s? What percentage will climb out of the cellar to obtain a D? Will any make the leap from F to A?

crosstab.JPG

As a reminder, last year, about 23% of schools received an A; 38% received a B; 26% received a C; 8% received a D; and 4% (i.e., 53 schools) received an F.

A caveat: The DOE knows that the legitimacy of the School Progress Reports depends on the grades not being too volatile from year to year. If 75% of last year’s A’s became F’s this year, no one would take this scheme seriously. (And if schools that everyone views as exemplary or high-performing got middling grades, this too would call the scheme’s legitimacy into question. So don't expect Stuyvesant High School to get a C.) There may not be very much fluctuation from last year to this. You can be sure that the DOE has constructed this year’s scores so that there’s not too much instability from last year to this year.

But since we believe in incentives on this blog, the reader who comes closest to the actual association between last year and this year shall receive a prize to be selected by eduwonkette—and we know how creative she can be. Be sure to fill in all 25 blanks.

*Employees of Tweed Courthouse, KPMG Consulting, and the Parthenon Group are ineligible for this contest.

August 27, 2008

NYC Links: Klein Petrilli Barcelona

Javier-B.JPG

1)Klein Petrilli Barcelona: Mike Petrilli has a stalker, he says, and it's not the sizzling Javier Bardem. Nonetheless, the NY Times blog chronicles it all here.

2) Welcome Meredith Kolodner!: I'm a little late, but the Daily News has a new education beat reporter who, from this article on NYC's SAT scores, seems to like digging into the numbers. Though the DOE stressed that the number of students scoring at 600 or above went up 3.6 percent, Kolodner recognized that if the average is falling, there must be more low scoring students as well. As she wrote: At the same time, the number of students scoring below average also increased by 3.2%, reflecting a polarization in student results. This is neither here nor there in terms of policy - simply put, more students are taking the SAT in NYC than in the past - but I'm impressed with her smarts already.

3) A Tale of Two Schools: NYC Parents posts a letter from a Jamaica High School teacher about the resource disparities separating his school from the new small school in the building.

4) Grad Rate Round-Up: Philissa Cramer at GothamSchools has the skinny on NYC grad rates.

5) Mimi is the Best: The author of It's Not All Flowers and Sausages is possibly NYC's funniest teacher: Rumor has it that we are going to be getting new seat sacks emblazoned with the school's name and mascot. However, we will NOT be receiving any paper what-so-ever.

6) Core Knowledge x NYC: Dan Willingham weighs in on the role of content knowledge in reading - relevant to NYC's pilot of the Core Knowledge reading program - over at their site.

August 18, 2008

Graduation Rates in NYC: The Long View

Last Thursday the NY Sun gave the Times editorial board a well-deserved spanking for ignoring its own backyard. Buried in the piece is a description of Bloomberg's latest temper tantrum, this time over the gall of a reporter for - gasp! - asking questions about the graduation rate:
Perhaps in their coverage of the No Child Left Behind law the mandarins of Eighth Avenue have fallen victim to the law of Not In My Backyard. They'd certainly be in good company. Announcing the latest graduation rate results, Mayor Bloomberg could not for his life fathom why our reporter Elizabeth Green might inquire as to his opinion on the charge that graduation rates are inflated by schools trying to put on a good face.

"I'm sort of speechless," the mayor said. "Is there anything good enough to just write the story?"
Using enrollment data from the DOE Statistical Summaries, the graph below plots the proportion of 9th graders still enrolled in 12th grade 3 years later beginning with the cohort that entered 9th grade in 1995. Thus, we can follow the 4-year attrition patterns of every 9th grade cohort beginning high school between 1995 and 2004. Though looking at "promoting power" this way is not the best way to look at overall graduation rate levels (there are both upward and downward biases and it's difficult to figure out how they shake out), it does provide a better way to look at long-term trends than any other data available.

The graph below suggests that graduation rates in New York City did indeed increase for the cohort that entered school in 2000 and again for the cohort that entered in 2001, which four years later would have been in 12th grade in 2004 and 2005, respectively. The graduation rate has largely been flat for the last four years, which would represent the classes that entered high school from 2002 onwards.

Grad%20Rate%2099_08.jpg

In his weekly radio address yesterday, Bloomberg argued that mayoral control is the primary driver behind increasing graduation rates. Hmmm. The graduating class of 2004 had finished its first 2.5 years of high school before the Children First reforms were even announced in January 2003, and the graduating class of 2005 had already made it through the first 1.5 years of high school. Since the entering 9th grade class of 2002, these 4-year figures have largely been flat.

I'm happy to cheer for increasing graduation rates for New York City kids - though I wish the proportion of classes passed through credit recovery was also publicly reported - but the time ordering here makes it impossible to attribute them to mayoral control.

August 14, 2008

eduwonkette flies over to GothamSchools: NYC Graduation Rates

eduwonkette-NYC.jpg
NYC Readers - Wondering what's going on with the graduation rates that were released this week? Head on over to GothamSchools, where I will be posting occasionally on NYC education issues, and check out a map of 4-year cohort graduation rates across the city.

August 10, 2008

A New Slogan for New York City: "Reach Out and Test Someone"

A week ago, you submitted 47 slogans for the New York City Department of Education, and I picked one to illustrate. The winning slogan comes from Gary Babad, Chief Satirical Officer at the NYC Public School Parents blog. Congrats Gary, and thank you to everyone who contributed a slogan!

reach-out-and-test-someone.jpg

August 5, 2008

An Unchanged NYC Achievement Gap Hits the Papers (Plus, Joel Klein's Postmodernist Turn!)

science%20who%20needs%20it.jpg
With her article on New York City's lack of progress in closing the achievement gap, Elizabeth Green demonstrates once again that's she the sharpest and most inquisitive education reporter in New York City. I'm pretty sure she's the second coming of Josh Benton, formerly of the Dallas Morning News, who wowed us all with his analyses of original data.

Bottom line: Three NYC professors – Bob Tobias (NYU prof who ran the NYC testing department for 13 years), Howard Everson (Fordham prof and advisor to New York State Ed.), and Aaron Pallas (TC prof) – all agree that there’s not much action on the achievement gap in New York City.

The most priceless parts of the article involve Onion-worthy quotes from newly minted postmodernist Joel Klein. Apparently, the achievement gap is really just a matter of opinion!

Silly me – I thought New York City was data-driven. Never mind.

Note to self: burn my stats books and sprinkle their ashes over Tweed, deinstall Stata, and buy Foucault, Derrida, and Baudrillard. I’m tired of those damn social scientists getting in my way. Science is dead! Let’s nuke positivism! Readers, are you up for it?

Here are some delicious snippets and my commentary in italics:

1) “In an interview at Tweed Courthouse, the schools chancellor, Joel Klein, said the achievement gap is ‘an issue,’ but he said it should not obscure the significant gains black and Hispanic students have made under his watch.” [Hey, wait! What about that “Educational Equality Project” that was founded specifically around closing the achievement gap? Now it’s not important? Huh? And PS - your own "Chief Equality Officer" Roland Fryer has written two important articles about the achievement gap focused on gaps in scale scores, not proficiency!]

2) “Mr. Klein criticized the National Center on Education Statistics analysis. ‘Those are just confidence levels. Nobody is saying this is a science,’ Mr. Klein said. He added: ‘If three points is flat, and four points is statistically significant, then what you're doing is, you're playing something of a game.’” [A piece of free advice for the Tweed PR Department: You guys need to get someone else out front when there are numbers involved. Your fearless leader’s statistical prowess is quickly becoming the best evidence of high variance in male math achievement, such that men are overrepresented at the bottom of the distribution.]

The Department of Education’s shameless attempt at big lie propaganda can be found here, as can the New York Sun’s experts’ analysis of the data.

Update: Kelly Vaughan at Gotham Schools is all over this, too.

August 1, 2008

New York City Achievement Gap Round Robin

bird.jpg
Check out these links on the NYC achievement gap dust-up:

1) All Tricks, No Treats: Head over to the National Review Online, where dataman Robert VerBruggen takes a stab at the NYC state achievement gap data. In Has NYC Discovered the Trick for Closing the Achievement Gap?, he writes:
That question has important ramifications for college admissions and affirmative-action policies. The schools claim the answer is yes....as yours truly will further argue in the ridiculously long post after the jump, that doesn't appear to be the case.
2) Madonna Revenge: Achievement gap virgin Mike "Milli" Petrilli argues over at Flypaper that proficiency is what's important, not the continuous achievement gap. I've planned a longer post on why the achievement gap matters, but for now, a few words from "When Measuring Achievement Gaps, Beware the Proficiency Trap:"
The proficiency view, to my mind, is certainly important to consider when we are thinking about building stocks of human capital. But if we are concerned about inequality and social stratification - ensuring that, on average, every demographic and socioeconomic group is equally prepared to compete in higher education and the workplace - relative achievement measured on a continuous scale is what matters, not proficiency rates.
3) More Sorry than Eliot Spitzer: Matthew Tabor won a no-bid contract with the NYC DOE to write David Cantor's letter of apology for denying the public access to data that are rightfully public. Read the whole thing, but here's a taste:
Dear New Yorkers,

This last Sunday I denied a public information request inappropriately. When one is overcome with a bitter, “them vs. us” attitude on top of a penchant for political game-playing and a disinterest in public communication, surely you understand how these things happen.

July 31, 2008

Last Chance to Submit a Slogan!

gangstagreen.gif
Last call! Submit your slogan by the end of the day. I'll put a few together as logos and we'll vote next week. My new submission is "The NYC DOE: Truthiness in Education." Some of yesterday's submissions included:

The NYC DOE: Redefining success since 1876 (MD)

The NYC DOE: Do as I say not as I do (cha424)

The NYC DOE: Proficiency is forever. (Doug)

The NYC DOE: Making words and data mean whatever we want them to mean. (BHR)


Here's the original challenge:

Every legit corporation has a catchy slogan. Nike rocks "Just Do It!" GE "brings good things to life." But what about the NYC Department of Education?

Until the end of the week, you can submit your slogan for the NYC Department of Education as a comment. Surely we can come up with something splashy.

Enter early, enter often. Based on the Halloween Edu-Parade, NYC School Report Card Haiku Contest, and the Valentine's Day Edu-Poetry contests, this should be fun.

In light of the New York City Comptroller's report issued this morning, which found that the Department of Education ignored bididng rules and overspent on travel, food, and conferences expenses in the middle of a budget crisis, my entry is:

The NYC DOE: Damn, It Feels Good to Be a Gangsta!

If you don't get the "Office Space" reference, you can watch this video.

Introducing Gotham Schools: A New York City Schools Blog!

gotham1_lg.jpg
If you follow NYC schools, here's a new must read blog for you - Gotham Schools. When the Open Planning Project lined up two of NYC's most talented education bloggers - Philissa Cramer (formerly of Inside Schools) and Kelly Vaughan (a NYC teacher for the last eight years) - I knew we could expect big things from this site. Here's a description:
GothamSchools is a news source and online community for teachers, parents, policy makers, and journalists interested in learning about what works and what doesn’t in the nation’s largest school district. We seek to provide a clearinghouse for New York City school news and commentary, connect teachers and parents with resources, highlight effective practices in policy and pedagogy, and build a participatory knowledge base about education in New York City. By offering a critical eye on education research and reporting, and by creating a forum for conversation, GothamSchools is helping New Yorkers create better schools.

July 30, 2008

On New York State Tests, A Growing Achievement Gap Between White/Asian and Black/Hispanic New York City Students

Looks like the tea party is finally over. As we all expected, the New York City Department of Education had questionable motives for stalling the release of the New York State scale score data.

Here's why: The achievement gap in New York City has increased in the last five years, and the decreases in the achievement gap in grade 8 ELA have come at the expense of white and Asian students. Coupled with my analyses of NAEP achievement gaps - which also showed no progress and in some cases growing gaps - these findings are quite troubling.

Over the period 2003 to 2008, New York state tested only 4th and 8th graders continuously. Here's what I found:

* In 4th grade ELA, the black-white, Hispanic-white, and black-Asian gaps have all grown. The black-white gap has increased by 13%, the Hispanic-white gap by 6%, and the black-Asian gap by 2%. The Hispanic-Asian gap has narrowed by a measly 4%.

* In 8th grade ELA, gaps have decreased - but only because the average scale scores of Asian and white students fell between 2003 and 2008. In contrast to Bloomberg's claims of a 50% reduction in some cases, the black-white gap has only decreased by 12%, the Hispanic-white gap by 3%, the black-Asian gap by 21%, and the Hispanic-Asian gap by 13%.

* In 4th grade math, black-white, Hispanic-white, black-Asian, and Hispanic-Asian gaps have all grown. The black-white gap has increased by 7%, the Hispanic-white gap by 5%, the black-Asian gap by 15%, and the Hispanic-Asian gap by 13%.

* In 8th grade math, the black-white, black-Asian, and Hispanic-Asian gaps have all grown. The black-white gap has increased by 4%, the black-Asian gap by 28%, and the Hispanic-Asian gap by 21%. Only the white-Hispanic gap has decreased by 6%.

It's hard to imagine how the NYC Department of Education will finesse these results. Rest assured that they will try. So sit back, grab some popcorn, and get ready for some serious spin. As I suggested below, an apt slogan might be, "The New York City Department of Education: It depends on what the meaning of the word 'is' is."

You can find the scale scores, and the race-specific standard deviations, here. Analysis-wise, here's what I did: I calculated the point in the white distribution that the average black student scored: i.e. (White average score for 2003 - Black Average score for 2003)/White standard deviation. I performed the same calculation for all of the gaps above for both 2003 and 2008. If you have questions, as always, email me at eduwonkette (at) gmail (dot) com.

July 29, 2008

The Persistent Achievement Gap in New York City: A Summary

Some readers asked me to put together a summary about the achievement gap in New York City:

1) Proficiency rates, or the percentage of students passing a test, are often used to measure achievement gaps. For example, if 90% of white students passed a test and 65% of black students did, some observers will say that the achievement gap is "25 points."

2) Proficiency is a misleading and inaccurate way to measure achievement gaps. Primarily, the problem is that we cannot differentiate between students who just made it over the proficiency bar and those who scored well above it. Proficiency rates can increase substantially by moving a small number of kids up a few points - just enough to clear the cut score. But black and Hispanic students may still lag far behind their peers even as their proficiency rates increase.

3) The most valid way to measure gaps between groups is to compare the test score distributions of the groups. What this means is that we compare average scale scores as well as differences between low-scoring white/Asian and Hispanic/black students (i.e. students scoring at the 10th percentile of their respective groups) and differences between high-scoring students (i.e. students scoring at the 90th percentile of their respective groups. In my posts last week, I focused on average scale scores - next, I'll take a look at the whole distribution.

4) When we compare average NAEP scale scores over time, there is no change in the white-black and white-Hispanic achievement gap in NYC for any subject or grade level. (See reading and math.)

5) In grade 8, the black-Asian and Hispanic-Asian gaps have grown: The black-Asian and Hispanic-Asian gaps in reading, and the Hispanic-Asian gap in math, have grown substantially and these differences are statistically significant. The black-Asian gap in math has also grown, but the differences aren't statistically significant.

6) The end result is that the average black and Hispanic student in New York City is as far behind - and in some cases, further behind - their white and Asian peers as they were five years ago.

In sum, we need an Educational Equality Project, indeed - but for black and Hispanic kids' sake, let's hope it's not modeled off of New York City's faux progress in closing the achievement gap.

July 28, 2008

No Cape for Cantor

cape.jpg
David Cantor, the New York City Department of Education's press secretary, will not be receiving a free cape. Sadly, this is what it's come to in New York City - the Department of Education is denying all of us access to data that rightfully belong in the public domain.

****

From: eduwonkette
To: Cantor David
Cc: Jacob Andrew
Sent: Sun Jul 27 22:18:22 2008
Subject: Requesting scale scores

Dear David,

I saw over at eduwonk that you are giving out the scale scores by race/ethnicity. Could you please send these scores to me (2003-2008), or provide a statement that I can post on the blog as to why you will not?

Thank you,
eduwonkette

****

From: Cantor David
To: eduwonkette
Date: Sun, Jul 27, 2008 at 11:42 PM
Subject: Re: Requesting scale scores
mailed-byschools.nyc.gov

I've thought about it and decided i don't want to give out information to someone asking anonymously.

I'm sure you'll have no difficulty finding what you need.

Update: Matthew Tabor is my hero. Read his FOIL request here.

July 25, 2008

New York City Achievement Gap Round-Up: Three Cheers for Our Naked Emperor!

NYC-New-Clothes.jpg
A child, however, who had no important job and could only see things as his eyes showed them to him, went up to the carriage. "The Emperor is naked," he said.

"Fool!" his father reprimanded, running after him. "Don't talk nonsense!" He grabbed his child and took him away. But the boy's remark, which had been heard by the bystanders, was repeated over and over again until everyone cried: "The boy is right! The Emperor is naked! It's true!"

The Emperor realized that the people were right but could not admit to that. He thought it better to continue the procession under the illusion that anyone who couldn't see his clothes was either stupid or incompetent.
- Hans Christian Andersen, The Emperor's New Clothes
If you've been reading over the course of this week, perhaps you, too, now realize that New York City's emperor, much as he likes to travel across the land saying that he's significantly narrowed the achievement gap in New York City, is wearing no clothes.

The achievement gap has not narrowed at all in math for any grade level or subject.

Nor has the achievement gap narrowed at all in reading for any grade level or subject.

So perhaps it is time for us to rethink the reform strategy in New York City, and ask some difficult questions about why this strategy has not narrowed the gap between white and Asian students and their African-American and Hispanic peers. K-12 schools are in a tough spot when African-American students show up at kindergarten testing at the 35th percentile of the white distribution in reading and the 25th percentile in math. They are even further disadvantaged by racial inequalities in how students' out of school time is spent. Remember, kids only spend 22% of their waking hours between kindergarten and 12th grade in school.

We can and should push and support schools to do better. But all of the pushing we've seen in New York City has amounted to no change in the size of the achievement gap. Why should we expect anything different from the reforms proposed by the Educational Equality Project, which has taken New York City's reforms as something of a blueprint?

Bottom line: if we want to close the achievement gap, it will not happen by schools alone. Let's ask schools to do better, but let's also be realistic about how far those efforts will take us. Unfortunately for New York City's black and Hispanic kids, these efforts have left them as far behind their white and Asian peers as they were 5 years ago.

July 22, 2008

In New York City, A Long Wait Ahead to Close the Math Achievement Gap

MLK%20Birmingham.jpg
Today I will lay out the math achievement disparities separating black and Hispanic New York City students from their white and Asian counterparts on the National Assessment of Educational Progress (NAEP). Needless to say, Mayor Bloomberg and Chancellor Klein forgot to mention these inconvenient facts when they testified before Congress last week:

* In 2007, the average African-American 8th grade student in NYC performed at the 20th percentile of the white distribution in math, and at the 15th percentile of the Asian distribution. Put differently, 80 percent of white students performed above the average African-American math score, and 85 percent of Asian students did.

* In 2007, the average Hispanic 8th grade student in NYC performed at the 24th percentile of the white distribution in math, and at the 17th percentile of the Asian distribution. In other words, 76 percent of white students performed above the average Hispanic math score, and 83 percent of Asian students did.

The size of those gaps is almost identical for 4th grade students.

If there is trouble in Gotham, it can be summarized in a few lines: best case scenario, the black-white achievement gap in 8th grade math achievement won't close for 21 years, and the Hispanic-white achievement gap won't close for 36 years. But here's the catch: these projections only hold if white students make no progress. And indeed, they have made no progress in 8th grade math in the last four years: The average scale score for white students was the same in 2003 as it was in 2007 (289 points). If white students also improve, these gaps will take even longer to close if New York City continues at the current pace.

What's more, the gaps separating black and Hispanic students from their Asian peers appear to be growing, at least in the 8th grade. Though only the growth in the Asian-Hispanic achievement gap is statistically significant, the growth in this gap from 2003 to 2007 is suggestive of a troubling trend:

* Between 2003 and 2007, the average black 8th grader in NYC has fallen from the 19th to the 15th percentile of the Asian distribution. (Note that this change falls short of statistical significance, however.)

* Between 2003 and 2007, the average Hispanic 8th grader has fallen from the 24th to the 17th percentile of the Asian distribution.

Of late, Joel Klein has taken to invoking Martin Luther King, a habit that I find quite infuriating given the sizable and persistent achievement gaps in New York City - notwithstanding his PR campaign clucking about his successes on this front. I don't think Dr. King would have looked kindly on such subterfuge. So I leave you with this quote from his Letter from a Birmingham Jail, which provides a lens through which to view these gaps and New Yorkers growing impatience with the Department of Education's unwillingness to acknowledge that gaps have not closed under their watch:

For years now I have heard the word "Wait!"....This "Wait" has almost always meant "Never." ....There comes a time when the cup of endurance runs over, and men are no longer willing to be plunged into the abyss of despair. I hope, sirs, you can understand our legitimate and unavoidable impatience.

Tomorrow: The Reading Achievement Gap in New York City.

July 20, 2008

Trouble in Gotham

eduwonkette-batsignal_web.jpg

"Over the past six years, we’ve done everything possible to narrow the achievement gap – and we have. In some cases, we’ve reduced it by half."
-Mayor Michael Bloomberg, testifying before the House Committee on Education and Labor, July 17, 2008

"Our African-American and Latino students have gained on their white and Asian peers....What does this show? Achievement for high-needs students is not a dream. It’s happening."
-NYC Chancellor Joel Klein, testifying in the same House Committee hearing
New York City Mayor Michael Bloomberg and Chancellor Joel Klein seem intent on taking the "New York City miracle" national. However, a close review of racial and ethnic achievement gaps in New York City over their tenure suggests that Bloomberg and Klein would do well to get their own house in order first.

Analyzing data from the NAEP Trial Urban District Assessment from 2003-2007, I will show over the course of this week that in every grade and subject, achievement gaps separating New York City's white and Asian students and their African-American and Hispanic counterparts are unchanged. What's more, there's some suggestive evidence that Asian-black and Asian-Hispanic gaps are actually growing.

You ask, "But Bloomberg and Klein say the achievement gap is closing. Are they lying?" It's not lying as much as it's finessing with the intention of providing a rosier picture of progress than is warranted. But the hard facts - for example, that in 2007, the average black 8th grader in New York City performed at the 20th percentile of the white distribution in reading and in math, and there has been no statistically significant change in the size of that gap since 2003 - makes Bloomberg's victory march, as he likes to say, unconscionable.

Why? First, any respectable psychometrician would advise against measuring achievement gaps by using proficiency rates. I have explained why comparing group differences in proficiency rates provides a misleading measure of between-group inequality here: When Measuring Achievement Gaps, Beware the Proficiency Gap and Scale Score Magic! Why We Shouldn't Rely on Proficiency Rates to Measure Academic Achievement. The central problem is that the black-white achievement gap can be increasing even as the difference between the proficiency of black and white students is closing. (If you are asking yourself, "What the heck are scale scores?", see skoolboy's excellent testing primer.)

Second, the Department of Education has refused to release average scale scores for racial and ethnic groups on the New York state tests. As a result, it's tough to tell what's really going on on the state test. It is possible that gaps in average scale scores are unchanged on the NAEP, but closing on the state test. If this is the case, one possible explanation is that students from historically disadvantaged groups are increasingly drilled in skills that pump up their scores on the state test, but do not generalize to other measures of achievement. (See Why We Should Care About Test Score Inflation.)

Everyone's favorite "but they're different tests" argument has some validity when we are looking at overall levels of achievement. We should not expect state and NAEP tests to track each other perfectly. But that argument is highly suspect in explaining why the size of gaps varies significantly between tests. It's also possible that gaps as measured by the group differences in average scale scores on the state test are unchanged, but small gains by black and Hispanic "bubble" students across the proficiency cut score have led to large increases in proficiency.

To adjudicate between those two explanations, I requested scale score data from Truth Squad captain David Cantor almost a week ago, and was told I would receive these data by last Thursday. I guess the data got lost in the Internets. Unfortunately, denying data access appears to be a growing Department of Education strategy - in this case, the DOE failed to release data, and in other cases, they have released data only in PDF formats that no one can analyze. There is something deeply troubling about an administration that bows down at the altar of "data-driven decision making" but refuses the public access to data that rightfully should be available in a spreadsheet on their website.

Luckily for us, data from the NAEP Trial Urban District Assessment are publicly available, so the Department of Education doesn't get the last word on the size of achievement gaps in New York City. Nor do you have to take my word for it. The National Center for Educational Statistics analyzed the NAEP TUDA scale scores, and found that in every subject and grade level, racial and ethnic achievement gaps have not closed in NYC between 2003-2007. You can find all of these data, as well as NCES's own conclusions about changes in NYC achievement gaps, here. (Click on Advanced --> select a grade level and subject --> under "Jurisdiction," select "Urban District" and "New York City" --> under "Variables," select "Gaps and Changes in Gaps" for "all assessments.")

I hope you will join me this week in trying to make sense of the persistence of achievement gaps in New York City schools and the Department of Education's unflinching willingness to cast facts aside and tell the public otherwise.

July 13, 2008

Cerf-ing the Web

Ministry-of-Truth.jpg
Over at eduwonk, the New York City Department of Education is putting its best foot forward by displaying its two strongest (and most becoming!) skills: a remarkable willingness to spin the naked facts and to personally attack anyone who questions their miracle. But Chris Cerf can't manage to slip past Sol Stern's first-rate BS detector, which is on full display in his original post and his drop-kick comment on Cerf's post, which are both must-reads.

Here's what I don't get. If you're a believer in Truth, why spin checkable facts when you're no doubt going to get busted? It's just not good government. But it doesn't strike me as a smart PR strategy either, because it gives us good reason to wonder what else is going on behind that curtain.

My concern is not so much with any individual assertion of Cerf. A much larger problem is the New York City Department of Education's willingness to swap facts in and out as they see convenient.

Here's one example: Cerf insists on taking credit for the gains in the 2002-2003 school year, though Klein's Children First reforms were announced for the first time in January 2003 at the same time that students were taking state tests. Did his words fall upon NYC kids' brains like pixie dust, and so inspire them that they produced huge gains in reading and math? The timing just doesn't add up. Yet Cerf wrote, "You frequently argue that the Mayor and Chancellor should not be given credit for the growth in achievement in their first year. To the contrary, they instituted important changes during that year. Obviously what happened in the past affected the results, just as our work will affect the results of the next chancellor, but that first year was on our watch."

But back in 2003, Joel Klein didn't want to draw attention to or take credit for the large gains that were posted that year. Klein was attempting to overhaul the entire system, and when the ELA and Math results were released in both May and September, his reaction was described as "muted." In fact, he even threw a few sentences in questioning the validity of the math scores because "it is hard to tell the true significance of any one set of results in isolation." Here's a clip from the NY Times article on the reading scores that year:
The city's positive results come at a time when Mayor Michael R. Bloomberg and his schools chancellor, Joel I. Klein, are trying to overhaul the public school system and impose a uniform reading and math curriculum at all but the highest performing schools.

City officials, who might otherwise have been jubilant about yesterday's results, offered a muted reaction, saying that the gains were not broad enough and that the school system as a whole was still failing at least half the city's children.
Klein's reaction in this NY Times article on the 2003 math scores was also less than ecstatic:
But not everyone greeted the news so enthusiastically.

The suggestion that city schools are on the upswing put Chancellor Joel I. Klein, who is overhauling them, in a tricky position. While the chancellor's critics pounced upon the higher scores as evidence that the school system did not need such an overhaul, some of his allies acknowledged that he would now be under even more pressure to show gains next spring.

Mr. Klein's reaction to the good news was muted, as it was to news of higher reading scores in the spring.

''While I am gratified by the test results released today for fourth and eighth graders in New York City, I must emphasize that it is hard to tell the true significance of any one set of results in isolation,'' the chancellor said in a statement. ''We must always look at results in comparison over a number of years. Only through comparison can we truly measure the progress we're making.''
Truth Squad! (Press officer Andy Jacob is eduwonkette's designated truth guru): Of course I've got all this wrong, so do enlighten us wayward philistines.

PS - Have you guys considered tee-shirts? I won't demand any royalties from the image above, which should obviously go on the front (with your last names emblazoned on the back). You can thank me later.

July 10, 2008

Gimme Some Truth

lennon.jpg
If you think "Education Department Employs Squadron in Search for Truth" is a spoof article from The Onion, guess again. You'd think that PR flaks would know better than to name an otherwise mundane 21st century version of letters to the editor the "Truth Squad," and in doing so, make it worth reporting on. Said New York City Deputy Chancellor Chris Cerf, who came up with the Truth Squad concept:

"We try to keep track of what people are saying about us, and we respond periodically. Because we believe in the truth."

Is that truth with a big T, truth with a little t, or truth with an asterisk?

I'll let you be the judge.

Here's what John Lennon would have said:

I'm sick and tired of hearing things
From uptight, short-sighted, narrow-minded hypocritics
All I want is the truth
Just gimme some truth
Ive had enough of reading things
By neurotic, psychotic, pig-headed politicians
All I want is the truth
Just gimme some truth


Update: Check out Alexander Russo and Norm Scott's pithy posts on the same.

June 29, 2008

"Independence" Day

spiffboy2-thumb.jpg

I’ll try to stay reasonably serious this week, but some things are just too ridiculous to pass up. On Friday, the New York City Department of Education (DOE) announced that it had selected the NYC Leadership Academy to provide principal training and development services. The press release proclaimed that the Leadership Academy was “chosen from among multiple bidders in a competitive procurement process.” The DOE is negotiating a five-year contract for a total of $50 million, beginning Tuesday, July 1.

Long-time followers of New York City public schooling are aware that the NYC Leadership Academy was created by the DOE in 2003, and Chancellor Joel Klein serves as a Director of the organization. (At least according to the organization’s IRS filings – its website doesn’t list him as a director.) The Leadership Academy website describes the Leadership Academy as “the centerpiece of the NYC Department of Education’s transformational strategy,” a phrase that also appears in DOE press releases, and the staff have e-mail addresses provided to employees of the DOE. The April press release announcing this extraordinary competitive procurement spent more time crowing about the Leadership Academy’s accomplishments than describing the request for proposals.

So: The DOE had a competitive bidding process to award a contract to an organization that Mayor Mike Bloomberg and Chancellor Joel Klein had created and publicly supported over the past five years. Remarkably, the report of the award indicated that there were three other bidders. I can only imagine who would seriously think they had a shot at this.

Probably the same people who think they have a shot at this. In related news, skoolboy, who has been happily married for many years, is announcing a competitive procurement for spousal services. The successful bidder will have experience attending to the needs of a partner like skoolboy. Prior joint ownership of property with skoolboy and collaborative experience raising a family a plus. The date of the bidder’s conference will be announced later.

Demographer Takes On New York City's Gifted and Talented Admissions

Andrew Beveridge, the New York Times' demographer, turns his attention to New York City's gifted program in this Gotham Gazette column. Based on his estimates, here's the bottom line on the change in gifted and talented admissions in NYC:

Non-Hispanic whites and Asians almost triple their percentage, while the percent non-Hispanic black and Hispanic plunges. In short, students accepted in the Gifted and Talented program are not all representative of the students in New York City, and are less so this year than last year.

June 26, 2008

New York's Lake Woebegon Effect

woebegon.jpg
Sol Stern nails it in his article on test score inflation:

The premise of NCLB, as of so many current education reform efforts, is that schools must serve the interests of children, not the interests of the adults who work in the system. But in a classic case of unintended consequences, the widespread test inflation produced by NCLB is serving only the interests of the adults. New York education officials like Mills, New York City mayor Michael Bloomberg, and his schools chancellor, Joel Klein—along with teachers’ union leaders like Randi Weingarten—advance their varied agendas in the glow of inflated test scores. But the children are the big losers. Sometime in the next decade, the white children of Lake George and the black children of New York City will come face to face with reality. On a high school math Regents test—or on an SAT test, or in a college remediation course—they will discover that they are not quite as proficient as New York State once assured them.

When Measuring Achievement Gaps, Beware the Proficiency Trap

gap.jpg
Though we can thank the No Child Left Behind Act for drawing our attention to the "achievement gap" - which is now loosely deployed to reference gaps between African-American and white/Asian, poor and advantaged, suburban and urban, or even male and female kids - it's also done us a great disservice by distorting the way that we measure, and think about, differences between groups.

There are at least two ways of thinking about the relationship between achievement and kids' life chances. The first is to consider, in absolute terms, the set of skills that students have. The second views achievement as relative. Most coveted opportunities - jobs, college admission, a good grade in a college course, or positive evaluations in the workplace - are not divvied up based on students crossing an arbitrary line of proficiency or competence. We don't give everyone a job who's passed a basic reading test, nor do we admit everyone to UC-Berkeley who's received more than a 700 on the verbal SAT. Every student in a college course at NYU can't get an A, and faculty measure students' performance against others to assign grades. In short, all of these decisions are made by comparing the performance of those in a pool, and choosing those who come out near the top.

The proficiency view, to my mind, is certainly important to consider when we are thinking about building stocks of human capital. But if we are concerned about inequality and social stratification - ensuring that, on average, every demographic and socioeconomic group is equally prepared to compete in higher education and the workplace - relative achievement measured on a continuous scale is what matters, not proficiency rates.

Which brings us to how we currently measure "achievement gaps" between social groups, and why this method is tragically flawed. For example, if you look at the NYC press release from this year's test scores, you'll see that gaps are defined as the difference between the percentage of students that are proficient in each group. If the gap in proficiency between black and white students was 29 percentage points last year in 4th grade ELA, and now is 26 percentage points, we hear that the gap has narrowed by 3 percentage points. But it's possible that the gap in the achievement that matters - the continuous measure of achievement - has actually grown.

Let me give a brief example to illustrate. If we use the proficiency logic, the achievement gap that separates the Bronx and the affluent suburbs of Westchester is closing. And indeed on Monday, Mayor Bloomberg crowed that NYC is catching up to the suburbs. If we take a look at 7th grade math, we see that there was a 30 percentage point Westchester/Bronx gap in proficiency in 2007 (73% versus 43%), but this year, there is only a 25 percentage point gap (83% versus 58%). If we use a proficiency measure, the achievement gap has closed by 5 percentage points.

Not so fast. The achievement gap, if we measure the differences in the average student scores in Westchester and the Bronx, has actually increased in 7th grade math. The scale score gap was 28 points last year. Put differently, the average Bronx 7th grader scored at the 23rd percentile of the Westchester distribution in 2007. This year, the gap was 30 points. Now the average Bronx 7th grader has dropped to the 21st percentile of the Westchester distribution, even though the achievement gap, as measured by proficiency, is closing.

Take-home point: when you hear about achievement gaps closing based on proficiency scores, beware of what you're being sold.

June 24, 2008

Are New York City Schools Shortchanging High Achieving Students? The View from 2003-2008

MyShip.jpg
Savvy New York City parents have long suspected that high achieving kids are losing out in the push to boost the achievement of the lowest performing students. But those suspicions are often cast aside by public officials as helicopter parent whining or muted class warfare.

But a review of 4th grade test score data from 2003-2008 suggests that these parents have been on to something. Between 2003 and 2008, the fraction of students scoring in the highest achievement level on the 4th grade NY state ELA test has plummeted.

In 2003, 15.6% of 4th graders scored at Level 4. By 2008, only 5.8% did. In other words, the fraction of students scoring at Level 4 in 2003 was about 2.7 times higher than this year. At the same time, the percentage of students scoring at proficiency has increased 9 percentage points, from 52.4% to 61.3%.

Put bluntly, it appears that schools are focusing on pushing lower performing students over the passing mark, and shortchanging high-achieving students in the process. In Bloomberg's New York, as it turns out, a rising tide does not lift all boats.

2003_8%204th%20Grade%20ELA.jpg

You can find the data from 1999-2005 here, and the data from 2006-2008 here. I analyzed 4th grade scores because tests weren't given in grades 3-5 throughout the entire time period. If anyone knows where to find average scale scores at different parts of the distribution over time (i.e. 10th/90th percentile) - I would have preferred to work with these data for all of the reasons suggested below - please let me know.

In NYC Middle Grades, Fewer High Achieving ELA Students, Even As Passing Rates Increase

In grades 5-7, grades that have seen sharp increases in ELA passing rates over the past two years, the percentage of New York City students scoring in the highest performance category has decreased substantially. You can find those results here. Interestingly, this is only true for ELA, not math.

* In 2006, 8.7% of 5th graders scored at Level 4 on the ELA. This year, only 4.3% did.

* In 2006, 7.1% of 6th graders scored at Level 4. This year, only 2.2% did.

* In 2006, 4.7% of 7th graders scored at Level 4. This year, only 1.6% did.

2006_8%20Level%204%20ELA.jpg

Anyone have ideas about what's going on here? Fordham's report on high achieving students in a NCLB era provides some insight, I think.

Scale Score Magic! Why We Shouldn't Rely on Passing Rates to Measure Academic Achievement

rabbit%20hat.jpg
Consider this puzzle: in 2007, the average scale score on the New York State ELA Test was 661. In 2008, it is also 661. Yet the overall level of proficiency has increased by 3 percentage points, from 68% to 71%. How is this possible?

When we measure student achievement solely based on the proportion of students who have jumped over a bar, we can end up with pretty misleading picture of student performance.

Take a look at grades 3, 5, and 8 in the graph below, which shows the change in ELA average scales scores and passing rates for New York state. In each case, the average scale score increased by 2 points, or about .05 standard deviations. But the increases in the percentage of students who were proficient varied widely across those grades. In 3rd grade, there was an increase of 3 percentage points. In 5th grade, there was a much larger increase - 9.5 percentage points. And in 8th grade, though the average scale score increased, the percentage of students who were proficient actually decreased .9 percentage points.

Should we conclude that our 5th graders are much better off than they've been in the past, and 8th graders are falling behind? Definitely not - 5th grade just happened to hit the sweet spot of the distribution - but that's what you'd get if you relied only on passing rates.

2008%20ELA%20Graph%282%29.jpg

In short, know what you're buying when you're looking at passing rates. They can increase substantially by moving a small number of kids up a few points - just enough to clear the cut score. In some of the grade levels above, there are good reasons to suspect that these small moves may partially explain large jumps in proficiency on the New York State ELA test.

June 22, 2008

Our Very Own Disney Movie! The New York State 2008 ELA and Math Results

magic%20kingdom.jpg
I really appreciate the opportunity to join all of you here at Disney World. I can't wait to get over to the Magic Kingdom. I just love cartoon characters; outlandish fairy tales; and wild, stomach-churning roller coaster rides.
-Mayor Bloomberg, Excellence in Action Summit

If you like fairy tales, today is your day. Overnight, the majority of kids in New York City have become proficient readers (up 7 percentage points to 58%) and mathematicians (up 9 percentage points to 74%). Apparently, scores are up even more in Buffalo, Yonkers, and Rochester. Here's Elizabeth Green's article in the NY Sun, Mayor Sees a Test Score Triumph: Or Is It a Case of Inflation of Results? When test scores rise dramatically on one test and are largely flat on the NAEP, we have good reasons to worry that something besides real learning is happening. In this case, it appears that the NY ELA and Math tests were just easier, which drove up scores across the state.

Alas, at the Magic Kingdom, outlandish fairy tales always win the day. Bloomberg is holding a press soiree at P.S. 175 in Harlem this afternoon, and the state is holding its press conference at 11:45. More details to follow...

June 19, 2008

With New Rules for Gifted Programs, NYC's Poor and Minority Students Lose Out

nytmap.jpg
If you'd ever bumped your head up against test score distributions for entering kindergarteners, you already knew that NYC's shift to a uniform cutoff for gifted admissions - the 90th percentile - could only hurt poor and minority kids' access to gifted programs. So many of you were unsurprised in April when I analyzed the new gifted and talented data, and found that poor and minority kids' access to gifted and talented programs had been seriously diminished. (See maps here.)

Kudos to Elissa Gootman and Robert Gebeloff at the New York Times, who pushed the G&T issue out onto center stage this morning (Gifted Programs in the City are Less Diverse):

An analysis by The New York Times shows that under the new policy, children from the city’s poorest districts were offered a smaller percentage than last year of the entry-grade gifted slots in elementary schools. Children in the city’s wealthiest districts captured a greater share of the slots.

Considered alongside Fordham's report on high achieving students and Stanford prof Sean Reardon's finding that the black-white grows faster among the highest achieving students, these losses in G&T seats should not be taken lightly. Because of NYC's stark residential segregation, high achieving minority students are more likely to attend schools populated by low-achieving students than are high achieving white students. Robert Pondiscio has done a great job educating us about how this unfolds in New York City classrooms, "The 'not your problem' kids walk in smart and walk out smart, largely by accident of birth. While they’re in school, they are nearly completely neglected, and as a result achieve not nearly as much as they would have (while still testing at or above grade level on dumbed-down state tests) had they not been starved for oxygen in an underperforming school, where they were constantly praised for being bright, but had few demands placed upon them, and where opportunities for enrichment, in or out of school, were non-existent."

Let's hope that those concerned with "educational equity" revise the admissions policy for next year. Here's what I'd like to see: If we want to increase access to advanced instruction for disadvantaged kids who are more advanced than their peers, we might consider offering gifted slots to the top 5% of students in each community school district, while also guaranteeing a seat for any student who scores in the 90th percentile or above of the national distribution. This is analogous to states' top 4% (California) or top 10% (Texas) plans for college admissions, which guarantee college admission to students who have excelled in their own high schools. Thoughts?

June 9, 2008

ATRs Continued: The UFT's Policy Recommendations

At the end of last week, the UFT responded to the New Teacher Project report on ATRs in NYC. (If you missed the backstory, see Why You Should Read the Fine Print in the New Teacher Project Report, Why Buy the Teacher When You Can Have the Teaching for Free?, Tim Daly on the New Teacher Project report, and Joel Klein Blames Teachers for $4 Gas, Subprime Crisis).

Though the NYT article was pretty vague, the UFT actually made six policy recommendations:

1. The DOE should take a more pro-active role in placing ATRs, as the contract requires, by sending ATRs for the first interviews for open positions, before other candidates—new hires or transfers—are considered. Successfully placing more ATRs would avoid the unnecessary costs of hiring and mentoring more new teachers and maintaining a large ATR pool when the talent already exists in the system to staff vacancies.

2. Make teacher hiring selections financially neutral. The FSF budget replaced a longstanding system in which schools were fully funded for their teachers. Schools considered only an educator’s qualifications and “fit” for a position at the school, with no incentive to hire the cheapest candidate. Such a neutral system is fairer all around.

3. As an incentive, DOE could, for a specified period of time, cover the cost of ATRs who are permanently hired in a school.

4. Implement the contract provision that permits the union and DOE to negotiate a buyout to any remaining excessed teachers. Any additional cost would be offset by savings for the school administration.

5. Let the experience and expertise of ATRs be known to principals rather than maligning them, thus encouraging their hiring.

6. Offer a coaching and skills training program to ATRs who wish to enhance their marketability.

These recommendations sound pretty reasonable to me, and I see no retreat on mutual consent here. I can't say enough times that creating an incentive to hire the cheapest candidates was one of the poorest policy choices the NYC DOE has made. For similar reasons (the problem of creating different price incentives across candidates), I'm not crazy about #3 - but the real action above is in reforming "Fair Student Funding" and negotiating a buyout.

And to eduwonk's point about the dispute over how many ATRs are performing the duties of full-time classroom teachers: student schedules and report cards/transcripts are a good place to start looking. If you're responsible for evaluating students for more than a marking period, you are their regular teacher.

Should Kids Protest? The Case of New York City's Budget Cuts

kpp2.jpg
No one expected that Graeme Frost, a 12-year old who suffered brain stem injuries in a car accident, would become a political target after he delivered a late September radio address in support of the State Children's Health Insurance Program. Commentators demurred that if a political party "send[s] a boy to do a man’s job, then the boy is fair game." The episode raised difficult questions over the role of children in political debate. Are they mini-protesters, learning the ropes of democracy, or simply political pawns?

New York City is likely to encounter these thorny questions this week, as multiple flights of public schoolchildren are slated to protest at Tweed Courthouse under the auspices of the Kids Protest Project. Truth be told, my own view on kids' budget protests is strongly shaped by my own participation in such protests as a kid. When we were in elementary school, we wrote letters. When we were a little older, a few of us piped up at budget hearings. And when we were in high school, we organized a hundred teenagers to fill rooms at Board meetings, and scared the bejesus out of the Board in the process.

We didn't understand the larger issues, but we were advocating for our short-term interests. Wouldn't each of us be marginally better off if our school had more dough? All of these experiences were formative in my attitudes towards political engagement, and I look back on them fondly - which is all a long way of acknowledging that I'm the wrong person to offer nuanced analysis on kids protesting budget cuts.

So I'll leave it to you to tease this one out - isn't this just like the Regents using a one-sided prompt on Teach for America? Or is it different because students' short-term interests are served by garnering more funds? You can read letters to Chancellor Klein from kids at PS 87 below.

Dear Chancellor Klein,

My name is Danny, and I am a student at P.S. 87. My brother is coming into this school next year. It will be my 5th year next year in this school.

The purpose of this letter is to stop you from taking money from the schools. Would you like it if you were in third grade and the chancellor was going to take money from your school? I hope this letter will change your mind.

Sincerely,
Danny

***

Dear Chancellor Klein,

Hello. My name is James. I go to P.S. 87. I just heard that you're cutting the school's budget. I don't want to be mean, but next year I'm going to be in the 4th grade, and I want it to be even better than last year, but it won't be if you lower my and other schools' budgets. There won't be enough money for books (and I'm crazy over books) and chairs for sitting on, and pencils to last us through September to June, and a lot more reasons that I don't want to talk about! So, please, stop cutting my and other schools' budgets, so that I will have a wonderful 4th grade.

Sincerely,
James

***

Dear Chancellor Klein,

Hi, my name is Nicole. I'm a student in P.S. 87. It is a public school. Please do not cut anyone from the school, and please don't take money from the school.

Love,
Nicole

June 6, 2008

What Do Public Servants Owe the Public When They Make Mistakes?

Imagine that you are a public servant. This year, you've left families in a lurch by centralizing an enrollment system that you lacked the organizational capacity to run effectively. It is June, and kids and families are still in the dark about their middle and pre-school placements for September. How should you react?

a) You should issue a heartfelt apology, explaining that you've make a serious mistake, that you take full responsibility for the mistake, and that you understand how terribly you've inconvenienced the families you serve. In addition, you should explain how you will be sure this doesn't happen again.

b) A press spokesperson for the organization should say, "It's simply not correct to say that we're running way behind."

c) The person running the office that made the mistake should say, "I know that there are parents who are upset that they haven’t gotten a letter yet. Rest assured they will by the end of the week, and we have committed to parents we will work to get this done earlier next year.”

d) Both B and C

If you answered d), you should look into a position in the press office at the New York City Department of Education. It's a growth industry, and I heard the pay's alright.

Isn't It Ironic? Bonuses for NYC Administrators at 4 "F" and 5 "D" Schools

chimp_scratching_head.jpg
Administrators at four New York City schools that received F’s on their Progress Reports, and five that earned D’s, are eligible for bonuses, which range from $5,500 to $15,000 for principals and from $2,750 to $7,500 for assistant principals. One of my favorite haiku pretty much sums up this story:

Amateur Night's spozed
to be at the Apollo
not at Tweed courthouse

-Anonymous 7:50 AM

June 5, 2008

Move Over Grey's Anatomy! Thursday Night TV With Joel Klein

McDreamy? McSteamy? You decide.

June 4, 2008

Why Has the Education Press Missed the Boat? The Case of Small Schools

missed-the-boat.jpg
With the release of Scott McClellan's tell-all, everyone's been asking whether the press did its due diligence on the Iraq war. Closer to home, last week's Newsweek article provides similar occasion for us to reflect on the press coverage of small schools over the last six years.

Let me first throw in my prejudices about small schools - I like them. I followed the first wave of small schools that opened in the 1990s, and was thrilled when the Gates Foundation put up millions of dollars for the second wave. And I am willing to believe that students will be more attached to school in smaller schools.

All that said, what should we make of the endless parade of glowing stories about how much better small schools are doing than their predecessors? If any of these reporters had perused the basic stats, they would have uncovered that these schools are not serving the same population. (Needless to say, in my excitement about the Gates Foundation's grant back in 2003, I did not anticipate that small schools would have the effect of clearing out the old students, replacing them with higher achieving ones, and pushing the leftover students into increasingly crowded large schools.)

Over the course of the year, I've made tables comparing the new and old populations at three different NYC high schools that have been converted into small schools: Evander Childs High School in the Bronx, Bushwick High School in Brooklyn, and now Morris High School in the Bronx, the subject of the Newsweek article. As stunning as the differences between the old large and new small school populations is the fact that few reporters covering small schools (save Sam Freedman, who sadly wrote his last column this morning) have bothered to ask if these populations were different, and if so, why.

Why has the press missed the boat? I'm not sure. Here are some ideas:

* Math is Hard: Reporters are trained to write and report, not to analyze data. It's unsurprising that they've avoided the city's statistical treasure troves. But that answer is unsatisfying to me - these are all bright people.

* Positive Story Starvation: Jay Mathews offers a different answer in reflecting on reporting about KIPP, "I understand why we education reporters try to make KIPP sound like more than it is. We are starved for good news about low-income schools. KIPP is an encouraging story, so we are tempted to gush rather than report. We don't ask all the questions we should." Maybe this explains some of the puff pieces, but still falls short of a full explanation.

* All City Kids are the Same: Perhaps the problem runs deeper than training and optimism. Too many people assume that because the kids in the old school were black and brown and poor, and those in the new school are as well, they must be the same.

* Everyone Loves Individualization: skoolboy weighs in with this thought: "The small school model is so appealing because it taps into a variety of modern narratives. Small schools are personal, provide more customized (i.e., middle-class) educations, and therefore can compensate for the breakdown of families and other social institutions in central cities. In this view, whoever is served by these schools is better off than they were before, and those who were in the schools before just get ignored."

* Power and Money Talk: Small schools are backed by big foundations. Money buys, and helps to influence, evaluations conducted by firms that are contract dependent. Money also buys PR - and a lot of money buys the best PR money can buy.

Any thoughts?

The table below shows the characteristics of the entering 9th graders at Morris High School before small schools started opening there, and the characteristics of 9th graders at the new small schools: the School for Excellence, the High School for Violin and Dance, Bronx Leadership Academy, Bronx International High School and the Morris Academy for Collaborative Studies. Particularly notable are the lower concentrations of full-time special education students, students qualifying for free lunch, students who were below grade level in reading and math, and English Language Learners (with the exception of Bronx International, which is a school specifically for ELL students). If you click over to the links above, you'll see this was also the case with Evander Childs and Bushwick High School.

Characteristics of Entering 9th Graders, Morris High School and New Small Schools

TABLE.JPG

June 3, 2008

How Much Would Paying Kids for Test Scores Cost?

In the midst of this budget debacle, along comes an estimate of the cost of NYC's student incentive program at full scale - i.e. if all students in grades 4-7 were eligible to receive up to $500 per year. Even a 50% success rate would cost a cool $90 million dollars - not far off from the $99 million dollars in budget cuts that will be distributed to New York City schools unless the city ponies up.

tcrecord_table.jpg

June 1, 2008

All Purpose Equity!

Klein-Clean.jpg
Everyone loves equity - the US Department of Education, the New York City Department of Education, insert your hometown Department of Education here. If you've got a shaky initiative in mind, best to back it up with the equity line.

Certainly that's the strategy Joel Klein has used in New York City. Want to change the admissions process for gifted and talented programs? It's about equity! (Even when doing so shuts out poor kids.) Want to close down comprehensive high schools? It's about equity! (Even if the most disadvantaged kids can't access those new small schools.) Want to use dollars that the state legislature specifically earmarked for the most disadvantaged kids to plug holes in a budget you cut yourself? It's about equity! (Even if the number of central employees has increased by 18% since 2004 - a jobs program for the Ivy League.)

When it comes to school funding, what does it mean to treat students "equitably?" Does equity imply treating each student the same by providing each student the same level of funding? Or does equity require a recognition that students bring different levels of disadvantage to school, and as a result, disadvantaged students must be treated differently in order to be treated equitably?

In 2007, when the city was bulldozing through its "Fair Student Funding" program, NYC Chancellor Joel Klein argued that educational equity required differential treatment. Poor students face formidable obstacles to school success, Klein explained, and the allocation of tax-levied funds in New York City should reflect that reality.

It was also in this compensatory spirit that the remedy emerging from New York's adequacy suit - now known as the Contracts for Excellence - was designed. Three rules were applied to these funds. First, these funds must be spent on six program areas, including class size reduction, time on task, teacher and principal quality initiatives, full day pre-kindergarten, middle and high school restructuring, and model programs for English Language Learners. Second, these funds must be spent on those students with the greatest educational needs. Finally - and most relevant to this budget debate - these funds must be used to supplement, not supplant, the city's school funding allocations. The idea is that these dollars represent additional investments New York City's most disadvantaged children.

This budget cycle, fairness and equity, according to Joel Klein, require universalism - specifically, a universal budget cut - not differential treatment. The city has cut tax-levied funds to all schools, which will be offset by Contracts for Excellence funds for the neediest schools. But the city's more advantaged schools are facing substantial cuts because they won't receive more state money. Joel Klein is now arguing that an "equitable" solution to this budget problem is for the state to release the restrictions on these Contracts for Excellence funds so that all schools will take a 1.4% cut. And he claims, with remarkable chutzpah, that it is the state's fault, not the city's fault for cutting budgets in the face of a projected $4.5 billion budget surplus, that some schools will suffer more than others.

Ultimately, if equity can be called upon to support any action - even those that nakedly reallocate dollars set aside to serve the city's most disadvantaged students - then equity means nothing at all.

May 28, 2008

New Verbs to Describe City Council Hearings: Hissing, Spanking, Chasing

Hissing%20Cobra%20Small%20.MA183.jpg
Here's a round-up of yesterday's budget hearings: Chancellor Talks of Cuts for Schools, Amid Hissing (NYT), City Council Spanks Chancellor Klein Over School Aid Cuts (Daily News), School Budget Showdown (Gotham Gazette), and Rollback Set in Schooling of the Gifted (NY Sun). (Sidenote on City Council hearings: one Columbia Law School reader reports that the footage of The Great Liebman Chase of 2007 made rounds in his Criminal Law course.)

We still have scant details on the "$200 million in central cuts." As of this morning, David Cantor at the NYC Department of Education has not responded to a request for an itemized list of these "central cuts." Here's the best detail I've got (from Klein's powerpoint at yesterday's City Council hearing):

*$21 million from Central. For example,
- Reducing 80 positions (3% of central headcount)
- Reducing NYC Teaching Fellows tuition and stipends
- Reducing IT consultants

* $7 million from Field Offices. For example,
- Reducing 101 positions (4% of field staff)
- Reducing Integrated Service Center staff

* $30 million from Support Services. For example,
- Reducing custodial funding

* $120M from Operational efficiencies and savings. For example,
- Identifying purchasing efficiencies in buying things like trade books
- Reducing amounts of accrual budget at the end of the fiscal year

* $23M from Reduced Program expenditures. For example,
- Reducing Quality Review expenses by transitioning to in-house reviewers and moving to every 3 years for A/WD schools and every 2 years for B/P or higher schools
- Reducing number of periodic assessments in ELA and Math from 5 to 4 a year and shifting to more Web-based professional development

Do you have other ideas for central cuts? Could we spare a benchmark test? The Leadership Academy? Return the level of central staffing back to its 2004 level (a layoff of 366 Tweed staff)?

May 27, 2008

In NYC, Tis the Season for Sacrifice

breadline.jpg
A few weeks ago, a solemn President Bush revealed that he honors our soldiers' sacrifice by abstaining from golf. "I think playing golf during a war just sends the wrong signal," he explained.

It was in this spirit that Chancellor Joel Klein appeared before the City Council this morning. Klein dedicated his presentation to the heroic central cuts endured by his bureaucracy. While salty tears welled up in my eyes, I noticed that one slide was missing. Paragons of restraint that they are, the New York City Department of Education has only increased central staffing levels by 18% over the last three years. In October 2004, there were 1984 central staff. By February 2008 there were only 2350.

nyc_growth.jpg

Some administrative divisions of Tweed, though, are hurting more than others. Please stand while I salute these departmental role models:

* In October 2004, the Department of Assessment and Accountability had 19 staff. In February 2008, they had 80 - that's only a 321% increase.

* In October 2004, the Division of Human Resources had only 235 staff. In February 2008, they had 370 - a 57% increase.

* In October 2004, the Office of New Schools had 14 staff. In February 2008, its spawn, the Office of Portfolio Development, had 36 - a 157% increase. (See 2005 and 2008 data for all headcount figures; these are central staff paid for with tax-levied funds.)

As New York City schools face budget cuts of up to 6%, New York City parents and kids are grateful to the Department of Education for making the sacrifices necessary to send us the right signal.

May 19, 2008

Should State Tests Require Students to Advocate for Specific Education Policies?: NY's ELA Test on Teach for America

kopp.jpg
A Voice Cries Out reports that this year's high school ELA retest required students to complete the following task:

Today’s ’situation’ told students that they were in a leadership team who has been debating ‘whether leaders should have experience in their chosen fields.’ They were instructed to write ‘a position paper in which you argue that inexperienced people can provide leadership.

They weren’t even given a choice about which position to take.

They then had to listen to a speech by-you guessed it-Wendy Kopp, about why she started Teach For America. In the speech, Kopp talks about how her lack of experience served to her advantage when creating Teach For America. In the speech she explains that TFA teachers, “challenge the conventional wisdom” that schools are limited in what they can do to ‘overcome the challenges of poverty and the lack of student motivation and parental involvement that is perceived.”


This takes us back to the social justice debate, in which we discussed how schools should and shouldn't deal with contentious issues in the classroom.

So is this fair game for a state test? Those who have pointed out schools' trespasses on social and political issues are generally cool with TFA, but what if the prompt instead instructed students to argue that schools need more funds to be effective, or that unions have a positive impact on public education? I reckon that some ed wonk/wonkettes' heads would explode.

To me, the problem with this question is that it didn't offer a counter-position, nor did it allow students to choose a side to argue.

School Closings and Teacher Salaries in New York City: There's Something For Everyone Here

Last fall, the New York City Department of Education graded each of its schools on an A-F scale. Schools were warned that those with Fs – there were 49 altogether - faced closure. Shortly thereafter, the New York City Department of Education announced its intention to close 14 schools. Somewhat perplexing was that 6 of these schools had earned Ds on their progress reports. Why would the Department of Education, we wondered, close D schools before F schools if it believed in its own Progress Report system?

Theories abounded. A widely circulated explanation reasoned that Klein et al. were hell-bent on rooting out experienced – and thus expensive – teachers. If this was the case, closing schools should have higher average salaries than other D and F schools that are not closing.

The tables below, derived from school-based average teacher salaries, do not suggest that schools with higher average salaries are more likely to be shuttered. Below, I’ve cut the data three ways – first looking at closing and non-closing salaries for both D and F schools, and then breaking these data out separately for D and F schools. One exception is among F high schools, where the school that is closing has the highest average teacher salary. There are 7 other F high schools with salaries ranging from $57,289 to $69,154; Canarsie High School has an average teacher salary of $72,370.

But there’s something in this post for everybody. Many have speculated that the district is closing large high schools and replacing them with smaller ones, in part, to drive out experienced teachers.

What’s clear is that smaller high schools have substantially lower average teacher salaries than the larger high schools that they’re replacing. (See the graph below.) Schools with fewer than 400 students have average salaries of $61,293, while those with more than 3000 students have average salaries of $71,296. While we can’t confirm the district’s intent, the effect of closing down large high schools has been to replace experienced teachers with inexperienced ones.

High%20school%20salary%20by%20size.jpg

Salary.jpg

* Note: The data I presented on Friday come directly from schools' Galaxy budgets; these data are from an aggregate Fair Student Funding file, and thus the salaries reported for individual schools are not identical.

May 16, 2008

Teacher Salaries, ATRs, and Closing Schools: A Preview

I've got NYC's school-level teacher salary data fired up, and will write a few posts using these data next week. Here's a preview. New York City is slated to close 14 schools this year, though many will not close immediately, but will phase out over the coming years. Per the whole "Absent Teacher Reserve" (ATR) debate (here, here, here, and here), how many teachers are employed at these schools, and what are their average salaries?

These schools employ a total of 822 teachers, and a number of these schools have relatively high average salaries. Given current budgeting rules, through which schools are allocated dollars rather than positions, what's the chance a principal will, all else equal, hire an excessed teacher from Franklin K. Lane who makes $80,000 when he can hire a teacher with 3 years experience for about $46,000? (See the teacher salary scale here.)

If you've got questions that you'd like to see answered using the teacher salary data, please leave me ideas below.

closing-schools-salary.jpg

May 14, 2008

Unsolved Mysteries: The Joel Klein Budget Edition

Holmes-Image-Loupe.jpg
Imagine that you had 8 staff that cost a total of $904,636. Next year, you will also have 8 staff, but they are only budgeted at $1117, for a mean salary of $139.63. (See p. 446.) That's the deal with Joel Klein's staff - his 8 staff stay, but they are working for sweatshop wages.

Hmmmm - if I wanted to make the central Department of Education budget appear smaller than it really is, might I make these monies reappear after public scrutiny of the budget subsided? I'm just saying.

If I am missing an alternate explanation (i.e. maybe budgets are meaningless after all?), please let me know.

page446.jpg


Update: Thanks to Dave Bellel for sending along this page. Click to enlarge.

What Can $7,789,623 Buy in New York City?

price%20is%20right.jpg
A) 3,894,812 subway rides
B) 15,579 pairs of Prada heels
C) 1812 hours with the Emperors VIP Club
D) 315 years of education at the Brearley School
E) 18 staff for the New York City Department of Education's Division of Assessment and Accountability

On page 446 of New York City's FY09 budget, we learn that the Division of Assessment and Accountability is budgeted at $8,287,282. $7,789,623 will buy you 18 staff - that's $432,757 per person!

The irony of NYC's selective attention to budgeting issues? Priceless.

Update: NYC Parents dishes the goods on Bloomberg's $4.5 million dollar slush fund, which he used to reward city council members.

Update II: NYC DOE's Press Secretary David Cantor posted the following correction to the NYC education news listserve:

The actual headcount in the DOE's Office of Accountability is 79, not 18. The actual budget for salaries is $6.7 million, not $8.3 million. The correct headcount figure is reflected in the most recent DOE Financial Status Report, located online here on page 13. The correct budget figure will appear in the next iteration of the City budget. Sorry for the confusion.

As skoolboy said, "The fact that the people responsible for assessment can't put together an intelligible budget should give us great confidence about the assessment data they report."

May 12, 2008

Watch Out, Elizabeth Green, Erin Einhorn, & Jenny Medina!!!

NYC education reporters take note. Straying from his Code Blue demeanor, Mayor Mike proves that he will devour you (without checking your calories) if you accuse him of "maintaining" anything - about NYC schools or otherwise. That's a shame, because this has been a blockbuster school year for "maintaining" in NYC. (Greatest hits: here, here, and here.)

In the clip below, a Newsday reporter says, “Mayor, you maintain that..." Bloomberg cuts him off with: “Maintain is a word that I don’t think is appropriate, sir. The next time you have a question and want to insinuate that I lie, just talk to the press secretary. I don’t think we have a question for you.” (Via the Daily Intel.)

May 7, 2008

Joel Klein Blames Idle Teachers for $4 Gas, Subprime Crisis

joel_klein.jpg
Forget Secretary of Education - this guy should be running the Fed. This morning, the Daily News reported that "Schools Chancellor Joel Klein said the teachers union - and policies that keep instructors from their classrooms - bear some of the blame for next school year's budget cuts."

You've got to give the man props for having the cojones to craft a budgeting rule that creates disincentives to hire teachers from closing schools on Monday, spend $80 million on a data warehousing system that doesn't work on Tuesday, hire a legion of PR and executive staff at McKinsey prices on Wednesday, pay for British quality review evaluators to fly across the pond on Thursday, and on Friday, blame the freaking teachers union for his lack of fiscal discipline and America's economic downturn. Those are epic cojones, really.

So if we could get back to the real issues - I'd like to know the answers to these questions:

1) What percentage of ATRs are carrying full loads but haven't been formally hired? Now that the UFT has established that many ATRs are serving as regular teachers, a third party needs to formally study this question. I do wonder why these data weren't collected and analyzed as part of the original report.

2) How do budgeting rules affect experienced teachers' odds of being hired? Yesterday, Daly clarified that some excessed teachers are on local budgets (34% of the 2006), but there are good reasons to believe that it's the younger teachers who are on local budgets. As I understand it, here's the budgeting rule: If the teacher comes from a closing school, the ATR goes on central payroll. If a school is simply deciding that it wants to close down one of its programs, or its student enrollment goes through the ordinary dips, the ATR remains on the school's budget.

In a comment, Daly reported that senior ATRs are more likely to come from closing schools - it follows, then, that experienced teachers are more likely to be on the central budget. If experienced teachers are more likely to be centrally financed, this may explain, in part, why they are more likely to remain in the ATR pool.

If, as TNTP report said, we need a solution "that recognizes the value, commitment and service of New York City’s teachers," we first need to understand why experienced teachers are more likely to remain in the ATR pool. More hard numbers on these issues would be a good start.

Image credit: Gotham Gazette

May 5, 2008

Guest Blogger Tim Daly on The New Teacher Project's Report

timdaly.jpg
Tim Daly is the President of The New Teacher Project and the lead author of "Mutual Benefits."

Over the past several days, representatives of the United Federation of Teachers (UFT) and others have sought to challenge specific findings of “Mutual Benefits,” our recently released study on New York City’s school staffing policies. We appreciate the UFT’s engagement in this dialogue and welcome their participation.

The New Teacher Project (TNTP) researched and released “Mutual Benefits” with the goal of sparking a substantive, data-driven policy debate from which better policies would emerge. We are glad to see this debate taking shape and remain optimistic that it will lead to reforms that better serve New York City students.

As our paper indicates, the current policy on teachers in the Absent Teacher Reserve (ATRs) is flawed in four fundamental ways:

1. Teachers in the ATR have no incentive to search for positions aggressively and no requirement to apply for positions
2. Teachers have earned and will continue to earn tenure while serving in the ATR
3. There is no limit to the amount of time teachers may serve in the ATR, earning full salary and benefits regardless of their placement status
4. The ATR includes a higher concentration of teachers with documented performance problems than the overall teacher population, and that concentration is growing over time

It is important to note that our assessment of these flaws in the current policy has not, to our knowledge, been rebutted or addressed by any criticism of the paper to date. We stand by these findings and continue to believe that, if unaddressed, the stresses that these flaws put on the school system will inevitably undermine the fair, open and efficient staffing process now in place in New York City.

Though the arguments by the UFT and others against our findings and recommendations have not centered on these core issues to date, many of them mischaracterize our research and threaten to distract everyone involved from the real issues at hand. Below we respond to each of the primary arguments leveled against our report, as discussed primarily in posts on the UFT’s official blog, EdWize.org, and on Eduwonkette.com. We have asked both sites to post this response as part of the larger discussion.

One-third of ATRs are teaching “regular programs” on a full-time basis.

This assertion is inaccurate and misleading for several reasons, including:

1) It wrongly includes guidance counselors

The UFT estimates that 200 or more individuals in the ATR are, “teaching full programs, with regularly scheduled classes, just as they had done when they were regular assigned to schools.” However, the UFT includes not only teachers but also guidance counselors in this figure. Our report does not include data on guidance counselors or address their hiring patterns at any point. Guidance counselors should therefore be excluded from this calculation. Data from New York City’s payroll system appear to indicate that approximately 85 guidance counselors remained in excess as of April 2007.

2) It includes District 79 teachers, whose excessing and hiring processes were anomalous

In his posting on EdWize.org, Leo Casey of the UFT claims that 270 of the 665 teachers in the ATR are from District 79 alternative schools. Neither figure is correct. According to the NYCDOE’s payroll system, 123 teachers from District 79 schools were in the ATR as of December 2007. These teachers were not included in the 665 figure or our study in general because District 79 underwent a substantial and atypical restructuring in 2007 that led to many teachers changing schools. The rules governing the hiring process for these teachers differed from those for other excessed teachers.

For this reason, TNTP did not include 2007 excessed teachers from District 79 schools in its analysis; it would have been misleading to consider them along with other teachers whose excess process was quite different and far more typical of the city’s normal hiring process. If the UFT believes that the restructuring process for alternative schools should have happened differently, that is a worthy debate – but it is quite separate from this one.

Even so, District 79 teachers fared very well in obtaining new placements. Overall, only 24 percent of teachers excessed from District 79 in 2007 still had not found a new position by December—lower than the unselected rate for teachers who were not from District 79 schools.

3) It is based on an unreliable data source

Last, the UFT’s data is of questionable quality and requires more scrutiny and explanation. It is not enough to conclude that because a teacher reports working a full class schedule that the teacher is actually filling a full-time, permanent vacancy. Self-reported data is vulnerable to a host of inaccuracies. For example, the teacher could be substituting for a teacher who is on long-term leave but who will return again. Verification of the UFT’s claim would require communication with the building principal and an examination of the course allocation for each school. It would require knowing whether the only factor preventing principals from placing ATRs into permanent positions is the budget issue raised by the UFT, or whether they are assigning them to classes merely because they have been instructed to do this as the best way to accommodate ATRs who are housed in their buildings.

It is entirely possible that some teachers in the ATR are effectively teaching on a full-time basis. Indeed, as we have noted before, it is difficult to know exactly how principals are putting these teachers to use. In instances where a reserve pool teacher truly is filling a permanent position, we believe that teacher should be formally appointed to the position. That is a reasonable and fair outcome. Limiting the amount of time a teacher may serve in the reserve pool, as we recommend, may in fact provide an incentive for principals to appoint these teachers to positions formally (or risk losing them).

Continue reading "Guest Blogger Tim Daly on The New Teacher Project's Report" »

Why Buy the Teacher When You Can Have the Teaching for Free?

free_250x251.jpg
New Yorkers love themselves some incentives. We have incentives for students to do well on tests and incentives for parents to take their kids to the doctor. Now that we can't enjoy a meal without contemplating its caloric content, we have guilt-based incentives to eat Pinkberry yogurt instead of Beard Papa's cream puffs. Last week, the New Teacher Project argued that teachers in the "Absent Teacher Reserve" have no incentive to get a job. This morning, it's clear that, in many cases, principals have no incentive to hire them.

On Friday, I showed that experienced teachers are more likely to remain in the Absent Teacher Reserve, and asked what role financial incentives might play in producing this outcome. Of teachers excessed in 2006, only 22% had 13+ years of experience. Of the 2006 teachers who remained unplaced as of December 2007, 42% had 13+ years of experience. Under Fair Student Funding, which allocates dollars rather than positions to principals, a rational principal would choose a $40,000 teacher over an $75,000 one, all else equal. But FSF didn't come online until 2007, and thus can't account for this pattern.

But a more basic incentive problem predates Fair Student Funding - ATR teachers are off-budget. Imagine that you're a principal and through the ATR pool, you've identified a teacher with 20 years of experience that you'd like to have on board. You can give the teacher a full-time class and acquire him at no cost, or you can shell out a pile of money. In the former scenario, the teacher is happy (he's getting paid a full salary and has no reason to leave) and the principal is happy - he's scored a free teacher.

This morning, Elizabeth Green reported that 29% of the absent teacher reserve pool (194 of the 665 teachers) are teaching full courseloads. Edwize provides a list of schools in which teachers have full-time positions and more details. While we're kvetching about aligning incentives, we should get them aligned for principals, too.

May 2, 2008

Why You Should Read the Fine Print in the New Teacher Project Report

fine-print-shadow.jpg
From the coverage of the New Teacher Project's report, "Mutual Benefits: New York City’s Shift to Mutual Consent in Teacher Hiring,” you'd think that the 235 teachers excessed in 2006 and remaining in the "absent teacher reserve" in December 2007 are the worst of NYC's worst teachers. Consider the National Center on Teacher Quality's retelling: "They are also a generally substandard bunch, with a higher rate of unsatisfactory ratings on their personnel records than their more successful peers. For those content to do very little in life, why give up the life of an excessed teacher?" Or, as the NTP's press release put it, "By September 2007, unselected excessed teachers from 2006 were six times as likely to have received a prior “Unsatisfactory” rating as other New York City teachers."

So what percentage of these teachers have never received an Unsatisfactory rating? 81 percent. What percentage of these teachers have received an Unsatisfactory rating more than one time in their careers? Only 6 percent - about 14 teachers. I am not denying that these rates are higher than the NYC teacher population as a whole. They are. But the raw numbers provide much needed context, and we shouldn't have to dig deep in the report to find them.

The issue of age discrimination in teacher hiring also remains unresolved by this report, despite eduwonk's protest on this point. And there are good reasons to keep a close eye on age discrimination in NYC. With the advent of "Fair Student Funding," principals have strong incentives to hire teachers that cost less. And as the age of principals continues to decline, we might expect that young principals will prefer to supervise younger teachers.

To be sure, the NTP report provides evidence that experienced teachers are somewhat less engaged in the job seeking process than inexperienced teachers. Unfortunately, it doesn't provide enough evidence to convince me that previous teacher ratings and job seeking patterns can fully explain the pattern exhibited in the graph below. The blue bars show the experience levels of the pool of teachers excessed in 2006, while the red bars show the experience levels of teachers who remained unplaced in December 2007.

Because of seniority rules, 44% of teachers excessed in 2006 had 0-3 years experience, while 22% of teachers in this pool had 13+ years of experience. Of the 235 teachers who remained unplaced as of December 2007, only 25% of these teachers had 0-3 years of experience, while 42% had 13+ years of experience. (All numbers are taken from the NTP report, though it wisely never put these two sets of numbers in a figure together.)

NTP%20graph.jpg

My point is not that we should preserve the current staffing rule, or that we should turn back the clock - mutual consent is an important principle. The DOE and UFT need to strike a deal, but first we need to understand the nature of the problem. Framing these teachers as a uniform bunch of incompetent louts does little to advance this understanding.

Update: The New Teacher Project's Tim Daly comments below.

April 17, 2008

The Upper West Side Relief Act of 2008 (Or: More on Gifted Admissions in NYC)

Upper West Side kids face obstacles, folks - sometimes there are two Bugaboo strollers blocking their path to the Elephant Playground at 76rd and Riverside. Joel Klein recognized their struggle against adversity, and gently tweaked the gifted and talented admissions rules to open the door of opportunity for all (Manhattan) kids.

Make no mistake - NYC's poorer community school districts lost out under the new gifted and talented admissions process. On Monday, I discussed the change in gifted seats by district, but some readers asked for the overall percentage of kids in each district that are classified as gifted.

Let's look at the numbers for this school year first. We see that some districts, like Brooklyn's District 22 or the Upper West Side's District 3, have very high proportions of students in the entry grade classified as gifted (23.8% and 13.8%, respectively). On the other end, East Harlem's District 4 and the South Bronx's District 7 have no students in the entry grade classified as gifted.

Percentage of Students Classified as Gifted and Talented in Entry Grade, 2007

gt2007b.jpg
I then estimated the percentage of students that will be classified as gifted in the entry grade if all students matriculated in gifted programs. These estimates are necessarily imprecise in two ways - first, because all students will not enroll in NYC gifted programs and thus we will overestimate gifted populations in districts with high private school sending rates, and second, because the true cohort size is not available, so the best we can do is use this year's cohort size as the denominator. Caveats aside, these estimates do offer insight into the effects of the new gifted policy.

What we see in the map below is that Districts 2 and 3 in Manhattan have especially large increases in the proportion of students classified as gifted - from 13.8 to 22.3% in District 3 and from 7.1 to 15.2% in District 2. Hence, the Upper West Side/Manhattan Relief Act of 2008. And as expected, the districts with higher proportions of free lunch kids have fewer kids classified as gifted in both 2007 and 2008, but many of these districts fall further back because of the GT policy change. (See Robert Pondiscio's post for implications.)

Percentage of Students Classified as Gifted and Talented in Entry Grade, 2008

gt2008b.jpg


You can find the full figures for 2007 and 2008 below. Overall, the big winner in entry grade seats is Manhattan, and Brooklyn and the Bronx lost the most.

On behalf of all Manhattan residents, I'd like to thank the Department of Education for helping us pull ourselves up by our bootstraps. It's rough out here!

Entry%20seats%20by%20borough.jpg


Percentage of Students Classified as Gifted and Talented, 2007 and 2008

Gifted%20by%20district.jpg


April 14, 2008

More Signs of the Apocalypse!

Apocalypse-Cow_logo.jpg
Here's my take on the New York tenure law discussion going on around the blogs:

1) The backdoor process was unsavory, and now threatens to displace an important discussion about the limits of value-added measures in New York. Sherman Dorn offers some fertile thoughts on the process issue. Also worth noting that last week's outragists were hardly outraged about the secrecy surrounding NYC's teacher experiment.

2) Critics would do well to separate the likely effects of this law from their unhappiness with the process. Consider Robert Gordon's post, which interprets the law's effects as follows:

This means that in deciding whether to give a teacher a presumptive right to teach for 30+ years, a principal may not consider evidence of whether the teacher is helping students learn. The principal can consider whether the teacher maintains neat bulletin boards, whether the teacher attends meetings on how to pay for pencils, and whether the teacher is sufficiently deferential in the hallway. But the principal may not consider, based on achievement data, whether children are learning.

Do classroom observations provide no "evidence of whether the teacher is helping students learn?" Value-added measures, after all, are simply a proxy for student learning, and observations also provide proxy data on student learning. Gordon assumes that principals cannot identify teachers with especially low value-added in the absence of test score data. But if value-added measures mean anything, very low performers should be getting poor subjective evaluations too. It turns out that principals are actually pretty good at identifying teachers with low value-added based on subjective evaluations (see this post). If a teacher is a consistent low performer, the three admissable forms of evidence in tenure decisions - 1) observations, 2) peer review, and 3) an evaluation of how teachers use data to inform instruction - already provide lots of information about how teachers affect student learning.

3) To my knowledge, no one has provided a viable technical solution to the middle of the year testing issue. Given existing problems with value-added and the added complication of midyear testing dates, it would be wildly irresponsible to put these measures into place in NY without further study.

If you want new reasons (not related to testing dates) to sweat about the fallibility of value-added, check out this paper, which was presented last weekend at AEFA by Tim Sass (in collaboration with RAND's J.R. Lockwood and Dan McCaffrey). They looked at the year-to-year stability of value-added estimates in Florida, and found that it's often the case that teachers who are in the bottom 20% of value-added estimates in one year are not in the bottom 20% the next year. In Broward County, only 41.4% of teachers who were in the bottom 20% in one year were in the bottom 20% the next year, too. In Orange County, only 31.7% of the teachers who were in the bottom 20% in one year were also there the next year!

Update: Robert Gordon cherrypicks a finding from the Jacob and Lefgren paper to make his point. Perhaps if he'd read beyond the abstract and looked at the magnitude of the value-added advantage over principal ratings in predicting future student achievement (a whopping .036 SD in reading and .074 SD in math), he would realize that all is not lost. And again, this minuscule value-added advantage is coming from the middle of the distribution, not the top and bottom - and the bottom is the relevant issue in tenure decisions. From the same paper:

While value-added measures of teacher effectiveness generally do a better job at predicting future student achievement than principal ratings, the two measures do about equally well in identifying the best and worst teachers. With regard to parent satisfaction, we find that a principal’s overall rating of a teacher is a substantially better predictor of future parent requests for that teacher than either the teacher’s experience, education and current compensation or the teacher’s value-added achievement measure.

Moreover, what kind of predictive advantage can we expect inaccurate/noisy value-added estimates to have over principals' evaluations?

With New Gifted and Talented Rules, Who Wins and Loses?

"Today, there’s limited access to gifted and talented education in some districts. The opposite is true in other districts. We want to create universal opportunity—and dramatically increase the numbers of students testing for, and hopefully entering, gifted and talented programs."

-Joel Klein, October 29, 2007 Press Release


This fall, New York City adopted a uniform system for gifted and talented admissions. Educational equity, we were told, was the reason for this reform; New York City has long operated a decentralized network of gifted programs, and the conventional wisdom is that more affluent community districts had more than their fair share of these programs. Tapping into this debate, Joel Klein framed his reform as a mechanism to increase access to poor and minority kids.

Last week, the Department of Education released the number of kids qualifying for gifted and talented programs by community school district (those scoring at or above the 90th percentile on the OLSAT and Bracken School Readiness Assesment qualified). The DOE did not release socioeconomic or demographic breakdowns, but one way to get at the equity question is to look at which districts won and lost under the new system.

Did poor kids gain ground? The graph below, which plots the percent change in the number of students offered gifted seats in the entry grades against the percentage of students qualifying for free lunch in the district suggests that the answer is no. On average, districts with higher proportions of poor kids saw declines in gifted admissions. Districts above the red line gained seats, while those below the red line lost seats.

Percent%20change%20GT%20Seats%20vs.%20Free%20Lunch.png


Here's a closer look: in Washington Heights' District 6, 80 students are currently enrolled in kindergarten G&T classes, but only 50 have been offered seats next year. In Manhattan's more advantaged District 2, 174 students are enrolled in G&T kindergarten this year, but 371 have been offered seats for next year. District 3, which includes the Upper West Side, saw increases from 192 to 310 students. In District 9 in the South Bronx, the number of seats declined from 37 to 11. (Footnote: It's possible that admitted students in more advantaged districts will enroll in private school at higher rates, so the gains may not be as pronounced as they appear here. With available data, we can only compare the number of students admitted for fall 2008 with those enrolled this year. Also, my free lunch numbers are from the 2005 School Report Cards; please point me to more recent data if you know where to find it!)

If we cut the data by the percentage of African-American students in the district, we also see that many districts with high proportions of black students lost ground.

Percent%20change%20GT%20Seats%20vs.%20Black.png


Yet the Department of Education continues to swagger about how many more students were tested. Yes, the number of students tested in all districts increased, and we see larger increases in high poverty districts.

Pct%20increase_apps.png


But parents in disadvantaged districts have not been complaining about their kids' lack of opportunity to take an admissions test, but their lack of access to programs for more advanced students. Families in New York City's poorest districts face disadvantages that make them less likely to reach the 90th percentile on a national assessment, but the highest achieving students in these districts could still benefit from enriched instruction.

If we want to increase access to advanced instruction for disadvantaged kids who are more advanced than their peers, we might consider offering gifted slots to the top 5% of students in each community school district, while also guaranteeing a seat for any student who scores in the 90th percentile or above of the national distribution. This is analagous to states' top 4% (California) or top 10% (Texas) plans for college admissions, which guarantee college admission to students who have excelled in their own high schools. What do you think, readers?

Preview: I've also put together tables on the proportion of students applying to and qualifying for gifted and talented programs in each district, and will post these tables later this week.

April 11, 2008

Finally, Credit Recovery Uncovered by NY Times

underrock.gif
To close observers of the NYC system, the "credit recovery" story is old news. But this burgeoning phenomenon had received scant media attention until Elissa Gootman turned in this important NYT article linking credit recovery to the mounting pressure to increase graduation rates by any means necessary.

For the uninitiated, credit recovery involves "letting those who lack credits make them up by means other than retaking a class or attending traditional summer school." This often involves completing a project which demonstrates "mastery" of the course. I've seen projects ranging from a packet of math problems to a 5-page "term paper," and Gootman also identified similar patterns in NYC high schools:

In interviews, teachers or principals at more than a dozen schools said the programs ranged from five-day crunch sessions over school breaks, to interactive computer programs culminating in an online test, to independent study packets — and varied in quality.

Klein argues there's no evidence that credit recovery has become more prevalent in recent years. But the incentives for schools to push students through (or to transfer them out before they count against the school) have grown with the adoption of NYC's report cards and funder-driven graduation targets for the small schools.

When a simple system tries to regulate an issue as complex as graduation rates, you end up with unintended consequences. Hopefully Madame Secretary will consider NYC's experience with credit recovery as she contemplates graduation rate measures and targets.

April 10, 2008

Quotes of the Day

randi-the-vampire-slayer.jpg
This afternoon, Randi Weingarten was the keynote speaker at the opening session of the annual meetings of the American Education Finance Association. More detail on the talk later, but here are three quotable quotes in the meantime:

"What drives me crazy is that there is so much disinformation and downright hostility towards teachers these days. The fact that they get scapegoated so much is a huge disservice to society."

"The vast accounting trick [of current educational accountability systems] sooner or later will implode and leave the shareholders - the public - holding an empty bag."

"Education used to have the fad of the month - now we have the fad of the Chancellor."

March 24, 2008

Load of Bollocks

big%20ben.jpg
The Daily News reports that Cambridge Education Associates is getting a 9% pay raise, even as NYC schools face budget cuts. The average cost of reviewing a school will jump to $4,856, up from $4,427. NYC taxpayers are dishing out 1.1 million for their travel expenses - looks like you and I are paying for our cross-pond friends to fly business class and eat warm chocolate chip cookies. Meanwhile, 8th graders who face retention have lost out on tutoring opportunities. Awesome!

With $2,375,649 spent on the 30 staff working in NYC Department of Education public relations via the "Communications Office," the "Office of Public and Community Affairs," the "Strategic Response Unit," as well as "Community Education Council" PR, can't these wizards keep pay for the Cambridge Ed punters out of the news? You'd think the folks pulling $175,250, $158,603, and $127,776 (top 3 earners in NYC DOE PR) could bring it. NYC Educator provides a clue - were all hands on deck prepping the Ed Next debutante ball?

March 13, 2008

RFSLIC

willie-wonka-roland-fryer.gif
I am 2BZ4UQT. But a reader sent along his thoughts on how Roland Fryer's plan to text message our way to educational equity could reinvent NYC teens' texting lingo. More likely is that the Department of Ed makes a major gaffe while trying to communicate with the young folks in a language no one older than 22 understands. For original meanings, you can look here.

2BZS2T - too busy studying to talk
MILF - man, I like fractions!
LOL - learning obligatory lessons
RMTVA - raising my teachers' value-added
ROFL - reading only for loot
OTFN - our teachers fired now
WDIGP - when do I get paid?
JK - Joel Klein
POS - pouring over schoolwork
LMAO - learning math adds opportunities
MFWIC – math is for wicked intelligent children
RFSLIC - Roland Fryer says learning is cool

February 29, 2008

Nip/Tuck for NYC Progress Reports?

nip-logo.jpg
Yesterday's Principals Weekly (a weekly email sent to New York City principals) foreshadowed some possible changes to the NYC Progress Reports. (You can read earlier posts on progress reports here.) Some proposed changes include:

1) The new system may assign separate grades for each element of the progress report. In other words, schools could get an A for the overall proficiency category, a C based on their students' test score growth, and an F based on the learning environment surveys. This is a very positive step. (Diane Ravitch made a powerful argument for this change in the fall.)

2) The Progress Reports compare each school to a group of similar schools. In the fall, the elementary and K-8 "peer indices" were created using demographics; the new proposal is to use "the average ELA and math proficiency rating of students in the testing grades" instead.

3) To address ceiling effects, the new Progress Reports may count any level 4 student (the highest performance level) who remains at level 4 as making one year of progress.

4) A "progress adjustment" may be made for special education students who take the state ELA and math tests in consecutive years. I am not sure how DOE plans to adjust scores, but this appears to be a response to Leo Casey's special ed post on Edwize.

Read the full Principals Weekly excerpt on Progress Reports below, or see Elizabeth Green for more details.

Continue reading "Nip/Tuck for NYC Progress Reports?" »

February 8, 2008

Do Quality Reviews Lead to Increased Student Achievement?

spiffboy2.jpg
skoolboy wraps up his posts on Quality Reviews. His first two posts can be found here and here.

Do quality reviews lead to increased student achievement? There’s been surprisingly little research that addresses this question. Most research on quality reviews has examined the school inspection process in Great Britain managed by the Office for Standards in Education (Ofsted), a national agency which reports to the Parliament. Since school inspections for primary and secondary schools were instituted in 1993, there have been several iterations in the school inspection process. But I haven’t found any persuasive evidence that inspections improve student achievement. Some teachers and administrators report that they intend to change their practices in response to the inspection report, but I’ve not seen studies which examine whether those intentions translate into improved practice.

You might get the impression from my postings this week that I think that quality reviews are a bad idea. Not necessarily! But there are some things that I think are essential for quality reviews to be a good idea. Here’s a brief list:

The purpose of the review must be clear. Sociologist Gary Natriello has written about four potential purposes for evaluations in schools: motivation, direction, certification and selection. The first two can contribute to school improvement, whereas the latter two are more concerned with regulation, accountability, and control; and it’s desirable to confront the tensions between improvement and control directly. If the purpose of a quality review is to improve how schools work, then all phases of the review process need to be oriented towards this purpose.

Definitions of quality must be clear and transparent. If there are clear criteria and standards for what constitutes school quality, then both educators and inspectors can orient their activities towards these criteria and standards. Unclear standards and definitions undermine the legitimacy of the quality review process. My impression is that the Ofsted criteria are a lot clearer than those that I’ve seen stateside. Quality teaching is a particularly challenging phenomenon to articulate; but if the goal is to improve teaching, we’ve got to be able to do it.

The quality review process must be designed to collect a sufficient amount of data on quality. If, for example, the purpose of the quality review is to improve teaching, then presumably there should be sustained collection of data on teaching quality, primarily through direct observation, but perhaps in other ways as well. Ms. Frizzle recently commented that in her New York City school, the quality reviewer was planning to observe 9 different classrooms in 30 minutes. Not much data on teaching quality will come from such a process. The intensity of data collection is a recurring challenge in evaluation research that involves site visits, because they are labor-intensive. “Drive-by” site-visits just aren’t very useful, even if conducted by well-trained observers, because they don’t gather enough data on the things that matter.

The frequency of quality reviews should be synchronized with a theory of how fast school quality is changing. This is Social Research 101: phenomena that change more quickly need to be measured more frequently to detect such changes, and phenomena that change more slowly don’t need to be measured as often. How frequently should we assess school quality? The school year is an arbitrary metric, and it may be wasteful and counterproductive to conduct school quality reviews on an annual basis. (In Great Britain, Ofsted inspects primary schools every three years.) Given a choice, I’d rather have less frequent, but more intensive, quality reviews.

February 4, 2008

Reviewing External Quality Reviews, or: Consultant Whack-a-Mole!

spiffboy2.jpg
I teach at a college that periodically commissions external reviews of the institution and its academic programs. Sometimes these external institutional reviews are "high stakes," such as regional accreditation reviews (e.g., North Central Association, Middle States, etc.) or professional accreditation reviews (such as the National Council for the Accreditation of Teacher Education). Out of the corner of my eye, I've been seeing an increase in the reliance of large urban school districts, such as New York City and Washington, DC, on external reviews (sometimes labeled "quality reviews.") I'm intrigued by the similarities and differences I'm observing.

Most external reviews begin with a self-study, which typically has three major dimensions: (a) What are your unit's goals? (b) How well are you meeting these goals, and what's the evidence? (c) What are you going to do about it? This is then followed by the proverbial "site visit," in which an individual or team from outside of the institution reviews the self-study, comes to the campus for a day or two, pokes around and asks questions, and retreats to write a report which is shared with the institution and its leaders. Often, the institution then will write a response to the report. Then the report goes on the shelf.

The composition of the site visit team can arouse some passion. In postsecondary institutions, site visitors typically are conceived of as peers of the faculty; but who counts as a peer is a matter of debate. How can someone from Eastern Podunk College ever understand how we at Elite University do business? Is a site visitor who studies 18th-century English literature really a peer of the faculty in an English department that focuses on contemporary American fiction?

I'm intrigued by the fact that in New York City and Washington, DC, the site visitors are external management consultants who are not educators within the system, and in fact may not be teachers or administrators in other systems. Consultants such as these would be laughed out of the room in a review of a college department; but nobody's laughing in large urban districts. I think this is because college faculty are assumed to have stronger claims to disciplinary knowledge and expertise than do K-12 teachers and administrators, and because the shared governance model in colleges and universities give faculty more control over academic decision-making than K-12 educators are typically granted.

Scholars of organizations make sense of external reviews by drawing on institutional theory. Institutional theory focuses on the relationship between organizations and their external environments, including the ways in which organizations are perceived to be legitimate by their external environments. An organization (e.g., school, district, or college) that is perceived to be high-performing generally doesn't have to worry about its legitimacy. But many educational organizations are not seen as high performers. In this case, they have to rely on some other way to be seen as legitimate than a demonstration of good outcomes. A common strategy is to imitate the practices of other social institutions that are seen as legitimate, in the hopes that the legitimacy will "rub off."

Many cases of education imitating the business world can be explained in this way. (Not that the business world has such a great track record to warrant serving as the ideal standard.) So, for example, because it's seen as rational for organizations to set goals and measure progress towards them, this is an integral part of most external review processes-much more so than direct inspection of what the organization is actually doing to meet those goals. This would account for the use of management consultants as external reviewers in New York City and Washington. In this sense, external reviews are mostly symbolic, rather than substantive.

This is, of course, a highly cynical view of external reviews-perhaps more than is warranted. I'd like to pose a couple of questions to eduwonkette's readers: (1) What are some legitimate purposes of external reviews of K-12 schools? (2) Based on these purposes, what should the composition of an external review team look like? The purpose in asking these questions is not to play whack-a-mole with consultants (although that may be a consequence), but rather to introduce a topic that I hope to post a bit more about over the next couple of days. I'm also curious if readers know of any evidence of external reviews actually improving teaching and learning in K-12 schools. Please feel free to e-mail me at skoolboy2 (at) gmail (dot) com to point me in a fruitful direction.

January 30, 2008

Social Promotion Rap Up

Ali-G-Ette.jpg
Yo yo yo
Word up to Dan Brown
For showing how to break
A billionaire down

Shameful practice?
DOE, you're just like a cactus
Soaking up data but ya head is all dry
Read the research, yo
I'm telling you why

My boy Brian Jacob and his main man Lars
Wrote a paper saying you're down from Mars
For holding back kids when they're 14 years old
Check their results, J.K., then see if you're sold

Next time someone argues that rapping doesn't require talent and skill, direct them to this post. You'll handily win the argument.

Seriously, the Jacob and Lefgren paper, based on analysis of Chicago's similarly structured 8th grade retention program, found that Chicago's 8th grade retention policy increased the proportion of 8th grade retainees dropping out of high school. (However, Jacob and Lefgren found no effects of the 6th grade retention policy on students' likelihood of dropping out of high school.) It's a very thorough paper - take a look.

January 29, 2008

Guest Post: The Misleading Specter of "Social Promotion"

great_expect_feet_FINAL.jpg
Let's give it up for guest blogger Dan Brown, the author of the Bronx teacher memoir, “The Great Expectations School: A Rookie Year in the New Blackboard Jungle.” You can email him at danbrownteacher@gmail.com.

It’s a rough time to be a struggling student in New York City.

Mayor Bloomberg has now pledged to end the “shameful practice of social promotion” for eighth-grade students who fail either of their two state tests or any core classes. This means nearly 17,000 more eighth-graders than last year may be retained. For his tough position on boosting standards and student accountability, Bloomberg has received much praise.

But what about those kids who will be left back? Who are these socially promoted hangers-on that each year skate by with sub-par marks, undermining the achievement of a serious educational institution?

The answer is innocent kids who aren’t getting the help they need and don’t know how to demand it.

Students come to school with low academic skills for a variety of reasons. In New York, many are faultless victims of the ever-present crush of poverty and its far-reaching tentacles. The school system’s obsession with high-stakes testing— a game struggling students are poorly equipped to play— exacerbates their frustration. Their self-esteem levels are rock bottom and oppositional behavior often takes root. Can you blame them?

Blindly pushing struggling students forward (social promotion) is not the answer, but neither is holding them back for another lap around a failed track. Retaining low-achieving students does not improve their academic future; in fact it often does quite the opposite.

The struggling student conundrum can’t be solved with false choices like the ones offered in the social promotion political debate, but with serious assessments of the short-term and long-term needs of students.

The short-term answer for failing students is a major investment in remediation and individualized support. Clearly, the traditional classroom set-up isn’t working for these students.

The long-term solutions, ones that deal with the root issues of why kids fall behind early on, are more complicated, and more important, for the future of New York City. Students don't spontaneously combust in middle school. When students fail in eighth grade, something has been wrong for a long time. We need our mayor to address how those students can be rescued before hitting a seemingly irreversible frustration level.

Today's middle school students have lived with high-stakes testing in the No Child Left Behind era for virtually their entire scholastic lives. Mayor Bloomberg and Chancellor Klein have worked unrelentingly to conflate school accountability with test performance, a practice with myriad negative consequences. Rather than making it a priority for school to be a nurturing and personal experience, our system sees many kids denied preschool, packed into overcrowded classrooms, denied support services like fundamental skills tutoring, denied much-needed counseling, and supervised by administrators often more worried about test scores than their real needs. It's no wonder that some students eventually give up.

Many of Bloomberg and Klein’s school reforms are dynamic and exciting, but the ones that they have not yet made are essential. A more substantial up-front investment in supporting all students will pay manifold dividends.

Bloomberg is an expert of the business sphere, but bottom-line-driven business models are an ill fit for the education of young human beings. Focusing on holding struggling students back rather than intensively attending to their academic needs is tantamount to blaming the victims. Many socially promoted students have unwittingly suffered the collateral damage of suffocating poverty at home and a depersonalized, test-obsessed regime at school. It’s time they had some doors opened for them, not slammed in their faces.

January 24, 2008

Data-Driven Decision Making Gone Wild: How Do We Know What Data to Trust to Inform Decision-Making?

spiffboy2.jpg
skoolboy returns to weigh in on data-driven decision making:

I’m as much a fan of data as the next guy. But I worry that proponents of data-driven decision-making are understating just how hard it is to use data thoughtfully.

I’d like to describe the strategy championed by the New York City Department of Education, and point out the difficulties involved. The logic that the DOE is promoting is (a) use data to identify an area where a school is lagging, either in relation to some absolute standard or to other similar schools; (b) use the available data systems to identify similar schools that are doing better in this area; (c) ask these more effective schools what they are doing that accounts for their success; and (d) adapt their suggestions for use in the school.

It’s not as easy as it looks to determine which schools are doing better than others. Two different criteria are relevant: is the difference in performance between two schools large enough to matter, which is sometimes termed educational significance or practical significance; and is the difference in performance between two schools real, or could it just be due to chance, which is typically described as statistical significance. Ideally, we are interested in differences that are both practically and statistically significant. But a difference could be large, but not statistically significant (which is often the case when we have a small sample of information about performance), or statistically significant, but very small (in which we are pretty sure that the difference is real, but it’s just not very important). (Yes, statistical significance does matter!)

This is kind of abstract, so here’s an example, drawn from the NYC Department of Education’s Survey Access tool, which reports the results of the system’s first round of Learning Environment Surveys in the spring of 2007. The Department’s spiffy PowerPoint presentation imagines the principal and a group of teachers in (mythical) IS 402 identifying teacher engagement as an issue. In particular, teachers in this school generally disagreed that “Obtaining information from parents about student learning needs is a priority at my school.” Using the Survey Access tool, it’s possible to identify 12 similar NYC schools (i.e., middle schools with an enrollment over 700 and at least 25% ELL students), seven of which have more positive scores on this question. In the top school, the Eleanor Roosevelt School, 71% of the teachers strongly agreed or agreed with the statement, whereas in the bottom school, 13% of the teachers strongly agreed or agreed. (In mythical IS 402, 36% of the 31 teachers who responded to the survey strongly agreed or agreed.)

So why not just look at the seven schools above IS 402? Because the percentages of teachers strongly agreeing or agreeing is an estimate of the true percentage that would be observed if all teachers in the school responded to the survey. (In these 12 schools the teacher response rate ranged from 26% to 53%; in mythical IS 402, 40% of the teachers responded.) Our interest is in the population of teachers in the school, not just the sample that chose to respond. And there’s a degree of uncertainty in these estimates. If a different group of 31 teachers in IS 402 responded, just by chance, we might not have obtained an estimate of 36% strongly agreeing or agreeing. In fact, with a sample of 31 teachers responding and a sample estimate of 36%, the percentage of all of the teachers in IS 402 agreeing or strongly agreeing could plausibly range from 23% to 49%. (There’s a finite population correction in there, for those who care about such things.) That’s a pretty big range, and the range of possible values is pretty large for the other dozen schools as well.

Of the seven schools above IS 402, just one of them, the Eleanor Roosevelt School, is really head-and-shoulders above it in a statistical sense. The other six are statistically indistinguishable, because there’s so much overlap in the intervals in which the true percentage of all of the teachers strongly agreeing or agreeing in each school lies.

Would the principal and teachers in IS 402 learn something from asking the staff in these seven other schools how they do things? Sure! It doesn’t hurt to think about new ways of doing business. Will doing so raise performance in IS 402? Probably not. Because an assessment of statistical significance suggests that, with the exception of Eleanor Roosevelt, these other schools really aren’t doing better, and therefore there’s no reason to think that adopting their practices will yield genuine improvements.

Data-driven decision makers, beware of spurious comparisons.

The NYC Teacher Experiment Revisited

white_rat_in_maze.gif
Over at the Ed Sector, there's some confusion about my concern with the ethics of the NYC teacher experiment (see here). To be clear, my problem is not that NYC is collecting value-added data. As I have written before, standardized tests have a role to play in teacher assessment alongside holistic evaluation of teachers' effectiveness. But as eduwonk himself noted, the methodological issues are hairy and as of yet unresolved.

The concern expressed in my earlier post was how this experiment was conducted in secret and, in my opinion, in violation of generally accepted human subjects policies. The entire enterprise of social science relies on potential study participants trusting researchers to minimize risks and fully disclose the purpose of their study. Every time a gaff like this happens, it undermines researchers' ability to build trust with study participants in the future. Let's review the chronology:

1) In September, an academic experiment headed by two very talented researchers, Jonah Rockoff (Columbia Business School) and Tom Kane (Harvard Grad School of Ed), was announced. It was presented as an experiment intended to generate academic knowledge, not to inform human resources decisions in real time. (You can watch a video of a study recruitment session here.)

2) Academic research is bound not only by common sense research ethics, but by the conventions of university Institutional Review Boards. What this means is that when academic researchers conduct research intended to produce generalizable knowledge - i.e. if researchers want to publish off of these data - the experiment has to proceed within generally accepted research ethics and a university IRB has to approve it. (Even if this was not an academic research project, the DOE should have notified teachers of an intervention of potential consequence for them. After all, the data are not just being collected, but distributed to principals in the experiment's treatment group.)

IRBs are primarily concerned with the harm that researchers could do to subjects by intervening in their lives, and applicants to IRBs must demonstrate that their project poses minimal risks, that participants have been notified of these risks, and that participants have consented to the research. Teachers did not need to consent in this case, as they are government employees and their employers can collect whatever data they want.

However, it is difficult for me to understand how one could justify not notifying teachers in the study. After all, the information given to their principal - which, given the ongoing methodological problems with value-added, may or may not be accurate - has the potential to permanently change their principals' perceptions of them and their future employment prospects. Moreover, this treatment is not being applied universally to NYC teachers. By simply having the bad luck to be selected into the study's treatment condition, some teachers are affected and others are not.

It is important to note that a "live experimental" study like this one is different from the secondary data analysis studies that eduwonk cites. He wrote:

By that logic, all these various studies with panel data, choice studies using lotteries, etc...all constitute human experimentation and are wrong.

Studies based on secondary data analysis are fundamentally different - and are treated differently by IRBs - because researchers are analyzing "dead" data that have no effect on real people's lives. Ongoing research projects in which interventions are made in real people's lives are held to a different standard. And should be.

3)According to Edwize and the NYT article, teachers were not notified of the study. What went wrong is that at some point this went from an academic study to a human resources project that Chris Cerf wants to take prime time. Perhaps he mispoke, or the NYT article had this wrong, but it appears that these data, collected under the auspices of an academic research study, may be used as early as June. As eduwonk noted, simply gathering the data is not a problem. The problem is that under the cover of "academic research," data are being given to princpals in ways that affect teachers' future employment without teachers' knowledge.

The irony, of course, is that none of this would be a big deal if the project had been announced to teachers. When I watched the recruitment session video back in September, it didn't seem like a big deal at all. I bookmarked that this was an interesting experiment conducted by two reseachers whose work is first rate, and assumed that the experiment would proceed under normal conditions (i.e. full disclosure of the study). For reasons I don't fully understand, it didn't. And here we are.

There's much more to say about the methodological and broader philsophical issues with value-added measures. I'll follow up with a post on these issues later.

Update: eduwonk and I continue our bridging differences exercise. He wrote:

Her position here would be a lot more compelling if (a) this were an actual experiment in the way she and other anti-Klein partisans are seeking to describe it rather than what it is. In addition --and again-- the fact is that we don't know what they are doing with the data so at this point all these leaps to various consequences are unfounded.

But we do know what they are doing with the information, at least in the context of this experiment (and, as I have explained above, it is an experiment). Principals in the treatment group are given value-added data reports on each of their teachers. These principals' perceptions of teachers' academic effectiveness are thus affected - correctly or incorrectly - by this information. Saying "principals can't use it" is like trying to strike evidence from the record in a courtroom. Jurors' perceptions are already influenced, and the damage is done.

January 22, 2008

It's Our Secret! The NYC Teacher Experiment

telephone_shhh.GIF
The NY Times reported yesterday on an ongoing experiment on teacher effectiveness in NYC schools. Principals in the treatment group (140 schools) receive extensive value-added information on each teacher, and then are asked to evaluate the teachers. Principals in the control group do not receive these reports but also provide evaluations of their teachers. As far as I can tell, the goal is to determine how principals' evaluations are affected by having access to value-added data. By the summer, the NYC DOE will decide how these data will be used, and Deputy Chancellor Chris Cerf has even suggested releasing individual teachers' effectiveness data publicly. You can watch this video for more information about the experiment.

While much could be said about the challenges of estimating reliable value-added measures for teachers or the move to use test scores as the primary measure of assessing teacher effectiveness, I'll save those for later. (See more posts about measuring teacher effectiveness here.) Instead, I want to talk about the issue of research ethics in scientific experiments. It turns out that many teachers in participating schools have not been notified of the study.

Secret experiments have an odious history in science. The most notable example is the Tuskegee experiment, in which African-American men with syphilis were recruited into a study but not told of the purpose of the study or notified of their diagnosis. Their disease was left untreated so that researchers could track its progression. Once this experiment broke publicly, Congress passed legislation that, many commissions and administrative changes later, ultimately required universities receiving federal grants to form Institutional Review Boards to oversee all research. Human subjects policies require university researchers to receive the consent of all subjects and to make them aware of the potential risks of the study.

My point is not that the NYC experiment's secrecy is the moral equivalent of the Tuskegee Experiments. The Department of Education is not bound by any university's human subjects policy, and it is their right to examine whatever data they please to produce new knowledge. (Note that the university researchers involved are bound by IRB standards if they plan to publish off of these data.) But the Hippocratic Oath of the research community - that subjects should be aware that they are part of a study - has been grossly violated. And it does not help the reputation or future of "scientifically based research" in education when studies are conducted in secret. Even if this was not a research study, a decent boss notifies employees when they change the criteria on which employees are evaluated.

Where is this going next? Notably, Cerf's suggestion that individual teachers' data should be publicly released has precedent in New York. The New York State Department of Health started collecting similar data on doctors' effects on mortality in the early 1990s. In 1991, New York Newsday filed a Freedom of Information request, which forced the Department of Health to publicly release doctor level data. Since then, individual doctors' data have been publicly reported. Assuming the same Freedom of Information statutes apply to education, it may not be long before we can examine the "value-added scores" of NYC teachers while waiting for the C train to show up.

Back to data-driven decision making tomorrow.

January 18, 2008

They Never Say "Thanks for Improving My Test Scores!"

SHarris3.jpg
New York City posted the nomination narratives from its "Thank a Teacher" awards program. Here's the first one, about a physics teacher named Sidney Harris:

Mr. Harris’s expertise was in physics but what he taught me went far beyond science. He pushed me. He shaped the way I thought about my future. And he set expectations for me that were, before then, unimaginable.

What was his value-added on this kid's Physics Regents? We'll never know, but Mr. Harris' former student Joel Klein says: "I really believe I am chancellor today in no small measure because of Sidney Harris." Read a handful of these narratives and then ask yourself if we should evaluate teachers primarily based on their students' test scores.

January 17, 2008

In New York City, Math is Hard

math%20is%20hard.jpg
Test your skills with this word problem:

A comprehensive high school in New York City has an enrollment of 900 9th graders. The NYC Department of Education decides to close the school and replace it with 5 new small schools, each of which will enroll 108 9th graders. How many 9th graders are left over?

Extra credit, Part I: Imagine that the NYC Dept of Ed closes 2 comprehensive schools in one year with enrollments identical to those above. Now how many 9th graders are left over?

Extra credit, Part II: Where will the displaced kids go to school?

If you've got your noggin on, you know that the answer is 360 kids, and that if we close two schools, we now have 720 displaced kids who need a place to go to 9th grade. This is, in part, the subject of Sam Freedman's NYT column yesterday. His column provides a hint on Extra Credit, Part II:

More broadly, the problem is the outcome of Department of Education decisions to open scores of small, niched schools in the area, close large ones perceived as academic failures and leave the excess students to land in traditional schools like Richmond Hill that, while relatively successful academically, were often overcrowded to begin with. In this version of education reform, it is never hard to tell the winners from the losers.

I know what you're thinking - doesn't anyone have a calculator? The NYC Department of Ed seems to have forgotten that matter is neither created nor destroyed in a chemical reaction - and the kids don't disappear, either.

Update: For more on high school reorganization hiccups, see the Gotham Gazette's Wonkster and this article in the Village Voice.

January 15, 2008

American Gladiator: Joel vs. Rudy

joel%20and%20rudy.jpg
Earlier today, Diane Ravitch drew attention to American education's growing faceoff between non-educators and educators. She writes: How did American education fall so effortlessly into the control of Know Nothings from the world of business, law, and politics? Now, John Merrow releases a podcast with NYC's past and present Chancellors (moderated by Jay Mathews) that squarely hits on this philosophical divide. Some highlights:

*Joel sums up his job with a song: "Give a little, take a little, let your poor heart break a little…"

* Joel identifies leadership and attracting new and different teachers into teaching as his top two improvement initiatives. He questions whether the principal should primarily be an instructional leader. Rudy disagrees, saying, "the core of this business is how children learn."

* Joel and Rudy spar on the role of charter schools in urban ed reform.

Definitely worth a listen.

January 14, 2008

Be My Guest

be%20my%20guest.jpg
Edwize is pulling in a gaggle of guest bloggers to comment on the NYC Progress Reports - check out Sherman Dorn's post on "Bundling Accountability," Seth Pearce's post on "The Importance of the School Progress Debate," and my post, "The NYC Progress Report Catch-22".

January 9, 2008

If Roland Fryer Was the CEO of Heaven...

willie-wonka-roland-fryer.gif
We've now entered a P.F. (Post Freakonomics) age, and talk of incentives is everywhere. Education is no exception - there's rising interest in the idea of paying kids for upping their scores (more on this idea here). See the New Yorker's pithy take on incentives and the afterlife here:
Eternity lasts a very long time. Our resources, though “infinite,” are not unlimited....Focus groups have suggested that offering a mere year or two of heavenly bliss, coupled with the threat of a single hour spent bathing in hot pitch and being harassed by demons, would generate ninety-seven per cent of the current program’s salutary effect on mortal behavior. (Interestingly, eternity itself is now perceived as a disincentive by blessed souls with more than two years of college education.) This suggests that severely scaling back Our incentive plan—and its attendant costs—would not lead to a significant diminution in faithfulness, obedience, repentance, or other benefits accruing to Ourself.
The opinions expressed in eduwonkette are strictly those of the author and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Get RSS

Get eduwonkette delivered by e-mail. Enter your e-mail here:

Delivered by FeedBurner

Advertisement
Powered by
Movable Type 3.34
<

EW Archive