Prompted by ongoing stories of pharmaceutical companies that declined to publicize studies suggesting their profitable drug lines might be less than helpful to users, This Week in Education’s uberblogger Alexander Russo asked whether foundations suppressed unfavorable research on the effectiveness of their grantee's educational programs. (See January 18.)
Since then, there have been a few technical comments on methodology. I think Alexander hoped to spark a more expansive discussion.
In that spirit, I offer some observations based on personal experience.
Perhaps the most extensive and expensive review of education programs funded by philanthropy was RAND’s eight-year roughly $10 million evaluation of New American Schools (NAS) and its $100+ million acquaintance with comprehensive school reform design grantees. I experienced the ongoing review as a member of the RAND team, the lead grants officer and later Chief Operating Officer of NAS, and the President of NAS’s lending and equity investment arm, the Education Entrepreneurs Fund. Since that time, I have made a habit of following and summarizing program evaluation and evaluation issues and tracking relevant grant RFPs for subscribers to my firm's web-based publications School Improvement Industry Week and K-12Leads and Youth Service Markets Report.
1. Serious evaluation of products, services and programs to improve student learning is a recent practice and remains unusual - whether developed with grants from philanthropy or private investment. Third-party evaluation has been especially unusual as an activity built into the terms of philanthropic grants. Government grant programs now invariably call for an evaluation plan, but not philanthropy – nor private equity.
2. To the extent program evaluation has been built into grants, it has been focused on initial development and small-scale implementation. The challenge, of course, is taking the program from a field lab experiment to a scaleable system or even a replicable model. (As I see it, replication answers the question "can this model be reproduced?" Scale addresses far more demanding question "can it be replicated in large numbers simultaneously?" See Chapter 16 here, here, and also listen starting here.) So many additional factors come to bear on scale and even replication, that the field evaluation is barely relevant to the larger “what works” question.
3. When a foundation is deciding whether to commit itself to a program developer, it should be trying to determine potential efficacy. When it makes a grant, it should commit the grantee to certain benchmarks, including some related to efficacy. I have never heard of the former taking place, and doubt it has ever been seriously attempted. Most philanthropic due diligence is laughable. I have seen the latter only in NAS contracts with its design teams – and they proved very hard to enforce. Most grantees and most foundation program officers know that the name of the game is over-promising and then stretching out the funding. Once the foundation has made the grant, it is not a disinterested third party – it and the program officers who recommended the grant have a vested interest in success, and very little incentive to point out bad decisions. (Moreover, foundation staff are rarely held accountable for their grant decisions or recommendations. Tom Vander Ark's departure from Gates may or may not have been related to the disappointing return on his "investment thesis" - the sometimes popular "small schools" "theory of change.")
4. It is important to understand the market in which nonprofit educational program developers operate. The nonprofit revenue model is based on grants, not fees. The business is not really developing educational programs to take to scale, it is addressing the interests of grant-making customers in philanthropy. So it is not correct to characterize the role of philanthropy as analogous to venture capital - providing seed money that will lead the grantee to a larger market and a new source of revenues. For most nonprofits, philanthropy is the market. As such, the "sale" of grants is not based much on a nonprofits' records of program efficacy or scaleabilty, but on topical trends in education philanthropy, personality and salesmanship of grant seekers, and pack movement in grant making. The most successful educational nonprofits are managed by leaders who know when one education trend is about to play out, what the next trend will be, and how to craft and deliver conceptually compelling proposals.
5. Philanthropies interests are not pragmatic, except to the extent that they must give 5 percent of their total value away every year - they are ideological. All foundation giving is informed by its social philosophy, policy and politics. This is not an accusation of wrongdoing, it's a recognition of philanthropy's social purpose. So it is no surprise that there is almost certainly more post-hoc evaluation funded for political purposes, than prior evaluation for due diligence, or contemporaneous evaluation for consumer awareness. The studies vary in quality, but in the end, no reasonable observer can take the findings at face value. All must be deconstructed to identify their bias.
6. Finally A) developers have a poor understanding of how and why their programs work in one school, let alone do or do not work in hundreds, and B) the state of the evaluation art is still primitive relative to the demands of demonstrating efficacy at scale. As a consequence, the net result of prominent evaluators and tiny oufits, biased and unbiased, small and large scale, old and current program reviews amounts to the Scottish legal finding “not proved” and the evaluator's old standby “promising.”
The main reason why even “not proved” is good enough to keep going with program development is that we know the current system isn’t meeting society’s needs. But until something like the requirements implied in the Scientifically Based Research provisions of No Child Left Behind legislation are given meaning in administrative law and Department of Education regulation and practice - and enforced as law, I don’t expect much to change in philanthropy’s approach to evaluation.
The problem of method. Even high-quality program evaluations are problematic. All studies are constrained by budget, time, and the cooperation of the schools selected for review. Longitudinal studies are bit like committing to a heading determined by dead reckoning. Once the research plan is set, changing circumstances of budget, school support, and the like are difficult to accommodate. (See, for example MDRC's recent study of the Institute for Learning's district wide strategy.) At least as important is that the "model" being tested is not a set combination of inputs, the fidelity of implementation matters, and few studies attempt to accommodate the real problem of results accountable to differences in developers' "client selection," teacher buy-in, the varying expertise and experience of the designer's staff, or less well-understood factors like students' learning styles. All of these features are absolutely vital to the question of scale, because the goal of reaching large numbers of students is not synonymous with "one size fits all." (For a recent effort to consider fidelity of implementation see here.)
Moreover, models do not remain fixed. They evolve as experience in client schools identifies problems and opportunities. The model evaluated at the start of the study is not the model implemented at the end of the study, nor any year in between. Indeed, "models" are not really what's being evaluated, unless we incorporate the dynamic of organizational capacity and management strategy into the definition. What's really being tested is the ability of an organization to address all the factors that determine student performance in the schools where it is engaged.
Models are not "pills" taken by schools, they are a set of routines designed to maximize outcomes - from the very first identification of a potential partner school, to the very last moment before the provider interacts with teachers and students about to take the high stakes test. They include a provider's relationships with everyone from the superintendent to parents. Evaluations simply do not account for all these factors and their role in student success. The typical study treats all school engagements as equivalent, when we know that every implementation has its own strengths and weaknesses. Today's evaluators are looking for go/no go answers to the "what works?" question, when what we really need to know are the circumstances in which the program works well or not at all.
Negotiating the final report. Finally, the final report of every third party review involves some degree of negotiation between the sponsor and the evaluator. This negotiation involves many parties in each organization. There are judgment calls on method and analysis, report structure, and even word choice. Some issues are clearly legitimate, some are obviously improper, most lie in a vague ground in between.
My experience suggests a fairly clear negotiation calculus. If the researchers want more work from the client, they tend to go easy - if that's what the sponsor wants. If they don't need the business, they call the analysis as they see it, but the more mature project leaders will accommodate wording of the bottom line to put the program in the best reasonable light given the analysis. (If they are fed up with the sponsor, the developer will tend to lose the benefit of the doubt.) RAND's reputation for objectivity is far more important to the organization's viability than any one client's happiness - even the Defense Department finds it hard to rein them in. This is true of all the major evaluators - Westat, Mathematica, SRI, AIR, etc.
In this respect, consider RAND's studies of New American Schools design teams, Edison and NCLB's SES program sponsored by NAS, Edison and the Department of Education respectively. A careful review of each yields reasons to give potential buyer/users of those programs some pause. Yet each was discussed in the press with a more or less positive initial spin (and to the provider's folks in external relations, the initial spin is the one that counts). Based on the the numbers, each report could just as easily been written in a somewhat disparaging tone and released with a modestly negative spin.
Smaller research organizations, whose financial sustainability depends on going from contract study to contract study, have a much harder time resisting sponsor pressure. They too have to protect some reputation for objectivity, but their narrow margin of viability also requires their managers to be more "flexible." The price of real independence can be very high, and word gets around.
More important, evaluation sponsors understandably tend to pick research organizations they believe they can "work with" up front. Post-hoc debates are actually pretty rare. And to most readers a positive study is a positive study, whether it comes from RAND or a firm consisting of two people with doctorates.
Whatever the reason an evaluator accommodates a sponsor, the "promising" label now means much less than it might as an assessment of efficacy. It is generally possible to discover something positive in any evaluation, find hope therein, and latch onto it as a basis for judgment.
What to rely on. Research and evaluation into program efficacy remains vitally important if we are to move school improvement products, services and programs from the era of snake oil to something closer to medicine. But for the next decade at least, school improvement programs cannot be simply purchased "off the shelf." They need to be fit to the circumstances with some care. Buyers will have to put the burden of proof on sellers in the purchasing process and then do the due diligence required before making a selection.
In my view, given the vast uncertainties of program evaluation, I would rely less on intermittent third party reviews, and look more to the capacity the developer has for data driven decision making to support program improvement. My advice to investors, philanthropists and educators (start here) has always been to bet on providers who build program evaluation into their product.
Some relevant podcasts: