« skoolboy Goes to the Olympics, IV: Differences across Schools | Main | eduwonkette Unmasked »

Yes, Beltway Wonks, Sampling Error Does Matter

It's in vogue these days to declare the building blocks of statistical inference irrelevant to assessing the performance of schools. For example, Joel Klein recently argued that statistical significance is "a game." Yesterday, Kevin Carey argued that accounting for sampling error - the idea that there is statistical uncertainty in measures from a sample rather than the full population - in the context of NCLB is "silly" because "unlike opinion polls, NCLB doesn't test a sample of students. It tests all students. The only way states can even justify using [margin of errors] in the first place is with the strange assertion that the entire population of a school is a sample, of some larger universe of imaginary children who could have taken the test, theoretically."

Dan Koretz, Harvard psychologist and author of Measuring Up: What Educational Testing Really Tells Us, provides a very clear explanation of why Carey is wrong:
A few readers might be wondering: if all students in a school (or at least nearly all) are being tested, where does sampling error come into play? After all, in the case of polls, sampling error arises because one has in hand the responses of only a small percentage of the people who will actually vote. This is not the case with most testing programs, which ideally test almost all students in a grade.

This question was a matter of debate among members of the profession only a few years ago, but it is now generally agreed that sampling error is indeed a problem even if every student is tested. The reason is the nature of the inference based on scores. If the inference pertaining to each school...were about the particular students in that school at that time, sampling error would not be an issue, because almost all of them were tested. That is, sampling would not be a concern if people were using scores to reach conclusions such as "the fourth-graders who happened to be in this school in 2000 scored higher than the particular group of students who happened to be enrolled in 1999." In practice, however, users of scores rarely care about this. Rather, they are interested in conclusions about the performance of schools. For the inferences, each successive cohort of students enrolling in the school is just another small sample of the students who might possibly enroll, just as the people interviewed for one poll are a small sample of those who might have been. (p. 170)
Addressing complexities like sampling error is not just exploiting a "loophole" to avoid NCLB sanctions. Rather, it's an assurance that when we label a school as "in need of improvement," we're not wrongly assigning that label. It strikes me as deeply ironic that even as NCLB endorses "scientifically-based" research, many wonks continue to turn their noses up at the central conventions of the science of statistics.

The issue of students who might potentially enroll in a school to which Koretz alludes is especially important in the context of school choice plans, which open school doors to populations that are larger and potentially more diverse than the particular group of students currently attending a school.

It would probably come as a surprise to these folks that cosmologists try to account for the uncertainty in their results that comes from having only one universe to observe.

When the folks on the National Technical Advisory Council (see below) have their first meeting, they may want to start with a review of basic statistics. I wonder who among this brain trust would agree with Carey's interpretation...

PRESS RELEASES August 13, 2008
U.S. Education Secretary Appoints 16-Member Council to Advise on State Standards, Assessments and Accountability Systems U.S. Secretary of Education Margaret Spellings today announced the appointment of 16 members to the National Technical Advisory Council (NTAC), which Spellings announced as part of the proposed regulations to strengthen No Child Left Behind. The Council's purpose is to advise the Department on complex and technical issues regarding the design and implementation of state standards, assessments and accountability systems. The Council will offer expert advice on such things as the use and applicability of minimum subgroup sizes for proficiency calculations, confidence intervals and the principles necessary for ensuring that performance indexes are consistent with the Title I statute and regulations.

"The National Technical Advisory Council will play a vital role in ensuring that we address the technical needs of states and their accountability systems," Spellings said. "Their work will be invaluable as we move forward in strengthening and improving No Child Left Behind."

Tom Fisher, former Florida state director of testing, will chair the Council. Members will serve staggered terms, ranging from one to three years. All members are experts in assessment and accountability, and represent a range of backgrounds-from academicians and researchers to national, state and local policymakers. The Council will meet twice a year and additional meetings may be called at the request of the Secretary. Proceedings from meetings will be made available to the public. The first meeting will be held within the next few months.

Members of the Council are as follows: Tom Fisher, David Abrams, Anthony Alpert, Diane Browder, Wesley Bruce, Wayne Camara, Kevin Carey, Gregory Cizek, Carl Cohn, Denise Collier, Robert Costrell, Harold Doran, Margo Gottlieb, Suzanne Lane, Scott Marion, John Poggio.

I am not a real wonk when it comes to understanding statistics. I get where skoolboy is coming from with regard to the potential population who could attend the school (assuming that students bring some variability to the outcome of education). But, in terms of accountability--and the dreaded label of "in need of improvement" (which the press and others insist on calling "failing"), it would seem that there are lots of cushioning factors in place already that would tend to protect against suffering the effects of a false negative (false positive?--being wrongly labelled, in any case). While states differ somewhat, most have some built in protections such as safe-harbor provisions (which allows demonstration of a percentage improvement rather than meeting an absolute target), multiple-year averaging of scores, as well as requiring a consistent failure over time (three consistent years) to meet AYP goals in order to acquire a label, AND the famous n size requirements that ensure that many schools continue to escape accountability for small populations (like students with disabilities).

It seems also, that I recall, from the last statistics course I took, that there is some concern about the amount of risk attached to type 1 or type 2 errors (and I forget which is which). In other words--on which side would we prefer to err. Personally, I would prefer to err on the side of continuing to attend to the need for improvement. I don't see any particular advantage to erring on the side of overlooking schools or students who might do better.

Mark Twain said it best: There are lies, damn lies and then there are statistics.

There is another type of uncertainty involved in these tests -- each test is a sample, not a population, of the ideas the teachers presented over course of the year, and a student's performance represents a sample of his/her total knowledge. Does anybody really need to point out that the tests do an imperfect job of measuring students' knowledge?

@ Margo/Mom -- the situation you describe - rejecting a null hypothesis when it is correct - is a Type I error. (The null hypothesis is the "no change" hypothesis, so the null hypothesis would be that a school is doing fine.) I can think of some good reasons for not leaning to far to the side of this error - otherwise, we should just have the government take over all the schools today just in case any of them are messing up.

diarykid--forgive me for being old. I grew up before government was a dirty word. It has been my understanding that the government is in fact responsible for providing public education.

Comments are now closed for this post.


Recent Comments

  • Margo/Mom: diarykid--forgive me for being old. I grew up before government read more
  • diarykid: There is another type of uncertainty involved in these tests read more
  • pissedoffteacher: Mark Twain said it best: There are lies, damn lies read more
  • Margo/Mom: I am not a real wonk when it comes to read more
  • Stat101: When the folks on the National Technical Advisory Council (see read more




Technorati search

» Blogs that link here


8th grade retention
Fordham Foundation
The New Teacher Project
Tim Daly
absent teacher reserve
absent teacher reserve

accountability in Texas
accountability systems in education
achievement gap
achievement gap in New York City
acting white
AERA annual meetings
AERA conference
Alexander Russo
Algebra II
American Association of University Women
American Education Research Associatio
American Education Research Association
American Educational Research Journal
American Federation of Teachers
Andrew Ho
Art Siebens
Baltimore City Public Schools
Barack Obama
Bill Ayers
black-white achievement gap
books on educational research
boy crisis
brain-based education
Brian Jacob
bubble kids
Building on the Basics
Cambridge Education
carnival of education
Caroline Hoxby
Caroline Hoxby charter schools
cell phone plan
charter schools
Checker Finn
Chicago shooting
Chicago violence
Chris Cerf
class size
Coby Loup
college access
cool people you should know
credit recovery
curriculum narrowing
Dan Willingham
data driven
data-driven decision making
data-driven decision-making
David Cantor
Dean Millot
demographics of schoolchildren
Department of Assessment and Accountability
Department of Education budget
Diplomas Count
disadvantages of elite education
do schools matter
Doug Ready
Doug Staiger
dropout factories
dropout rate
education books
education policy
education policy thinktanks
educational equity
educational research
educational triage
effects of neighborhoods on education
effects of No Child Left Behind
effects of schools
effects of Teach for America
elite education
Everyday Antiracism
excessed teachers
exit exams
experienced teachers
Fordham and Ogbu
Fordham Foundation
Frederick Douglass High School
Gates Foundation
gender and education
gender and math
gender and science and mathematics
gifted and talented
gifted and talented admissions
gifted and talented program
gifted and talented programs in New York City
girls and math
good schools
graduate student union
graduation rate
graduation rates
guns in Chicago
health benefits for teachers
High Achievers
high school
high school dropouts
high school exit exams
high school graduates
high school graduation rate
high-stakes testing
high-stakes tests and science
higher ed
higher education
highly effective teachers
Houston Independent School District
how to choose a school
incentives in education
Institute for Education Sciences
is teaching a profession?
is the No Child Left Behind Act working
Jay Greene
Jim Liebman
Joel Klein
John Merrow
Jonah Rockoff
Kevin Carey
KIPP and boys
KIPP and gender
Lake Woebegon
Lars Lefgren
leaving teaching
Leonard Sax
Liam Julian

Marcus Winters
math achievement for girls
meaning of high school diploma
Mica Pollock
Michael Bloomberg
Michelle Rhee
Michelle Rhee teacher contract
Mike Bloomberg
Mike Klonsky
Mike Petrilli
narrowing the curriculum
National Center for Education Statistics Condition of Education
new teachers
New York City
New York City bonuses for principals
New York City budget
New York City budget cuts
New York City Budget cuts
New York City Department of Education
New York City Department of Education Truth Squad
New York City ELA and Math Results 2008
New York City gifted and talented
New York City Progress Report
New York City Quality Review
New York City school budget cuts
New York City school closing
New York City schools
New York City small schools
New York City social promotion
New York City teacher experiment
New York City teacher salaries
New York City teacher tenure
New York City Test scores 2008
New York City value-added
New York State ELA and Math 2008
New York State ELA and Math Results 2008
New York State ELA and Math Scores 2008
New York State ELA Exam
New York state ELA test
New York State Test scores
No Child Left Behind
No Child Left Behind Act
passing rates
picking a school
press office
principal bonuses
proficiency scores
push outs
qualitative educational research
qualitative research in education
quitting teaching
race and education
racial segregation in schools
Randall Reback
Randi Weingarten
Randy Reback
recovering credits in high school
Rick Hess
Robert Balfanz
Robert Pondiscio
Roland Fryer
Russ Whitehurst
Sarah Reckhow
school budget cuts in New York City
school choice
school effects
school integration
single sex education
small schools
small schools in New York City
social justice teaching
Sol Stern
Stefanie DeLuca
stereotype threat
talented and gifted
talking about race
talking about race in schools
Teach for America
teacher effectiveness
teacher effects
teacher quailty
teacher quality
teacher tenure
teachers and obesity
Teachers College
teachers versus doctors
teaching as career
teaching for social justice
teaching profession
test score inflation
test scores
test scores in New York City
testing and accountability
Texas accountability
The No Child Left Behind Act
The Persistence of Teacher-Induced Learning Gains
thinktanks in educational research
Thomas B. Fordham Foundation
Tom Kane
University of Iowa
Urban Institute study of Teach for America
Urban Institute Teach for America
value-added assessment
Wendy Kopp
women and graduate school science and engineering
women and science
women in math and science
Woodrow Wilson High School