Distinguishing Statistical and Substantive Significance in Studies of Online Learning (Opinion)

Save to favorites
Print

Email Facebook LinkedIn Twitter

Copy URL

Justin Reich

Assistant Professor of Digital Media and Director of the Teaching Systems Lab, MIT

Here’s a short version of this post (spoiler! I’m making two intentional rhetorical missteps):

Last week, the Columbia University Community College Research Center (CCRC) released a working paper “Adaptability to Online Learning: Differences Across Types of Students and Academic Subject Areas. The paper examines nearly 500,000 courses taken by 40,000 students over a 4 year period, and offers two pieces of analysis. First, it presents data that face to face (F2F) courses have a higher persistence rate and higher average GPA than online courses. Second, it presents data that these differences are significantly greater between certain kinds of subgroups: that male students, students with weaker academic records, and minority students in online courses have lower completion and GPA rates than their counterparts.

From these data, the authors make the argument that online learning may accelerate gaps of concern that already exist between these sub-groups.

As regular readers of this blog now, this is the sort of thing that keeps me up at night: that online courses, MOOCs, and digital learning broadly will have a few Horatio Alger stories of impoverished children in far off places getting an MIT degree through a 56kbit modem while the vast majority of MOOC completers will be people who already have degrees and the kind of steady income and comfortable lifestyle that would let you pursue college courses forever.

Most research studies, like this one, don’t profoundly change our thinking. The enterprise of research is one where evidence is meticulously gatheredd in small units and weighed against other research. This particular piece tips the scales slightly in favor of those arguing that online learning can widen rather than narrow opportunity gaps.

But here’s a longer version that highlights some real problems with the summary I just wrote:

In the section above, I’ve made two rhetorical moves very common in the press, that shape the way this study is perceived. I think these common rhetorical moves are detrimental to public understanding.

What I’ve done in the above summary is to give you a selective perspective on the findings. Among statisticians, it’s common to think of findings as having two qualities: direction and magnitude. That is, when, we evaluate the relationship between two things (like gender and GPA), we can talk separately about direction (being female is positively correlated with GPA, being male is negatively correlated with GPA) without talking about magnitude (to what extent is gender correlated with GPA?). If I tell you that being female is correlated with higher GPA , that could mean an average difference of 0.0001 points, or of 1.5 points. The first is probably policy-irrelevant, and the second is a major policy concern. Direction of a finding typically means little without an understanding of magnitude, especially when thinking about costs and benefits in policy.

And one major, major potential point of confusion has to do with the word “significant” as it’s used by statisticians. In everyday life, “significant” means “important.” It conveys a sense of magnitude. A “significant problem” is a big problem.

But, to my eternal chagrin, for statisticians, significant refers to direction not magnitude. A statistically significant difference is not necessarily a substantively significant difference (in this sentence, “statistically significant” refers to direction and “substantively significant refers to magnitude”). I said, for instance, in my above summary, “these differences are significantly greater between certain kinds of subgroups: that male students, students with weaker academic records, and minority students in online courses have lower completion and GPA rates than their counterparts.”

It is true that students with weaker academic backgrounds have significantly lower completion rates and GPAs than students with better academic backgrounds. Statistically, this is true. Statistical tests show us with reasonable conclusiveness that the direction of the relationship between weak academic backgrounds and online course GPA is negative. But what of the magnitude? Students who have ever taken a remedial course (a proxy for academically weak background) have, on average, a 0.02 lower GPA in online courses than students who never took a remedial course. Their persistence, on average, is ½ of 1 percentage point lower.

From a policy perspective, are these differences important? They aren’t stirring. These differences are statistically significant. In my view, they are not really substantively significant. Some of the differences among subgroups in the paper are larger, but none are particularly large. It’s actually quite hard to tell how large because the paper doesn’t tie the findings to real world outcomes. So students in online courses have slightly lower GPAs... how important are community college GPAs to real world outcomes? What size gaps are important? The authors don’t place the gaps they find in a substantive context.

Does this one study push the scale in favor of concerns about online learning and equity? Absolutely. How much does it push the scale? In my view, a small amount. And the reporting about the study (like in Chronicle of Higher Ed or TechCrunch), by focusing on statistical significance rather than substantive significance, and without digging into the details of the findings, obscures this point. The uncommon way that statisticians use “significant” is at the heart of the problem.

The second problem with my summary is how I’ve subtly shifted the domain of the conversation. The study is about community college students taking traditional distance education courses. I then made a comment about how I’m alarmed about the potential problem with MOOCs. This isn’t really a responsible rhetorical move. This study about community college students doesn’t generalize very well to the experience of MOOC takers; the domains are too different to naively apply findings from one domain to the other (in this case). You can see this problem magnified in reverse sort of way in a recent New York Times Op-ed which cites the high attrition rates from MOOCs (where completion rates are often 10%) and then switches to raising concerns about community colleges in this study (where, on average, F2F persistence is 94% and online persistence is only 3% less, at 91% completing). I tip my hat to Russel Poulin at WCET Learn for drawing my attention to this problem with the Times Op-ed. When making these kinds of shift in ground, authors need to be responsible about explaining the intellectual leaps that they are making.

I think the CCRC authors have published a great study. I think it’s a nice contribution. I think they insufficiently qualify their findings. I think journalists reporting the study are echoing their insufficiently qualified findings. I think the public is not served well by these patterns, even as I think that the argument that the CCRC authors are making is tremendously important.

Lessons: 1) understand how statisticians and researchers use the word significant to mean “demonstrably different” rather than “substantively, importantly different. 2) Be careful as people use the findings from one domain to make claims about another.

For regular updates, follow me on Twitter at @bjfr and for my papers, presentations and so forth, visit EdTechResearcher.

The opinions expressed in EdTech Researcher are strictly those of the author(s) and do not reflect the opinions or endorsement of Editorial Projects in Education, or any of its publications.

Distinguishing Statistical and Substantive Significance in Studies of Online Learning

Sign Up for EdWeek Update