Big Data MOOC Research Breakthrough: Learning Activities Lead to Achievement
In a shocking series of studies, learning environments that are capable of tracking trillions of learner actions down to the millisecond have led to a breakthrough in education research: being active in a learning environment may be a powerful predictor of doing well in that learning environment. That is to say: students who do stuff also do more stuff, and they do stuff better.
This powerful insight, that doing stuff may very well be one of the reasons why people do more stuff or do good at stuff, is starting to come into focus after a series of studies looking at some of the largest platforms in online learning, such as Khan Academy, Google Course Builder, Udacity, and edX.
One of the promises of big data in education has been that online learning platforms that collect realtime learner actions on thousands or millions of learners will lead to major advances in teaching and learning research. edX president Anant Agarwal has called edX a particle accelerator for learning, so let's see what happens when we start smashing students together at extremely high speeds.
First, let's look at the recent study from Khan Academy's implementation in 20 schools, conducted by SRI International and funded by the Bill and Melinda Gates Foundation. SRI spent two years studying KA in schools from 2011-2013, and their methods were diverse; in their words: "To collect information about how Khan Academy was being used and its potential benefits, SRI researchers visited schools, districts, and CMOs; made classroom observations; interviewed organization and school leaders as well as teachers, parents, and students; conducted teacher and student surveys; and analyzed students' user log files over the school year." Huge study, multiple methods, but let's focus on the big data- the logs.
Khan Academy's log files contain detailed data about student behavioral activity: how long they watched a video, how many times they solved a problem, and on and on. SRI researchers took that vast trove of data, and they summarized it in one statistic: the number of minutes a students logged in Khan Academy.
This is a move we should expect to see quite often in forthcoming research: where researchers take big data and make it small data. It's actually quite complicated to look at all those log files and user activity pathways, so researchers take terrabytes of data, squash together all that nuance and complexity, and summarize big data into a single number.
And the big finding here was this: there is a correlation between the number of minutes a student spends on Khan Academy and test scores. Here's the key table, notice that students who perform better than expected spent more minutes on KA:
The SRI study is careful to note that this isn't casual evidence. It could be that kids who spend more time on KA also listen more in class. It could be that kids who do well in math like KA more, and spend more time on it. But here is piece one of our puzzle: kids who do more math stuff on Khan Academy, do better on math tests.
Udacity and San Jose State University
Khan Academy sets the stage, but let's move into higher education. In the summer of 2013, San Jose State University partnered with Udacity to offer a series of remedial and introductory courses online. This fall, they presented some preliminary findings in a research report. Once again, Udacity collects detailed, realtime, clickstream data showing every action of every student on the platform. Big data. Huge data. So once again, researchers compressed that rich detailed data into very simple summary statistics, like the number of problems completed and the minutes of video watched. And, once again, findings are provocative. It turns out, students who did stuff were more likely to pass the class. In convoluted, bold-face researcher speak:
The primary conclusion from the model, in terms of importance to passing the course, is that measures of student effort eclipse all other variables examined in the study, including demographic descriptions of the students, course subject matter and student use of support services. Although support services may be important, they are overshadowed in the current models by students' degree of effort devoted to their courses. This overall finding may indicate that accountable activity by students--problem sets for example--may be a key ingredient of student success in this environment.
The researchers here are cautious. Again, this isn't a causal study, like a randomized control trial. But the evidence suggest that accountable activity--e.g. effort, e.g. doing stuff--*may* be a key ingredient of student success, e.g. passing a class.
Here, the data is presented in graphical form. In the first figure we see that the probability of passing a course strongly correlates with the doing problems. Look at those lines go up.
In the second figure, we see that hours of watching video is also correlated with passing the course, especially in this one statistics class.
Once again, we see that by taking big data, reducing it to simple summary statistics, and creating basic statistical models, we find evidence that effort predicts achievement, that doing stuff may be correlated with doing good at that same stuff.
Google Course Builder
Next we turn to data from Google's Mapping with Google course. Google has one of the largest teams of the best data scientists anywhere in the world, so I think it would be fair for readers to enter this section with some real excitement about what might unfold.
A team of three researchers from Google recently submitted a paper to the Learning@Scale conference, and some of their findings build upon this growing foundation of evidence that doing stuff leads to doing stuff. Here, Google researchers argue that "our research indicates that learners who complete activities are more likely to complete the course than peers who completed no activities" (h/t to Hack Education)
To defend this controversial and still unproven position, the researchers compared students who did and did not do activities during the class with those who completed the final project. Again, Google has realtime analytics on every action taken by every student in the course and one of the world's largest team of analysts, and they've chosen to reduce this massive trove of data into simple summary statistics. In fact, they get special commendation for reducing things to dichotomous variables like whether or not a student did something, rather than more complex variables like how many times they did something or how well they did.
The evidence here, presented in both a table and a bar chart for added emphasis, shows powerful evidence that students who did the activities during class were much more likely to do the final project than those who did not do the activities.
The table shows this. The bar chart shows this. So it might be fair to consider that this study provides twice as much evidence as the previous ones that doing stuff appears to be correlated with doing more stuff.
My Own Contribution to the Do Stuff and Do Stuff Theory
So that's evidence from Khan, Google, and Udacity, but now it's time to step into the particle accelerator itself: edX. I personally have conducted several studies on HarvardX courses, and I'm ready to add my bricks to the pile of evidence thus far presented. By now, the script should be totally familiar. We have gigabytes of log data on student activity within our courses, and with MITx we have a team of three post-docs working on this research, with access to leading computer science and education faculty to support us.
As with the previous cases, in the reports I lead-authored (and I'm going to start using the first-person singular here so as to not implicate my colleagues in this nonsense), my first move was to reduce this incredibly rich and nuanced data into a single measure of student effort. And what better way to summarize clickstream data than to simply count the number of "clicks" or events in the logs for each student. I then don't even bother with models or correlations, I just present histograms of event counts for those who do and don't pass the course. I'm sure by this point, you can predict the findings. In multiple diferent courses, I found that people who passed a course and earned a certificate, on average, clicked on things more than those who didn't. Here are the figures from Justice and Heroes:
As my colleagues and I wrote: "In general, however, those who took many actions in the course were more likely to earn a certificate than those who took few actions, a simplistic insight that echoes findings from other early studies of lecture-based online courses."
So Harvard research confirms: people who pass courses do more stuff than people who don't.
Reich's Law of Doing Stuff
At this point, the patterns from so many courses, in so many disciplines--humanities, science, math, philosophy--all pointing in the same direction can lead to only one conclusion. Thus,I'm ready to stake my reputation on proposing a new scientific law for learning research.I propose Reich's Law of Doing Stuff: students who do stuff in a MOOC or other online learning environment will, on average, do more stuff than those who don't do stuff, and students who do stuff will perform better on stuff than those who don't do stuff. The implication for practice couldn't be more clear. When we create online courses, if we want students to learn, we need to motivate students to actually do the stuff in those courses, rather than getting bored and leaving to watch cat videos (unless the MOOC in question is something like, The Semiotics of Viral Cat Videos).
In fact, I'd suggest that this law is so well-established, that perhaps we don't need additional studies published that demonstrate this point. Perhaps this could be a corollary to Reich's law: that if a study of online learning simply shows that activity predicts further activity or effort, maybe we need not spend time writing up that study.
Two questions for further research are these: with all of the extraordinary data resources that are available to MOOC researchers, why are so many folks shrinking big data into small data? And why does it seem that one major strand of research appears to be confirming the obvious? I'll save that discussion for a future post.