What Would You Do With Years of Online Discussion Data?
Back in April, after my AERA talks on Are Great Wikis Born or Made? (born) and Do People Actually Collaborate on Wikis? (not much), Emily Schneider invited me out to coffee. She's a first year doctoral student at Stanford—the hope of the profession!—and she has some cool research plans ahead about the role of technology and learning in for profit colleges.
Emily and I were talking about one project she has coming up. She has access to a huge dataset from a for-profit college which includes student outcomes (graduate rates, annual re-enrollment, course completion), student demographic information, and transcripts from online discussion boards.
So we started brainstorming: what could you do with the online transcripts that could teach you something about improving outcomes? How would you go about identifying practices in online learning environments that predicted better outcomes for students? And if you found those practices, could you understand them with enough granularity to make actionable suggestions for educators?
On the one hand, it seems like such an incredible treasure trove of data. On the other hand, it may be that college persistence has much less to do with what happens online and much more to do with all of the other factors in a young persons life.
But if I had to make a bet on what would make a difference (and force Emily to spend the next three years of her life figuring out if I'm right), I'd bet on "social presence." Garrison and Anderson theorize three important kinds of "presences" in online discussions: academic presence, teaching presence, and social presence. My hunch is that if any academic factors predict a young person's persistence in an online degree program, it is the degree to which he or she feels connected to the other learners.
How would one study that? Use the research on social presence to design an observation rubric to measure it in online forums. Take a random sample of discussion boards and rate them on a scale based on their social presence. Use that coding to train a natural language processing program (basically, an automated essay score predictor) to code the rest of the discussion boards. Then, set up a longitudinal model, where you track students and see if taking a class with high social presence predicts course completion, annual re-enrollment, and degree completion, controlling for student demographics.
If that works, and you can do it cheaply, try Garrison and Anderson's other two presences and see what you find!
What ideas do you have? Leave them in the comments for Emily!