Liberating Big Data

Apparently standards for certifications in aircraft maintenance have not been updated recently: some of them still refer to wooden airframes. That was one of the things I learned at one of the Department of Education's DC to VC events (hosted by University Ventures).

That is one image of government technology: the wood-framed, canvas-covered biplane in the age of--but wait! My analogy is breaking apart and crashing! Mayday! Mayday! Doesn't the government also work with defense contractors to build the world's speediest, most advanced airplanes and spacecraft?

The biggest technological innovations of the twentieth century came out of intimate collaborations between government agencies, universities, and private enterprise. Together, DARPA (the Defense Advanced Research Projects Agency), universities including UCLA, the University of Michigan, and MIT, and commercial entities like Bell Laboratories, SRI, Rand, and BBN Technologies developed the internet (an early instantiation was called ARPANET).

Another interesting example of the government helping to create a comercially valuable platform is the global positioning system, which is based on work by two Johns Hopkins physicists but was created and is still run by the Department of Defence. At the DC to VC event I attended, Todd Park, Chief Technology Officer of the United States, cited an estimate that use of the global positioning system has added an annual $100 billion to our GDP. This figure may not be entirely trustworthy, since it comes from a consultant hired to assess the potential economic damage that would result from interference with the global positioning system by the LightSquared network. But there is no question that GPS technology is generating all sorts of unexpected benefits.

Todd Park's mission as CTO of the United States is to help create more data platforms of similar fruitfulness. Todd was co-founder of healthcare technology innovators Athena Health and Castlight, and he spends a lot of time thinking about how to open up the government's trove of healthcare statistics for new uses. But Todd also wants to do the same for the government's hoards of education data. This turns out to be trickier than one might think.

Todd and the Office of Educational Technology are pretty sophisticated about privacy and preventing the leakage of personally identifiable information. And they understand how to create rich, useful application programming interfaces (APIs). But the underlying data itself is extremely problematic. For example, as Daniel Pianko of University Ventures pointed out, there is no consensus on even such basic topics as what qualifies as retention or degree completion, and there is a much higher degree of debate over how to measure job placements and educational outcomes. Perhaps opening up more of the government's data will lead to an improvement in the specificity and usefulness of that data. And if enough useful tools and services are created, perhaps the Department of Education and other institutions will be motivated to gather more data points.

One of the most interesting educational data initiatives is the White House Office of Science and Technology Policy and the Office of Educational Technology's MyData, which will give college students access to their own data in a machine-readable format, including not only student loan and financial aid information but also, at least with one vendor, information from a student information system. The idea is that students will own their data in a permanent and portable form. This initiative may or may not get traction in the next few years. But it is certainly worth trying, and I am grateful for the energy and enthusiasm the government has brought to it.

