The new issue of Wired has an essay I wrote on a topic I've been mulling for a few months: liberating what I call "dark data" in science, the unpublished, inconclusive, or inconvenient results that too many researchers would rather stick in a drawer than put into the light so that others may learn from their work (and perhaps build on it). Here's the gist:
In this data-intensive age, the apparent dead ends could be more important than the breakthroughs. After all, some of today's most compelling research efforts aren't one-off studies that eke out statistically significant results, they're meta-studies â€” studies of studies â€” that crunch data from dozens of sources, producing results that are much more likely to be true. What's more, your dead end may be another scientist's missing link, the elusive chunk of data they needed. Freeing up dark data could represent one of the biggest boons to research in decades, fueling advances in genetics, neuroscience, and biotech.
The essay mentions one cool project by Google they're calling (internally) the Palimpsest project, where they offer to store and distribute massive data sets - like in the petabytes - from scientists.
I've been mulling over this "free negative results" theme for a couple years, but this summer I had two seperate conversations with Pat Brown and Michael Eisen, two of the three founders of PLoS. Both of them grokked the idea immediately - turns out it's the impulse that led to the creation of PLoS ('open access' being just one step along the way. But since the more ambitious, less practical notion of freeing all dark data hasn't really been stated as an explicit call to arms, I figured I may as well try to give it a shot.