Science as a Search Function

The ideal combination of the internet and science is to put data (lots of data) into a easily/readily searched database. The hypothesis becomes a search: Want to know what people are prone to which disease? Want to know what genetic backgrounds compell certain behaviors? Run a search. That's the ideal - and though this idea has been hashed about for a few years, we're a long way from there now. There is no such database out there, an ongoing research project that combines enough data and enough flexibility in interogation and enough variety in populations that it might adequately reflect a broad enough pool of data to be meaningfully searched.

Until, perhaps, today. Or tomorrow.

On Thursday, 23andMe will announce a project with the Michael J. Fox Foundation and the Parkinsons Institute and Clinical Center that will enroll up to 10,000 people with Parkinson's Disease, get them genotyped using 23andMe platform, and then issue them a survey to fill in the phenotypic information - the physical "end result" correlation to our genotypic information - to make for a rich and fruitful database. This database will then be ripe for interrogation, and with members' permission, it will be searched and probed for possible correlations and implications. 10,000 is a big number for Parkinson's Disease, and such a large population will afford, it's hoped, a rich enough database to afford meaningful implications. (Linda Avey, 23andMe cofounder, describes the effort here).

The idea is to create a genetic database around one disease - Parkinson's - that is rich and robust and big enough to answer lots of questions that get into minutae: If this isn't a "genetic disease" (ie, typically brought on purely by genetic cause), then what are the fractions of effects caused by our genomes? It's a complicated question, but if we're really going to get to an understanding and application of predictive medicine, then we need to understand all the possible inputs that have ramifactions into a multitude of outputs.

This is, in effect, the thing that's been talked about for about 5 years now: Googling our Genomes. And it's not surprising that behind this landmark endeavor is Google, or at least its co-founder, Sergey Brin. Brin, who has said he has a genetic risk for Parkinson's, has funded the study and promised to give 10,000 people their genomic data for just $25 each (the 23andMe service is regularly about $400). So that's at least a $4 million commitment, at least at the going rate of genotyping - and that's not counting the backend research that will take place. Brin will also be a participant in the study.

I don't doubt that many folks will be skeptical of this endeavor (Valleywag and Steve Murphy, start the race  to be the first to squat on this with some snide remark, starting...now!) And sure, I acknowledge that 23andMe is good at publicity. But I think there's a fundamental shift in science here that should not go unnoted.

This is an endeavor to create an apparatus for answers; answers not just to the questions we have today, but the questions science will have or 10 years from now. This is, indeed, science by search. And I'm thrilled that somebody's actually trying to make it happen. And here's what's especially cool: 23andMe co-founders Ann Wojcicki and Linda Avey know this is what they're doing. They talk about this as a "research platform," and they have a host of diseases that they want to hit next: Autism, Alzheimer's, and so on. Expect more this year.

Again, this isn't a research study - this is a new way to study research. Make the costs of research recruitment (finding populations to study, recruiting individuals, taking samples, and repeating every time you have a new question to ask) a backend function, and prioritize the questions, not the apparatus. It's how to do science in the petabyte age.

I'll add some more tomorrow, including some remarks from Wojcicki and Avey.