Pathogens, Pathogens, and More Pathogens!

In more database news, a neat new project out of the University of Maryland was announced in Plos Computational Biology today. Called Insignia, it is, in their own words:

a project to create and store information about a broad range of microbial pathogens, focusing primarily on bacteria and viruses. There are two major components to the system:

-A computational pipeline to generate unique DNA signatures for any and all pathogens in our database -An integrated database containing genome sequences, phenotypic information, and comparative genomic analysis of all pathogens

They've got over over 3,000 organisms in the database (though it's not clear whether there are full sequences for all of these). I'm not sure on the face of it how this is different than the National Microbial Pathogen Database Resource, which NIAID set up in 2004. Both seem to be positioned as potentially one-stop-shops for the new generation of diagnostics, tools that will be able to quickly scan samples looking for snippets of a sequence for a rapid diagnosis.

I've pinged the lead researcher with a couple questions, and will follow up when I hear from him.

UPDATE: So here's what Adam Phillippy from the Univ. of Maryland's Center for Bioinfomatics & Computational Biology (rolls off the tongue...) adds:

Hi Thomas,

There are many great resources out there for hosting pathogen metadata, but Insignia is the only one I am aware of dedicated to signature design. NMPDR offers a “signature genes” service that appears similar to Insignia, but only identifies unique genes and is not as well suited for assay design. For instance, Insignia can design probes for differentiating between two nearly identical strains of bacteria, whereas the NMPDR system would be unable to do so. Like you mentioned in your blog, we aim to be a one-stop-shop for diagnostic assay design. An assay designer could draw sequences from our database and use them directly in their diagnostic. That is exactly how we designed our Vibrio cholerae assays, and they proved to be quite accurate.

Insignia is definitely a work in progress. We are only a year into what is planned to be a three year development effort. We do encourage others to contribute sequences to our database, and to the entire project as well. We freely release all of our code and experiments.

Thanks Adam.

As I said, very cool stuff.