Using Twitter to track flu outbreaks
When public health officials track the outbreak of a virus, like H1N1, it takes time to get the story right. They have to collect and assemble data from institutions scattered across the country, a process that can be, well, slow.
For instance, at the CDC’s FluView website, you can see statistics for influenza trends across the country. But today’s “weekly influenza report” was assembled with data from the week ending 7 May 2011. Or put another way, the latest information is already 11 days old.
It seems crazy that sometimes the information we desperately need is the most difficult to get, but it’s all too often true. You can up-to-the-minute details on the location of your neighborhood’s taco truck, but if you want flu data, you’ll have to wait about 2 weeks.
The difference, of course, is that the food trucks have wholeheartedly embraced social media, which has quickened the pace of information flow. And as more and more people are using services like Twitter – which in 2010 was growing at a rate of 300,000 users each day – a savvy group of researchers from the University of Iowa wondered: if people are using Twitter to catalogue the minutia of their lives, could the tweets be analyzed to better track outbreaks of the flu?
Starting in April 2009, the research team led by Philip Polgreen, an assistant professor in the Department of Internal Medicine, starting logging tweets from users living in the US, and combed thru the data, filtering for certain keywords, like flu, swine, influenza, vaccine, H1N1, Tamiflu, etc.
The first thing they noticed was that the general Twitter chatter about H1N1 peaked before the outbreak surfaced (check out the figure above). The red line represents the percentage of tweets talking about the flu or flu-like illness, while the green line shows the number of confirmed or probable cases. Whether this reflects an ability of social media to "predict" an outbreak remains unclear. But one thing's certain: people were aware of the storm that was brewing.
According to the study, in early May 2009, the CDC released targeted messaging to consumers about the importance of flu prevention. So when the team searched through the Twitter data for specific phrases like "mask" or "hand hygiene" they were able to gauge how prevention strategies were rippling through the virtual community. [Notice the two distinct peaks in Twitter traffic for "mask" (green line) and "hand hygiene" (red line) in the figure above.]
Seeing how Twitter chatter of certain keywords, however interesting it may be, doesn't do anything to address the larger problem, which is: How many people are infected with the flu at this very point in time? So the team devised a complicated statistical model to estimate the number of people infected with the flu based on their Twitter status. And surprisingly, when they compared their numbers (red line in figure below) to the count generated by the CDC (green line), they discovered the data were indistinguishable. However, they would have had the current estimates in hand a lot sooner than the CDC.
The authors acknowledge that their model needs to be validated by others. So consider this finding exactly what it is, a solid first step in a lengthy journey.
Citation: Signorini A, Segre AM, Polgreen PM, 2011 The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic. PLoS ONE 6(5): e19467. doi:10.1371/journal.pone.0019467
Brian Mossop is currently the Community Editor at Wired, where he works across the brand, both magazine and website, to build and maintain strong social communities. Brian received a BS in Electrical Engineering from Lafayette College, and a PhD in Biomedical Engineering from Duke University in 2006. His postdoctoral work was in neuroscience at UCSF and Genentech.
Brian has written about science for Wired, Scientific American, Slate, Scientific American MIND, and elsewhere. He primarily cover topics on neuroscience, development, behavior change, and health.