Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

What the Internet Can Tell Us About Flu Season

BY Miranda Neubauer | Friday, February 1 2013

In the past few weeks, if you had a stuffy nose, felt a fever coming on or were experiencing a bad headache, it is possible that you took to Google to look up information for "flu like symptoms." In fact, a recent Pew study found that 77 percent of Internet searchers in the U.S. start their online search for health information with a search engine. A review of Google Trends queries in the Health category for the past 90 days shows rises in terms like flu symptoms, pneumonia, bronchitis and RSV (Respiratory syncytial virus). But in entering those queries, Internet searchers can not only confirm for themselves whether they have the flu or not. They are also part of new kind of public health experiment that might become increasingly useful abroad, in countries where access to the Internet is improving but access to health care is slower to arrive.

Google Flu Trends data has provided good estimates in the past about how bad a flu season really was, although its model — which tracks search queries to guess if the user has the flu — was never meant to replace statistics from the Centers for Disease Control, which tracks the number of people who show up at their doctor's office to receive treatment for the flu. As of early January, the CDC was predicting a flu prevalence of just under five percent. Google was predicting a prevalence of 10 percent — sparking that worst-flu-season-ever talk that's got everyone so concerned. Writing for Slate, Will Oremus observes that Google's Flu Trends data isn't just often right — it's right well before the CDC data is ready to share. But through January so far, it looks like this is the year Google's algorithms are going to be a little off — and it's a little too soon to use search data to decide whether it's time to break out the hazmat suit.

Counting people who show up at the doctor because they're sick is pretty simple, even if it takes more time for the numbers to come in. Mining search data, on the other hand, can be complicated. In September 2009, as swine flu fears swept the world, Google happened to be changing the model it used to predict prevalence of flu. Evaluating the new model against the old one and against the CDC's methodology, Google found that the old way it had been doing things would not have correctly predicted the early days of the swine flu outbreak.

A research paper, authored by Google employees and a CDC employee, explained the reasons in detail. Evidence suggested people were seeking out medical care more readily, which may have affected the CDC's numbers. Meanwhile, they had also changed their search behavior. And Google's new flu-prediction model looked for fewer words related to complications from the flu, but more words related to symptoms. While the original model performed well during the second wave of that outbreak, the new one did well both times.

The paper notes that in the early stages of the swine flu, "the proportion of outpatient visits due to ILI captured in ILINet was slightly elevated (61%) compared with Wave 2 (43%), due to ill persons more readily seeking health care for relatively mild illness during the first weeks of pH1N1." The paper also notes that the researchers excluded a few weeks during that outbreak because of "tremendous media attention."

So "how bad is flu season" has different answers depending on how you count the numbers, and both models leave room for error. This adds uncertainty to the fear Slate's Will Oremus captured when he wrote:

"[T]he really ominous chart is the one that shows the trend line for the nation as a whole. It roughly agrees with the CDC that flu activity in December was about in line with the 'moderately severe' peak in 2007-2008. But if Google is right, the CDC's snapshot came just as the outbreak was gaining steam. Since mid-December, the trend line has rocketed past that of all previous years and now towers over that of the October 2009 H1N1 pinnacle, suggesting a CDC outpatient surveillance figure of an unprecedented 8.9 percent."

All of this has people asking another question: How accurate is Google Flu Trends? On Quora, MIT computer science graduate student Keith Winstein writes that rather than predicting an epidemic the CDC might not have seen coming, it looks more like Google's model is broken again:

At this point, it appears likely that Google Flu Trends has considerably overstated this year's flu activity in the U.S. But we won't be able to draw a firmer conclusion until after the flu season has ended. I don't know why the model broke down this year but am eager to learn, when and if Google comes to a similar conclusion. For now, I suspect this episode may provide a cautionary tale about the limits of inference from "big data" and the perils of overconfidence in a sophisticated and seemingly-omniscient statistical model.

Matt Mohebbi, a Google engineer who works on Google Flu, says that it is much too early to draw any conclusion about the accuracy of Google Flu and how it reflects this year's flu season. He notes that the CDC in general is one or two weeks behind Google with its data, there are additional delays at this time since the early season coincided with the holidays, and some reporting sites have been experiencing longer than normal delays, likely due to the large amount of cases. In addition, the CDC's data is also often adjusted retroactively.

Mohebbi added, though, that data from New York City through January 15, which he said has one of the best electronic surveillance systems, was reporting about about a four to five percent increase in cases over the number in September, which he said corresponded with what Google was seeing.

And Google Flu Trends was supposed to be complementary to the CDC's data, Mohebbi said, not a predictive tool to replace it.

But some research indicates that Google Flu Trends could make a significant contribution to epidemic detection. Umair Saif is a Pakistani computer scientist who is an associate professor at the Lahore University of Management Sciences and has also taught at the Computer Science and Artificial Intelligence Laboratory at MIT. Saif leads the Dritte initiative, which is focused on using technology to aid the developing world. Last year, he helped author a research paper on how Google Flu Trends could contribute to early epidemic detection, and his team also built a system to implement that research, called FluBreaks.

"Our analysis showed that adding a layer of computational intelligence to Google Flu Trends data provides the opportunity for a reliable early epidemic detection system that can predict disease outbreaks in advance of the existing systems used by the CDC," he wrote in an e-mail to techPresident. "We present an early investigation of algorithms to translate data from services such as Google Flu Trends into a fully automated system for generating alerts when the likelihood of epidemics is quite high."

FluBreaks translates Google search query volume into epidemic alerts, Saif told techPresident. The result, he says, is "a near real-time alternative to conventional disease surveillance networks."

But Saif also explained that futher work was necessary to fully realize the potential of Google Flu Trends in this area. First, raw search data needs to be put through some algorithmic paces before health professionals might be able to use it in decision-making, he said, but the possibility is there.

"Second, there is also a need to develop a more detailed appreciation of how changes in population size and Internet penetration affect the ability of a system based on Google Flu Trends data to provide accurate and actionable information," he wrote.

And there are other approaches as well. Voice of America recently reported on an effort led by Boston Children's Hospital epidemiologist John Brownstein to show the prevalence of the flu by pinning its prevalance on a Flu Near You map. And a new study from Brigham Young University looks at how Twitter could help track the flu.

While no one's saying it's time to stop washing your hands, taking one look at Google and saying it's swine flu all over again is an overstatement.

News Briefs

RSS Feed thursday >

NYC Open Data Advocates Focus on Quality And Value Over Quantity

The New York City Department of Information Technology and Telecommunications plans to publish more than double the amount of datasets this year than it published to the portal last year, new Commissioner Anne Roest wrote last week in an annual report mandated by the city's open data law, with 135 datasets scheduled to be released this year, and almost 100 more to come in 2015. But as preparations are underway for City Council open data oversight hearings in the fall, what matters more to advocates than the absolute number of the datasets is their quality. GO

Civic Tech and Engagement: Announcing a New Series on What Makes it "Thick"

Announcing a new series of feature articles that we will be publishing over the next several months, thanks to the support of the Rita Allen Foundation. Our focus is on digitally-enabled civic engagement, and in particular, how and under what conditions "thick" digital civic engagement occurs. What we're after is answers to this question: When does a tech tool or platform enable actual people to make ongoing and significant contributions to each other, to a place or cause, at a scale that produces demonstrable change? GO

More