Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

What the Internet Can Tell Us About Flu Season

BY Miranda Neubauer | Friday, February 1 2013

In the past few weeks, if you had a stuffy nose, felt a fever coming on or were experiencing a bad headache, it is possible that you took to Google to look up information for "flu like symptoms." In fact, a recent Pew study found that 77 percent of Internet searchers in the U.S. start their online search for health information with a search engine. A review of Google Trends queries in the Health category for the past 90 days shows rises in terms like flu symptoms, pneumonia, bronchitis and RSV (Respiratory syncytial virus). But in entering those queries, Internet searchers can not only confirm for themselves whether they have the flu or not. They are also part of new kind of public health experiment that might become increasingly useful abroad, in countries where access to the Internet is improving but access to health care is slower to arrive.

Google Flu Trends data has provided good estimates in the past about how bad a flu season really was, although its model — which tracks search queries to guess if the user has the flu — was never meant to replace statistics from the Centers for Disease Control, which tracks the number of people who show up at their doctor's office to receive treatment for the flu. As of early January, the CDC was predicting a flu prevalence of just under five percent. Google was predicting a prevalence of 10 percent — sparking that worst-flu-season-ever talk that's got everyone so concerned. Writing for Slate, Will Oremus observes that Google's Flu Trends data isn't just often right — it's right well before the CDC data is ready to share. But through January so far, it looks like this is the year Google's algorithms are going to be a little off — and it's a little too soon to use search data to decide whether it's time to break out the hazmat suit.

Counting people who show up at the doctor because they're sick is pretty simple, even if it takes more time for the numbers to come in. Mining search data, on the other hand, can be complicated. In September 2009, as swine flu fears swept the world, Google happened to be changing the model it used to predict prevalence of flu. Evaluating the new model against the old one and against the CDC's methodology, Google found that the old way it had been doing things would not have correctly predicted the early days of the swine flu outbreak.

A research paper, authored by Google employees and a CDC employee, explained the reasons in detail. Evidence suggested people were seeking out medical care more readily, which may have affected the CDC's numbers. Meanwhile, they had also changed their search behavior. And Google's new flu-prediction model looked for fewer words related to complications from the flu, but more words related to symptoms. While the original model performed well during the second wave of that outbreak, the new one did well both times.

The paper notes that in the early stages of the swine flu, "the proportion of outpatient visits due to ILI captured in ILINet was slightly elevated (61%) compared with Wave 2 (43%), due to ill persons more readily seeking health care for relatively mild illness during the first weeks of pH1N1." The paper also notes that the researchers excluded a few weeks during that outbreak because of "tremendous media attention."

So "how bad is flu season" has different answers depending on how you count the numbers, and both models leave room for error. This adds uncertainty to the fear Slate's Will Oremus captured when he wrote:

"[T]he really ominous chart is the one that shows the trend line for the nation as a whole. It roughly agrees with the CDC that flu activity in December was about in line with the 'moderately severe' peak in 2007-2008. But if Google is right, the CDC's snapshot came just as the outbreak was gaining steam. Since mid-December, the trend line has rocketed past that of all previous years and now towers over that of the October 2009 H1N1 pinnacle, suggesting a CDC outpatient surveillance figure of an unprecedented 8.9 percent."

All of this has people asking another question: How accurate is Google Flu Trends? On Quora, MIT computer science graduate student Keith Winstein writes that rather than predicting an epidemic the CDC might not have seen coming, it looks more like Google's model is broken again:

At this point, it appears likely that Google Flu Trends has considerably overstated this year's flu activity in the U.S. But we won't be able to draw a firmer conclusion until after the flu season has ended. I don't know why the model broke down this year but am eager to learn, when and if Google comes to a similar conclusion. For now, I suspect this episode may provide a cautionary tale about the limits of inference from "big data" and the perils of overconfidence in a sophisticated and seemingly-omniscient statistical model.

Matt Mohebbi, a Google engineer who works on Google Flu, says that it is much too early to draw any conclusion about the accuracy of Google Flu and how it reflects this year's flu season. He notes that the CDC in general is one or two weeks behind Google with its data, there are additional delays at this time since the early season coincided with the holidays, and some reporting sites have been experiencing longer than normal delays, likely due to the large amount of cases. In addition, the CDC's data is also often adjusted retroactively.

Mohebbi added, though, that data from New York City through January 15, which he said has one of the best electronic surveillance systems, was reporting about about a four to five percent increase in cases over the number in September, which he said corresponded with what Google was seeing.

And Google Flu Trends was supposed to be complementary to the CDC's data, Mohebbi said, not a predictive tool to replace it.

But some research indicates that Google Flu Trends could make a significant contribution to epidemic detection. Umair Saif is a Pakistani computer scientist who is an associate professor at the Lahore University of Management Sciences and has also taught at the Computer Science and Artificial Intelligence Laboratory at MIT. Saif leads the Dritte initiative, which is focused on using technology to aid the developing world. Last year, he helped author a research paper on how Google Flu Trends could contribute to early epidemic detection, and his team also built a system to implement that research, called FluBreaks.

"Our analysis showed that adding a layer of computational intelligence to Google Flu Trends data provides the opportunity for a reliable early epidemic detection system that can predict disease outbreaks in advance of the existing systems used by the CDC," he wrote in an e-mail to techPresident. "We present an early investigation of algorithms to translate data from services such as Google Flu Trends into a fully automated system for generating alerts when the likelihood of epidemics is quite high."

FluBreaks translates Google search query volume into epidemic alerts, Saif told techPresident. The result, he says, is "a near real-time alternative to conventional disease surveillance networks."

But Saif also explained that futher work was necessary to fully realize the potential of Google Flu Trends in this area. First, raw search data needs to be put through some algorithmic paces before health professionals might be able to use it in decision-making, he said, but the possibility is there.

"Second, there is also a need to develop a more detailed appreciation of how changes in population size and Internet penetration affect the ability of a system based on Google Flu Trends data to provide accurate and actionable information," he wrote.

And there are other approaches as well. Voice of America recently reported on an effort led by Boston Children's Hospital epidemiologist John Brownstein to show the prevalence of the flu by pinning its prevalance on a Flu Near You map. And a new study from Brigham Young University looks at how Twitter could help track the flu.

While no one's saying it's time to stop washing your hands, taking one look at Google and saying it's swine flu all over again is an overstatement.

News Briefs

RSS Feed today >

China's Porn Purge Has Only Just Begun, And Already Sina Is Stripped of Publication License

It seems that China is taking spring cleaning pretty seriously. On April 13 they launched their most recent online purge, “Cleaning the Web 2014,” which will run until November. The goal is to rid China's Internet of pornographic text, pictures, video, and ads in order to “create a healthy cyberspace.” More than 100 websites and thousands of social media accounts have already been closed, after less than a month. Today the official Xinhua news agency reported that the authorities have stripped the Internet giant Sina (of Sina Weibo, the popular microblogging site) of its online publication license. This crackdown on porn comes on the heels of a crackdown on “rumors.” Clearly, this spring cleaning isn't about pornography, it's about censorship and control.


wednesday >

Another Co-Opted Hashtag: #MustSeeIran

The Twitter hashtag #MustSeeIran was created to showcase Iran's architecture, landscapes, and would-be tourist destinations. It was then co-opted by activists to bring attention to human rights abuses and infringements. Now Twitter is home to two starkly different portraits of a country. GO

What Has the EU Ever Done For Us?: Countering Euroskepticism with Viral Videos and Monty Python

Ahead of the May 25 European Elections, the most intense campaigning may not be by the candidates or the political parties. Instead, some of the most passionate campaigns are more grassroots efforts focused on for a start stirring up the interest of the European electorate. GO

At NETmundial Brazil: Is "Multistakeholderism" Good for the Internet?

Today and tomorrow Brazil is hosting NETmundial, a global multi-stakeholder meeting on the future of Internet governance. GO

Brazilian President Signs Internet Bill of Rights Into Law at NetMundial

Earlier today Brazil's President Dilma Rousseff sanctioned Marco Civil, also called the Internet bill of rights, during the global Internet governance event, NetMundial, in Brazil.


tuesday > Reboots As a Candidate Digital Toolkit That's a Bit Too Like launched with big ambitions and star appeal, hoping to crack the code on how to get millions of people to pool their political passions through their platform. When that ambition stalled, its founder Nathan Daschle--son of the former Senator--decided to pivot to offering political candidates an easy-to-use free web platform for organizing and fundraising. Now the new is out from stealth mode, entering a field already being served by competitors like NationBuilder, Salsa Labs and And strangely enough, seems to want its early users to ask for help. GO

Armenian Legislators: You Can Be As Anonymous on the 'Net As You Like—Until You Can't

A proposed bill in Armenia would make it illegal for media outlets to include defamatory remarks by anonymous or fake sources, and require sites to remove libelous comments within 12 hours unless they identify the author.


monday >

The Good Wife Looks for the Next Snowden and Outwits the NSA

Even as the real Edward Snowden faces questions over his motives in Russia, another side of his legacy played out for the over nine million viewers of last night's The Good Wife, which concluded its season long storyline exploring NSA surveillance. In the episode titled All Tapped Out, one young NSA worker's legal concerns lead him to becoming a whistle-blower, setting off a chain of events that allows the main character, lawyer Alicia Florrick (Julianna Margulies), and her husband, Illinois Governor Peter Florrick (Chris Noth), to turn the tables on the NSA using its own methods. GO

The Expanding Reach of China's Crowdsourced Environmental Monitoring Site, Danger Maps

Last week billionaire businessman Jack Ma, founder of the e-commerce company Alibaba, appealed to his “500 million-strong army” of consumers to help monitor water quality in China. Inexpensive testing kits sold through his company can be used to measure pH, phosphates, ammonia, and heavy metal levels, and then the data can be uploaded via smartphone to the environmental monitoring site Danger Maps. Although the initiative will push the Chinese authorities' tolerance for civic engagement and activism, Ethan Zuckerman has high hopes for “monitorial citizenship” in China.


The 13 Worst Bits of Russia's Current and Maybe Future Internet Legislation

It appears that Russia is on the brink of passing still more repressive Internet regulations. A new telecommunications bill that would require popular blogs—those with 3,000 or more visits a day—to join a government registry and conform to government-mandated standards is expected to pass this week. What follows is a list of the worst bits of both proposed and existing Russian Internet law. Let us know in the comments or on Twitter if we missed anything.