Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

What the Internet Can Tell Us About Flu Season

BY Miranda Neubauer | Friday, February 1 2013

In the past few weeks, if you had a stuffy nose, felt a fever coming on or were experiencing a bad headache, it is possible that you took to Google to look up information for "flu like symptoms." In fact, a recent Pew study found that 77 percent of Internet searchers in the U.S. start their online search for health information with a search engine. A review of Google Trends queries in the Health category for the past 90 days shows rises in terms like flu symptoms, pneumonia, bronchitis and RSV (Respiratory syncytial virus). But in entering those queries, Internet searchers can not only confirm for themselves whether they have the flu or not. They are also part of new kind of public health experiment that might become increasingly useful abroad, in countries where access to the Internet is improving but access to health care is slower to arrive.

Google Flu Trends data has provided good estimates in the past about how bad a flu season really was, although its model — which tracks search queries to guess if the user has the flu — was never meant to replace statistics from the Centers for Disease Control, which tracks the number of people who show up at their doctor's office to receive treatment for the flu. As of early January, the CDC was predicting a flu prevalence of just under five percent. Google was predicting a prevalence of 10 percent — sparking that worst-flu-season-ever talk that's got everyone so concerned. Writing for Slate, Will Oremus observes that Google's Flu Trends data isn't just often right — it's right well before the CDC data is ready to share. But through January so far, it looks like this is the year Google's algorithms are going to be a little off — and it's a little too soon to use search data to decide whether it's time to break out the hazmat suit.

Counting people who show up at the doctor because they're sick is pretty simple, even if it takes more time for the numbers to come in. Mining search data, on the other hand, can be complicated. In September 2009, as swine flu fears swept the world, Google happened to be changing the model it used to predict prevalence of flu. Evaluating the new model against the old one and against the CDC's methodology, Google found that the old way it had been doing things would not have correctly predicted the early days of the swine flu outbreak.

A research paper, authored by Google employees and a CDC employee, explained the reasons in detail. Evidence suggested people were seeking out medical care more readily, which may have affected the CDC's numbers. Meanwhile, they had also changed their search behavior. And Google's new flu-prediction model looked for fewer words related to complications from the flu, but more words related to symptoms. While the original model performed well during the second wave of that outbreak, the new one did well both times.

The paper notes that in the early stages of the swine flu, "the proportion of outpatient visits due to ILI captured in ILINet was slightly elevated (61%) compared with Wave 2 (43%), due to ill persons more readily seeking health care for relatively mild illness during the first weeks of pH1N1." The paper also notes that the researchers excluded a few weeks during that outbreak because of "tremendous media attention."

So "how bad is flu season" has different answers depending on how you count the numbers, and both models leave room for error. This adds uncertainty to the fear Slate's Will Oremus captured when he wrote:

"[T]he really ominous chart is the one that shows the trend line for the nation as a whole. It roughly agrees with the CDC that flu activity in December was about in line with the 'moderately severe' peak in 2007-2008. But if Google is right, the CDC's snapshot came just as the outbreak was gaining steam. Since mid-December, the trend line has rocketed past that of all previous years and now towers over that of the October 2009 H1N1 pinnacle, suggesting a CDC outpatient surveillance figure of an unprecedented 8.9 percent."

All of this has people asking another question: How accurate is Google Flu Trends? On Quora, MIT computer science graduate student Keith Winstein writes that rather than predicting an epidemic the CDC might not have seen coming, it looks more like Google's model is broken again:

At this point, it appears likely that Google Flu Trends has considerably overstated this year's flu activity in the U.S. But we won't be able to draw a firmer conclusion until after the flu season has ended. I don't know why the model broke down this year but am eager to learn, when and if Google comes to a similar conclusion. For now, I suspect this episode may provide a cautionary tale about the limits of inference from "big data" and the perils of overconfidence in a sophisticated and seemingly-omniscient statistical model.

Matt Mohebbi, a Google engineer who works on Google Flu, says that it is much too early to draw any conclusion about the accuracy of Google Flu and how it reflects this year's flu season. He notes that the CDC in general is one or two weeks behind Google with its data, there are additional delays at this time since the early season coincided with the holidays, and some reporting sites have been experiencing longer than normal delays, likely due to the large amount of cases. In addition, the CDC's data is also often adjusted retroactively.

Mohebbi added, though, that data from New York City through January 15, which he said has one of the best electronic surveillance systems, was reporting about about a four to five percent increase in cases over the number in September, which he said corresponded with what Google was seeing.

And Google Flu Trends was supposed to be complementary to the CDC's data, Mohebbi said, not a predictive tool to replace it.

But some research indicates that Google Flu Trends could make a significant contribution to epidemic detection. Umair Saif is a Pakistani computer scientist who is an associate professor at the Lahore University of Management Sciences and has also taught at the Computer Science and Artificial Intelligence Laboratory at MIT. Saif leads the Dritte initiative, which is focused on using technology to aid the developing world. Last year, he helped author a research paper on how Google Flu Trends could contribute to early epidemic detection, and his team also built a system to implement that research, called FluBreaks.

"Our analysis showed that adding a layer of computational intelligence to Google Flu Trends data provides the opportunity for a reliable early epidemic detection system that can predict disease outbreaks in advance of the existing systems used by the CDC," he wrote in an e-mail to techPresident. "We present an early investigation of algorithms to translate data from services such as Google Flu Trends into a fully automated system for generating alerts when the likelihood of epidemics is quite high."

FluBreaks translates Google search query volume into epidemic alerts, Saif told techPresident. The result, he says, is "a near real-time alternative to conventional disease surveillance networks."

But Saif also explained that futher work was necessary to fully realize the potential of Google Flu Trends in this area. First, raw search data needs to be put through some algorithmic paces before health professionals might be able to use it in decision-making, he said, but the possibility is there.

"Second, there is also a need to develop a more detailed appreciation of how changes in population size and Internet penetration affect the ability of a system based on Google Flu Trends data to provide accurate and actionable information," he wrote.

And there are other approaches as well. Voice of America recently reported on an effort led by Boston Children's Hospital epidemiologist John Brownstein to show the prevalence of the flu by pinning its prevalance on a Flu Near You map. And a new study from Brigham Young University looks at how Twitter could help track the flu.

While no one's saying it's time to stop washing your hands, taking one look at Google and saying it's swine flu all over again is an overstatement.

News Briefs

RSS Feed wednesday >

Facebook Seeks Approval as Financial Service in Ireland. Is the Developing World Next?

On April 13 the Financial Times reported that Facebook is only weeks away from being approved as a financial service in Ireland. Is this foray into e-money motivated by Facebook's desire to conquer the developing world before other corporate Internet giants do? Maybe.


The Rise and Fall of Iran's “Blogestan”

The robust community of Iranian bloggers—sometimes nicknamed “Blogestan”—has shrunk since its heyday between 2002 – 2010. “Whither Blogestan,” a recent report from the University of Pennsylvania's Iran Media Program sought to find out how and why. The researchers performed a web crawling analysis of Blogestan, survey 165 Persian blog users, and conducted 20 interviews with influential bloggers in the Persian community. They found multiple causes of the decline in blogging, including increased social media use and interference from authorities.


tuesday >

Weekly Readings: What the Govt Wants to Know

A roundup of interesting reads and stories from around the web. GO

Russia to Treat Bloggers Like Mass Media Because "the F*cking Journalists Won't Stop Writing"

The worldwide debate over who is and who isn't a journalist has raged since digital media made it much easier for citizen journalists and other “amateurs” to compete with the big guys. In the United States, journalists are entitled to certain protections under the law, such as the right to confidential sources. As such, many argue that blogging should qualify as journalism because independent writers deserve the same legal protections as corporate employees. In Russia, however, earning a place equal to mass media means additional regulations and obligations, which some say will lead to the repression of free speech.


Politics for People: Demanding Transparent and Ethical Lobbying in the EU

Today the Alliance for Lobbying Transparency and Ethics Regulation (ALTER-EU) launched a campaign called Politics for People that asks candidates for the European Parliament to pledge to stand up to secretive industry lobbyists and to advocate for transparency. The Politics for People website connects voters with information about their MEP candidates and encourages them to reach out on Facebook, Twitter or by email to ask them to sign the pledge.


monday >

Security Agencies Given Full Access to Telecom Data Even Though "All Lebanese Can Not Be Suspects"

In late March, Lebanese government ministers granted security agencies unrestricted access to telecommunications data in spite of some ministers objections that it violates privacy rights. Global Voices reports that the policy violates Lebanon's existing surveillance and privacy law, Law 140, but has gotten little coverage from the country's mainstream media.


friday >

In Google Hangout, NYC Mayor de Blasio Talks Tech and Outer Borough Potential

New York City Mayor Bill de Blasio followed the lead of President Obama and New York City Council member Ben Kallos Friday by participating in a Google Hangout to help mark his first 100 days in office, in which the conversation focused on expanding access to technology opportunities through education and ensuring that the needs of the so-called "outer boroughs" aren't overlooked. GO

thursday >

In Pakistan, A Hypocritical Gov't Ignores Calls To End YouTube Ban

YouTube has been blocked in Pakistan by executive order since September 2012, after the “blasphemous” video Innocence of Muslims started riots in the Middle East. Since then, civil society organizations and Internet rights advocacy groups like Bolo Bhi and Bytes for All have been working to lift the ban. Last August the return of YouTube seemed imminent—the then-new IT Minister Anusha Rehman spoke optimistically and her party, which had won the majority a few months before, was said to be “seriously contemplating” ending the ban. And yet since then, Rehman and her party, the conservative Pakistan Muslim League (PML-N), have done everything in their power to maintain the status quo.


The #NotABugSplat Campaign Aims to Give Drone Operators Pause Before They Strike

In the #NotABugSplat campaign that launched this week, a group of American, French and Pakistani artists sought to raise awareness of the effects of drone strikes by placing a field-sized image of a young girl, orphaned when a drone strike killed her family, in a heavily targeted region of Pakistan’s Khyber-Pakhtunkhwa Province. Its giant size is visible to those who operate drone strikes as well as in satellite imagery. GO

Boston and Cambridge Move Towards More Open Data

The Boston City Council is now considering an ordinance which would require Boston city agencies and departments to make government data available online using open standards. Boston City Councilor At Large Michelle Wu, who introduced the legislation Wednesday, officially announced her proposal Monday, the same day Boston Mayor Martin Walsh issued an executive order establishing an open data policy under which all city departments are directed to publish appropriate data sets under established accessibility, API and format standards. GO

YouTube Still Blocked In Turkey, Even After Courts Rule It Violates Human Rights, Infringes on Free Speech

Reuters reports that even after a Turkish court ruled to lift the ban on YouTube, Turkey's telecommunications companies continue to block the video sharing site.