Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

Politico-Facebook Sentiment Analysis Will Generate "Bogus" Results, Expert Says

BY Micah L. Sifry | Friday, January 13 2012

Thursday morning, Politico announced that it was joining with Facebook to "measure GOP candidate buzz" and give its readers an "exclusive look at the conversation taking place on the social networking site" ahead of the January 21 South Carolina primary. "Mitt, Paul winning Facebook primary" was the headline on their first story on the project.

"'Social media has forever changed the way candidates campaign for the presidency,' said John F. Harris, editor-in-chief of Politico, in a press release about the new partnership. He added, “Facebook has been instrumental in expanding the political dialogue among voters and we couldn't be more excited about the opportunity to offer our readers a look inside this very telling conversation.”

At the core of the partnership are two projects. One is completely uncontroversial: Politico's editorial team is going to survey voting-age Facebook users on a daily basis and report the results each day. Presumably these will be randomized according to typical polling methodologies. The other one is completely "bogus," as one expert put it to me: Facebook is analyzing its users' status updates, postings and comments that refer to the candidates, and assigning positive and negative values to them, producing a daily track of their supposed ups and downs.

It's called "sentiment analysis." It's the heart of the pretty charts and graphs that the two companies rolled out to tout their partnership. And it's total bunk.

Here's the issue: Counting the number of times a candidate's name is mentioned on social media and noting what words appear alongside those mentions can illuminate broad trends. You can report that "more people talked about Candidate X today" and "Y percent of that group used word ZZZZ in their comment." But you can't make any kind of meaningful judgment about what those people intended by that usage without asking them.

Someone who writes "I'm so happy that Newt Gingrich is staying in the race" might be a genuine Gingrich fan, or they might be someone who hates him, but likes that he's staying in the race because he's entertaining, or because they think he's hurting the Republican field. But "sentiment analysis" is still such an embryonic field that serious researchers tend to avoid any hard claims about whether such a statement is positive, negative or neither.

Facebook says the tool it is using, Linguistic Inquiry and Word Count, is a "well-validated software tool used frequently in social psychological research to identify positive and negative emotion in text." It is an often-used tool: It can be useful, researchers said, to find frequent words and how they are associated. But several researchers I spoke to were highly skeptical about making definitive claims about what the word analysis might show.

Marc A. Smith, director of the Social Media Research Foundation, told me, "I share your skepticism re: 'sentiment' analysis. Irony is a tough nut to crack."

"That said, I do put great faith in simple text volume tracking," he added. "When the names of candidates rise and fall in volume that is news. When the words that co-occur with them change, that is news as well."

"So my preferred approach is simple frequency measures on keywords and *word pairs* rather than generating percentage measures of positive or negative sentiment."

Smith sent me a quick analysis that he did of Twitter comments in the last 24 hours from people following Mitt Romney. The top word pairs he found were:
mitt - romney
romney - heads
auto - industry
south - carolina
romney - came
ron - paul
victorious - romney
far - worse
bain - bad
attacking - romney

Here's the full "keyword network graph" that Smith developed (using NodeXL) from the same data set. Words are linked because they occurred next to one another in a tweet, filtered so all words with only one connection are removed.

As you can see, this isn't something you can write a headline around. Serious efforts to analyze online conversation en masse produce much less satisfying results than Politico's dubious claim to be tracking who is winning "the Facebook primary." (And even Politico's story Thursday on the "Facebook primary" admits that the data they're collecting often don't make sense on their own terms.

As reporter Rachel Van Dongen wrote, "Declaring a winner of the Facebook primary — or the candidate viewed most positively over the past month — is far more difficult than tallying votes. Santorum didn’t see a spike in positive postings on Facebook around his Iowa win, but Gingrich did."

Say what? How could Gingrich's cranky post-caucus speech on live TV have generated positive buzz for him?

Libby Hemphill is another expert who shies away from making claims about analyzing sentiment in social media. An assistant professor of communication and information studies at the Illinois Institute of Technology, she is working on a study of language use among elected officials.

"Analyzing text from social media is especially hard because it has so little context and is so short," she told me. "I think most of those sentiment reports are bogus for the reasons you mention and more.

"We're working on algorithms for automatically detecting content and manner, but we're avoiding sentiment," she continued. "These sentiment reports make the analysis sound so much easier than it actually is, and I worry that they're giving automated detection and classification a bad rep before we even get off the ground."

Jahna Otterbacher, a language expert who works with Hemphill, concurred. "Even in the case of words that, individually, are completely uncontroversial with respect to the sentiment they convey (e.g., good, bad, etc.) in context, things are much trickier, even before we get to talking about irony and sarcasm (e.g., 'It's so good to see him go' is clearly not conveying positive sentiment toward the person in question.)"

And online chatter, she added, makes analysis even harder.

"With the more casual nature of language that's used in social media contexts," Otterbacher said, "things are even more complicated (e.g., 'He's got it bad for her.')"

I raised these concerns with spokespeople for Facebook and Politico. A Facebook spokesman did not comment for the record. Politico's press team did not respond to a request for comment.

Remember, there are reasonable ways to explore social media metrics around the presidential campaign. Take for example this post by Gavin Sheridan on Storyful. They collected the 53,000 tweets from within the U.S. mentioning a presidential candidate in the 24 hours before the New Hampshire polls closed, and they also focused on the tweets that were specifically from the state. Lo and behold, they found that the volume of mentions correlated to the order of the candidates' standing in the vote: Romney, Paul, Huntsman, Santorum, Gingrich and Perry. But did Storyful run breathless headline claiming that Twitter "predicted" the election (as did Mashable)? No, it just noted that "The New Hampshire data was interesting because it appears to closely match, probably coincidentally, the actual result of the primary."

Or take this post from Google Politics & Elections, which looks at trends in search interest in the different candidates. Santorum, Romney and Paul are all bunched together with a recent uptick in interest shown in South Carolina in their names. Interestingly, Gingrich is trailing those three, and searches for Rick Perry are completely flatlining. There's no rush to claim that this means someone is "winning the Google primary," and instead Google's team just asks readers, "What do you think of these early trends?"

It's not too late for Politico to back off its claim to be mining Facebook chatter for deep meaning, and just stick to simple statements about who's being talked about. But I doubt they'll do that. Whether it's CNN with its fancy and expensive computer graphics (remember the hologram?) or the various debate partnerships between media organizations and tech companies, a big chunk of the political press seems deep in a mindless love affair with technology and social media. After years of scorning the web, now some of them are rushing in the opposite direction, chumming up with companies they should be reporting critically on, and claiming to find meaning in things that are nowhere as clear as they seem.

News Briefs

RSS Feed wednesday >

Facebook Seeks Approval as Financial Service in Ireland. Is the Developing World Next?

On April 13 the Financial Times reported that Facebook is only weeks away from being approved as a financial service in Ireland. Is this foray into e-money motivated by Facebook's desire to conquer the developing world before other corporate Internet giants do? Maybe.


The Rise and Fall of Iran's “Blogestan”

The robust community of Iranian bloggers—sometimes nicknamed “Blogestan”—has shrunk since its heyday between 2002 – 2010. “Whither Blogestan,” a recent report from the University of Pennsylvania's Iran Media Program sought to find out how and why. The researchers performed a web crawling analysis of Blogestan, survey 165 Persian blog users, and conducted 20 interviews with influential bloggers in the Persian community. They found multiple causes of the decline in blogging, including increased social media use and interference from authorities.


tuesday >

Weekly Readings: What the Govt Wants to Know

A roundup of interesting reads and stories from around the web. GO

Russia to Treat Bloggers Like Mass Media Because "the F*cking Journalists Won't Stop Writing"

The worldwide debate over who is and who isn't a journalist has raged since digital media made it much easier for citizen journalists and other “amateurs” to compete with the big guys. In the United States, journalists are entitled to certain protections under the law, such as the right to confidential sources. As such, many argue that blogging should qualify as journalism because independent writers deserve the same legal protections as corporate employees. In Russia, however, earning a place equal to mass media means additional regulations and obligations, which some say will lead to the repression of free speech.


Politics for People: Demanding Transparent and Ethical Lobbying in the EU

Today the Alliance for Lobbying Transparency and Ethics Regulation (ALTER-EU) launched a campaign called Politics for People that asks candidates for the European Parliament to pledge to stand up to secretive industry lobbyists and to advocate for transparency. The Politics for People website connects voters with information about their MEP candidates and encourages them to reach out on Facebook, Twitter or by email to ask them to sign the pledge.


monday >

Security Agencies Given Full Access to Telecom Data Even Though "All Lebanese Can Not Be Suspects"

In late March, Lebanese government ministers granted security agencies unrestricted access to telecommunications data in spite of some ministers objections that it violates privacy rights. Global Voices reports that the policy violates Lebanon's existing surveillance and privacy law, Law 140, but has gotten little coverage from the country's mainstream media.


friday >

In Google Hangout, NYC Mayor de Blasio Talks Tech and Outer Borough Potential

New York City Mayor Bill de Blasio followed the lead of President Obama and New York City Council member Ben Kallos Friday by participating in a Google Hangout to help mark his first 100 days in office, in which the conversation focused on expanding access to technology opportunities through education and ensuring that the needs of the so-called "outer boroughs" aren't overlooked. GO

thursday >

In Pakistan, A Hypocritical Gov't Ignores Calls To End YouTube Ban

YouTube has been blocked in Pakistan by executive order since September 2012, after the “blasphemous” video Innocence of Muslims started riots in the Middle East. Since then, civil society organizations and Internet rights advocacy groups like Bolo Bhi and Bytes for All have been working to lift the ban. Last August the return of YouTube seemed imminent—the then-new IT Minister Anusha Rehman spoke optimistically and her party, which had won the majority a few months before, was said to be “seriously contemplating” ending the ban. And yet since then, Rehman and her party, the conservative Pakistan Muslim League (PML-N), have done everything in their power to maintain the status quo.


The #NotABugSplat Campaign Aims to Give Drone Operators Pause Before They Strike

In the #NotABugSplat campaign that launched this week, a group of American, French and Pakistani artists sought to raise awareness of the effects of drone strikes by placing a field-sized image of a young girl, orphaned when a drone strike killed her family, in a heavily targeted region of Pakistan’s Khyber-Pakhtunkhwa Province. Its giant size is visible to those who operate drone strikes as well as in satellite imagery. GO

Boston and Cambridge Move Towards More Open Data

The Boston City Council is now considering an ordinance which would require Boston city agencies and departments to make government data available online using open standards. Boston City Councilor At Large Michelle Wu, who introduced the legislation Wednesday, officially announced her proposal Monday, the same day Boston Mayor Martin Walsh issued an executive order establishing an open data policy under which all city departments are directed to publish appropriate data sets under established accessibility, API and format standards. GO

YouTube Still Blocked In Turkey, Even After Courts Rule It Violates Human Rights, Infringes on Free Speech

Reuters reports that even after a Turkish court ruled to lift the ban on YouTube, Turkey's telecommunications companies continue to block the video sharing site.