Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

Wild Horses: Data.gov Proves Good Stats are Hard to Wrangle

BY Nick Judd | Thursday, January 28 2010

Wrangling good data is like wrangling horses: It's hard, and technology can only make it so much easier.Rollin' rollin' rollin', keep them data rollin': A herd of federal agency data was taken in from the pasture on Jan. 22. // Photo: Bureau of Land Management

Not to knock the plight of the wild North American horse, but it isn't clear to me how population counts of wild burros and mustangs are the most important data the Department of the Interior has to offer for its eager public.

Along with every other federal agency, Interior had until Jan. 22 to respond to a Dec. 8 directive from Office of Management and Budget Director Peter Orszag by posting, on the Obama administration's Data.gov open government data repository, three "high-value data sets." Their response was a list of volunteer opportunities from serve.gov; a list of government recreation facilities; three data sets concerning wildland fires; and an elaboration on the United States' dwindling stock of wild mustangs.

So I asked Interior: What makes the wild American donkey so important?

"One of the mandates under the open government directive was that the data being published was central to the agency mission and that was the case with these projects," Kendra Barkoff, a Department of Interior spokeswoman, wrote in reply.

Well, fair enough. But the Department of the Interior also handles Native American affairs — a population dealing with high unemployment, poor infrastructure in many places, and little to no focus or attention in the minds of the general public. It struck me as odd that mustangs, although Secretary of the Interior Ken Salazar has taken heat on the issue in the past, would be the subject of the data Salazar's department would make public via Data.gov.

Poring through the statistics and research made available on Data.gov that is relevant to their areas of interest, a small handful of researchers I spoke to say that much of it is stuff they've either seen before or don't find especially useful. The Sunlight Foundation's Bill Allison has already opined on this subject: the OMB directive mandated that federal agencies post only "data sets not previously available online or in a downloadable format."

UPDATE: On Thursday, the Washington Post came out with similar findings.

"I think that in some cases it may be true that a version or some of the information that was submitted as of Friday deadline had existed in one form or another on a government website," said an OMB spokesman, Tom Gavin. "But what we have found is in that many of those cases the data was not available in a machine readable format."

In other cases, it was available but not free, he said. Part of the point is that the data is now all in one place, and the process of aggregating that data is just beginning.

The problem may not be bureaucratic reticence, but simply that the agencies have only so much good data in the first place. Standards are getting better, researchers tell me, but right now, good government data is hard to find because there often isn't a whole lot of it, not because the government is keeping it to itself.

Open Voting

Data.gov allows users to vote on which datasets are the best. Here are some of the top contenders as of Wednesday afternoon, by number of votes:

"High value" itself may seem to be a subjective test, but, in Orszag's Dec. 8 directive, the OMB director offered a definition: Raw data that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation.

More than the "high-value" test, the issue of redundancy seems to be a pressing one. Data.gov seems to be better at fulfilling the need for a central clearing-house of information than at serving as a catalyst for the release of new government data, as Orszag's three-new-datasets proviso implied it was meant to be.

"Certainly I love datasets more than the average person," said Ashley Nellis, a research analyst for the Sentencing Project who has scoped out the criminal justice data available on Data.gov, in a Tuesday phone interview. "But those datasets, there wasn't anything new there that I could see."

For Nellis, who researches racial disparity in the criminal justice system — she's compiling data the Sentencing Project is getting from the state level on its own, for instance, on the greater propensity for black people to have life sentences as opposed to white people — the Bureau of Justice Statistics has such a great website that she has no need of its data in a second place.

A spokeswoman for the Justice Department was not immediately available for comment early Wednesday evening.

Similarly, the raw results of a Federal Voting Assistance Program survey of overseas voters after the 2004 Presidential elections was among the data released by the Department of Defense. But FVAP released that data online last summer, Claire Smith, research director of the Overseas Vote Foundation, told me Wednesday in a phone interview. (She's eagerly awaiting the results of the 2008 survey, she says, which she expects soon.)

When new appointee Bob Carey took over the Federal Voting Assistance Program in summer 2009, Smith says, she immediately noticed the program become more open. So perhaps it's fair to say that the data did come out as a result of the Obama administration's professed open-government ethos — just not this particular initiative.

Then what is left to disclose? I asked Smith if there was much else the government might have on this topic that researchers would want.

"I don't think there is any," she said, later adding, "we don't even know how many Americans live abroad."

The Census Bureau hasn't even tried to make an estimate since 2000, Smith said. (Although I imagine there are closely-held figures on the topic somewhere in the country's intelligence community.)

No one at the Department of Defense press shop was immediately available to comment, but it's important to note that this wasn't the only dataset DoD posted. Their log of received Freedom of Information Act requests, which they also indicated was a "high-value" dataset, has garnered attention, judging by the number of people who cast votes on Data.gov for the data as a useful set of information. DoD posted ten datasets in total as of this writing.

The Department of the Interior's data caught my eye because it is the federal department tasked with keeping track of Native American affairs. If ever there was an underappreciated area of research, this would be it — but when I went looking, I found a census of feral mustangs, and no information on the unemployment rate on reservations.

Similarly, there were available data on prisons in Indian country — but the most current set available on Data.gov is from 2001 as of this writing.

"[There's a] distinct difference between having the data sets available," said Peter Morris, director of strategy and partnership for the National Congress of American Indians, "and having the kind of data you need."

Specifically, he was talking about the Treasury's Recovery Act data, another popular dataset on Data.gov. The data is there on investments made through the Recovery Act, including through the Community Development Financial Institutions Program. That program, Morris says, has facilitated investment in Native American communities, some of which are in grave need of infrastructure like better roads and the jobs that would be filled by people building them. But the data isn't structured such that Morris can sort out investments made in Native American communities from those that aren't, he said.

Morris heaped praise on Interior Secretary Ken Salazar for his willingness to pay attention to Native American affairs, and said he feels that Salazar, and by extension the entire department, understands the need for better data.

It's a refrain Nellis repeated: It's very slowly getting easier to get good data, and this is a focus of the Obama administration. But it's been a process for at least the last eight years.

Interior and the Bureau of Labor Statistics keep separate unemployment data, for example, and Interior uses different standards than Labor. This means comparing unemployment rates of people living on Native American land and in the surrounding states would be like comparing, well, horses and burros: The two are similar, but just not the same. As a result, says Morris, Alaska, Arizona and Minnesota — all of which have sizeable populations on Native American land with higher unemployment rates than the states themselves — were classified as having an unemployment rate under the threshhold required to gain an extension of unemployment benefits that was granted last year.

The Obama administration appears to create an emphasis on Data.gov even as it pursues more arduous but arguably more relevant aspects of institutional change behind closed doors — and it is by all accounts moving in that direction. The same memo that established the Data.gov dump deadline also required each agency to designate a senior official responsible for the "quality and objectivity" of records that track federal spending, and — to the delight of data nerds worldwide — established that federal data should be as granular as possible. I heard from several researchers that, while slow, change in this arena was coming.

Gavin, the OMB spokesman, says a list of the agency officials responsible for data quality and a list of people on a related inter-agency working group are both supposed to be released soon.

While Data.gov is flashy, easy to explain and even starting an open-government competition of sorts with the United Kingdom, submitting data to the website is really the least difficult of the commitments the administration is now expected to keep.

The more abstract task of establishing standards for data, and actually collecting and entering it, is like wrangling a mustang: It's quite hard, and technology can only make it so much easier.

I asked Morris, of the National Congress of American Indians: Is Data.gov, at least for now, susceptible to garbage in, garbage out?

"That theme," he said, "comes through in the data you see here."

News Briefs

RSS Feed friday >

Chilean Anti-Corruption Resource: A Crowdsourced Database of Social and Political Connections

In countries where a small minority of social circles have a majority of the political and economic power, personal relationships can affect major decision-making, a serious concern of anti-corruption activists. A new web platform stores personal profiles of key players in Chilean business and politics, complete with biographies and personal and professional connections through family, education, social circles, employers and coworkers, to make tracking social relationships and conflict-of-interest easier. Called Poderopedia (from the Spanish word for power), the project sounds kind of like LinkedIn, but the creation and management of profiles is being crowdsourced out to journalists, activists and concerned citizens.

GO

Middle Eastern Telecom Accused of Working With Saudi Arabia to Spy on Citizens

Mobily, an arm of the state-owned Middle Eastern telecom giant Etihad Etisalat, has been accused of working with Saudi Arabia to develop software that would allow the government to bypass protections for social media users. The exposé comes from Moxie Marlinspike (neé Matthew Rosenfield), an expert in a certain type of malicious Internet attack called MITM (man-in-the-middle), whereby attackers intercept and secretly alter private messages exchanged via email and other social media platforms. GO

Saudi Religious Leader Warns Twitter Users of Consequences in the Afterlife

In late March, Saudi Arabia's top religious cleric said Twitter was for clowns and corrupters. Earlier this week, he said anyone using social media, in particular Twitter, “has lost this world and the afterlife.” His comments might be laughable, if they did not come at a time when the Saudi government is looking into monitoring or blocking social media sites and eliminating user anonymity.

GO

thursday >

What The Other Silicon Valley Immigration Group Is Doing This Month

A bipartisan coalition of political advocacy, business and tech groups are moving ahead to launch a social media blitz next week designed to persuade members of the Senate to vote in favor of immigration reform legislation supported in Silicon Valley. "We're going to create a virtual digital storm," said Jeremy Robbins in a Wednesday ... GO

The New Yorker Hopes "Strongbox" Is a Wiretap-Proof Sieve for Leaks

The New Yorker yesterday became the first outlet to implement DeadDrop, a new system for sources to submit information to journalists online in a more secure and anonymous way than, for example, email. GO

Female Organizer of Pakistan's First Hackathon Stresses Collaboration Over Competition

After Pakistan banned Valentine's Day this year, Sabeen Mahmud started an online protest in which people uploaded photos to mock the government ban. In the weeks following she received death threats and menacing phone calls, and early on she had to stay home from work. That did nothing, however, to keep her from further organizing. Last month, the café she started in Karachi hosted Pakistan's first ever hackathon, which tackled problems including sanitation, crime, disaster management, and education. She even invited a government representative to observe the initial conversations, tackling sensitive areas like government inefficiency and elections.

GO

wednesday >

White House Innovation Fellows Project Spins Off Into A Business

Clay Johnson and Adam Becker joined the Presidential Innovation Fellows program to help the White House fix the way government does business. Now they're turning that mission into a business themselves. GO

Fighting Fires With Data, New York City Launches New Safety Inspection System

Mayor Michael Bloomberg announced today that New York City has implemented city-wide a new risk based inspection system focused on fire safety that is driven by analytics from multiple city agencies. GO

Chinese Netizens Use Digital Initiative to Gain Media Attention for Unsolved Poisoning Case

Last month a medical science student at a Shanghai university died from poisoning, allegedly murdered by his roommate. The specifics of the crime echoed a case from the mid-1990s, in which a 19-year-old student was poisoned with thallium. That case has once again been thrown into the media spotlight, but after 18 years the media has changed and the spotlight means a trending hashtag on Sina Weibo or an online petition to the U.S. President.

GO

PDF France 2013: “Au Code, Citoyens!”

This year PDF France will take place in Paris on June 13, with the theme "Au Code, Citoyens!" ("To Code, Citizens!") The speakers' lineup includes some of the continent's leaders in the digital revolution. GO

tuesday >

Website Imitation is Flattery in New York City Council Race

A New York City Council candidate who had made his name as a technology consultant and spearheaded an open government initiative several years ago found parts of his website copied by another City Council candidate in a different borough, as Politicker first reported. GO

Mike Honda Locks Up Establishment Support, But Challenger Has Ear of the Silicon Valley Elite

Some of Silicon Valley's most influential business people will hold a fundraiser in San Francisco this Thursday for Ro Khanna, the 36-year-old lawyer who's challenging 71-year-old California Democrat Mike Honda for his 17th Congressional District seat. The names at the top of the invite: Ron Conway and Sean Parker. They're apparently forming a committee to help Khanna build his campaign. The other bold-face names who are listed as part of the 'committee in formation' include Salesforce.com's Founder and CEO Marc Benioff, Benchmark Capital General Partners' Matt Cohler and Peter Fenton, tech entrepreneur Shawn Fanning, Yahoo CEO Marissa Mayer, her big data venture investor husband Zach Bogue, and Conway's SV Angel colleague, Founder and Managing Partner David Lee. GO

Tools to Keep Independent Media Online in Hostile Environments

Websites and media outlets in developing countries or countries with corrupt or repressive regimes struggle daily to fend off hacker attacks, some from their own government — like the Malaysian news portal Sarawak Report, which techPresident reported was taken down in April by sustained denial-of-service attacks. The negative attention controversial reporting draws can scare local advertisers away as well, making it difficult for a media company to support itself. Media Frontiers offers two services to websites dealing with either of those problems.

GO

monday >

Ahead of September Elections, German Pirate Party Picks Its Platform

The German Pirate Party held its election year convention over the weekend and approved its party platform, following lengthy debate over the role that online decision-making should have within the party, as German news sources reported and the party outlined on its own web platforms. GO

Peruvians Petition their President to Stick Up for their Digital Rights

Peru’s civil society advocacy groups have started an online petition outlining their ‘non-negotiable’ demands for digital rights and freedom of speech. The campaign was prompted by the controversial Trans-Pacific Partnership (TPP) agreement. Lima, Peru, will soon host the 17th round of secretive TPP trade talks, which will take place from May 15 – 24.

GO

Gun Control Advocates Take Aim At LivingSocial for Promoting Guns and Alcohol

A coalition of advocacy groups is launching a new campaign this week against the promotion of American gun culture. The campaign focuses on the daily deals site Living Social, which hasn't stopped promoting social events Hunter S. Thompson would have loved (they promote shooting off guns and letting off steam and drinking.) GO

More