Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

Wild Horses: Data.gov Proves Good Stats are Hard to Wrangle

BY Nick Judd | Thursday, January 28 2010

Wrangling good data is like wrangling horses: It's hard, and technology can only make it so much easier.Rollin' rollin' rollin', keep them data rollin': A herd of federal agency data was taken in from the pasture on Jan. 22. // Photo: Bureau of Land Management

Not to knock the plight of the wild North American horse, but it isn't clear to me how population counts of wild burros and mustangs are the most important data the Department of the Interior has to offer for its eager public.

Along with every other federal agency, Interior had until Jan. 22 to respond to a Dec. 8 directive from Office of Management and Budget Director Peter Orszag by posting, on the Obama administration's Data.gov open government data repository, three "high-value data sets." Their response was a list of volunteer opportunities from serve.gov; a list of government recreation facilities; three data sets concerning wildland fires; and an elaboration on the United States' dwindling stock of wild mustangs.

So I asked Interior: What makes the wild American donkey so important?

"One of the mandates under the open government directive was that the data being published was central to the agency mission and that was the case with these projects," Kendra Barkoff, a Department of Interior spokeswoman, wrote in reply.

Well, fair enough. But the Department of the Interior also handles Native American affairs — a population dealing with high unemployment, poor infrastructure in many places, and little to no focus or attention in the minds of the general public. It struck me as odd that mustangs, although Secretary of the Interior Ken Salazar has taken heat on the issue in the past, would be the subject of the data Salazar's department would make public via Data.gov.

Poring through the statistics and research made available on Data.gov that is relevant to their areas of interest, a small handful of researchers I spoke to say that much of it is stuff they've either seen before or don't find especially useful. The Sunlight Foundation's Bill Allison has already opined on this subject: the OMB directive mandated that federal agencies post only "data sets not previously available online or in a downloadable format."

UPDATE: On Thursday, the Washington Post came out with similar findings.

"I think that in some cases it may be true that a version or some of the information that was submitted as of Friday deadline had existed in one form or another on a government website," said an OMB spokesman, Tom Gavin. "But what we have found is in that many of those cases the data was not available in a machine readable format."

In other cases, it was available but not free, he said. Part of the point is that the data is now all in one place, and the process of aggregating that data is just beginning.

The problem may not be bureaucratic reticence, but simply that the agencies have only so much good data in the first place. Standards are getting better, researchers tell me, but right now, good government data is hard to find because there often isn't a whole lot of it, not because the government is keeping it to itself.

Open Voting

Data.gov allows users to vote on which datasets are the best. Here are some of the top contenders as of Wednesday afternoon, by number of votes:

"High value" itself may seem to be a subjective test, but, in Orszag's Dec. 8 directive, the OMB director offered a definition: Raw data that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation.

More than the "high-value" test, the issue of redundancy seems to be a pressing one. Data.gov seems to be better at fulfilling the need for a central clearing-house of information than at serving as a catalyst for the release of new government data, as Orszag's three-new-datasets proviso implied it was meant to be.

"Certainly I love datasets more than the average person," said Ashley Nellis, a research analyst for the Sentencing Project who has scoped out the criminal justice data available on Data.gov, in a Tuesday phone interview. "But those datasets, there wasn't anything new there that I could see."

For Nellis, who researches racial disparity in the criminal justice system — she's compiling data the Sentencing Project is getting from the state level on its own, for instance, on the greater propensity for black people to have life sentences as opposed to white people — the Bureau of Justice Statistics has such a great website that she has no need of its data in a second place.

A spokeswoman for the Justice Department was not immediately available for comment early Wednesday evening.

Similarly, the raw results of a Federal Voting Assistance Program survey of overseas voters after the 2004 Presidential elections was among the data released by the Department of Defense. But FVAP released that data online last summer, Claire Smith, research director of the Overseas Vote Foundation, told me Wednesday in a phone interview. (She's eagerly awaiting the results of the 2008 survey, she says, which she expects soon.)

When new appointee Bob Carey took over the Federal Voting Assistance Program in summer 2009, Smith says, she immediately noticed the program become more open. So perhaps it's fair to say that the data did come out as a result of the Obama administration's professed open-government ethos — just not this particular initiative.

Then what is left to disclose? I asked Smith if there was much else the government might have on this topic that researchers would want.

"I don't think there is any," she said, later adding, "we don't even know how many Americans live abroad."

The Census Bureau hasn't even tried to make an estimate since 2000, Smith said. (Although I imagine there are closely-held figures on the topic somewhere in the country's intelligence community.)

No one at the Department of Defense press shop was immediately available to comment, but it's important to note that this wasn't the only dataset DoD posted. Their log of received Freedom of Information Act requests, which they also indicated was a "high-value" dataset, has garnered attention, judging by the number of people who cast votes on Data.gov for the data as a useful set of information. DoD posted ten datasets in total as of this writing.

The Department of the Interior's data caught my eye because it is the federal department tasked with keeping track of Native American affairs. If ever there was an underappreciated area of research, this would be it — but when I went looking, I found a census of feral mustangs, and no information on the unemployment rate on reservations.

Similarly, there were available data on prisons in Indian country — but the most current set available on Data.gov is from 2001 as of this writing.

"[There's a] distinct difference between having the data sets available," said Peter Morris, director of strategy and partnership for the National Congress of American Indians, "and having the kind of data you need."

Specifically, he was talking about the Treasury's Recovery Act data, another popular dataset on Data.gov. The data is there on investments made through the Recovery Act, including through the Community Development Financial Institutions Program. That program, Morris says, has facilitated investment in Native American communities, some of which are in grave need of infrastructure like better roads and the jobs that would be filled by people building them. But the data isn't structured such that Morris can sort out investments made in Native American communities from those that aren't, he said.

Morris heaped praise on Interior Secretary Ken Salazar for his willingness to pay attention to Native American affairs, and said he feels that Salazar, and by extension the entire department, understands the need for better data.

It's a refrain Nellis repeated: It's very slowly getting easier to get good data, and this is a focus of the Obama administration. But it's been a process for at least the last eight years.

Interior and the Bureau of Labor Statistics keep separate unemployment data, for example, and Interior uses different standards than Labor. This means comparing unemployment rates of people living on Native American land and in the surrounding states would be like comparing, well, horses and burros: The two are similar, but just not the same. As a result, says Morris, Alaska, Arizona and Minnesota — all of which have sizeable populations on Native American land with higher unemployment rates than the states themselves — were classified as having an unemployment rate under the threshhold required to gain an extension of unemployment benefits that was granted last year.

The Obama administration appears to create an emphasis on Data.gov even as it pursues more arduous but arguably more relevant aspects of institutional change behind closed doors — and it is by all accounts moving in that direction. The same memo that established the Data.gov dump deadline also required each agency to designate a senior official responsible for the "quality and objectivity" of records that track federal spending, and — to the delight of data nerds worldwide — established that federal data should be as granular as possible. I heard from several researchers that, while slow, change in this arena was coming.

Gavin, the OMB spokesman, says a list of the agency officials responsible for data quality and a list of people on a related inter-agency working group are both supposed to be released soon.

While Data.gov is flashy, easy to explain and even starting an open-government competition of sorts with the United Kingdom, submitting data to the website is really the least difficult of the commitments the administration is now expected to keep.

The more abstract task of establishing standards for data, and actually collecting and entering it, is like wrangling a mustang: It's quite hard, and technology can only make it so much easier.

I asked Morris, of the National Congress of American Indians: Is Data.gov, at least for now, susceptible to garbage in, garbage out?

"That theme," he said, "comes through in the data you see here."

News Briefs

RSS Feed tuesday >

Weekly Readings: What the Govt Wants to Know

A roundup of interesting reads and stories from around the web. GO

Russia to Treat Bloggers Like Mass Media Because "the F*cking Journalists Won't Stop Writing"

The worldwide debate over who is and who isn't a journalist has raged since digital media made it much easier for citizen journalists and other “amateurs” to compete with the big guys. In the United States, journalists are entitled to certain protections under the law, such as the right to confidential sources. As such, many argue that blogging should qualify as journalism because independent writers deserve the same legal protections as corporate employees. In Russia, however, earning a place equal to mass media means additional regulations and obligations, which some say will lead to the repression of free speech.

GO

Politics for People: Demanding Transparent and Ethical Lobbying in the EU

Today the Alliance for Lobbying Transparency and Ethics Regulation (ALTER-EU) launched a campaign called Politics for People that asks candidates for the European Parliament to pledge to stand up to secretive industry lobbyists and to advocate for transparency. The Politics for People website connects voters with information about their MEP candidates and encourages them to reach out on Facebook, Twitter or by email to ask them to sign the pledge.

GO

monday >

Security Agencies Given Full Access to Telecom Data Even Though "All Lebanese Can Not Be Suspects"

In late March, Lebanese government ministers granted security agencies unrestricted access to telecommunications data in spite of some ministers objections that it violates privacy rights. Global Voices reports that the policy violates Lebanon's existing surveillance and privacy law, Law 140, but has gotten little coverage from the country's mainstream media.

GO

friday >

In Google Hangout, NYC Mayor de Blasio Talks Tech and Outer Borough Potential

New York City Mayor Bill de Blasio followed the lead of President Obama and New York City Council member Ben Kallos Friday by participating in a Google Hangout to help mark his first 100 days in office, in which the conversation focused on expanding access to technology opportunities through education and ensuring that the needs of the so-called "outer boroughs" aren't overlooked. GO

thursday >

In Pakistan, A Hypocritical Gov't Ignores Calls To End YouTube Ban

YouTube has been blocked in Pakistan by executive order since September 2012, after the “blasphemous” video Innocence of Muslims started riots in the Middle East. Since then, civil society organizations and Internet rights advocacy groups like Bolo Bhi and Bytes for All have been working to lift the ban. Last August the return of YouTube seemed imminent—the then-new IT Minister Anusha Rehman spoke optimistically and her party, which had won the majority a few months before, was said to be “seriously contemplating” ending the ban. And yet since then, Rehman and her party, the conservative Pakistan Muslim League (PML-N), have done everything in their power to maintain the status quo.

GO

The #NotABugSplat Campaign Aims to Give Drone Operators Pause Before They Strike

In the #NotABugSplat campaign that launched this week, a group of American, French and Pakistani artists sought to raise awareness of the effects of drone strikes by placing a field-sized image of a young girl, orphaned when a drone strike killed her family, in a heavily targeted region of Pakistan’s Khyber-Pakhtunkhwa Province. Its giant size is visible to those who operate drone strikes as well as in satellite imagery. GO

Boston and Cambridge Move Towards More Open Data

The Boston City Council is now considering an ordinance which would require Boston city agencies and departments to make government data available online using open standards. Boston City Councilor At Large Michelle Wu, who introduced the legislation Wednesday, officially announced her proposal Monday, the same day Boston Mayor Martin Walsh issued an executive order establishing an open data policy under which all city departments are directed to publish appropriate data sets under established accessibility, API and format standards. GO

YouTube Still Blocked In Turkey, Even After Courts Rule It Violates Human Rights, Infringes on Free Speech

Reuters reports that even after a Turkish court ruled to lift the ban on YouTube, Turkey's telecommunications companies continue to block the video sharing site.

GO

wednesday >

Everything You Need to Know About Social Media and India's General Election

The biggest democratic election in the world to date is taking place in India from April 7 to May 14, and, for the first time in India, the results might hinge on who runs a better social media campaign. The Mumbai research firm Iris Knowledge Foundation has said that Facebook will “wield a tremendous influence” but Indian politicians are not limiting their attentions to India's most popular social media platform. In addition to virtual campaigning are initiatives to inform, educate and encourage Indians to participate in their democracy.

GO

EU Court Rejects Data Retention Law, But Data Retention Won't End Overnight

The European Court of Justice in Luxembourg struck down a data retention law Tuesday that required telecoms to keep customers' communications data for up to two years, declaring it violated privacy rights. However, experts warn that the ruling will have no automatic effect on relevant laws in member states, which could lead to “messy consequences.”

GO

More