You are not logged in. LOG IN NOW >

Developers Are Already Submitting Patches to Obama's New Open Data Policy

BY Nick Judd | Thursday, May 9 2013

Photo: Tom Lohdan / Flickr

The White House on Thursday morning released an executive order from President Barack Obama that mandates any data in information systems created by government agencies going forward be available for anyone to access, download, and use.

Administration officials say this will present opportunities for entrepreneurs across the country.

"We sit on a treasure trove of data in government," White House Chief Information Officer Steven VanRoekel said today during a conference call with reporters. "Today most of that data is locked up in paper, proprietary systems and other things. As part of this motion, government agencies are going to, as they create new or modernize their existing systems, by default they will be required to make their data in those systems open and machine readable."

In recent years, VanRoekel said, hundreds of companies have launched with a focus on government data, creating thousands of jobs.

The new directive will also be a boon for transparency, VanRoekel said.

"This, we feel, will create opportunities around transparency and efficiency inside the walls fo government as well as fuel economic opportunity on the outside," he told reporters.

The order declares that the "default state" of government information resources published or modernized going forward "shall be open and machine readable," with exceptions for privacy, confidentiality, and national security. Accompanying the order is a memo from the Office of Management and Budget that requires agencies to maintain an "enterprise data inventory" — a list of all its databases — and, based on that inventory, release a list of datasets available to the public.

The policy is posted to GitHub, a repository for open-source projects, and people in the field of technology in government are already proposing changes through GitHub's chosen method, pull requests.

Citing a new hospital billing data release, VanRoekel said that these datasets will create new opportunities for companies who can turn data into valuable market intelligence for consumers. Home buyers, for instance, could benefit from listing agencies who augment their existing information with data on neighborhood crime statistics, broadband accessibility, average energy consumption and cost, and other federal data, the White House CIO told reporters.

Federal officials are not immediately available to clarify VanRoekel's statement about the "hundreds of companies" or "thousands of jobs" created through open government data, but promise a response. (I'll update this post when it arrives.) Officials hope government data will make itself useful across many industries and the White House is naming no specific target. The memo specifies that agencies "adopt a presumption in favor of openness to the extent permitted by law and subject to privacy, confidentiality, security, or other valid restrictions."

To assist agencies in meeting the demands of this new initiative, OMB and the Office of Science and Technology Policy have launched "Project Open Data," described in the OMB memo as an "online repository" of information and schema to help agencies get with the program. Already live, it includes several tools, such as software that creates a programming interface for data stored in common databases or in CSV format, a common file type for storing data.

Agencies have six months to revise their policies and create a public data listing of all available datasets. Requirements and specifications for the collection and storage of new data apply only to datasets collected going forward.

Open government and transparency activists are still discussing the memorandum's utility for their own work.

In a blog post, Josh Tauberer — who built GovTrack, which is for now the best place to go for updated data on the doings of Congress — says he's concerned that the White House is mandating the use of "open licenses." Open license is different from "public domain," he says, and that means it can be used to prevent access or use.

"A public domain dedication differs from an open license in that it disclaims copyright and other protections, whereas, again, an open license implies that such a limitation on use is already present," he writes. "The CC0 statement [a pre-built statement about intellectual property rights composed by Creative Commons] was successfully used by the Council of the District of Columbia to disclaim copyright over data files containing the DC Code."

What's more, he adds, citing New York Times developer Derek Willis, the government obliges agencies to consider the "mosaic effect" of its data — that is, the ability for datasets, when cobbled together, to reveal personally identifiable information or potentially compromise "security." It's a potentially overbroad exemption that might allow government officials to misconstrue "national security" for their own "job security," in other words, and in so doing block access to valuable insight about what government is doing.

Although Obama made catching up on a backlog of Freedom of Information Act requests a priority in his 2009 Open Government Directive, watchdogs still peg federal responsiveness to FOIA requests at 55 to 60 percent. So it's unclear if this directive really does break any ice on transparency.

Officials were not immediately available to respond to a follow-up request for comment.

The Sunlight Foundation's* John Wonderlich is thrilled that the White House has adopted a "default to open," but notes some cautions:

To be sure, getting agencies to publicly list all their data that can be open will be a significant challenge, even with a high-profile Executive Order. Concerns like cost, privacy, and security will be used to justify non-disclosure (as they often are), and will be used to try to justify keeping even a description of many datasets private. That's a good struggle to have, though, and one we're looking forward to. Without this Executive Order, too many agencies are managing data holdings that they haven't comprehensively reviewed, without public oversight, while advocates, journalists, and policymakers have an unclear view of what agencies know, and what they could be releasing.

"Agency-wide comprehensive audits of datasets," Wonderlich added in a follow-up email, "is a big and aggressive move."

After all, one of the big complaints from transparency activists is that they can't ask for data until they know what they can get. Soon, that is supposed to change.

* TechPresident publisher Andrew Rasiej and editorial director Micah Sifry are senior advisers to the Sunlight Foundation.

This post has been updated.