Free the Data: The Debate Over APIs and Open Government
BY Alex Howard | Monday, March 17 2014
The White House's digital government strategy explicitly focuses on APIs, directing agencies to stand up application programming interfaces to enable government staff and the public to dynamically access government data. Vendors like Socrata support APIs as the default method for publishing open data, certainly in part because they can charge for them.
There's no question that they're catching on across the federal government in 2014, from SaferProducts.gov to mortgage data from the Consumer Financial Protection Bureau to, later this year, FDA recalls and adverse reactions data. Adopting APIs represents an important cultural shift for agencies, given that they explicitly acknowledge the value of publishing data for third party use.
"There is great value in being able to reach people instantaneously with this type of critical information, but we know that our tools, no matter how proactive and and innovative we are, will not reach every U.S. citizen," said Janice Jacobs, U.S. Assistant Secretary of State, at this year's "SafetyDataPalooza." She added:
"We can't expect that every traveler will come to our website or seek out our Twitter feed. People don't simply think about how the US government can help them until help is needed. That's why making our travel safety data more accessible is so exciting.
Our application programming interface lets developers access the most important information we release for the public, our travel alerts and our travel warnings, and present them in an infinite number of ways to serve their users. These alerts and warnings provide critical time sensitive information to US citizens abroad about their safety, as I mentioned not every traveler will consult our website before going abroad, thanks to this API, our hope is that a traveler purchasing a plane ticket online will automatically see our information about their destination country."
A State Department spokesman said that its API has received more than 8,000 data calls since its soft launch in 12/1/2013.
Open government advocates like the Sunlight Foundation, however, argue that government APIs aren't a backup plan in the case of a shutdown or outage.
"APIs can be extremely useful, but they also centralize control and form a single point of failure," argued Eric Mill, a developer at Sunlight Labs, in October 2013, during the federal government shutdown. "Ultimately, APIs are optional — data is a necessity."
What happens when digital government shuts down was made abundantly clear to the nation last fall, although some of the decisions officials made looked bafflingly arbitrary or even politicaly motivated to some observers.
Not only federal websites shut down with the government, but government databases went offline as well, a series of data casualties that affected data journalists and left policy wonks and developers alike scrambling for alternatives.
I couldn't help but wonder that if it does, in fact, cost more money to shut down websites and data feeds than keep them up, why the White House didn't providing clearer guidance to agencies to leave them online. So, I asked.
"Which platforms remain operational during a government shutdown is determined by agencies, in consultation with their General Counsel, on a case by case basis, consistent with applicable law," said U.S. chief information officer Steven VanRoekel, in response to my inquiry. "The determination is based on whether the agency has the authority to continue spending funds on this activity during a lapse in funding."
When I followed up with a question regarding future provisions to keep federal open data feeds up in the future, perhaps through archive.org, Akamai or Google, however, I didn't receive a clear answer.
"The best way to ensure that open data feeds stay up is for Congress to provide funding in a timely manner so as to avoid shutdowns," said VanRoekel.
While the White House has shared an official API standards guide, the Office of Management and Budget has not issued guidance on service level agreements for APIs. It's unclear where OMB stands on service level agreements for APIs, issuing API keys to developers, or the thorny issue of charging for high rates of calls to an API.
The Obama administration has now publicly raised the question of whether, in some situations, the U.S. government could or should charge for APIs. Posting in the U.S. government APIs Google Group, Gray Brooks, the senior API strategist at the General Services Administration, asked:
"Can you take a second and respond - either for yourself or any other agencies you know of:
Using a pay model for API services they offer
Offering a service-level agreement (SLA)
In addition to any examples, any thoughts about this? Of course the default is free, but I think there's a place for pragmatic use of either or both of these. It's come up a few times lately and it'd be great for us to know of any precedence. "
Brooks' inquiry catalyzed a dozen responses from inside and outside of government, including followup posts on charging for APIs by Govfresh founder Luke Fretwell, Philadelphia chief data officer Mark Headd, and former Presidential Innovation Fellow and current API evangelist Kin Lane. Everyone contributing to the thread on Google Groups or writing elsewhere urged caution and deliberation before adopting any given model, though there was a wide variety of opinions on which models made sense to pursue.
"My default answer to this is no, that we should treat it much like we do other public goods," wrote Fretwell.
"Just like any venture, government agencies need to reconfigure their budgets and IT operations to provide a public API offering. In this day and age, government needs to take into account that data and APIs are a twenty-first century public offering. If agencies are trying to justify data/APIs from a budgetary perspective, the first step would be to reallocate funding priorities and eliminate antiquated services these offerings replace."
"My personal opinion is that all government APIs should require keys, so that agencies can measure and understand API usage," wrote Lane.
…if developers want to get full downloads of data they can, but to use APIs you need to register and key up. Entry level tiers should allow for sensible rate limits, but always cap usage, with the expectation that developers can request rate limit increases, which at some point may warrant the charging of fees. If all agencies employ this method, APIs can still be freely accessed, agencies have the data they need to better understand how APIs are use, and by whom, and heavy or specialized API usage can be dealt with on a case by case basis—leaving open the potential for fees applied to API usage in future…or not.
I still like the city and national park example I pulled from a conversation with Andrew Nicklin(@technickle) in New York a couple years ago. Some public parks are free and open, some have small usage fees, but all commercial activity on public lands requires additional levels of access, fees and even revenue sharing in some situations. We just haven't applied the same thought process to our virtual public lands, and like we had to do in the early days of governing our public lands, we have some experimentation, learning and standardization to do around the management of our virtual public resources.
"Metering access to public information has serious problems," wrote Eric Mill:
So many applications of public datasets require first obtaining the entire dataset (not just a way to plug in to an API), and so if bulk data isn't offered, then the API becomes the bulk data source.
A year ago, GPO was nudged by NAPA to consider user fees, and I wrote for Sunlight about the harm this could bring. There's no such thing as a small fee when multiplied by millions of requests. If the Federal Register API charged per-query, it would be expensive even to experiment with the activity of the executive branch.
A more classic example is PACER. Using PACER to find a court docket and download public records -- and watching your bill skyrocket as you **navigate search results** -- is enough to put anyone in a righteous rage. But the real problem is not just PACER's particularly hostile metering approach - it's that now that their budget is defined by user fees, they now have to vigorously defend their fees to defend their budget. It completely misaligns their incentives to serve the public, and causes them to see their true audience as whatever their current (well-budgeted) customers are.
And that to me is the real problem with charging for public data - even if you make a pricing plan that feels accessible to your current audience, today, you're willingly putting blinders on, and creating an engine whose future evolution is difficult to predict and that will be very hard to turn off.
"In the open data world, especially internationally, it's generally accepted that gov can charge for the marginal cost of reproducing a record," wrote open government developer Josh Tauberer:
For APIs, the marginal cost of an API call is, maybe, the proportion of the server expenses used up by that call. At scale that may not be negligible, and it's probably perfectly fair to start charging users that use significant resources the marginal cost of their API calls, especially if there are the same sorts of fee waivers as with FOIA.
A related point other people in the thread made, however, is that a paid service may carry with it a heightened level of customer service and technical robustness, both of which sound a lot like the result of service level agreements.
Developer Charles Worthington highlighted at least one federal agency that's already selling graduated access to APIs. The "Dept. of Energy has a small number of APIs with paid tiers and/or "usage fees" or "licensing fees," he wrote:
"So far as I know, all of these were built by and are operated by a National Lab or other contractor. One example I can think of is the Home Energy Score Tool API: https://developers.buildingsapi.lbl.gov/hescore/documentation/licensing. In practice I don't think there are many users of this API, but it's at least one example I could find.
I commonly hear program managers worry that by investing in creating a dataset or tool and providing API access to the data, that they are basically committing to some ongoing cost to maintain the service in perpetuity. I don't think this fear is unreasonable. The idea that large users/businesses that directly benefit from these APIs could help defray this cost is appealing to these managers.
I have very mixed opinions on whether this is a good idea. But even if I was fully on board with the idea of charging a usage fee I would strongly advocate for a completely self-provisioned sign up process and a generous free tier (capped by total calls per month or by rate limiting) that makes it totally free to get started with the API in a matter of minutes. That's a best practice in the commercial API space."
Whatever policies officials decide upon -- and there's sure to be a broad variety across local, state, federal and sovereign governments to contrast and compare over the coming years, building and maintaining APIs will need to be considered in the context of an entire portfolio of IT assets. In other words, legislatures and executive agencies should be strategic in how they invest public revenues in technology, a point that Philadelphia CDO Mark Headd has been making strongly for some time:
Ultimately, if and how a government employs a fee structure for access to APIs should be connected to plans for long-term sustainability of the IT infrastructure support the API. As I’ve said before, far too often governments do not budget properly for the long term care of IT assets.
Could charging for access to government APIs generate a reliable revenue stream to help ensure resource for upkeep, maintenance and improvements to the systems behind these APIs?
At a minimum, governments should consider deploying a test environment for their APIs. If governments charge users for access to their APIs, they should allow new users and developers to experiment and build new apps without incurring costs, until they are ready to go to production.
It may be some time before there's any federal government-wide policy for APIs, if ever, however, because of the laws governing the sale of government assets.
"If you really wanted to look into it you'd want to consult your administrative law specialist (see this statutory provision), "suggested Hyon Kim, the deputy program director for Data.gov.. "If you look at the CFR tab, it appears that establishing user fees could also entail rulemaking."
Federal agencies are another matter: vast amounts of data are available from Data.gov or a host of other websites and servers, from GIS data to weather data. It's not clear how much money the U.S. federal government earns, in aggregate, from selling such data to commercial concerns, nor what third-parties charge for access to cleaned up versions of it, but
A city hall, state house or government agency charging the press or general public to access or download data that they have already paid for with their tax revenues, however, remains problematic.
It may make more sense for policy makers to pursue a course where they always make bulk government data available for free to the general public and look to third parties to stand up and maintain high quality APIs, based upon those datasets, with service level agreements for uptime for high-volume commercial customers. The determination of whether federal agencies offers SLAs today is "currently made by the agency on a case-by-case basis," said VanRoekel.
The freemium model that ProPublica's data store is exploring to sell custom datasets may be worth considering, at least for non-profits or non-governmental entities. Raw, source data is free, while data that ProPublica staff have invested time and resources in cleaning and formatting will cost money to acquire. How much depends upon the entity: Journalists might pay $200, while academics might pay $2000 and commercial companies even more.
That's not to say that governments should get out of the API game. The U.S. Census, for instance, has shown that it's possible to stand up and maintain an API for thousands of developers. but that there are development and maintenance costs that any government agency preparing to deploy an API must consider.
"The open data policy emphasizes access in multiple formats and states that ‘Agencies should make data available in multiple formats according to their customer needs,’ said VanRoekel. "We strongly encourage agencies to make information available and open through bulk format as well as APIs."