House Publishes U.S. Code in XML
BY Miranda Neubauer | Tuesday, July 30 2013
The House of Representatives is now making the United States Code available for download in XML format, Speaker John Boehner's office announced today.
Transparency advocates like Joshua Tauberer, creator of Govtrack, welcomed the move, but are still waiting on the publication of legislative data in bulk format.
The Speaker's press release notes that the data is compiled, updated and published by the Office of Law Revision Counsel and is available for download as individual titles or in bulk.
The press release points out that the "House created the Legislative Branch Bulk Data Task Force in 2012 to expedite the process of providing bulk access to legislative information and to increase transparency for the American people."
"[The U.S. Code data] is really good example of this kind of project done right," Tauberer said. "The documentation is very comprehensive and detailed and really one of the best examples of documentation for a government XML standard that I've ever seen. The data is structured in a coherent, natural way."
He said that the new format would make it easier as a developer to process the hierarchy of the Code and access specific sections or elements of it in context, compared with what is currently possible through an HTML format.
The new tool will make it possible for Govtrack to offer a service allowing users to track elements of the Code and receive an alert any time a bill mentions a specific section, he said.
He noted that that Sunlight Foundation's Scout tool functions in a similar way, but that the new data will make such a tool easier to maintain and allow for comprehensive results. Govtrack had had a similar function up until 2011, he said, when he discontinued it because it was too hard to keep it up-to-date.
But for Tauberer, "the big elephant in the room" is the unavailability of legislative data. Currently, Govtrack and other transparency groups scrape such data from the Library of Congress' Thomas platform, a process that leads to inaccuracies and is hard to maintain. What is currently available in bulk format is the text of bills, he emphasized, but not their legislative status, meaning it isn't easy to create a spreadsheet of all bills passed or find out how many bills were passed by one chamber and not the other.
Tauberer suggested that the new XML release was the result of a larger internal House modernization project, even though it is being billed as a transparency initiative.
Citing previous advocacy efforts and discussions about what the Bulk Data Task Force was focused on, he said that while the work on XML was positive, "that's not the reason the task force was created," and warned against "losing sight" of the legislative data priority.
In September, on the occasion of the launch of Congress.gov, a Library of Congress spokesperson told techPresident that Congress had "not requested that data be provided in that manner."
In August, a report co-authored by Tauberer, the Sunlight Foundation and others welcomed the House Leadership's commitment to bulk data and outlined a path towards implementation.
FierceGovernmentIt reported on July 22 that a December 31 report by the task force was recently made available as part of the Legislative Branch appropriations bill.
"Consistent with the pledge by House Leaders, the Task Force recommends that it be a priority for Legislative Branch agencies to publish legislative information in XML and provide bulk access to that data; that the XML Working Group develop and maintain standards to ensure compatibility and interoperability of all machine-readable data published by the Legislative Branch, and that the Task Force be extended to the 113th Congress to continue to coordinate, initiate and track transparency-related projects," the report's executive summary reads.