Hearing Highlights Successes and Challenges of NYC's Open Data Law
BY Miranda Neubauer | Thursday, November 21 2013
New York City Council members, transparency advocates and other advocacy groups on Wednesday had both praise and criticism for the implementation of the city's open data law, as they called for better agency compliance with the law's requirements, more user-friendly platforms and better responsiveness to public demands, as the city moves forward with its mayoral transition.
The city emphasizes that it has released more data than any other U.S. city and that it has gone beyond what is required by the law. But while activists like Code for America's Noel Hidalgo praised the law for helping to make New York City a center for "civic hacking," civic hackers and other advocates said they are frustrated with the available data.
The open data law, passed last year, required city agencies to within a year make all datasets already released publicly online available through the open data portal in machine-readable format, and provide reasons for any such datasets that that they cannot make available. The law also required the Department of Information Technology and Telecommunications to craft a compliance plan identifying which datasets agencies planned to release by 2018, which was released in September.
Wednesday's oversight hearing took place under the direction of the Committee on Technology, which comprises among others City Council member Gale Brewer, who spearheaded the legislation and is the future Manhattan Borough President, and Letitia James, the future City Public Advocate.
A report released by the committee notes that the compliance plan identifies 434 data sets to be made available in the coming years, with the most coming from the Department of Environmental Protection and the Office of the Mayor. "The number of data sets to be released, however, is not necessarily a good reflection of the quality or usefulness of those datasets and is not necessarily a good indication of whether the agency has complied with the spirit of the law," the report notes. The report also notes that of those data sets, 17 percent will be updated daily, while 36 percent will be updated once a year or less. The report includes a chart indicating that "most of the agencies opted for the last possible date of compliance." Another chart highlights that a majority of 2018 data releases are due from the School Construction Authority, the Department of Education and the Department of Finance.
A report from the NYC Transparency Working Group, a coalition of good government and civic technology groups, states that "agencies largely missed the March 2013 deadline for publishing data sets already on their websites in PDF or other closed format," with only around 40 percent of the data sets that should have been posted by the deadline available. "Additionally, many of the data sets scheduled for future release under the Agency Compliance Plans should have been released in March 2013. Essentially, the agencies have given themselves an extension."
Representing the Mayor's Office, Michael Flowers, Chief Analytics and Open Platforms Officer, who has been overseeing the implementation, testified on the progress of the implementation. He noted that there are now over 1,100 unique datasets available, compared with the 350 available at the launch in 2011. He specifically highlighted the releases of PLUTO, a collection of land use and geographic data at the tax lot level, ACRIS property records and the parking ticket data in the last few months. He noted that the city has released a beta platform highlighting the progress made in publishing datasets. His office is currently reviewing lists of datasets that agencies submitted to ensure they include all qualifying data, can get the earliest possible release and that they don't inadvertently include any private data, ahead of a scheduled annual update to the council on July 14.
Planned Releases Powered by Socrata
Available Data Powered by Socrata
Flowers pointed out that many "high-value" datasets are part of the work vendors deliver under contracts to the City of New York. "We believe that, to the greatest extent possible, this data should be released as Open Data," he said. "DOITT aggressively negotiates for the intellectual property rights on all data created, generated or maintained by the City's contractors and whenever possible works to provide public access to that data."
In the coming months, he said his office would be focusing on releasing automated feeds of newly available data related to flu vaccination locations, farmer's markets, Office of Emergency Management incidents and notifications, Office of Management and Budget revenue, expense and capital funds, and work on automatic existing fee data feeds from the Department of Housing Preservation and Development, the Department of Transportation, and the Department of Environmental Protection. In addition, he said his office is focused on measuring the economic benefits of open data, such as in the form of agency efficiencies, cost avoidance and the creation of new jobs and businesses.
Flowers also highlighted existing usage cases for open data release. One is a collaboration between the Department of Education and PediaCities, a start-up and previous winner of the Big Apps Competition, to launch a set of public APIs that form the basis for applications to help middle school students access information on the high school search. In another instance, the Taxi & Limousine Commission is directing fleet owners to an automatically refreshed list of licensed taxi drivers to help them verify that licenses are current.
In response to questions from City Council members, Flowers acknowledged that not all agencies had released their data on time. Around 70 percent of the data set to be on the portal in March was available, DOITT Deputy Commissioner Donald Sunderland said. Flowers said he saw the compliance plan target dates "as an organic deadline," but also emphasized that he felt obligated to work on constantly updating and processing newly available datasets to get as close as possible to 100 percent.
"We're very much dependent on the agencies," said DOITT General Counsel Charles Fraser, adding that when members of the public identify a dataset want to see, the office can follow-up with the agencies' general counsels. Members of the public can nominate datasets for release on the portal and comment on the datasets.
Among the challenges agencies faced in making data available were delays caused by Superstorm Sandy, agency capacity, and legacy systems, Flowers and Sunderland said. Flowers stressed that the goal was to establish an "incentive structure" where agencies "want to default to open" since it makes their work more efficient and helps them access data from other agencies.
In their testimony, advocates praised the effects of the Open Data law so far, but said that much work remains to be done, arguing that many large agencies were "dragging their feet."
John Kaehny, executive director of Reinvent Albany and co-director of the Transparency Working group, called the law "foundational" for open government in New York City in his testimony. "The data law is working and it's working well," he said, noting the release of datasets like PLUTO and ACRIS that advocates had sought for decades and pointing out that news outlets were regularly citing the open data law, especially in connection with online interactives.
"But the implementation does need improvement," he emphasized. "There's a mismatch between what the public most wants and what the city has made available," he said, adding that "big mainline agencies that have a lot of data sets" are refusing requests of DOITT and the Mayor's Office. He singled out the NYPD, the Department of Transportation, the Department of Education and the Department of Environmental Protection. An appendix of the Transparency Working Group's report lists a large number of datasets the advocates would like to see given priority to be available on the portal, including a database of entities and individuals doing business with the City of New York, the list of datasets requested by members of the public, Health Department licenses, information on 8,800 facilities citywide storing hazardous substances, the NYPD's citywide Compstat crime statistics, traffic crash statistics and data on lobbyists registered with the city.
Kaehny urged the city to draw on 311 requests, FOIL requests and website data to inform what is given priority for release to create "an open information ecosystem...The ability to offload requests from 311 and from city websites, that's what this all about." He suggested strengthening the law with a "One Strike and You're In" measure, under which data released as part of FOIL requests would automatically become part of the portal.
"[Agencies] want control of the release of data," Kaehny said. "We don't have a lot of leverage, we can't sue with no public right of action, so public complaints are important." He also said it was key that "agencies need to understand this law helps them, it reduces their workload." He emphasized that City Council members and staff should regularly use and cite the open data law in committee hearings on all subjects. "Part of this law is empowering the legislative branch with information." In addition to educating new members of the City Council, he suggested that borough presidents should spread awareness of the law to district managers and community board members. Rachel Fauss, research and policy manager at Citizens Union, added that the City Council should lead by example by making its own data on expenditure reports and legislation available, and that there should be a list of agencies subject to the open data law.
Noel Hidalgo, founder and program manager of New York City's Code for America brigade betaNYC, also a member of the Transparency Group, testified that the open data law had helped make New York City "one of the premier cities for civic hacking." The civic hacking community has grown from originally around 110 to 1,300 people working on projects on a weekly basis, he said. The group had successfully advocated to shift the focus of the BigApps competition to building communities and companies, he noted, while the release of PLUTO and ACRIS had helped fuel "an explosive demand" to work with property data at the weekly meetings. He also noted that property search company Streeteasy, one of the group's hacknight partners, was recently acquired for $50 million dollars.
But he also outlined several frustrations of the hacker community, especially with regard to NYPD and Citibike traffic and safety data. Often that data is only available with poor data formatting, meaning it has to be scraped, hurting useability and accuracy, he said. Other data is locked in PDFs or spreadsheets, he added. "We want that data disaggregated and frequently updated," he said. At another hearing the NYPD had claimed that civic hackers could already work with the data, he said. "We are the hackers and we are frustrated with the data," Hidalgo said. He suggested that the scope of the law should be expanded to cover items such as property sales and court records, the implementation of an error reporting policy to make it easier to report inconsistencies and maintain quality control, and the embrace of common data standards such as the one pioneered for restaurant health inspection scores.
In other testimony, Ellen McDermott, co-director of OpenPlans, another member of the working group, highlighted how the non-profit company had been using open data to work with local community boards to create maps showing requests for capital spending, assisting with the Participatory Budgeting program, and working with a Brooklyn community to gather safety data as a basis for conversations with the local police precinct. She urged the city to improve the interface of the platform through a "useability clinic" and explore ways for the agencies to use community-edited data, such as through community mapping of street trees and by building on a collaboration between DOITT and OpenStreetMap.
Nathan Storey, product manager for PediaCities, encouraged the city to work on stronger partnerships between data consumers and producers, such as by expanding the volunteer Code Corps program beyond disaster relief and embedding civic technologists in community boards, among City Council staff and agency staff. He also suggested a focus on using data to track early indicators for risks of foreclosure, gentrification, disinvestment and climate vulnerability.
Matt Bishop, CEO of iGiveMore.org, described how APIs could help streamline the application process for government programs, for example with a button during the tax filing process showing users what programs they could qualify for, and also emphasized the importance of APIs to allow for collaboration and common authentication between different levels of government.
Many testifying outside the open government and technology space focused on the portal's usability problems. Juan Martinez, general counsel at Transportation Alternatives, noted that earlier legislation in 2011 required the NYPD to publish detailed data on traffic crashes and summonses. But he said the promise of the bill had been frustrated because of how the data was published. "If it weren't for the civic hackers, we wouldn't see any benefit to the legislation that [Council member Lappin] passed," he said. They had been able to extract some data, but the result is not as clean and useable as it could be, and requires a time- and labor-intensive process, he said. Often City Council staff and others "end up coming to us as opposed to being able to find it themselves," he said.
"It's not as if we need the NYPD to do more work. Actually we're asking them to do less work," he said. "They add formatting in a way that introduces errors and can't load on computers ... What we're hoping for is more legislation to convince the NYPD to put less effort into it and make it something we can use." He noted the potential of receiving e-mail reports on traffic hotspots and for City Council members and Community Board staff to more easily able to explain the need for speed-bumps and focus on the most important intersections. OpenPlans and Transportation Alternatives planned to hold an event Thursday highlighting how maps and interactive tools could help reduce street fatalities.
Sara LaPlante, data analyst for the NYCLU, criticized the NYPD for only deeming six data sources as eligible for release, including ones that were image and text heavy reports. "Submitting entire reports misses the mark of the law," she said. "These reports come pre-packaged from a PR standpoint and require researches to deconstruct narratives. Even if the reports were released as raw data, the list is far from exhaustive." She emphasized how data on the Stop, Question and Frisk policy, which is published but not available as data on the portal, has informed the debate on the issue.
Lourdes Cintron, founder of the Citywide Mental Health Project, criticized in her testimony that a search of the database did not bring up any data on mental health issues. "The search for either 'mental health' or 'department of health and mental hygiene' gives you, both of them, 'NYC’s famous Baby names' and 'food vendors without permit'," she said. "Also, a search on '311' shows not a single call requesting information about mental health services or a single incident related to it. Almost all 311 reports since 2010 are related to vermin and rats. A researcher could easily conclude that rat infestation has no impact in the city’s mental health. This could matter for policy and budget purposes."
She also strongly criticized the useability of the platform. "The website is confusing and, in my view, (as it is now) useless for the purpose stated in the law ... It requires high levels of computer and research skills to figure out which [format] to select, and once selected, the format is still confusing. I could not use it, even though I do have computer skills," she said. "As it is now, most of the members in my group do not have the skills to navigate this website’s graphical user interfaces if they needed to access the information supposedly available. This website was designed for researchers, not for the general public."
"What I loved was the really specific suggestions, not pie in the sky," Brewer said after the hearing. "I think what's important for the next administration is to work with the agencies to really get them to put data in open format on the portal, and then it's up to all of us to work with the communities to help them understand and make it useable," she added, noting that Mayor-elect Bill de Blasio used to be a member of the technology committee.
De Blasio on Wednesday announced the 60 members of his transition team to help him shape the make-up of his administration. The members include William Floyd, head of external affairs for Google, Ken Lerer, co-founder of the Huffington Post and managing director of Lerer Ventures, Tim Armstrong, chairman and CEO of AOL, Kevin Ryan, chairman and founder of Gilt and Jukay Hsu, founder of the Coalition for Queens.
An earlier City Council hearing last week highlighted the importance of open data for the ability to track the expenditure of funds connected with the recovery from Superstorm Sandy. The hearing concerned legislation that would require the city to establish a database to track such funds, similar to one that tracked the dissemination of stimulus funds. City Council members and advocates emphasized that the database could help prevent waste and wage theft that often especially affects immigrant workers.
Thaddeus Hackworth, general counsel for the NYC Mayor's Office of Housing Recovery Operations, said the city was working on establishing a database, possibly to be released before Thanksgiving. But he warned that the bill as written brought up some privacy concerns and he said it might not be feasible to provided detailed data on job creation and workers' borough of residence since employers had no contractual obligation to provide that data. City Council members countered that the city could force contractors to provide that information and that it would be easily accessible through their payroll systems.
Josh Kellerman, an analyst for the Alliance for a Greater New York, emphasized in his testimony that the database should interface with the Open Data portal and provide downloadable and useable data, and that it was important for it not to be an "island of information."