NYC Open Data Advocates Focus on Quality And Value Over Quantity
BY Miranda Neubauer | Thursday, July 24 2014
The New York City Department of Information Technology and Telecommunications plans to publish more than double the amount of datasets this year than it published to the portal last year, new Commissioner Anne Roest wrote last week in an annual report mandated by the city's open data law, with 135 datasets scheduled to be released this year, and almost 100 more to come in 2015.
But what what matters more to New York City open data advocates than the absolute number of the datasets is their quality and values: creating a transparent process of releasing the data, making the data machine-readable and prioritizing release of data sets in high demand. As preparations are underway for City Council hearings on the law, New York City's open data progress and challenges are both a model for and reflective of open data efforts across the country.
The law, spearheaded in 2012 by now-Borough President Gale Brewer and the New York City Transparency Working group, a coalition of good government groups and technology advocates, requires that public data sets that agencies already make available online must be accessible through New York City's open data portal in machine-readable format, and that agencies must provide an explanation if that is not possible.
Overall, the city says it has published nearly 1,300 agency data sets to the portal, with 345 currently set to follow up until 2018.
"I think New York City is doing an amazing job with Open Data. I think that the city is not taking nearly enough credit for a lot of the datasets involved with the Mayor's Management report," said City Council member Ben Kallos, chair of the Government Operations Committee, referring to datasets related to a mandated annual public report card of city services. "It may appear like it's only one dataset here and there but the underlying data is so rich and contains so many hundreds of other datasets that the administration is releasing so much more information than anyone expected by this point."
Among the new 160 datasets since the city released its first plan in September are directories of play and park areas, 2040 population projections, emergency response incident data, quality of life conditions reported by city inspectors and agency performance indicators. According to the portal, the most accessed data sets in the last year include taxi medallion drivers, WiFi hotspot locations, 311 service requests from 2010 to the present, subway entrances, restaurant inspection results, the dataset of available datasets and NYPD motor vehicle collision data, which has only been available since May.
Prioritizing Value and Open-Source Platforms
In addition to releasing data, the city is attempting be transparent as it reviews documents included in the original plan and in some instances determines they will not be released to the portal in their current form.
Of 27 data sets that have been removed from the plan, several of them are being merged together. "One of them is related to privacy concerns and I will be the strongest advocate to protect people's privacy," said Kallos, referring to that dataset, an aggregate list of visitors to the Department of Corrections.
But for Kallos the next key step of the open data process would involve his Open FOIL legislation that "would also include a one strike and you're in feature where items that people are frequently requesting would be prioritized and put online and thereby save money by not requiring as many FOIl requests -- literally demand should drive releases. The only concern is that a lot of the information that's being released is not really tied to the data that people want the most."
While Kallos said that Socrata, which hosts the city's open data platform, has been open and pro-active about making the platform more user-friendly, Kallos said that having been "elected as an open source developer" he would always prefer free and open source software licenses. He hopes to suggest that the administration look into saving money by partnering with the federal government to use its open source data.gov platform. He noted that the federal platform would be available to New York City under a GPL version 2 license, meaning New York City could use it for free and make modifications as long as it rereleases any changed code, which would also make it available to other cities.
Kallos, who added that he has also met with a member of the British Parliament's Commission on Digital Democracy, is pursuing a similar goal with the City Council's legislative platform. "We are working with our existing vendor Granicus to have an open API," he said, an issue that also came up Personal Democracy's Open Gov workshop.
In an effort to coordinate this open government advocacy across cities, he and San Francisco Supervisor Mark Farrell issued a challenge to civic technologists to develop a free and open source digital legislative platform by fall 2015, and have established a Free Law Founders initiative partnered with Seamus Kraft of the Open Gov Foundation and city officials in Chicago, Washington D.C., and Boston. On Monday, Washington D.C. Mayor Vince Gray issued an open data and transparency directive and launched a FOIA portal, though it came under some criticism from open government advocates.
Recently passed New York City Council rules reforms mandate that the city release its legislative and funding data in machine readable format and that the Speaker draft a council technology plan. On Thursday, the New York City Council was expected to pass two other bills co-sponsored by Kallos and Technology Committee Chair James Vacca. One will require the city to publish its laws in machine-readable format, while the other would require the online publication of the City Record, which publishes city procurement notices, in machine-readable format, and Kallos has also introduced legislation to make information on New York City film shoots machine-readable.
Importance of Transparency and Underlying Data
Praise for the updated New York City Open Data plan also came from Dominic Mauro, a staff attorney for Reinvent Albany. "What we like is New York's continued commitment to the process," he said. "This is the first automatic deadline under the new administration and it's good that even 2 1/2 years after the passage of the original bill we're still going through this process." He also highlighted the release of new Mayor's Management Report data. "With 21,000 rows in that one dataset... I skimmed through the first 1,000 rows or so and counted a couple dozen different datasets," he said. "They [even] set up a dataset that was specifically a list of datasets removed from the last plan and their reasoning for removing each of these from the open data plan," he noted. "What's not in the open data plan that we would like to see are data sets that are being frequently FOILed or otherwise are very high priority for agency stakeholders. We'd like to see FOIL being used to power New York City's Open Data movement."
One of 27 items removed from the open data plan one is an NYPD report titled Murder in NYC. Reinvent Albany noted last fall that the report's data should be available in an open format rather than as a PDF. As a narrative report it does not qualify as "data" under the law. In 2012, the most recent year for which it was available, the report has graphs and illustrations showing, for example, that 37 percent of murders occurred between 11.p.m. and 5 a.m., that blacks, who make up 23 percent of the city's population, represent 60 percent of victims, that 81 percent of black male victims aged 16 to 21 were killed with a gun, and that over 90 percent of murder suspects were black or Hispanic.
Kallos has also introduced OpenGis legislation that would require the city provide more detailed geographic information on its crime map as well as release the underlying data in machine-readable format.
Also not yet available on the portal is detailed city budget information, which for example, would make it easier for the public to learn that DoITT's open data team received around $500,000 in new funding to add three developer positions, a project manager and a business intelligence analyst to its open data team.Gotham Gazette also recently highlighted how a Socrata Open Budget tool launched by Boston Mayor Martin Walsh could be a model for New York City. Some members of New York City's Code for America brigade BetaNYC have worked on a budget API on the basis of a budget PDF scraper. In a statement, New York City's Office of Management and Budget said it was always looking for ways to use new tools and data to improve transparency.
Open Data Plan Not Set in Stone
Nicholas Sbordone, spokesperson for DoITT, said the release of 67 datasets since September that had not originally been envisioned as part of the open data release plan, such as the vehicle collision data that the NYPD released in connection with Mayor Bill de Blasio's Vision Zero initiative focused on preventing traffic fatalities illustrated that "the legislation was crafted in a smart enough way to know that stuff would be somewhat fluid." Of the 345 sets currently set for release until 2018, he noted that that number "may go up or down" as agencies continue to identify and evaluate their datasets.
While the initial focus was making agencies aware of the open data law's requirements and relying an agencies to identify data for the portal "as we've matured, we're giving more specific guidance [on releasing important data]," said Nicholas O'Brien, acting director of the Mayor's Office of Data AnalyticsHe added that MODA also works to step up the release of data in conjunction with administration priorities, such as Vision Zero and de Blasio's universal pre-K initiative. That includes making a dataset on the city and community based early childhood centers offering seats and their capacity available to the public, he noted, but internally also using data analysis to manage permit information, gather information for outreach purposes and monitor enrollment to ensure that available seats are filled, he explained.
More Crime Data and Improved Interface to Come
In spite of the removal of the "Murder" report, O'Brien said the city was "absolutely committed to releasing more crime data." He said that MODA is currently working with the NYPD on "the right level of granularity" given restrictions such as victims' rights and other disclosure rules. As the implementation of the open data law has gone forward, "there is greater understanding of the approach of releasing as much raw and as granular data as possible," O'Brien said. When made aware of problems or issues with the data, the team forwards them to the respective agency that has subject matter expertise, he said.
He emphasized that the city is working with Socrata to improve the platform's interface and search function in the near future, and is also evaluating most requested and most viewed data sets as it looks to automate data updates.
Going Beyond the Numbers and the Next Frontier of Open Data
Earlier this year, a Boston City Council member cited the the total number of New York City data sets as an example as part of her effort to improve that city's open data policy. But last August, former DoITT Director of Research and Development Andrew Nicklin outlined in a blog post how he felt that Socrata was not always very clear in defining and counting what constitutes a data set, Sbordone noted that while a release such as the Mayor's Management Report only counts as one data set that "within that set is 10 years of data from 50 or so different city entities....a dataset could be something very small a couple of rows down a couple of columns across...or it could be something huge like this, we're talking about 1277 rows of data and 10 columns across."
Sbordone said he would "absolutely" like to see the place to nominate data be more prominent. "You also want to be able to demonstrate the value of how open data and its power can help folks that might not be super technical," said Sbordone. "I think that's kind of the next frontier of data, how do you personalize it in a way that really offers folks an appreciation of how powerful it is."
— NYC IT & Telecomm (@NYCDoITT) July 18, 2014
Sbordone said he couldn't comment on the details of the OpenFOIl legislation. "The idea of having things prioritized based on suggestions that people make is core to not just our approach but core to the bill itself," he said, adding that he thought that if there was an online mechanism for FOIL requests it would make sense to link it to open data and use that to help prioritize releases.
"I don't think we're married to any one way," Sbordone said when asked about Kallos' data.gov proposal, noting that other options could come under consideration when the Socrata contract is up for renewal.
Going Beyond Transparency with BetaNYC and BigApps
betaNYC is also calling on the city to improve the quality of its data. "While working to have a transparent government is a great goal, our community is concerned with new leadership and continued inconsistency of data fidelity," a BetaNYC statement e-mailed by executive director Noel Hidalgo says. "We need NYCHA to be as transparent as NYC 311. We need the NYPD to improve its crash data fidelity and to improve its crime and violations data. We need this Administration's commitment to move data beyond transparency."
BetaNYC has already started a working group focused on the issue data needs, data accessibility and fidelity, incorporating discussions from the BetaNYC's Facebook group, mailing list and weekly meetings. One issue that has recently prompted discussion about the value of open data and privacy is a visualization of a taxi driver's data created by BetaNYC co-captain Chris Whong based on data he received through a FOIl request. Another BetaNYC member, Vijay Pandurangan, had realized that the anonymization of individual taxi medallion and log information in the dataset was easily reversible.
A BetaNYC blog post also notes that this year twenty percent of the entries in New York City's BigApps competition, which encourages developers to draw on city data, came from BetaNYC members, as the competition shifted from being focused on start-ups to solving civic challenges. The full list of finalists will be announced in the coming weeks, but a weekend block party event featuring a BigApps Battlefield pitch and competition among apps with the most votes from the public has already resulted in one finalist with a BetaNYC team member: Heat Seek aims to counter New York City heating code violations by using mesh networks of temperature sensors to report heating violations, aiding the housing court process, notifying tenants, lawyers and landlords, and integrating its reports with 311 data.