With More Than 7 Million Searchable Records, Citizen Audit Makes Nonprofit Transparency Easy
BY Sam Roudman | Thursday, January 9 2014
Nonprofits comprise a boggling array of organizations. They include hospitals and universities, as well as political organizations, every stripe of interest group, and the National Football League.
“An enormous segment of the economy is exempt from taxes, which means they kind of have a burden to have increased transparency,” says Luke Rosiak, an investigative reporter at the Washington Examiner. Despite the clear public interest in making the tax forms of nonprofits readily available, and the fact that the IRS has nonprofits submit their 990 tax forms electronically, which should make them easy to post directly online, Rosiak says the IRS does not release them “in any meaningful way.” Instead, journalists, researchers, and those who work in nonprofits have to bumble through The Foundation Center’s 990 finder for scanned PDFs, or pony up serious money for easier access to the still-hard-to-sift through PDFs with Guidestar.
To accomplish what Rosiak thinks the IRS should probably doing already, he started a project called Citizen Audit. The site takes over a decade of nonprofit tax forms and puts them online, and is in the process of running them through computationally intensive optical recognition software, to makes them fully searchable.
By making a database of 990s open to a Google-like keyword search, Citizen Audit lets anyone search for not just individual nonprofits, but the individuals working there. It also solves the problem of untangling who is funding a nonprofits.
“A group does not have to report on its 990 form where it receives money from,” says Rosiak. But “now that you can search across this body of millions of 990s, you search an organization’s name, you’re going to see its donors pop up, because their nonprofits are going to mash, they’re going to report giving to that organization.”
The project got underway last year when Carl Malamud from Public.Resource.Org started requesting bulk data for every recent year of 990s from the IRS. For near $3000, the IRS would send him a stack of 40 DVDs every month with millions of TIFF files.
“[Rosiak] saw I was posting the 990s and asked if he could put something on top of the bulk data that I was providing,” says Malamud. “I said ‘knock yourself out.’”
The requests garnered over 7 million files, going back over a decade. But they still needed to be processed. That didn’t happen until he was at a National Day of Civic Hacking event last June.
“That was the day this project got a lot of momentum,” says Rosiak. At his event, Amazon gave every attendee $100 in EC2 cloud processing credit. Rosiak saw that not everyone was going to use their gift. He rounded up $3,000 worth of credit and in a week ran and went on a “cloud processing spree.” Using the optical recognition software Tesseract, the forms became text-searchable.
“That was sort of the critical mass this project needed to get off the ground,” he says.
The project continues from a computer in Rosiak’s apartment. It's custom built, with a 6 core Intel processor, a water cooler, and nine hard drives. “It’s a computer that’s just running 24 hours a day churning through these docs,” he says. Rosiak also received a $5,000 grant from the Sunlight Foundation to help Citizen Audit along.
To date Citizen Audit has over seven million PDFs to search through, 2.5 million of which have been processed through Tesseract. Rosiak says he has received praise about the project from Pullitzer prize winning journalists, and both pro and anti-union forces. But whether Citizen Audit will continue to grow depends on someone forking over cash to the IRS for the most up-to-date 990s.
“I’ve spent so many hours on this project, and the end result is something that is way more valuable than anything that has existed previously,” says Rosiak,“but it’s still far less useful than if the IRS just uploaded a CSV.”