Voter File 2.0: Catalist, Democratic Tool
BY Micah L. Sifry | Friday, May 9 2008
I'm in a breakout session at the New Democratic Network's daylong conference on "New Tools, New Audiences," listening to Vijay Ravindran, the CTO of Catalist, talk about web 2.0 and its development of an "Enhanced Voter File." As usual, these are my rushed notes, and at best a good paraphrase of what was said, not direct quotation.
The traditional voter file, which is collected by state bodies, is just name, contact info and party registration, and past voter behavior.
The enhanced voter file, something that Democrats, Republicans and sometimes other organizations build and maintain, contains commercial data, census data, historical information about your behavior, and specialized data (like lifestyle choices). (Vijay notes, later in the Q&A, that this kind of data is often a weak indicator of people's actual political preferences, and hypothesizes that someday under an OpenID framework, campaigns or organizations like his might be more interested in highly accurate information that individuals volunteer about themselves.)
Enhanced voter files are used for canvassing, and also for modeling campaigns.
Catalist is building on the lessons of 2004 (where Democrats had a database meltdown) and working to build a 50 state national database:
Catalist's voter file has the names of 180 million registered voters, plus 75 million unregistered people (for use by voter registration groups), enhanced with commercial data, specialty data (like who owns hunting licenses), integrated it with the Democrat's VAN application, and with a tool for subscribers to mine the data.
Catalist's goal is to be a permanent piece of progressive infrastructure. Vijay talks about Tim O'Reilly's "What is Web 2.0" paper as his "baseline in driving Catalist." So he goes thru some of O'Reilly's key points about the development of web 2.0.
1. The web as a platform: Examples are Amazon, Facebook, Google. For Catalist, this means: if our data is inaccessible, it doesn't exist. (This is another reference to the Democratic data disaster of 2004.) A database administrator is a poor excuse for an interface where people can self-administer. From the very beginning, having a back end web interface was essential. We also created a web services API for progressive organizations with technical staff.
2. Harnessing collective intelligence. Examples are Wikipedia, Amazon, Flickr. For Catalist, this means storing, organizing and utilizing in perpetuity the collective personal data of its customers. It means removing the technical limitations around cooperation, building value-added meta-data that no one else can, and relying on its customers to make its data more correct. (This is sort of the open source model for bug fixing, but Catalist isn't an open product. But inside its ecosystem, it sounds like it's applying the same logic.)
3. Data is the next "Intel inside." For Catalist, this means instant access to information about nearly everyone over 18 in the US in a single format, easy upload of proprietary data for integrated data mining, giving back unique identifiers for each data point, and the creation of a proprietary matching system for the data. What this means you can combine field canvass lists, fundraising, membership, polling info and online engagement, to get a 360 degree view of people and figure out more ways to engage them (or ask them for money, he notes, if you haven't already).
4. End of the software release cycle. Examples are eBay, Netflix, who make their fixes in real time online. The same can be true for politics. Data can be updated in critical election months; no more stale data. And new features and bug fixes need to be deployed rapidly. (No more 2004 horror shows for the Democratic side, in essence. He draws a parallel to Christmas season at Amazon.)
5. Lightweight Programming Models. Examples are Amazon Web Services, YouTube's embed feature. Catalist's approach is to not try to do everything. Their web service is designed to allow other's creativity to take advantage, like MoveOn's "Vote Poke" application. Their formats are usable by microtargetters, and other groups can syndicate their data (like America Votes)
6. Software Above the Level of the Single Device. They're making application configurations for field, analytics, fundraising, strategy, pollsters. etc. Vijay mentions the need to make these more Blackberry friendly, given how many political staffers have them.
7. Rich user experiences. Catalist's Q tool has a professional UI design, maps, drag and drop crosstabls, inline updated counts and access control for organizations.
Where is this going? With more data and more collaboration, and web services, more innovative applications will get built both by Catalist and others. He sees the semantic web coming, intelligent crowd sourcing, integrated web mining...and ultimately more progressive power.
Question time. I ask about their business model. Subscriptions to Catalist are $25K to $400K per year. Several hundred organizations are clients. About 40 people on staff. 15 terabytes in size database. He analogizes it to an electric company, where no one org would ever have the wherewithal to build one, but it is essential infrastructure.
I also ask: "How do they insure that they're only selling their services to progressives?" He says they haven't hit the interesting question yet of "what if Joe Lieberman wants our services?" It would be up to the board. AARP would be considered progressive, he thinks.
In terms of how information is shared internally: A lot of nonprofits that use Catalist's data release their own results back for others to use, such as Womens Voices, Womens Vote, one of their clients. By and large, donor data, membership info tends to be kept private by clients.
A very impressive presentation. Ravindran, who left Amazon to lead Catalist's technology team, has clearly brought the wisdom of Silicon Valley to the political infrastructure business. He's definitely someone to watch.