Who Controls 'Twistory?'
BY Nick Judd | Thursday, May 5 2011
University of Massachusetts political science professor Stuart W. Shulman has built software for doing textual analysis on large amounts of data. So when he saw an explosion of Twitter activity around the death of Osama bin Laden, he saw the opportunity to collect a new "first draft of history," and turned his tools, connected to Twitter via API, to scraping the service.
The result is about a gigabyte total of data containing many, many tweets that had the words "osama" or "bin laden," and Twitter has demanded that he stop sharing it. Twitter contacted Shulman, who was giving the data away and offering licenses for his software as a textual analysis tool for academics to work with it, and accused him of violating Twitter's terms of service. Shulman has removed the link to the dataset from his website.
Shulman is frustrated:
Now Twitter telss[sic] us not to share large collections. In my view, this is prime historical data (Twitter-History a.k.a. ‘Twistory’) that yearns to be free.
Now Shulman's scholastic interests appear to be directly opposed to Twitter's, which has an obligation to protect both its commercial viability (all it has to sell against is the content passing through its platform) and the privacy of its users. Facebook, which limits the use of its API to store user data for prolonged periods, has a similar stance.
One of the most interesting things about the digitization of primary source material is the immediacy with which a moment can be analyzed — but if Shulman waits, he should also be able to get that data through the Library of Congress.