Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

The Library of Congress is Going to Archive Twitter. Why?

BY Nancy Scola | Wednesday, April 14 2010

The Library of Congress just tweeted this: "Library to acquire ENTIRE Twitter archive -- ALL public tweets, ever, since March 2006! Details to follow."

Why would the Library of Congress want to become the repository for millions of tweets sent back and forth around the world over the last four years? Well, the LOC has embraced a mission of becoming the repository for the record of American history, and that's a history that has been unfurling increasingly online in the past several years. Still, it does raise questions. (In fact, enough so that when I first saw this tweet, I assumed that it was mistaken. But there it is, on @librarycongress.) Is this going to be a continuously updated tweet archive? Will it be searchable by researchers? Will it track retweets and responses? Why not archive Facebook?

And then there's the fact that this seems to tweak the social contract between Twitter and users a bit. Tweets have at least the feel of being ephemeral, even if we all know that that data is capable of being archived.

With this move, the Library of Congress is taking the vacuum approach. More data is better, and the Library puts itself in a better position to be a resource the more information it contains. But you can look at the legal profession, and how it has struggled somewhat to adapt to electronic discovery, and start to think that something like this isn't without risks. Beyond that, are you at all freaked out that the occasionally ridiculous things you tweet are going to be housed for all time in the nation's library?

But that's all uninformed speculation. This news is all of ten minutes old. Indeed, stay tuned.

UPDATE: The Library of Congress has posted on their blog the promised details on their plan to archive Twitter. Alas, at the moment the site is kaput from the traffic. Guess people are interested.

AND ANOTHER UPDATE: The LOC's blog post is back up, with promises of more detail to follow. The Library's Director of Communications Matt Raymond writes:

We will also be putting out a press release later with even more details and quotes. Expect to see an emphasis on the scholarly and research implications of the acquisition. I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I’m certain we’ll learn things that none of us now can even possibly conceive.


So if you think the Library of Congress is “just books,” think of this: The Library has been collecting materials from the web since it began harvesting congressional and presidential campaign websites in 2000. Today we hold more than 167 terabytes of web-based information, including legal blogs, websites of candidates for national office, and websites of Members of Congress. We also operate the National Digital Information Infrastructure and Preservation Program, which is pursuing a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations.

Twitter's Biz Stone spells out some of the nuances of the arrangement:

It is our pleasure to donate access to the entire archive of public Tweets to the Library of Congress for preservation and research. It's very exciting that tweets are becoming part of history. It should be noted that there are some specifics regarding this arrangement. Only after a six-month delay can the Tweets will be used for internal library use, for non-commercial research, public display by the library itself, and preservation.

News Briefs

RSS Feed wednesday >

In Mexico, A Wiki Makes Corporate Secrets Public

Earlier this year the Latin American NGO Poder launched Quién Es Quién Wiki (Who's Who Wiki), a corporate transparency project more than two years in the making. The hope is that the platform will be the foundation for a citizen-led movement demanding transparency and accountability from businesses in Mexico. Data from Quién Es Quién Wiki is already helping community activists mobilize against foreign companies preparing to mine the mountains of the Sierra Norte de Puebla.


thursday >

NY Study Shows How Freedom of Information Can Inform Open Data

On New York State's open data portal, the New York Department of Environmental Conservation has around 40 data resources of varying sizes, such as maps of lakes and ponds and rivers, bird conservation areas and hiking trails. But those datasets do not include several data resources that are most sought after by many New York businesses, a new study from advocacy group Reinvent Albany has found. Welcome to a little-discussed corner of so-called "open government"--while agencies often pay lip service to the cause, the data they actually release is sometimes nowhere close to what is most wanted. GO

Responding to Ferguson, Activists Organize #NMOS14 Vigils Across America In Just 4 Days

This evening peaceful crowds will gather at more than 90 locations around the country to honor the victims of police brutality, most recently the unarmed black teenager, Michael Brown, who was shot and killed by a police officer in Ferguson, Missouri, on Saturday. A moment of silence will begin at 20 minutes past 7 p.m. (EST). The vigils are being organized almost entirely online by the writer and activist Feminista Jones (@FeministaJones), with help from others from around the country who have volunteered to coordinate a vigil in their communities. Organizing such a large event in only a few days is a challenge, but in addition to ironing out basic logistics, the National Moment of Silence (#NMOS14) organizers have had to deal with co-optation, misrepresentation, and Google Docs and Facebook pages that are, apparently, buckling under traffic.