Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

The Library of Congress is Archiving 170 Billion Tweets — on Tape

BY Julia Wetherell | Monday, January 7 2013

The Library of Congress announces an update on the Twitter archive (on Twitter).

When the Library of Congress teamed up with Twitter in 2010 to archive four years’ worth of activity on the microblogging platform, the aim was to preserve a slice of early-millennial life to future researchers. Now the two-hundred-year-old institution is grappling with the resulting 133 terabytes of data, a bundle that includes every 140-character message sent out from Twitter’s six-year-history, from its inception in spring 2006 to December 2012.

While the initial agreement called for archiving up to 2010, the LOC subsequently determined to extend the project indefinitely, keeping up with the nearly half-billion tweets dispatched every day. There will be a six-month holdout before new tweets enter the archive, an interesting statement on our contemporary definition of “history.” The struggle now is for the LOC to create a keyword-searchable catalog for the vast amount of metadata associated with the archive, including the time and location that indicate a tie to certain events — election night on 2008, as one example. However, as the LOC’s recent report confirms, their agreement with Twitter states that the “Library cannot provide a substantial portion of the collection on its web site in a form that can be easily downloaded.” Therefore, it goes on to say, the archive will exist primarily in the physical realm – on tape.

The technical infrastructure for the Library’s Twitter archive follows the same general practices for monitoring and managing other digital collection data at the Library. Tape archives are the Library’s standard for preservation and long-term storage. Files are copied to two tape archives in geographically different locations as a preservation and security measure.

While this storage method seems oddly similar to creating nuclear-winter-proof seed vaults, it does anticipate the fact that, with all of our cloud-tending tech, we don’t exactly have a permanent record for our online life. For that, future historians may have the Library of Congress to thank.

The headline of this post has been corrected. An earlier version implied the Library of Congress report said Twitter posts were being stored on an analog media, which it does not say.

News Briefs

RSS Feed today >

First POST: Climate Changes

Google ends its support for ALEC; how network-centric organizing powered the big People's Climate march; is it time to retire the term "blogosphere"; and much, much more. GO

monday >

Germany Releases Open Data Action Plan Amidst Grassroots Enthusiasm and Pirate Party Turmoil

The German government on Wednesday unveiled its open data action plan to implement the open data charter established by the G8, now G7, countries. But while German open government advocates welcomed its release, for them it does not go far enough. Even as the open data movement is taking new hold in Germany on the local level with encouragement from the new Code for Germany effort, in the national Pirate Party, the supposed German net party, internal leadership disputes are overshadowing its digital agenda. GO

First POST: Packed

The impact of Sunday's giant People's Climate march in NYC; how the Kapor Center is increasing the role of minorities in tech; why Uber's business model is anti-worker; and much, much more. GO

friday >

First POST: Scotched

Why conservatives should back net neutrality; how big data may damage civil rights; the ways Silicon Valley start-ups are exploiting freelance workers; and much, much more. GO

thursday >

First POST: Resets

Apple's new iOS8 promises greater user privacy; Occupy Wall Street three years later; how tech may tilt the Scotland independence vote; and much, much more. GO

wednesday >

First POST: Connecting the Dots

Take Back the Tech grades Facebook, Twitter, et al, on transparency; MayDay PAC founder Lawrence Lessig talks about getting matched funds; and much, much more. GO

More