Personal Democracy Plus Our premium content network. LEARN MORE You are not logged in. LOG IN NOW >

Code Warriors Debate Whitehouse.gov Robot Commands

BY Sarah Granger | Thursday, January 22 2009

As the tech community poured over the new whitehouse.gov site, one of the first subterranean changes noted was that of a file most people would never notice called robots.txt. This file serves as a notice to search robots informing them of what files they should or shouldn't survey. Upon seeing the new version of the file, some noticed that it only had two lines of code excluding robot searches vs. the former whitehouse.gov robots.txt that had nearly 2400 lines of exclude lines by the end of the Bush administration, sparking excitement and controversy over what the change means in terms of government transparency.

The text from the new robots.txt file:

User-agent: *
Disallow: /includes/

A sampling from near the end of the previous file:

Disallow: /president/text
Disallow: /president/waronterror/iraq200404/text
Disallow: /president/waronterror/photoessay/text
Disallow: /president/winterwonderland/iraq
Disallow: /president/winterwonderland/text
Disallow: /president/world-leaders/iraq
Disallow: /president/world-leaders/text
Disallow: /president/worldunites/iraq
Disallow: /president/worldunites/text

Cory Doctorow, Editor of Boing Boing and Former Outreach Director for the Electronic Frontier Foundation was one of the first to report this finding, with just the facts followed by a bunch of commenters asking for explanations.

Proponents of the belief that the move to the vastly smaller file was a statement about transparency claimed were ecstatic. According to Patrick Thibodeau of ComputerWorld, New York blogger James Kottke "thinks that by eliminating the Bush disallow list on its first day in office, the Obama administration was sending out a symbolic message." Kottke, in his post on Tuesday, alluded to the "huge change in the executive branch of the US government." In e-mail to Thibodeau, Kottke wrote: "One of Obama's big talking points during the campaign and transition was a desire for a more transparent government, and the spare robots.txt file is a symbol of that desire."

Presenting an alternate view, Declan McCullagh of CNET News pointed out that the Bush whitehouse.gov robots.txt file followed the letter of coder law for the most part in terms of what to disallow with the exception of a few incidents that were corrected. McCullagh brought to attention the idea that perhaps the new robots.txt file is actually too short. "It doesn't currently block search pages, meaning they'll show up on search engines--something that most site operators don't want and which runs afoul of Google's Webmaster guidelines."

While most of the technical experts weighing in suggest and expect that the robots.txt file should grow, most of them explain it as just a normal process a website undergoes over time. Andy John, a search developer for DeepDyve, puts it like this: "robots.txt is just a request. Robots can do whatever they like anyway." He then went further to describe what that means. "For example, there is a program "wget" (web get). You give it a URL, it downloads it and saves the file... You can tell it to download an entire site. It honors robots.txt by default. But by just adding these parameters you can tell it to ignore robots.txt and get everything:  wget -erobots=off http://www.whitehouse.gov

As to why those who developed the new whitehouse.gov site would want to code it this way, Jaelithe Judy, a Search Engine Optimization specialist and political blogger says "Google does generally encourage webmasters to use disallows to keep from having their search pages spidered; this is to help keep a Google search from returning a whole page of search results from other sites' internal search engines, instead of relevant original content. However, in some cases a search result from a site is a meaningful result. For instance, when you are searching for 'DVD recorders' and the Amazon search page for 'DVD recorders on Amazon' pops up, that might actually be useful to most users."

She added that "Google is still trying to work out how to sort annoying search-generated page results from the useful ones. The Whitehouse.gov ones may lean toward being useful. For instance, if you are a middle school student doing a report on the First Ladies, and you get a Whitehouse.gov search page for First Ladies, that has all sorts of different links to different sorts of information, that might actually be useful."

The bottom line about robots.txt? John says, "It's really more of a serving suggestion."

News Briefs

RSS Feed yesterday >

"Power Politics in the Age of Google"

TechPresident's editorial director, Micah Sifry, will be speaking this afternoon on a panel at Harvard University called "Power Politics in the Age of Google," alongside Susan Crawford, Nicco Mele, Elaine Kamarck and Alexis Ohanian. The panel will be moderated by Harvard Shorenstein Center Director Alex Jones, and will be live-streamed here. GO

House Republicans Get a Jump on the Budget

Via Politico's Mike Allen, the House Republicans are out with a video — this one attributed to Majority Whip Kevin McCarthy — getting the drop on President Barack Obama's next federal budget, expected Monday. GO

Mittbucks.com Lets Voters Compare Their Paychecks With Romney's

What would it take for Mitt Romney to be able to relate to the average American's daily economic life? He'd have to pay $1,208.09 for a gallon of gas, according to Mittbucks.com, a web site recently created by Adam Rosenscruggs and his wife Danielle in Washington, D.C. The eye-popping figure results from an annual income that I plugged in ... GO

What Twitter Won't Tell You About the Election

A new study released on Tuesday by the Pew Research Center for the People & the Press on Tuesday offers the opportunity to get real about what the political conversation on Twitter and Facebook can — or can't — tell you about the progression of the 2012 political campaign. Pew has found that even among users of Twitter and Facebook, a paltry percentage of people use social networks to get news about politics: Only 24 percent of Twitter users in the sample and 25 percent of Facebook users said they "sometimes" got campaign news through that network, while a full 40 percent of Twitter users in the sample and 46 percent of other social media users reported "never" getting campaign news through either Twitter or Facebook. GO

Navigating New York's "Road Map for the Digital City," One Year In

In May 2011, New York City Mayor Michael Bloomberg revealed a "Road Map for the Digital City," a plan to use technology to make city government more and participatory, and to leverage the city's tech sector for economic and civic gains.

New York City Chief Digital Officer Rachel Sterne will join our editorial director, Micah Sifry, on a conference call this Friday afternoon to discuss the progress on that road map so far. The call is free and open to anyone to join. You can sign up here.

GO

tuesday >

Pete Hoekstra's Campaign Website's "Offensive" Source Code Changed After Outcry

As if "chop suey fonts" and obvious graphic allusions to the stereotype of the Chinese as the Yellow Peril weren't controversial enough, the group that created an incendiary microsite for former Rep. Pete Hoekstra's campaign has managed to further fan the flames with what it's calling a mistake in its code. GO

Fidel Castro Loves the Internet

“The Internet is a revolutionary instrument that permits the receiving and transmission of ideas, in both directions, that is something we should know how to use,” Fidel Castro told a crowd of supporters on Feb. 4, according to the state-owned Cuban newspaper Granma International. Castro, who made his first public appearance since April 2011, launched his two-volume memoir, “Guerilla of Time,” and took the opportunity to discuss issues of importance to him. Earlier this week, Miranda Neubauer reported that one of these topics was the need for the Internet. Castro has been a proponent of the Internet as a tool for the exchange of ideas since 2003, but the average Cuban citizen faces great difficulty getting online. GO

Claire McCaskill Hires Blue State Digital's Alex Kellner As Digital Director

Missouri's senior Democratic Senator Claire McCaskill has hired Blue State Digital's Alex Kellner as its digital director. GO

More