How Bike Share Data Can Share Your Identity Too
BY Sam Roudman | Thursday, April 17 2014
One of the benefits of opening civic data is that it can provide a detailed picture of who is using what service. This can be a vital tool for planners and bureaucrats allocating ever scarcer resources, and a boon to entrepreneurs, civic hackers, and anyone waving a flag for more transparent and responsive cities. But if that picture is too detailed, openness gives way to intrusion.
Take London's bike share for instance. Software engineer James Siddle took the open data available, and painted frighteningly accurate maps of the movements of individual customers.
"Someone who has access to the data can extract and analyse [sic] the journeys made by individual cyclists within London during that time," writes Siddle in the post, "and with a little effort, it's possible to find the actual people who have made the journeys."
Siddle's maps show the routines of various "Boris Bike" (named after the city's beloved mayor) customers. They include information on where an individual rider travels, the number of trips they take, and when they take them. Siddle created filters to look at travel patterns on weekdays and weekends, so you can differentiate between someone's work and leisure routines. One way trips are presented in orange, round trips are purple.
Although the information is ostensibly anonymous, Siddle makes the point that it would require just "one more piece of information," to identify the identity of a rider, whether it be a Facebook or Foursquare check-in, a tweet, or a geo-coded Flickr post.
But surely, bike share data is more secure in the US? Well, sort of. I asked Siddle to look at bike share data in New York City and the Bay Area, and he said the datasets leave open potentially revealing details:
"The New York data includes dates of birth and gender information, which allows journeys to be put into buckets [however full] disaggregation will be difficult. The San Francisco data includes zip codes for annual subscribers - this could actually be an issue, for example, if there's only one annual subscriber in a particular zip code."
Wonderful. But how much of a danger is it?
"In my opinion people should be concerned, but it's a grey area," says Siddle.
"By itself, this data may be fairly innocuous, but the issue comes when you combine it with other datasets - dataset that may either be leaked, or released intentionally."
Siddle's work points out that open data is not an automatic win for citizens, governments, and business. It requires a fine tuned balance between transparency and privacy. So riders beware, when you share a bike, you share your information as well.