The fleet of yellow taxicabs ever-present on New York City streets are as quintessentially New York as "I Love NY" T-shirts and Statue of Liberty paperweights. Everyone who lives in New York - from regular folks to investment bankers to A-list celebrities - has almost certainly taken a ride at least once. The cabs are an institution, and as such, they're subject to a lot of rules, regulations and surveillance.
For instance, did you know that every single taxi ride is completely cataloged by the city's Taxi and Limousine Commission? That includes the date, time, GPS location, fares and tips. Because that information is public, it's subject to Freedom of Information Act requests.
One data analyst wanted to track a year's worth of cab activity and display it as a cool time-lapse map. You can check out his work here - it's fascinating. He had no idea, however, that his work would open up celebrities' privacy to the whole Internet to peruse.
Using the public information he requested and posted online, you can figure out exactly when famous people rode cabs, where they were going, how much they paid and how much they tipped. The gossip site Gawker reported that movie stars like Bradley Cooper and Jessica Alba took cab rides and possibly didn't tip the drivers.
Personally, I don't think it's any of our business how much people may or may not have tipped, whether they're famous or not. On top of that, we don't have any way of knowing whether these actors left a cash tip. It's all very gossipy and I don't like it.
What I am interested in is how this public information was leveraged by hackers to invade these people's privacy. It's all thanks to weak security encryption. When the Taxi and Limousine handed over the data to the analyst, they tried to anonymize cab medallion numbers using an algorithm called MD5. All you need to know is that this algorithm is easy for a hacker to crack.
Even though no passengers' identities were recorded in the database, it didn't matter. If you had a time-stamped photograph of a person getting into a cab with the medallion number recorded, you could easily track that person's entire ride and learn what they tipped. But how would you be able to find such a photograph? Gawker writes:
[An analyst] had realized that paparazzi photographers in New York City frequently capture spot celebrities entering or exiting yellow taxi cabs, and that many of their pictures depicted the cab’s unique medallion number. After all, the number is prominently displayed on the car’s exterior: In lit letters on top, in black paint on the side, and on both license plates. You can spot the cab’s medallion number in every photograph in this post.
This analyst used the pictures cross-referenced with the database to identify Bradley Cooper and Jessica Alba's tips. It was all part of a larger piece about the dangers of data analysis with regards to privacy. After all, the identities of taxi passengers are not part of the public domain.
Of course, that was all Gawker needed to start identifying celebrities and posting their fares and tips. You probably won't be surprised that most of them are very fair tippers. Again, that's not the point.
As more and more data is collected by computers and parsed for trends by data analysis, we're going to run into privacy issues like this. Should legislation restrict the flow of data under the Freedom of Information act? Or do we just need stronger encryption?
Imagine if a stalker snapped a photo of you in New York getting into a cab. They wouldn't need to follow you - they could just track you online later. None of these are simple answers. After all, the data we collect in bulk can be very useful and helpful.
What do you think? Do we need more restrictions on what government information is available to the public? Or simply tighter security standards? Let me know in the comments below how you would handle this complex privacy issue.