By Joshua Keating
WASHINGTON If you've been following the fallout from last week's NSA surveillance revelations, you may have seen repeated reference to a certain "recent MIT study."
"Unique in the Crowd: The Privacy Bounds of Human Mobility," published in Nature's Scientific Reports last year, has been cited by multiple media sources, including this one, as evidence for why your metadata matters. Indeed, re-examined in light of the current headlines, the concerns raised by the study seem quite prescient.
The paper's authors, Yves-Alexandre de Montjoye, CÃ©sar A. Hidalgo, Michel Verleysen and Vincent D. Blondel of MIT and the Universite Catholique de Louvain, examined the dataset of 15 months of anonymous cellphone data from 1.5 million people in a "small European country."
There were no names, addresses or phone numbers in the data, yet they argue "if individual's patterns are unique enough, outside information can be used to link the data back to an individual."Just four points of observation time of the call and the nearest cellphone tower were enough to identify 95 percent of people in the database.
In other words, if I make four calls from four different places over the course of a 15-month period, my pattern of movement could be identified out of a population 2 1/2 times the size of Washington, D.C. If you were able to cross-reference that with my Twitter feed, say, you'd be able to build a pretty good picture of who I am. The pattern still worked when the researchers "coarsened" their sample by using less specific time observations and lumping multiple cellphone towers into one. The way we move through the world and share data while we're doing it is pretty distinctive, even at high altitude.
"We use the analogy of the fingerprint," said de Montjoye in a phone interview this week. "In the 1930s, Edmond Locard, one of the first forensic science pioneers, showed that each fingerprint is unique, and you need 12 points to identify it. So here what we did is we took a large-scale database of mobility traces and basically computed the number of points so that 95 percent of people would be unique in the dataset."
Hidalgo says that because phone companies like Verizon need to keep this kind of data for billing and customer service purposes, it seemed inevitable that it would sooner or later be put to questionable use. "It felt quite natural that something like this was taking place, but the scale was certainly surprising," he said.
At the very least, Montjoye hopes that this will be a wake-up call for the public to realize just how much information they're sharing. Their study "shows that when it comes to rich metadata datasets, there are no clear cuts between anonymous and not anonymous data. Achieving anonymity is really hard and might even be algorithmically impossible."