Journalist Turns Anonymised Data into Profiles

A journalist and a data scientist secured anonymised browsing data for three million users. They created a fake marketing company to get the data and were able to de-anonymised much of it i.e. they could identify the users.

Anonymised data means the names have been removed along with supposedly anything that makes it possible to identify the individuals.

How is that Possible?

There are various techniques that can be used to identify people in the data, such as:-

  1. Anyone who visits their own Twitter analytics page will have a URL in their browsing record which contains their Twitter username. Find that URL, and you’ve linked the anonymous data to an actual person.
  2. A similar trick works for German social networking site Xing.

For other users, a more statistical approach can be used to de-anonymise the data. For instance, just 10 URLs can be enough to uniquely identify someone. For instance, how few people there are at your company, with your bank, your hobby, your preferred newspaper and your mobile phone provider. By creating “fingerprints” from the data, it’s possible to compare it to other, more public, sources of what URLs people have visited, such as social media accounts, or public YouTube playlists.

Eckert, a journalist, worked up with data scientist Andreas Dewes to acquire personal user data and see what they could get from it. They created a fake marketing company, complete with its own website, a LinkedIn page for its chief executive, and even a careers site.

The pair presented their findings at the Def Con hacking conference in Las Vegas

They made the site full of pictures and marketing buzzwords, claiming to have developed a machine-learning algorithm which would be able to market more effectively to people, but only if it was trained with a large amount of data. Then they asked companies for anonymised data to try on their system.

The data they were eventually given came, for free, from a data broker, which was willing to let them test their hypothetical AI advertising platform.

Another discovery through the data collection occurred via Google Translate, which stores the text of every query put through it in the URL. From this, the researchers were able to uncover operational details about a German cybercrime investigation, since the detective involved was translating requests for assistance to foreign police forces.

Where did all of the data come from?  A number of browser plugins collect data, Google Translate collects data and various websites collect this data.

It is supposed to be anonymised when passed on to ensure no-one can identify the individuals, but this clearly is not true.

Do leave a comment on this post – click on the post title then scroll down to leave your comment.

Leave a Reply