Anonymous data isn't really so anonymous after all, says Microsoft

Anonymous data isn't really so anonymous after all, says Microsoft

Even anonymised data can leak information about IP addresses and cookies

Article comments

Data routinely gathered in Web logs - IP address, cookie ID, operating system, browser type, user-agent strings - can threaten online privacy because they can be used to identify the activity of individual machines, Microsoft researchers say.

At the same time, analysis of such data when anonymised can help detect malicious activity and so improve overall Internet security, they add.

The researchers found that 62 percent of the time, HTTP user-agent information alone can accurately tag a host. Combine that same information with the IP address, and the accuracy jumps to 80.6 percent. If the user-agent information is combined with just the IP prefix the accuracy is still 79.3 percent, they say.

The highest accuracy came when more than one user ID was linked to a single host, as would be the case in a family that shares a single computer. In such cases, multiple IDs would accurately represent that one host computer. The accuracy rate was 92.8 percent.

The analysis of this seemingly benign information was based on a month - August 2010 - of anonymised Hotmail and Bing data on hundreds of millions of users. The researchers say they tried to find out whether a single piece of log data can uniquely reveal a particular host.

They found that even anonymised data can leak information. For example, replacing an IP address with its IP prefix still yields enough information that when combined with other commonly logged factors can be revealing. ""[C]oarse grained IP prefixes achieve similar host-tracking accuracy to that of precise IP address information when they are combined with hashed [user-agent] strings," the researchers say.

They looked at data gathered from application-layer events directed at Web servers within the Hotmail and Bing networks.

From Hotmail, they gleaned coarse data about the OS and browser types, source IP address, time of login and anonymized user IDs. From Bing, they gathered anonymized HTTP user-agent strings, source IP addresses of queries, times of queries, anonymised cookie IDs issued by Bing and creation dates of the cookies.

The researchers set out to detail how much identifying information gets revealed by common identifiers. They weren't trying to discover specific individuals' activities, but to understand the patterns of aggregated activities and explore their implications.

The researchers say their use of the data falls within Microsoft's privacy policies and as part of that policy the data can't be made available to outside researchers.

They found that service providers can recognise 88 percent of devices that receive a cookie, clear the cookie, then return to the site, if they examine other identifying factors they gathered during the initial connection. Even if they use private browsing mode, which is designed to protect user identity, they can still be identified, the researchers say.

"Our analysis suggests that users who do not wish to be tracked should do much more than clear cookies," the researchers say, and note that in some circumstances clearing cookies can help identify a particular host. "Uncommon behaviors such as clearing cookies for each request may instead distinguish a host from others who do not do so."

The researchers did offer some tips for maintaining anonymity:

* Use a browser whose default user agent string is popular, making that string less useful for identifying your machine in particular.

* Even when using anonymous routing like Tor, use tools such as Torbutton to manage identity information.

* Consider using proxies.

Share:

Comments

Advertisement
Send to a friend

Email this article to a friend or colleague:


PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.


We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message

ComputerworldUK Knowledge Vault

ComputerworldUK
Share
x
Open
* *