The Anonymity Hoax

One of the phrases you will often hear Phorm repeat is this; "We cannot know who you are - it's impossible" (source).

Its pure nonsense.

Firstly, Phorm shouldn't have access to your private communication traffic in the first place. Its private, and should stay that way.

The internet is a vast database of unstructured text, in many languages, in many cultures, in many character sets. Some of those pages may include personal information about you, for example your blog, social networking sites, forums, customer account pages, or webmail.

The idea that a magic algorithm exists that is capable of fully anonymising that text is laughable. If it did exist, Phorm ought to patent it. It would certainly justify a Nobel prize.

Language is ambiguous. Take this piece of text;

    "Kent is in the south east of England"

Does Kent refer to the County of Kent in the south east of England, or the person who runs an adware company?


View Larger Map

To anonymise that block of text, and ensure it was impossible to identify anyone called Kent, Phorm would need a list of every name in the world, including the name Kent. To be on the safe side, you'd also have to remove the name England, it could be someone's surname. Phorm do not have a list of every name in the world.

Or how about this snippet of info;

    "Kent works with Phorm developers"

Is that a typo, is it form or Phorm? Phorm would need to have a list of every employer in the world to ensure you couldn't be linked to your work place.

How about foreign languages? How might their anonymising process cope with this French language piece;

    "Mon nom est Jean. Mon ami s'appelle Rose"

If their code can't handle multiple languages, how will it know that the reference to Jean is not a pair of denim trousers? Does Rose refer to a popular flowering garden plant, or Jean's friend Rose?

Lets try an instant messenger ID.

    "My name is Andy Flying-Pig, and you can reach me on aflyingpig2"

Is the word aflyingpig2 an instant messenger ID? Or just a typo?

And suppose your language isn't even represented by alphabetic characters? Take Chinese for example;

     

If you know enough Chinese, apparently that would tell you I live in Britain.

The extent of information published about the anonymisation process is a single paragraph, from Richard Clayton's excellent report;

    "48. The page is broken up into individual "words". Words which are solely made up of digits will be ignored, words that contain an @ (assumed to be an email address) will be ignored. There is an attempt to spot names by their context (viz: ignoring material after a "Mr" or "Mrs"). Words that are not very interesting (so called "noise words" like and/but/the/or/a etc) will be discarded"

Paragraph 49 says that Post Codes will be ignored "but this has not yet been implemented".

Conclusion

Phorm should not have access to your private communication traffic, anonymously or not.

Complete anonymisation of the internet is a pure hoax. Phorm may be able to remove some personally identifiable data, they certainly will not be able to remove all personally identifiable data, and enough will leak to allow you to be identified.

You have no visibility of the resulting data they will hold about you. And you are being asked to trust a company with alleged links to malicious software, who have tested their software in secret in 2006 and 2007 on hundreds of thousands of unsuspecting BT customers.

 

Why would you?