The tangled web of internet tracking

Advertisement can be more intrusive than analytics when it comes to privacy on the internet and the singularity of everything online connected is already a reality, right here, right now.

About a week ago I installed Mozilla’s Lightbeam plugin for Firefox. Lightbeam visualises the way you are tracked across the internet through a combination of sites, cookies and API calls. After a week of typical day-to-day use, I have taken a long hard look at the reports from my browsing habits and have come to quite a few surprising revelations when it comes to tracking and privacy.

While the focus on internet privacy seems to be on tracking cookies, they are just one way of tracking behaviour across sites. Calling an API, from something that is obvious such as Google Analytics or something less obvious like a web font service, could conceivably allow cross-site tracking. So if website A uses fonts from fonts.googleapis.com it could conceivably allow tracking via that connection by the log files on the font server when you visit site B that also uses fonts.googleapis.com. Similarly, many websites embed YouTube videos as a matter of course, which gives YouTube / Google a lot of visibility as to what happens across sites.

It was indeed a big G that was the worst offender in my Lightbeam report, but it was not the G I was expecting. I presumed Google to be the worst offender, but on my list it came in third. Lightbeam visualises sites as circles, third-party sites (referenced but not visited per se) as triangles, and cookies and API calls as lines of different colours. Larger triangles and circles mean more connections.

The largest circle was a tie, both with 46 connections between that champion of internet privacy, the UK’s Guardian newspaper and fellow news site Reuters.

Google.com was just third at 29 connections.

But perhaps most interesting was the way the graph quickly evolved over time into a single, tangled web where almost every site is connected to every other site through a third party. Forget about three degrees of separation, the vast majority of sites were just one site or API or cookie away from another. These links came in the form of known traffic analytics sites such as Doubleclick and Google Analytics (in that order) but a close third was Facebook - even though I do not visit Facebook.com having deactivated my Facebook account as my new year’s resolution. Yes, those Facebook plug ins that allow you to comment without logging in allow you to be tracked across sites.

Cookies were almost secondary to API calls in terms of links, probably because of the bad publicity they have had over the years. But it is safe it to say that lawmakers in the EU have overreacted to the privacy threat from cookies and have overlooked the worse threat from API calls when it comes to citizens’ privacy. Not that they really mind about a citizen’s privacy, they only mind when their politicians are bugged, but that is another matter entirely.

On a side note, while most sites have complied with EU privacy laws and now have a banner asking you to accept the use of cookies in order to use the site, it is interesting to note that most set the cookies anyway regardless of whether you click accept or not.

So while The Guardian and Reuters take the trophy for most Big Brotherly sites, the winner for most private site that I have visited would have to be UPS. Yes, of all the mainstream sites I have used over the past week, only the boys in brown stand apart from the singularity of links on an island, not linking to anyone else, at least in an open way that Lightbeam can analyse.

Actually there were only two sites that I visited in the past week that were not connected to the swarming mass in the middle, UPS and tvnihon.com - a site that does fan subs of Japanese live-action fighting movies (Americanised as Power Rangers).

Honourable mention must go to Amazon. Rather than connections to a zillion other sites, Lightbeam clearly showed Amazon connecting to its own analytics and advertising sites which were not used by anyone else. This meant that Amazon was in a cluster away from the rest of the internet. Almost, that is, except for two links into it the frenzy - doubleclick and Google syndication.

That said, perhaps with traffic analysis from the Amazon cloud, Amazon would have greater insight into our browsing habits than anyone else, hence they did not need to rely on traditional web technologies in the first place.

All the telco sites I visited were middle of the road, usually with links to the usual Google Analytics, Twitter and Facebook. On the other hand, it was the news sites, not just the Guardian, that were heavily into analytics. In a way it makes perfect sense. The cash-rich telcos are terrified of privacy scandals and act properly to avoid any controversy. Media, on the other hand, are struggling for survival and need analytics, at any cost, to sell advertisements.

A final thought is that Lightbeam shows the potential for tracking as much as it does actual tracking. It is one thing if a lot of sites use Google Analytics which does allow for easy cross site tracking, quite another if many sites use fonts.googleapis.com. The former was designed for tracking, the latter is just a font service, but one that could, conceivably, be used for tracking.

So, dear reader, do take a moment and install Firefox if you have not already done so and install the Lightbeam plug in and see for yourself, who is the biggest threat to your privacy based on your own surfing habits. I am sure that many surprises will await.