For this investigation the NOS staged multiple first time website visits for ten thousand websites, to collect the cookies these websites place on the computers of first time visitors. We found that hundreds of websites start tracking your online behaviour at the first site visit without your permission – which is against the law.
What makes this project innovative?
For the first time, there is a bird-eye view of cookie law compliance in the Netherlands, across different markets and varying from small to large websites.
What was the impact of your project? How did you measure it?
Publication of the investigations foundings caused some stir. Some of the companies behind websites found to place tracking cookies by the first visit without consent, were not aware of the problem and promised to stop using tracking cookies on first visit as soon as possible. Other sites, using tracking cookies to serve personalised advertising, stated that its an inevitable part of their revenue model - some said they would 'look into the issue'. The investigation was a collaboration of NOS Stories (team that produces news on social media for 13 - 18 year olds) and the national news desk. For NOS Stories YouTube was the main platform: currently the video has over 117.000 views and 380+ comments. Needless to say that the results of the investigation were also published in the tv and radio broadcasts, on the website and in the app - including a push message.
Source and methodology
We used a headless Chrome browser to visit a list of 10.000 URl's. The list contained the most popular websites in the Netherlands, and a selection of websites from sectors such as healthcare providers, companies and news websites. We visited each website 10 times using a script. We gathered all cookies that were placed during each visit. Then, we analyzed all unique domains responsible for storing one or more cookies, sorting out ad networks from other type of cookie providers. We could then determine which websites placed tracking cookies during the first visit (which is not allowed) and which sites did so after the first visit, without consent (which is dubious and probably not allowed as well).
We used PHP and Headless Chrome to crawl al 10.000 URL's, using two VPS instances with Ubuntu 18.04. To mask the fact that we were connecting from a VPS provider, we used a VPN connection provided by Private Internet Access. After each site visit we stored data about the number and type of cookies placed by the website. These files were then combined and analysed using Python - and the Pandas, Regular Expressions libraries among others - in a Jupyter Notebook.
Joost Schellevis Danny Simons Jorn Kompeer Anna Pruis Winny de Jong