{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Whotracks.me May Update\n", "\n", "*This post is one of our regular monthly blogs accompanying an update to the data\n", "displayed on WhoTracks.Me. In these posts we introduce what data has been added as well\n", "as point out interesting trends and case-studies we found in the last month. Previous\n", "month's posts can be found here: [April 2018](./update_apr_2018.html),\n", "[February 2018](./update_feb_2018.html), [January 2018](./update_jan_2018.html),\n", "[December 2017](./update_dec_2017.html).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "This month we update the site with data from 340 million page loads during April 2018. We expand\n", "the number of trackers shown to 951, and the number of websites to 1330. As this will be the last\n", "full month before the [GDPR](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation)\n", "comes into force for European users, this will provide a benchmark to assess whether there is an\n", "observable difference on the tracking ecosystem.\n", "\n", "This month also saw our new paper **\"WhoTracks.Me: Monitoring the online tracking landscape at scale\"**\n", "published on [Arxiv](https://arxiv.org/abs/1804.08959). This paper covers the methodology behind\n", "the data we collect here, and how we ensure no private information can be leaked during this\n", "process.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from plotly.offline import init_notebook_mode, iplot, offline\n", "\n", "import pandas as pd\n", "import cufflinks as cf\n", "\n", "init_notebook_mode()\n", "cf.set_config_file(offline=False, world_readable=True, theme='pearl')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data available for months: ['2017-05', '2017-06', '2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12', '2018-01', '2018-02', '2018-03', '2018-04']\n" ] } ], "source": [ "from whotracksme.data.loader import DataSource\n", "data = DataSource()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notable Changes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As customary, here below are the sites with the most notable changes this month. The\n", "largest increase in the average number of trackers per page load was measured in\n", "[markt.de](https://whotracks.me/websites/markt.de.html), and the largest decrease in\n", "[babbel.com](https://whotracks.me/websites/babbel.com.html)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
changetrackers
babbel.com-8.11145412.722951
bento.de-3.61172319.215815
klingel.de-3.49289326.706119
tvnow.de-3.15107325.500678
sheego.de4.63352611.616530
markt.de10.79591117.783326
\n", "
" ], "text/plain": [ " change trackers\n", "babbel.com -8.111454 12.722951\n", "bento.de -3.611723 19.215815\n", "klingel.de -3.492893 26.706119\n", "tvnow.de -3.151073 25.500678\n", "sheego.de 4.633526 11.616530\n", "markt.de 10.795911 17.783326" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "apr_trackers = data.sites.get_snapshot('2018-04').set_index('site')['trackers']\n", "mar_trackers = data.sites.get_snapshot('2018-03').set_index('site')['trackers']\n", "site_diffs = pd.DataFrame({\n", " 'trackers': mar_trackers,\n", " 'change': (apr_trackers - mar_trackers)\n", "})\n", "site_diffs[(site_diffs.change > 3) | (site_diffs.change < -3)].sort_values('change')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Facebook's Tough Month\n", "\n", "[Facebook](../trackers/facebook.html) have been in the news a lot in the last month, and with\n", "the `#deletefacebook` trending, will there have been an effect on their operations and bottom\n", "line? We [already reported](https://www.ghostery.com/blog/ghostery-news/report-have-publishers-banned-facebook-trackers-from-their-pages-after-the-cambridge-analytica-scandal/)\n", "that despite strong criticism in the press, the same news sites did not stop using Facebook's\n", "tracking tools. The data we release this month shows that this continues to be the case, with no\n", "drop in tracking reach for the [Facebook tracker](../trackers/facebook.html).\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "facebookDf = data.trackers.df[\n", " (data.trackers.df.tracker == \"facebook\") \n", "# & (data.trackers.df.month >= \"2018-01\")\n", "]\n", "facebookDf = facebookDf[['month','reach', 'site_reach']]\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "data": [ { "fill": "tozeroy", "fillcolor": "rgba(255, 153, 51, 0.3)", "line": { "color": "rgba(255, 153, 51, 1.0)", "dash": "solid", "width": 1.3 }, "mode": "lines", "name": "reach", "text": "", "type": "scatter", "x": [ "2017-05-01", "2017-06-01", "2017-07-01", "2017-08-01", "2017-09-01", "2017-10-01", "2017-11-01", "2017-12-01", "2018-01-01", "2018-02-01", "2018-03-01", "2018-04-01" ], "xaxis": "x1", "y": [ 0.31346037651773023, 0.3153770862904671, 0.30586649894717577, 0.30095937714542365, 0.2980006216898452, 0.2862098839766894, 0.28746809741487106, 0.2823794258106336, 0.2841597964515066, 0.2879803049264181, 0.2856411110740228, 0.28704290575311786 ], "yaxis": "y1" }, { "fill": "tozeroy", "fillcolor": "rgba(55, 128, 191, 0.3)", "line": { "color": "rgba(55, 128, 191, 1.0)", "dash": "solid", "width": 1.3 }, "mode": "lines", "name": "site_reach", "text": "", "type": "scatter", "x": [ "2017-05-01", "2017-06-01", "2017-07-01", "2017-08-01", "2017-09-01", "2017-10-01", "2017-11-01", "2017-12-01", "2018-01-01", "2018-02-01", "2018-03-01", "2018-04-01" ], "xaxis": "x1", "y": [ 0.37418853060533264, 0.378439338380774, 0.3779495953148648, 0.3755754129434065, 0.3738063981206384, 0.3719620646574803, 0.3763100284130607, 0.3848586709051076, 0.3896124423089081, 0.40593970425354137, 0.4148203788481679, 0.4060915662353362 ], "yaxis": "y2" } ], "layout": { "legend": { "bgcolor": "#F5F6F9", "font": { "color": "#4D5663" } }, "paper_bgcolor": "#F5F6F9", "plot_bgcolor": "#F5F6F9", "shapes": [ { "line": { "color": "#db4052", "dash": "solid", "width": 1 }, "type": "line", "x0": "2018-03", "x1": "2018-03", "xref": "x", "y0": 0, "y1": 1, "yref": "paper" } ], "title": "Reach and Site Reach", "titlefont": { "color": "#4D5663" }, "xaxis1": { "anchor": "y2", "domain": [ 0, 1 ], "gridcolor": "#E1E5ED", "showgrid": true, "tickfont": { "color": "#4D5663" }, "title": "", "titlefont": { "color": "#4D5663" }, "zerolinecolor": "#E1E5ED" }, "yaxis1": { "anchor": "free", "domain": [ 0.575, 1 ], "gridcolor": "#E1E5ED", "position": 0, "showgrid": true, "tickfont": { "color": "#4D5663" }, "title": "", "titlefont": { "color": "#4D5663" }, "zerolinecolor": "#E1E5ED" }, "yaxis2": { "anchor": "x1", "domain": [ 0, 0.425 ] } } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = facebookDf.iplot(\n", " subplots=True,\n", " shape=(2, 1),\n", " x='month',\n", " shared_xaxes=True, \n", " fill=True,\n", " title=\"Reach and Site Reach\",\n", " vline=[\"2018-03\"],\n", " asFigure=True\n", ")\n", "\n", "# fig.iplot()\n", "\n", "# To save the image as svg\n", "offline.iplot(fig, image='svg')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that `reach` refers to the percentage of total page loads where the Facebook\n", "tracker was seen to be present, whereas `site reach` refers to the percentage of\n", "domains.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Google and the Countdown to GDPR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With GDPR coming into effect on 25th May, we will soon see if it has an impact on the number of\n", "third-party trackers loaded on web pages. [Recent reports indicate](https://adexchanger.com/online-advertising/googles-gdpr-consent-tool-will-limit-publishers-to-12-ad-tech-vendors/)\n", "that Google will encourage publishers to reduce the number of AdTech vendors they use, in order to\n", "increase the chance of getting consent for tracking from users. If this is the case, we should\n", "expect this change to be visible in the WhoTracks.Me data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dc_sites = data.sites_trackers.df[\n", " (data.sites_trackers.df.tracker == \"doubleclick\")\n", " & (data.sites_trackers.df.month == \"2018-04\")\n", " & (data.sites_trackers.df.country == \"global\")\n", " & (data.sites_trackers.df.site_proportion > 0.5)\n", "].site\n", "\n", "\n", "dc_sites_df = data.sites.df[\n", " (data.sites.df.site.isin(dc_sites))\n", " & (data.sites.df.month >= \"2018-02\")\n", "]\n", "\n", "\n", "dcsitesDf = pd.DataFrame({\n", " \"apr_trackers\": dc_sites_df[dc_sites_df.month == '2018-04'].trackers,\n", " \"mar_trackers\": dc_sites_df[dc_sites_df.month == '2018-03'].trackers,\n", "})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = dcsitesDf.iplot(\n", " kind=\"histogram\",\n", " histnorm='percent',\n", " title=\"Distribution of the average number of trackers per site\",\n", " opacity=.6,\n", " bins=20,\n", " yTitle=\"Percentage of Sites\",\n", " vline={\n", " \"kind\": \"rect\",\n", " \"x0\": 12,\n", " \"x1\": 38,\n", " \"width\": 2,\n", " \"fillcolor\": \"red\",\n", " \"opacity\": 0.1\n", " },\n", " barmode=\"overlay\",\n", " bargap=0.2,\n", " line_color=\"#00000000\",\n", " width=0,\n", " asFigure=True,\n", ")\n", "\n", "fig.iplot()\n", "\n", "# To save the image as svg\n", "offline.plot(fig, image='svg')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we reported [last month](./update_apr_2018.html), we observe a gradual decline in the average\n", "number of trackers seen on websites. However, looking at sites which use Google's [Doubleclick](../trackers/doubleclick.html)\n", "Ad Network, a large proportion are still well-above this proposed 12 tracker limit. With only a few\n", "weeks to go, there will still be be a significant number of sites over the limit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we were to consider the most extreme scenario, where Google compels all customers use their GDPR\n", "consent system for European users, and enforces a 12 vendor limit in the process, this could\n", "have a significant impact on the ecosystem. If we extrapolate from WhoTracks.Me data, capping all\n", "these sites to 12 trackers means that over **1,300 trackers** would disappear from sites. AdTech\n", "companies deeper in the supply chain may be completely cut out unless they have direct publisher\n", "relationships which enable them to make the vendor shortlist.\n", "\n", "Such a sharp change in the ecosystem is unlikely, but it demonstrates the power of Google's market\n", "dominance, that they would be able to unilaterally pull the plug on a lot of their competition. We\n", "will continue to monitor the ecosystem to quantify any changes to tracking, and look forward to\n", "reporting the changes, if any, caused by the new regulation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to delve deeper, the data is open and available on the [Whotracks.me Github Repository](https://github.com/cliqz-oss/whotracks.me/tree/master/whotracksme/data), and as a [pip package](https://pypi.python.org/pypi/whotracksme/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }