{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "4q5ZcP7iXFxZ" }, "source": [ "### ANALYSING LATENCY IN AFRICA\n", "USING RIPE NCC RECURRING *TRACEROUTE* MEASUREMENTS FROM PROBES IN AFRICA TO `eu.thethings.network`\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TABLE OF CONTENT\n", "\n", "* **[I. Data Loading, preprocessing and cleaning](#data-loading)**\n", " * [I.1 Measurements](#measurements)\n", " * [I.2 Probes description](#probes-description)\n", " * [I.3 Merging measurements and probes](#merge-measurements-probes)\n", " * [I.4 Validation of Rtts vs. speed of light](#rtt-speed-of-light)\n", "* **[II. Preliminary Exploratory Data Analysis (EDA)](#eda)**\n", " * [II.1 Round-Trip-Time (RTT) histograms](#rtts-histogram)\n", " * [II.2 Round-Trip-Time (RTT) boxplots by country](#boxplots-country)\n", " * [II.3 Probes locations](#probes-location)\n", " * [II.4 Packet loss](#packet-loss)\n", "* **[III. Analysing National Education Research Networks probes latency (NREN)](#nren)**\n", " * [II.1 NRENs identification](#nrens-identification)\n", " * [II.2 NREN probes location](#nren-probes-loc)\n", " * [II.3 Comparing RTTs of NREN probes vs. others](#rtt-nren-vs-others) \n", " * [II.4 Comparing RTTs of NREN probes vs. others per country](#rtt-nren-vs-others-country)\n", " * [III.5 Comparing Packet Loss of NREN probes vs. others](#packet-loss-nren-vs-others)\n", " * [III.6 Comparing Packet Loss of NREN probes vs. others per country](#packet-loss-nren-vs-others-per-country)\n", " * [III.7 Is there enough evidence at this stage that a NREN make a difference?](#evidence)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": {}, "colab_type": "code", "id": "sbrEFQ9pXFxb" }, "outputs": [], "source": [ "from datetime import datetime\n", "import os.path\n", "import time\n", "import requests\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import altair as alt\n", "from vega_datasets import data as alt_data\n", "\n", "# for the notebook only (not for JupyterLab) run this command once per session\n", "alt.renderers.enable('notebook')\n", "alt.data_transformers.disable_max_rows()\n", "\n", "import utilities as utils\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Globar vars\n", "WIDTH = 860\n", "AFRICA_TOPOJSON_URL = 'https://raw.githubusercontent.com/deldersveld/topojson/master/continents/africa.json'\n", "SAMPLE_SIZE = 10000" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "iFjjILZHXFxj" }, "source": [ "## I. Data Loading, preprocessing and cleaning\n", "" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "gGNhCNuHXFxk" }, "source": [ "### I.1 Measurements\n", "" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fetching data from:\n", "https://atlas.ripe.net/api/v2/measurements/17468431/results/?start=1543618800&stop=1546210800&probe_ids=None\n" ] } ], "source": [ "json_data = utils.get_measurements(measurement_id=17468431, start='01/12/2018', stop='31/12/2018')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "GcZJxaR9XFxo", "outputId": "f14ee5f5-e1b1-47f6-97a7-f201936c486d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "607324 results\n" ] } ], "source": [ "# Number of records\n", "print(str(len(json_data)) + ' results')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "id": "MkCL9-SHXFxw", "outputId": "c40a3e13-55dc-451f-ef04-1039651cc406" }, "outputs": [ { "data": { "text/plain": [ "{'af': 4,\n", " 'dst_addr': '52.169.76.255',\n", " 'dst_name': '52.169.76.255',\n", " 'endtime': 1543618976,\n", " 'from': '169.0.103.214',\n", " 'fw': 4940,\n", " 'group_id': 17468431,\n", " 'lts': 8,\n", " 'msm_id': 17468431,\n", " 'msm_name': 'Traceroute',\n", " 'paris_id': 3,\n", " 'prb_id': 21682,\n", " 'proto': 'TCP',\n", " 'result': [{'hop': 64,\n", " 'result': [{'flags': 'SA',\n", " 'from': '52.169.76.255',\n", " 'hdropts': [{'mss': 1440}],\n", " 'rtt': 176.065,\n", " 'size': 4,\n", " 'ttl': 50},\n", " {'flags': 'SA',\n", " 'from': '52.169.76.255',\n", " 'hdropts': [{'mss': 1440}],\n", " 'rtt': 176.794,\n", " 'size': 4,\n", " 'ttl': 50},\n", " {'flags': 'SA',\n", " 'from': '52.169.76.255',\n", " 'hdropts': [{'mss': 1440}],\n", " 'rtt': 178.843,\n", " 'size': 4,\n", " 'ttl': 50}]}],\n", " 'size': 0,\n", " 'src_addr': '192.168.3.14',\n", " 'stored_timestamp': 1543619072,\n", " 'timestamp': 1543618975,\n", " 'type': 'traceroute'}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# An exemple of result\n", "json_data[0]" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "oiFbXicHXFxz" }, "source": [ "* **Measurement results data structure**\n", "\n", "*Note that according to probe's firmware version, returned fields might differ*\n", "\n", "From: https://atlas.ripe.net/docs/data_struct/\n", "\n", "* `af`: adress familly with possible values: {4: IPv4, 6: IPv6}\n", "* `dst_addr`: IP address of the destination (string)\n", "* `dst_name`: name of the destination (string)\n", "* `from`: IP address of the probe as known by controller (string)\n", "* `fw`: firmware version of the probe\n", "* `group_id`: if the measurement belongs to a group of measurements, the identifier of the group (int)\n", "* `lts`: last time synchronised. How long ago (in seconds) the clock of the probe was found to be in sync with that of a controller. The value -1 is used to indicate that the probe does not know whether it is in sync (int)\n", "* `msm_id`: measurement identifier (int)\n", "* `msm_name`: measurement type \"Ping\" (string)\n", "* `paris_id`: variation for the Paris mode of traceroute (int)\n", "* `prb_id`: source probe ID (int)\n", "* `proto`: \"UDP\", \"ICMP\", or \"TCP\" (string)\n", "* `result`: list of hop elements (array of objects). Objects have the following fields:\n", " * `hop`: hop number (int)\n", " * `error`: [optional] when an error occurs trying to send a packet. In that case there will not be a result structure. (string)\n", " * `result`: variable content, depending on type of response (array of objects) \n", "objects have the following fields:\n", " * `flags`: \"flags\" -- (optional) TCP flags in the reply packet, for TCP traceroute, concatenated, in the order 'F' (FIN), 'S' (SYN), 'R' (RST), 'P' (PSH), 'A' (ACK), 'U' (URG) (fw >= 4600) (string)\n", " * `from`: IPv4 or IPv6 source address in reply (string)\n", " * `rtt`: round-trip-time of reply, not present when the response is late in ms (float)\n", " * `size`: size of reply (int)\n", " * `ttl`: time-to-live in reply (int)\n", " * `size`: packet size (int)\n", " * `src_addr`: source address used by probe (string)\n", " * `timestamp`: Unix timestamp for start of measurement (int)\n", " * `type`: \"traceroute\" (string) \n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# To check a specific record by attribute\n", "# list(filter(lambda m: m['endtime'] == 1543619009, json_data))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "E9vOL5pzLz9T" }, "source": [ "* **Checking destination IP address and fetching lon, lat for later use**" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 71 }, "colab_type": "code", "id": "5RhGrCKbi7mr", "outputId": "f56ab5cd-951c-4d55-bc5a-b339808d773e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Destination IP address: {'52.169.76.255'}\n", "{'ip': '52.169.76.255', 'city': 'Dublin', 'region': 'Leinster', 'country': 'IE', 'loc': '53.3331,-6.2489', 'org': 'AS8075 Microsoft Corporation', 'postal': 'D02'}\n" ] } ], "source": [ "# Let's check quickly all probes trace to the same IP address\n", "ip_dest = set(map(lambda x: x['dst_addr'], json_data))\n", "print('Destination IP address: ', ip_dest)\n", "\n", "# Building the url as a long string in one go\n", "url_ipinfo = 'http://ipinfo.io/{}?token=your-token'.format(list(ip_dest)[0])\n", "url_ipinfo\n", "\n", "r = requests.get(url_ipinfo)\n", "json_ipinfo = r.json()\n", "\n", "print(json_ipinfo)\n", "lat_dest, lon_dest = map(lambda x: float(x), json_ipinfo['loc'].split(','))" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 136 }, "colab_type": "code", "id": "c-anRWUYQz21", "outputId": "4a1f62fe-e498-4d21-8819-0acab921cb70" }, "outputs": [ { "data": { "text/plain": [ "{'city': 'Dublin',\n", " 'country': 'IE',\n", " 'ip': '52.169.76.255',\n", " 'loc': '53.3331,-6.2489',\n", " 'org': 'AS8075 Microsoft Corporation',\n", " 'postal': 'D02',\n", " 'region': 'Leinster'}" ] }, "execution_count": 14, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "json_ipinfo" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "colab": {}, "colab_type": "code", "id": "2oNj5ukHXFx0" }, "outputs": [], "source": [ "# Utilities functions and lambdas\n", "# Error codes: https://atlas.ripe.net/docs/data_struct\n", "\n", "# Filtering/mapping predicates\n", "has_result = lambda row: not('error' in row['result'][0])\n", "is_late = lambda pack: 'late' in pack # number of packets a reply is late, in this case rtt is not present (int)\n", "is_timed_out = lambda pack: 'x' in pack # {'x': '*'} - Timed out\n", "is_error = lambda pack: 'error' in pack # Network, destination,... unreachable\n", "is_err = lambda pack: 'err' in pack # Network, destination,... unreachable\n", "is_empty = lambda pack: not(bool(pack))\n", "\n", "get_result = lambda row: row['result'][0]['result']\n", "get_nb_packets = lambda row: len(get_result(row))\n", "get_max_nb_hops = lambda row: row['result'][0]['hop']\n", "get_rtts = lambda row: list(filter(lambda pack: not(is_late(pack) or\n", " is_timed_out(pack) or\n", " is_error(pack) or \n", " is_err(pack) or\n", " is_empty(pack)), get_result(row)))\n", "\n", "get_nb_late_pack = lambda row: len(list(filter(lambda pack: is_late(pack), get_result(row))))\n", "get_nb_timed_out_pack = lambda row: len(list(filter(lambda pack: is_timed_out(pack), get_result(row))))\n", "get_nb_error_pack = lambda row: len(list(filter(lambda pack: (is_error(pack) or\n", " is_err(pack)), get_result(row))))\n", "get_nb_empty_pack = lambda row: len(list(filter(lambda pack: is_empty(pack), get_result(row))))\n", "\n", "get_nb_rtts = lambda row: len(get_rtts(row))\n", "get_rtts_mean = lambda row: np.mean(list(map(lambda pack: pack['rtt'], get_rtts(row))))\n", "get_ttl = lambda row: get_rtts(row)[0]['ttl']\n", "\n", "to_datetime = lambda x: datetime.utcfromtimestamp(x).strftime('%Y-%m-%d %H:%M:%S')" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [], "source": [ "# Postprocess json data\n", "measurements = []\n", "for row in json_data:\n", " #is_valid = has_result(row) and get_nb_rtts(row)\n", " is_valid = has_result(row)\n", " measurements.append(\n", " {'prb_id': row['prb_id'],\n", " 'ip_from': row['from'],\n", " 'paris_id': row['paris_id'],\n", " 'datetime': to_datetime(row['timestamp']),\n", " 'start_time': row['timestamp'],\n", " 'end_time': row['endtime'],\n", " 'last_time_sync': row['lts'],\n", " 'firmware_version': row['fw'],\n", " 'nb_packets': get_nb_packets(row) if is_valid else np.NaN,\n", " 'nb_rtts': get_nb_rtts(row) if is_valid else np.NaN, \n", " 'rtt': get_rtts_mean(row) if is_valid else np.NaN,\n", " 'measurement_failed': not(is_valid),\n", " 'nb_late_pack': get_nb_late_pack(row) if is_valid else np.NaN,\n", " 'nb_timed_out_pack': get_nb_timed_out_pack(row) if is_valid else np.NaN,\n", " 'nb_error_pack': get_nb_error_pack(row) if is_valid else np.NaN,\n", " 'nb_empty_pack': get_nb_empty_pack(row) if is_valid else np.NaN,\n", " #'nb_hops': get_max_nb_hops(row) - get_ttl(row) if is_valid else np.NaN\n", " }) " ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "colab": {}, "colab_type": "code", "id": "zxkNP5JjXFx4" }, "outputs": [], "source": [ "# Convert to Pandas dataframe \n", "columns = ['datetime', 'prb_id', 'ip_from', 'paris_id', 'start_time', \n", " 'end_time', 'last_time_sync', 'firmware_version', 'nb_packets', 'nb_rtts',\n", " 'rtt', 'measurement_failed', 'nb_late_pack', 'nb_timed_out_pack', \n", " 'nb_error_pack','nb_empty_pack']\n", "\n", "df = pd.DataFrame(measurements, columns=columns)\n", "df['datetime'] = pd.to_datetime(df['datetime'])\n", "df.set_index('datetime', inplace=True)\n", "df.sort_index(inplace=True)" ] }, { "cell_type": "code", "execution_count": 692, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 235 }, "colab_type": "code", "id": "qHmZbZ98XFx7", "outputId": "20afd14e-45fe-4b5c-e736-7b3e117c22f1" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prb_idip_fromparis_idstart_timeend_timelast_time_syncfirmware_versionnb_packetsnb_rttsrttmeasurement_failednb_late_packnb_timed_out_packnb_error_packnb_empty_pack
datetime
2018-11-30 23:02:493506741.248.208.16281543618969154361897020447803.03.094.480667False0.00.00.00.0
2018-11-30 23:02:5521682169.0.103.214315436189751543618976849403.03.0177.234000False0.00.00.00.0
2018-11-30 23:03:0033073196.15.205.173154361898015436189804849403.03.0172.731000False0.00.00.00.0
2018-11-30 23:03:0250355155.93.192.1734154361898215436189824949403.03.0178.559000False0.00.00.00.0
2018-11-30 23:03:0311383196.1.95.1033154361898315436189849149403.03.096.144667False0.00.00.00.0
\n", "
" ], "text/plain": [ " prb_id ip_from paris_id start_time end_time \\\n", "datetime \n", "2018-11-30 23:02:49 35067 41.248.208.162 8 1543618969 1543618970 \n", "2018-11-30 23:02:55 21682 169.0.103.214 3 1543618975 1543618976 \n", "2018-11-30 23:03:00 33073 196.15.205.17 3 1543618980 1543618980 \n", "2018-11-30 23:03:02 50355 155.93.192.173 4 1543618982 1543618982 \n", "2018-11-30 23:03:03 11383 196.1.95.103 3 1543618983 1543618984 \n", "\n", " last_time_sync firmware_version nb_packets nb_rtts \\\n", "datetime \n", "2018-11-30 23:02:49 204 4780 3.0 3.0 \n", "2018-11-30 23:02:55 8 4940 3.0 3.0 \n", "2018-11-30 23:03:00 48 4940 3.0 3.0 \n", "2018-11-30 23:03:02 49 4940 3.0 3.0 \n", "2018-11-30 23:03:03 91 4940 3.0 3.0 \n", "\n", " rtt measurement_failed nb_late_pack \\\n", "datetime \n", "2018-11-30 23:02:49 94.480667 False 0.0 \n", "2018-11-30 23:02:55 177.234000 False 0.0 \n", "2018-11-30 23:03:00 172.731000 False 0.0 \n", "2018-11-30 23:03:02 178.559000 False 0.0 \n", "2018-11-30 23:03:03 96.144667 False 0.0 \n", "\n", " nb_timed_out_pack nb_error_pack nb_empty_pack \n", "datetime \n", "2018-11-30 23:02:49 0.0 0.0 0.0 \n", "2018-11-30 23:02:55 0.0 0.0 0.0 \n", "2018-11-30 23:03:00 0.0 0.0 0.0 \n", "2018-11-30 23:03:02 0.0 0.0 0.0 \n", "2018-11-30 23:03:03 0.0 0.0 0.0 " ] }, "execution_count": 692, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "8Z3ATuaAXFx-" }, "source": [ "### I.2 Probes description\n", "
" ] }, { "cell_type": "code", "execution_count": 693, "metadata": { "colab": {}, "colab_type": "code", "id": "CZWd_-u-XFx_" }, "outputs": [], "source": [ "# Get list of probes involved in measurements\n", "probes_id = list(df['prb_id'].unique())" ] }, { "cell_type": "code", "execution_count": 694, "metadata": {}, "outputs": [], "source": [ "json_data_probe = utils.get_probes(probes_id)" ] }, { "cell_type": "code", "execution_count": 695, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 510 }, "colab_type": "code", "id": "oIs3qBS5XFyG", "outputId": "c4c7b11e-7a8c-4525-feab-55456fee71bf" }, "outputs": [ { "data": { "text/plain": [ "{'address_v4': '196.1.95.16',\n", " 'address_v6': '2001:4278:1000:1::16',\n", " 'asn_v4': 8346,\n", " 'asn_v6': 8346,\n", " 'country_code': 'SN',\n", " 'description': 'UCAD Probe',\n", " 'first_connected': 1291147153,\n", " 'geometry': {'type': 'Point', 'coordinates': [-17.4415, 14.6715]},\n", " 'id': 239,\n", " 'is_anchor': False,\n", " 'is_public': True,\n", " 'last_connected': 1568990096,\n", " 'prefix_v4': '196.1.92.0/22',\n", " 'prefix_v6': '2001:4278::/32',\n", " 'status': {'id': 1, 'name': 'Connected', 'since': '2019-09-19T22:17:28Z'},\n", " 'status_since': 1568931448,\n", " 'tags': [{'name': 'Office', 'slug': 'office'},\n", " {'name': 'No NAT', 'slug': 'no-nat'},\n", " {'name': 'system: V1', 'slug': 'system-v1'},\n", " {'name': 'system: Resolves A Correctly',\n", " 'slug': 'system-resolves-a-correctly'},\n", " {'name': 'system: Resolves AAAA Correctly',\n", " 'slug': 'system-resolves-aaaa-correctly'},\n", " {'name': 'system: IPv4 Works', 'slug': 'system-ipv4-works'},\n", " {'name': \"system: IPv6 Doesn't Work\", 'slug': 'system-ipv6-doesnt-work'},\n", " {'name': 'system: IPv4 Capable', 'slug': 'system-ipv4-capable'},\n", " {'name': 'system: IPv6 Capable', 'slug': 'system-ipv6-capable'}],\n", " 'total_uptime': 236903509,\n", " 'type': 'Probe'}" ] }, "execution_count": 695, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# An example of probe's description.\n", "json_data_probe['results'][0]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": {}, "colab_type": "code", "id": "4HmPuOPkXFyK" }, "outputs": [], "source": [ "# Postprocess json data\n", "probes = []\n", "for i, row in enumerate(json_data_probe['results']):\n", " probes.append(\n", " {'prb_id': row['id'],\n", " 'ip_v4': row['address_v4'],\n", " 'asn': str(row['asn_v4']),\n", " 'country_code': row['country_code'],\n", " 'description': row['description'],\n", " 'lon': row['geometry']['coordinates'][0],\n", " 'lat': row['geometry']['coordinates'][1]})" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": {}, "colab_type": "code", "id": "bF6Y-6swXFyO" }, "outputs": [], "source": [ "# Convert to Pandas dataframe \n", "columns = ['prb_id','ip_v4', 'asn', 'country_code', 'description', 'lon', 'lat']\n", "df_probes = pd.DataFrame(probes, columns=columns)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 85 }, "colab_type": "code", "id": "dxPc2HO2XFyR", "outputId": "b51b0c74-89b0-4356-9348-a3b2f668e601" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of countries: 36\n", "['SN' 'ZA' 'MU' 'TN' 'BW' 'CM' 'GH' 'TZ' 'KE' 'BF' 'UG' 'RW' 'NG' 'CG'\n", " 'BJ' 'MA' 'BI' 'MZ' 'ET' 'ZM' 'SZ' 'AO' 'MW' 'LS' 'NA' 'MG' 'SD' 'SS'\n", " 'SC' 'CD' 'DZ' 'ZW' 'GM' 'TG' 'EG' 'DJ']\n" ] } ], "source": [ "print('Number of countries: ', len(df_probes['country_code'].unique()))\n", "print(df_probes['country_code'].unique())" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 306 }, "colab_type": "code", "id": "ExKTlZZMXFyV", "outputId": "4624180d-922d-445c-b4e9-9957044b4f7a" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namealpha-2alpha-3country-codeiso_3166-2regionsub-regionintermediate-regionregion-codesub-region-codeintermediate-region-code
0AfghanistanAFAFG4ISO 3166-2:AFAsiaSouthern AsiaNaN142.034.0NaN
1Åland IslandsAXALA248ISO 3166-2:AXEuropeNorthern EuropeNaN150.0154.0NaN
2AlbaniaALALB8ISO 3166-2:ALEuropeSouthern EuropeNaN150.039.0NaN
3AlgeriaDZDZA12ISO 3166-2:DZAfricaNorthern AfricaNaN2.015.0NaN
4American SamoaASASM16ISO 3166-2:ASOceaniaPolynesiaNaN9.061.0NaN
\n", "
" ], "text/plain": [ " name alpha-2 alpha-3 country-code iso_3166-2 region \\\n", "0 Afghanistan AF AFG 4 ISO 3166-2:AF Asia \n", "1 Åland Islands AX ALA 248 ISO 3166-2:AX Europe \n", "2 Albania AL ALB 8 ISO 3166-2:AL Europe \n", "3 Algeria DZ DZA 12 ISO 3166-2:DZ Africa \n", "4 American Samoa AS ASM 16 ISO 3166-2:AS Oceania \n", "\n", " sub-region intermediate-region region-code sub-region-code \\\n", "0 Southern Asia NaN 142.0 34.0 \n", "1 Northern Europe NaN 150.0 154.0 \n", "2 Southern Europe NaN 150.0 39.0 \n", "3 Northern Africa NaN 2.0 15.0 \n", "4 Polynesia NaN 9.0 61.0 \n", "\n", " intermediate-region-code \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Loading country codes\n", "country_codes = pd.read_csv('https://raw.githubusercontent.com/franckalbinet/' + \n", " 'latency-internet-africa/master/data/country_codes.csv')\n", "\n", "country_codes.head()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 204 }, "colab_type": "code", "id": "9yJGliUPXFyX", "outputId": "d1b7a5b3-8bf7-4d27-9cdb-56fbff01d7bf" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prb_idip_v4asndescriptionlonlatcountry_namecountry_code
0239196.1.95.168346UCAD Probe-17.441514.6715SenegalSN
1242None3284531 Belvedere18.4895-33.9815South AfricaZA
2446196.192.112.22937708AFRINIC Mauritius57.4995-20.2395MauritiusMU
3473196.4.161.33741Internet Solutions [http://www.is.co.za] - Ros...28.0405-26.1515South AfricaZA
4504193.95.97.1322609ATI, KASBAH, 1G/s10.167536.7995TunisiaTN
\n", "
" ], "text/plain": [ " prb_id ip_v4 asn \\\n", "0 239 196.1.95.16 8346 \n", "1 242 None 328453 \n", "2 446 196.192.112.229 37708 \n", "3 473 196.4.161.3 3741 \n", "4 504 193.95.97.132 2609 \n", "\n", " description lon lat \\\n", "0 UCAD Probe -17.4415 14.6715 \n", "1 1 Belvedere 18.4895 -33.9815 \n", "2 AFRINIC Mauritius 57.4995 -20.2395 \n", "3 Internet Solutions [http://www.is.co.za] - Ros... 28.0405 -26.1515 \n", "4 ATI, KASBAH, 1G/s 10.1675 36.7995 \n", "\n", " country_name country_code \n", "0 Senegal SN \n", "1 South Africa ZA \n", "2 Mauritius MU \n", "3 South Africa ZA \n", "4 Tunisia TN " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Joining/merging with country code to get full country name\n", "df_probes = pd.merge(df_probes, country_codes[['name', 'alpha-2']], left_on='country_code', \n", " right_on='alpha-2', how='left')\n", "df_probes = df_probes.drop(['country_code'], axis=1)\n", "df_probes.rename(columns={'alpha-2':'country_code', 'name': 'country_name'}, inplace=True)\n", "df_probes.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "gFzhHrRmXyhL" }, "source": [ "* **It looks like we don't get the IP address from all probes. Let's use the ip address provided in measurement results instead.**" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": {}, "colab_type": "code", "id": "4nTgqXQBXHti" }, "outputs": [], "source": [ "# Note that each probe can have several IP addresses. We keep only one.\n", "ip_from_measurement = df[['prb_id','ip_from']].reset_index().drop(columns=['datetime']).drop_duplicates(subset='prb_id')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": {}, "colab_type": "code", "id": "9xo7zIrxaChY" }, "outputs": [], "source": [ "df_probes = pd.merge(df_probes, ip_from_measurement, left_on='prb_id', right_on='prb_id', how='left') " ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 204 }, "colab_type": "code", "id": "PmgzvUdzanl3", "outputId": "6e4c90c8-e415-433a-8128-7e47595ba51c" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prb_idip_v4asndescriptionlonlatcountry_namecountry_codeip_from
0239196.1.95.168346UCAD Probe-17.441514.6715SenegalSN196.1.95.16
1242None3284531 Belvedere18.4895-33.9815South AfricaZA196.210.11.182
2446196.192.112.22937708AFRINIC Mauritius57.4995-20.2395MauritiusMU196.192.112.229
3473196.4.161.33741Internet Solutions [http://www.is.co.za] - Ros...28.0405-26.1515South AfricaZA196.4.161.3
4504193.95.97.1322609ATI, KASBAH, 1G/s10.167536.7995TunisiaTN193.95.97.132
\n", "
" ], "text/plain": [ " prb_id ip_v4 asn \\\n", "0 239 196.1.95.16 8346 \n", "1 242 None 328453 \n", "2 446 196.192.112.229 37708 \n", "3 473 196.4.161.3 3741 \n", "4 504 193.95.97.132 2609 \n", "\n", " description lon lat \\\n", "0 UCAD Probe -17.4415 14.6715 \n", "1 1 Belvedere 18.4895 -33.9815 \n", "2 AFRINIC Mauritius 57.4995 -20.2395 \n", "3 Internet Solutions [http://www.is.co.za] - Ros... 28.0405 -26.1515 \n", "4 ATI, KASBAH, 1G/s 10.1675 36.7995 \n", "\n", " country_name country_code ip_from \n", "0 Senegal SN 196.1.95.16 \n", "1 South Africa ZA 196.210.11.182 \n", "2 Mauritius MU 196.192.112.229 \n", "3 South Africa ZA 196.4.161.3 \n", "4 Tunisia TN 193.95.97.132 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_probes.head()" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "colab": {}, "colab_type": "code", "id": "FsXi7gJmVolN" }, "outputs": [], "source": [ "ipinfo_lookup = []\n", "for ip in df_probes['ip_from'].values:\n", " r = requests.get('http://ipinfo.io/{}?token=your-token'.format(ip))\n", " ipinfo_lookup.append(r.json())" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 204 }, "colab_type": "code", "id": "tr1uA1s6bZJg", "outputId": "176a49c8-014f-4eee-c6aa-3db474a9bfcb" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citycountryhostnameiplocorgpostalregion
0SNdkr-sn.probe.atlas.ucad.sn196.1.95.1614.0000,-14.0000AS8346 SONATEL-AS Autonomous SystemNaN
1Cape TownZA196-210-11-182.dynamic.isadsl.co.za196.210.11.182-33.9258,18.4232AS3741 Internet Solutions7945Western Cape
2MUp446.probes.atlas.ripe.net196.192.112.229-20.2833,57.5500AS37708 African Network Information Center - (...NaN
3ZAripe-ncc.is.co.za196.4.161.3-29.0000,24.0000AS3741 Internet SolutionsNaN
4TunisTNNaN193.95.66.4036.8190,10.1658AS2609 Tunisia BackBone ASNaNTūnis
\n", "
" ], "text/plain": [ " city country hostname ip \\\n", "0 SN dkr-sn.probe.atlas.ucad.sn 196.1.95.16 \n", "1 Cape Town ZA 196-210-11-182.dynamic.isadsl.co.za 196.210.11.182 \n", "2 MU p446.probes.atlas.ripe.net 196.192.112.229 \n", "3 ZA ripe-ncc.is.co.za 196.4.161.3 \n", "4 Tunis TN NaN 193.95.66.40 \n", "\n", " loc org postal \\\n", "0 14.0000,-14.0000 AS8346 SONATEL-AS Autonomous System NaN \n", "1 -33.9258,18.4232 AS3741 Internet Solutions 7945 \n", "2 -20.2833,57.5500 AS37708 African Network Information Center - (... NaN \n", "3 -29.0000,24.0000 AS3741 Internet Solutions NaN \n", "4 36.8190,10.1658 AS2609 Tunisia BackBone AS NaN \n", "\n", " region \n", "0 \n", "1 Western Cape \n", "2 \n", "3 \n", "4 Tūnis " ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_ip_info = pd.DataFrame(ipinfo_lookup)\n", "df_ip_info.head()" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 479 }, "colab_type": "code", "id": "27WGfj2ad0wM", "outputId": "e4e1a3b1-eaeb-4753-d1a7-87f01637b316" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prb_idip_v4asndescriptionlonlatcountry_namecountry_codeip_fromcitycountryhostnameiplocorgpostalregion
0239196.1.95.168346UCAD Probe-17.441514.6715SenegalSN196.1.95.16SNdkr-sn.probe.atlas.ucad.sn196.1.95.1614.0000,-14.0000AS8346 SONATEL-AS Autonomous SystemNaN
1242None3284531 Belvedere18.4895-33.9815South AfricaZA196.210.11.182Cape TownZA196-210-11-182.dynamic.isadsl.co.za196.210.11.182-33.9258,18.4232AS3741 Internet Solutions7945Western Cape
2446196.192.112.22937708AFRINIC Mauritius57.4995-20.2395MauritiusMU196.192.112.229MUp446.probes.atlas.ripe.net196.192.112.229-20.2833,57.5500AS37708 African Network Information Center - (...NaN
3473196.4.161.33741Internet Solutions [http://www.is.co.za] - Ros...28.0405-26.1515South AfricaZA196.4.161.3ZAripe-ncc.is.co.za196.4.161.3-29.0000,24.0000AS3741 Internet SolutionsNaN
4567193.95.66.402609ATI, BELVEDERE, 1G/s10.180536.8205TunisiaTN193.95.66.40TunisTNNaN193.95.66.4036.8190,10.1658AS2609 Tunisia BackBone ASNaNTūnis
\n", "
" ], "text/plain": [ " prb_id ip_v4 asn \\\n", "0 239 196.1.95.16 8346 \n", "1 242 None 328453 \n", "2 446 196.192.112.229 37708 \n", "3 473 196.4.161.3 3741 \n", "4 567 193.95.66.40 2609 \n", "\n", " description lon lat \\\n", "0 UCAD Probe -17.4415 14.6715 \n", "1 1 Belvedere 18.4895 -33.9815 \n", "2 AFRINIC Mauritius 57.4995 -20.2395 \n", "3 Internet Solutions [http://www.is.co.za] - Ros... 28.0405 -26.1515 \n", "4 ATI, BELVEDERE, 1G/s 10.1805 36.8205 \n", "\n", " country_name country_code ip_from city country \\\n", "0 Senegal SN 196.1.95.16 SN \n", "1 South Africa ZA 196.210.11.182 Cape Town ZA \n", "2 Mauritius MU 196.192.112.229 MU \n", "3 South Africa ZA 196.4.161.3 ZA \n", "4 Tunisia TN 193.95.66.40 Tunis TN \n", "\n", " hostname ip loc \\\n", "0 dkr-sn.probe.atlas.ucad.sn 196.1.95.16 14.0000,-14.0000 \n", "1 196-210-11-182.dynamic.isadsl.co.za 196.210.11.182 -33.9258,18.4232 \n", "2 p446.probes.atlas.ripe.net 196.192.112.229 -20.2833,57.5500 \n", "3 ripe-ncc.is.co.za 196.4.161.3 -29.0000,24.0000 \n", "4 NaN 193.95.66.40 36.8190,10.1658 \n", "\n", " org postal region \n", "0 AS8346 SONATEL-AS Autonomous System NaN \n", "1 AS3741 Internet Solutions 7945 Western Cape \n", "2 AS37708 African Network Information Center - (... NaN \n", "3 AS3741 Internet Solutions NaN \n", "4 AS2609 Tunisia BackBone AS NaN Tūnis " ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_probes = pd.merge(df_probes, df_ip_info, left_on='ip_from', right_on='ip', how='left') \n", "df_probes.head()" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "colab": {}, "colab_type": "code", "id": "OQTDye1WfqHG" }, "outputs": [], "source": [ "df_probes = df_probes[['prb_id', 'ip', 'asn', 'hostname', 'org', 'description', 'country_name', 'country_code', 'region', 'city', 'lon', 'lat']]" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 272 }, "colab_type": "code", "id": "pb4GGvA3f1Um", "outputId": "8760e242-3c52-4961-bd52-6a62281ab48d" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prb_idipasnhostnameorgdescriptioncountry_namecountry_coderegioncitylonlat
0239196.1.95.168346dkr-sn.probe.atlas.ucad.snAS8346 SONATEL-AS Autonomous SystemUCAD ProbeSenegalSN-17.441514.6715
1242196.210.11.182328453196-210-11-182.dynamic.isadsl.co.zaAS3741 Internet Solutions1 BelvedereSouth AfricaZAWestern CapeCape Town18.4895-33.9815
2446196.192.112.22937708p446.probes.atlas.ripe.netAS37708 African Network Information Center - (...AFRINIC MauritiusMauritiusMU57.4995-20.2395
3473196.4.161.33741ripe-ncc.is.co.zaAS3741 Internet SolutionsInternet Solutions [http://www.is.co.za] - Ros...South AfricaZA28.0405-26.1515
4567193.95.66.402609NaNAS2609 Tunisia BackBone ASATI, BELVEDERE, 1G/sTunisiaTNTūnisTunis10.180536.8205
\n", "
" ], "text/plain": [ " prb_id ip asn hostname \\\n", "0 239 196.1.95.16 8346 dkr-sn.probe.atlas.ucad.sn \n", "1 242 196.210.11.182 328453 196-210-11-182.dynamic.isadsl.co.za \n", "2 446 196.192.112.229 37708 p446.probes.atlas.ripe.net \n", "3 473 196.4.161.3 3741 ripe-ncc.is.co.za \n", "4 567 193.95.66.40 2609 NaN \n", "\n", " org \\\n", "0 AS8346 SONATEL-AS Autonomous System \n", "1 AS3741 Internet Solutions \n", "2 AS37708 African Network Information Center - (... \n", "3 AS3741 Internet Solutions \n", "4 AS2609 Tunisia BackBone AS \n", "\n", " description country_name \\\n", "0 UCAD Probe Senegal \n", "1 1 Belvedere South Africa \n", "2 AFRINIC Mauritius Mauritius \n", "3 Internet Solutions [http://www.is.co.za] - Ros... South Africa \n", "4 ATI, BELVEDERE, 1G/s Tunisia \n", "\n", " country_code region city lon lat \n", "0 SN -17.4415 14.6715 \n", "1 ZA Western Cape Cape Town 18.4895 -33.9815 \n", "2 MU 57.4995 -20.2395 \n", "3 ZA 28.0405 -26.1515 \n", "4 TN Tūnis Tunis 10.1805 36.8205 " ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_probes.head()" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [], "source": [ "# Saving probes description as csv\n", "utils.df_to_csv(df_probes, prefix_name='probes')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "tgix6Tr-XFya" }, "source": [ "### I.3 Merging measurements and probes\n", "
" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "df_probes = pd.read_csv('data/probes-2019-08-23.csv')" ] }, { "cell_type": "code", "execution_count": 697, "metadata": { "colab": {}, "colab_type": "code", "id": "jhdvZlWbXFyb" }, "outputs": [], "source": [ "# Joining/merging with country code to get full country name\n", "data = pd.merge(df.reset_index(), df_probes, left_on='prb_id', right_on='prb_id', how='left')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### I.4 Validation of Rtts vs. speed of light\n", "" ] }, { "cell_type": "code", "execution_count": 700, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Light takes: 12 ms to travel\n" ] } ], "source": [ "SPEED_OF_LIGHT = 299792.458 # km/s\n", "DIST = 3620 # km (more or less shortest African point from Dublin)\n", "print(\"Light takes: \", int(1000*DIST / SPEED_OF_LIGHT), \"ms to travel\") # in ms" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# To be defined\n", "THRESHOLD_RTT = 30 # in ms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Which probes yield rtts below 30 ms??**" ] }, { "cell_type": "code", "execution_count": 702, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([29158, 35742, 32674, 18514, 33985, 29991, 33290, 14867, 23538,\n", " 21610, 14958, 33090, 4274, 596, 15883, 35743, 31505, 13299,\n", " 35074])" ] }, "execution_count": 702, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[data['rtt'] < 30]['prb_id'].unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **In which countries??**" ] }, { "cell_type": "code", "execution_count": 703, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Kenya', 'South Africa', 'Togo', 'Gambia', 'Cameroon',\n", " 'Congo (Democratic Republic of the)', 'Sudan',\n", " 'Tanzania, United Republic of', nan, 'Seychelles', 'Ghana',\n", " 'Botswana', 'Tunisia', 'Egypt'], dtype=object)" ] }, "execution_count": 703, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[data['rtt'] < 30]['country_name'].unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___" ] }, { "cell_type": "code", "execution_count": 704, "metadata": {}, "outputs": [], "source": [ "# Remove missing country names - though can be retrieved easily but those countries are\n", "# not relevant to ous NRENs vs. others analysis as no identified NRENs in those countries.\n", "data.dropna(subset=['country_name'], inplace=True)" ] }, { "cell_type": "code", "execution_count": 705, "metadata": { "colab": {}, "colab_type": "code", "id": "1EjO_frdXFyj" }, "outputs": [], "source": [ "# Dumping data for further use\n", "utils.df_to_csv(data, prefix_name='measurements')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "7LugDcssXFyl" }, "source": [ "## II. Preliminary Exploratory Data Analysis (EDA)\n", "" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv('./data/measurements-2019-09-20.csv')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "vkUm2QJ6XFym", "outputId": "180a436f-75ed-4d47-8b90-4c7fdda0dd33" }, "outputs": [ { "data": { "text/plain": [ "(581762, 28)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many measurements and columns\n", "data.shape" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 170 }, "colab_type": "code", "id": "Eg-NjnC6XFyp", "outputId": "a740819a-2538-423a-9535-1739b951f547" }, "outputs": [ { "data": { "text/plain": [ "count 576684.000000\n", "mean 167.272890\n", "std 123.419625\n", "min 0.585333\n", "25% 139.629500\n", "50% 170.072167\n", "75% 183.718333\n", "max 6292.648333\n", "Name: rtt, dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Quick descriptive statistics\n", "data['rtt'].describe()" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Measurements from: 2018-11-30 23:02:49 to 2018-12-30 22:54:10\n" ] } ], "source": [ "# Time range fetched\n", "print(\"Measurements from: {} to {}\".format(min(data['datetime']),max(data['datetime'])))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "dUx26FAkXFys" }, "source": [ "### II.1 Round-Trip-Time histograms \n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **All Rtts Range**" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "alt.Chart(data[data['rtt'] > THRESHOLD_RTT].sample(SAMPLE_SIZE))\\\n", " .mark_bar()\\\n", " .encode(\n", " alt.X(\"rtt:Q\", bin=alt.Bin(step=100), title='Round-Trip-Time (ms)'),\n", " y='count()')\\\n", " .properties(\n", " width = WIDTH,\n", " height = 200)\\\n", " .save('./img/all-rtts.png')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/all-rtts.png?modified=496)" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/all-rtts.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Focusing on [0, 600] ms range**" ] }, { "cell_type": "code", "execution_count": 712, "metadata": {}, "outputs": [], "source": [ "alt.Chart(data.sample(SAMPLE_SIZE))\\\n", " .mark_bar()\\\n", " .encode(\n", " alt.X(\"rtt:Q\", bin=alt.Bin(extent=[0, 600], step=20), title='Round-Trip-Time (ms)'),\n", " y='count()')\\\n", " .properties(\n", " width = WIDTH,\n", " height = 200)\\\n", " .save('./img/rtts-focus.png')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/rtts-focus.png?modified=519)" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/rtts-focus.png')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "4MicxUlWXFyv" }, "source": [ "### II.2 Round-Trip-Time (RTT) boxplots by country\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **With peaks and outliers**" ] }, { "cell_type": "code", "execution_count": 714, "metadata": {}, "outputs": [], "source": [ "alt.Chart(data[data['rtt'] > THRESHOLD_RTT].sample(SAMPLE_SIZE)).mark_boxplot().encode(\n", " alt.X('rtt:Q',title='Round-Trip-Time (ms)'),\n", " alt.Y(field = 'country_name', type = 'nominal', \n", " sort=alt.EncodingSortField(field='rtt', op='max', order='ascending'),\n", " title='Country'))\\\n", ".properties(\n", " width = 0.8*WIDTH,\n", " height = 550).save('./img/rtts-boxplot-by-country.png')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/rtts-boxplot-by-country.png?modified=319)" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/rtts-boxplot-by-country.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Focusing on Interquartile range and median**\n", "\n", " Let's forget outliers and peaks for now and focus on Q1, Q2, Q3." ] }, { "cell_type": "code", "execution_count": 716, "metadata": { "scrolled": false }, "outputs": [], "source": [ "points = alt.Chart(data[data['rtt'] > THRESHOLD_RTT].sample(SAMPLE_SIZE)).mark_point(\n", " filled=True,\n", " color='steelblue',\n", " size=80\n", ").encode(\n", " x=alt.X('median(rtt)', title='Round-Trip-Time (ms)'),\n", " y=alt.Y(\n", " 'country_name',\n", " sort=alt.EncodingSortField(\n", " field='rtt',\n", " op='median',\n", " order='ascending'\n", " ),\n", " title='Country'\n", " )\n", ").properties(\n", " width=0.8*WIDTH,\n", " height=600\n", ")\n", "\n", "error_bars = points.mark_rule(size=1.5, color='steelblue').encode(\n", " x='q1(rtt)',\n", " x2='q3(rtt)',\n", ")\n", "\n", "(points + error_bars).save('./img/rtts-iqr-by-country.png')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/rtts-iqr-by-country.png?modified=241)" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/rtts-iqr-by-country.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### II.1 Probes locations\n", "" ] }, { "cell_type": "code", "execution_count": 718, "metadata": {}, "outputs": [], "source": [ "african_countries = alt.topo_feature(AFRICA_TOPOJSON_URL, feature='continent_Africa_subunits')\n", "\n", "# US states background\n", "background = alt.Chart(african_countries).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").properties(\n", " width=WIDTH,\n", " height=500\n", ").project('equirectangular')\n", "\n", "# probes positions on background\n", "points = alt.Chart(df_probes).mark_circle(\n", " size=50,\n", " color='steelblue'\n", ").encode(\n", " longitude='lon:Q',\n", " latitude='lat:Q'\n", ")\n", "\n", "(background + points).configure_view(stroke=\"transparent\").save('./img/probes-locations.png')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/probes-locations.png?modified=244)" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/probes-locations.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### II.4 Packet loss\n", "\n", "\n", "Below is considered a lost packet when: \n", "* no result;\n", "* packet late;\n", "* timed out." ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "data_valid = data[data['measurement_failed'] == False]\n", "data_valid['packets_lost'] = data_valid['nb_packets'] - data_valid['nb_rtts']" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [], "source": [ "def perc_loss(df):\n", " return 100 * sum(df['packets_lost']) / sum(df['nb_packets'])" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Percentage of packets lost: 1.99\n" ] } ], "source": [ "print(\"Percentage of packets lost: {:.2f}\".format(perc_loss(data_valid)))" ] }, { "cell_type": "code", "execution_count": 723, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % of packet loss\n", "country_name \n", "Congo (Democratic Republic of the) 27.323449\n", "Egypt 12.523072\n", "Kenya 4.948249\n", "Ethiopia 3.988362\n", "Algeria 2.344592\n", "Morocco 1.936620\n", "South Africa 1.932597\n", "Sudan 1.902569\n", "Cameroon 1.687548\n", "Madagascar 1.372378\n", "Senegal 1.276268\n", "Zimbabwe 1.048065\n", "Tunisia 0.942401\n", "Zambia 0.883190\n", "Togo 0.830149\n", "Burundi 0.807505\n", "Benin 0.537149\n", "Uganda 0.508824\n", "Tanzania, United Republic of 0.216664\n", "Rwanda 0.147547\n", "Gambia 0.146935\n", "Congo 0.138889\n", "Ghana 0.077187\n", "Malawi 0.077160\n", "Angola 0.058377\n", "South Sudan 0.058092\n", "Mauritius 0.035770\n", "Nigeria 0.027006\n", "Djibouti 0.025621\n", "Mozambique 0.025077\n", "Eswatini 0.023148\n", "Botswana 0.011575\n", "Seychelles 0.000000\n", "Lesotho 0.000000\n" ] } ], "source": [ "packet_loss_country = data_valid.groupby(['country_name'])\\\n", " .apply(perc_loss)\\\n", " .to_frame(name='% of packet loss')\\\n", " .sort_values(by='% of packet loss', ascending=False)\n", "print(packet_loss_country)" ] }, { "cell_type": "code", "execution_count": 724, "metadata": {}, "outputs": [], "source": [ "alt.Chart(packet_loss_country.reset_index()).mark_bar().encode(\n", " x=alt.X('% of packet loss', title='Packet loss (%)'),\n", " y=alt.Y(\n", " 'country_name:N',\n", " sort=alt.EncodingSortField(\n", " field='% of packet loss',\n", " order='descending'\n", " ),\n", " title='Country'\n", " )\n", ").properties(\n", " width=0.8*WIDTH,\n", " height=600\n", ").save('./img/packet-loss-per-country.png')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/packet-loss-per-country.png?modified=189)" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/packet-loss-per-country.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## III. Analysing National Education Research Networks (NREN) probes latency \n", "\n", " \n", "For further information on NRENS: https://en.wikipedia.org/wiki/National_research_and_education_network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### III.1 NRENs identification process\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To identify the African NRENs, the following resources have been used:\n", "* https://en.wikipedia.org/wiki/National_research_and_education_network\n", "* https://www.africaconnect2.net/Partners/African_NRENs/Pages/Home.aspx\n", "* https://bgpview.io\n", "* https://bgpview.io/asn/36944#downstreams-v4 (UbuntuNet NRENs) " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13 NRENs probes identified.\n" ] } ], "source": [ "nren_probes = [15328, 30177, 13114, 13218, 12344, 14712, 19592, 6369, 33838, 33284, 3461, 13711, 21673]\n", "print(str(len(nren_probes)) + ' NRENs probes identified.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Adding a new column testing probe's NREN membership**" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datetimeprb_idip_fromparis_idstart_timeend_timelast_time_syncfirmware_versionnb_packetsnb_rtts...hostnameorgdescriptioncountry_namecountry_coderegioncitylonlatis_nren
02018-11-30 23:02:493506741.248.208.16281543618969154361897020447803.03.0...NaNAS36903 Office National des Postes et Telecomm...Rue BruxellesMoroccoMARabat-Salé-KénitraRabat-6.239533.8205False
12018-11-30 23:02:5521682169.0.103.214315436189751543618976849403.03.0...169-0-103-214.ip.afrihost.co.zaAS37611 Afrihost (Pty) LtdStrandSouth AfricaZAWestern CapeCape Town18.8315-34.1015False
\n", "

2 rows × 28 columns

\n", "
" ], "text/plain": [ " datetime prb_id ip_from paris_id start_time \\\n", "0 2018-11-30 23:02:49 35067 41.248.208.162 8 1543618969 \n", "1 2018-11-30 23:02:55 21682 169.0.103.214 3 1543618975 \n", "\n", " end_time last_time_sync firmware_version nb_packets nb_rtts ... \\\n", "0 1543618970 204 4780 3.0 3.0 ... \n", "1 1543618976 8 4940 3.0 3.0 ... \n", "\n", " hostname \\\n", "0 NaN \n", "1 169-0-103-214.ip.afrihost.co.za \n", "\n", " org description \\\n", "0 AS36903 Office National des Postes et Telecomm... Rue Bruxelles \n", "1 AS37611 Afrihost (Pty) Ltd Strand \n", "\n", " country_name country_code region city lon \\\n", "0 Morocco MA Rabat-Salé-Kénitra Rabat -6.2395 \n", "1 South Africa ZA Western Cape Cape Town 18.8315 \n", "\n", " lat is_nren \n", "0 33.8205 False \n", "1 -34.1015 False \n", "\n", "[2 rows x 28 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['is_nren'] = data['prb_id'].isin(nren_probes)\n", "data.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### III.2 NREN probes location\n", "" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "df_probes['is_nren'] = df_probes['prb_id'].isin(nren_probes)" ] }, { "cell_type": "code", "execution_count": 731, "metadata": {}, "outputs": [], "source": [ "african_countries = alt.topo_feature(AFRICA_TOPOJSON_URL, feature='continent_Africa_subunits')\n", "\n", "# US states background\n", "background = alt.Chart(african_countries).mark_geoshape(\n", " fill='lightgray',\n", " stroke='white'\n", ").properties(\n", " width=WIDTH,\n", " height=500\n", ").project('equirectangular')\n", "\n", "# probes positions on background\n", "points_not_nren = alt.Chart(df_probes[df_probes['is_nren'] == False]).mark_circle(\n", " size=30, fill='steelblue', stroke='white', strokeWidth=0.5, opacity=0.8\n", ").encode(\n", " longitude='lon:Q',\n", " latitude='lat:Q'\n", ")\n", "\n", "points_nren = alt.Chart(df_probes[df_probes['is_nren'] == True]).mark_point(\n", " size=300, fill='#c44e51', opacity=1, shape='triangle-up'\n", ").encode(\n", " longitude='lon:Q',\n", " latitude='lat:Q'\n", ")\n", "\n", "# NREN probes in red and non NREN probes in blue\n", "(background + points_nren + points_not_nren)\\\n", " .configure_view(stroke=\"transparent\")\\\n", " .save('./img/probes-locations-nrens.png')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/probes-locations-nrens.png?modified=22)" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/probes-locations-nrens.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Select only countries where NREN probes have been identified**" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "countries_nren = list(data[data['is_nren'] == True]['country_code'].unique())" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "data_nren = data[data['country_code'].isin(countries_nren)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Nb. of records per country per type of probes (belonging to nren or not)**" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "country_name is_nren\n", "Algeria False 5180\n", " True 5748\n", "Kenya False 25262\n", " True 5756\n", "Madagascar False 5748\n", " True 5571\n", "Morocco False 8540\n", " True 2808\n", "South Africa False 229449\n", " True 4316\n", "Sudan False 4121\n", " True 2713\n", "Tanzania, United Republic of False 11636\n", " True 2837\n", "Zambia False 10790\n", " True 2826\n", "dtype: int64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_nren[data_nren['rtt'] > THRESHOLD_RTT]\\\n", " .groupby(['country_name', 'is_nren'])\\\n", " .size()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### III.3 Comparing RTTs of NREN probes vs. others\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **What percentage of NRENs measurements?**" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Percentage of NRENs measurements: 9.38\n" ] } ], "source": [ "print(\"Percentage of NRENs measurements: {:.2f}\".format(100 * sum(data_nren['is_nren'] == True) / len(data_nren)))" ] }, { "cell_type": "code", "execution_count": 736, "metadata": {}, "outputs": [], "source": [ "points = alt.Chart(data_nren[data_nren['rtt'] > THRESHOLD_RTT].sample(SAMPLE_SIZE)).mark_point(\n", " filled=True,\n", " color='steelblue',\n", " size=80\n", ").encode(\n", " x=alt.X('median(rtt)', title='Round-Trip-Time (ms)'),\n", " y=alt.Y(\n", " 'is_nren',\n", " sort=alt.EncodingSortField(\n", " field='rtt',\n", " op='median',\n", " order='ascending'\n", " ),\n", " title='NREN membership'\n", " )\n", ").properties(\n", " width=0.8*WIDTH,\n", " height=100\n", ")\n", "\n", "error_bars = points.mark_rule(size=1.5, color='steelblue').encode(\n", " x='q1(rtt)',\n", " x2='q3(rtt)',\n", ")\n", "\n", "(points + error_bars).save('./img/rtt-nrens-vs-others.png')" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/rtt-nrens-vs-others.png?modified=946)" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/rtt-nrens-vs-others.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### III.4 Comparing RTTs of NREN probes vs. others per country\n", "" ] }, { "cell_type": "code", "execution_count": 181, "metadata": {}, "outputs": [], "source": [ "points = alt.Chart(data_nren[data_nren['rtt'] > THRESHOLD_RTT].sample(SAMPLE_SIZE)).mark_point(\n", " filled=True,\n", " color='steelblue',\n", " size=80\n", ").encode(\n", " x=alt.X('median(rtt)', title='Round-Trip-Time (ms)'),\n", " y=alt.Y(\n", " 'is_nren',\n", " sort=alt.EncodingSortField(\n", " field='rtt',\n", " op='median',\n", " order='ascending'\n", " ),\n", " title='',\n", " axis=None\n", " ),\n", " color=alt.Color('is_nren', legend=None, scale=alt.Scale(\n", " domain=['true', 'false'],\n", " range=['#c44e51','steelblue']))\n", " \n", ").properties(\n", " width=0.8*WIDTH,\n", " height=80\n", ")\n", "\n", "error_bars = points.mark_rule(size=1.5, color='steelblue').encode(\n", " x='q1(rtt)',\n", " x2='q3(rtt)',\n", ")\n", "\n", "(points + error_bars)\\\n", " .facet(\n", " row='country_name:N')\\\n", " .configure_axis(\n", " labelFontSize=15,\\\n", " titleFontSize=15,\\\n", " gridDash=[4,4]\\\n", " )\\\n", " .configure_header(\n", " titleFontSize=0,\\\n", " labelFontSize=15,\\\n", " labelLimit=80,\\\n", " title=None\n", " )\\\n", " .save('./img/rtt-nrens-vs-others-by-country.png')" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/rtt-nrens-vs-others-by-country.png?modified=683)" ], "text/plain": [ "" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/rtt-nrens-vs-others-by-country.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### III.5 Comparing Packet Loss of NREN probes vs. others\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Comparing mean Packet Loss**" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "data_nren_valid = data_nren[data_nren['measurement_failed'] == False]\n", "data_nren_valid['packets_lost'] = data_nren_valid['nb_packets'] - data_nren_valid['nb_rtts']" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % of packet loss\n", "is_nren \n", "False 2.231209\n", "True 0.818880\n" ] } ], "source": [ "packet_loss_nren_country = data_nren_valid.groupby(['is_nren'])\\\n", " .apply(perc_loss)\\\n", " .to_frame(name='% of packet loss')\\\n", " .sort_values(by='% of packet loss', ascending=False)\n", "print(packet_loss_nren_country)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### III.6 Comparing Packet Loss of NREN probes vs. others per country\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Comparing mean Packet Loss per country**" ] }, { "cell_type": "code", "execution_count": 302, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
% of packet loss
country_nameis_nren
AlgeriaFalse4.107719
True0.700231
KenyaFalse5.917119
True0.109954
MadagascarFalse0.416667
True2.341549
MoroccoFalse2.188217
True1.163611
South AfricaFalse1.955328
True0.690290
SudanFalse2.535381
True0.613422
Tanzania, United Republic ofFalse0.259367
True0.000000
ZambiaFalse1.030548
True0.317759
\n", "
" ], "text/plain": [ " % of packet loss\n", "country_name is_nren \n", "Algeria False 4.107719\n", " True 0.700231\n", "Kenya False 5.917119\n", " True 0.109954\n", "Madagascar False 0.416667\n", " True 2.341549\n", "Morocco False 2.188217\n", " True 1.163611\n", "South Africa False 1.955328\n", " True 0.690290\n", "Sudan False 2.535381\n", " True 0.613422\n", "Tanzania, United Republic of False 0.259367\n", " True 0.000000\n", "Zambia False 1.030548\n", " True 0.317759" ] }, "execution_count": 302, "metadata": {}, "output_type": "execute_result" } ], "source": [ "packet_loss_per_nren_per_country = data_nren_valid.groupby(['country_name', 'is_nren'])\\\n", " .apply(perc_loss)\\\n", " .to_frame(name='% of packet loss')\\\n", " \n", "packet_loss_per_nren_per_country" ] }, { "cell_type": "code", "execution_count": 305, "metadata": {}, "outputs": [], "source": [ "alt.Chart(packet_loss_per_nren_per_country.reset_index())\\\n", " .mark_bar(height=20)\\\n", " .encode(\n", " x=alt.X('% of packet loss', title='Packet loss (%)'),\n", " y=alt.Y('is_nren:N', title='', axis=None),\n", " color=alt.Color('is_nren', legend=None, scale=alt.Scale(domain=['true', 'false'], range=['#c44e51','steelblue'])),\n", " row=alt.Row('country_name:N', sort=['Kenya', 'Algeria', 'Sudan', 'Morocco', 'South Africa',\\\n", " 'Zambia', 'Madagascar', 'Tanzania, United Republic of']))\\\n", " .properties(\n", " width=0.8*WIDTH,\n", " height=70)\\\n", " .configure_axis(\n", " labelFontSize=15,\\\n", " titleFontSize=15,\\\n", " gridDash=[4,4]\\\n", " )\\\n", " .configure_header(\n", " titleFontSize=0,\\\n", " labelFontSize=15,\\\n", " labelLimit=80,\\\n", " title=None\n", " ).save('./img/packet-loss-nrens-vs-others-by-country.png')" ] }, { "cell_type": "code", "execution_count": 306, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/packet-loss-nrens-vs-others-by-country.png?modified=373)" ], "text/plain": [ "" ] }, "execution_count": 306, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/packet-loss-nrens-vs-others-by-country.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### III.7 Is there enough evidence at this stage that a NREN make a difference?\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Hypothesis testing**\n", "\n", "H0: there is no difference of mean of % of packet loss between NRENs vs. non-NRENS measurements." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-1.4123289321193973\n" ] } ], "source": [ "# Observed difference of % of packet loss between NRENs and not-NRENs\n", "observed_diff = packet_loss_nren_country.diff().values[1][0]\n", "print(observed_diff)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's first simulate that HO Null hypothesis:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "nren_idx = data_nren_valid[data_nren_valid['is_nren'] == True].index.values\n", "\n", "# Sample non NREN measurements to get the same number of values as NRENs one as 10x more\n", "not_nren_idx = data_nren_valid[data_nren_valid['is_nren'] == False].sample(len(nren_idx)).index.values\n", "\n", "both_idx = np.concatenate((nren_idx, not_nren_idx))" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "nb_replicates = 10000\n", "replicates = np.zeros(nb_replicates)\n", "for i in range(nb_replicates):\n", " both_perm_idx = np.random.permutation(both_idx)\n", " perm_sample_nren_idx = both_perm_idx[:len(nren_idx)]\n", " perm_sample_not_nren_idx = both_perm_idx[len(not_nren_idx):]\n", " \n", " replicates[i] = perc_loss(data_nren_valid.loc[perm_sample_nren_idx,])-\\\n", " perc_loss(data_nren_valid.loc[perm_sample_not_nren_idx,])" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "alt.Chart(pd.DataFrame({'x': list(replicates)}))\\\n", " .mark_bar()\\\n", " .encode(\n", " alt.X(\"x:Q\", title='difference of mean of packet loss between NRENS and not-NRENS under simulated H0'),\n", " y='count()')\\\n", " .properties(\n", " width = WIDTH,\n", " height = 200)\\\n", " .save('./img/estimate-diff-percentage-packet-loss.png')" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "![](./img/estimate-diff-percentage-packet-loss.png?modified=333)" ], "text/plain": [ "" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utils.show_chart('./img/estimate-diff-percentage-packet-loss.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's the probability of obtaining a value of our test statistic (difference of mean of packet loss) that is at least as extreme as what was observed, under the assumption the null hypothesis is true?" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0 %\n" ] } ], "source": [ "print(100 * sum(replicates <= observed_diff) / len(replicates), '%')" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "ripe-ncc-recurring-traceroute-to-ttn.ipynb", "provenance": [], "version": "0.3.2" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 1 }