{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## TriScale\n", "# Case Study - Failure Detection\n", "\n", "This notebook presents a case study of the TriScale framework. It revisits the analysis of [Blink](https://www.usenix.org/conference/nsdi19/presentation/holterbach), an algorithm that detects failuresand reroutes traffic directly in the data plane. Parts of this case study are described in the [TriScale paper](https://doi.org/10.5281/zenodo.3464273).\n", "\n", "\n", "## List of Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import copy\n", "from pathlib import Path\n", "import zipfile\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import plotly.graph_objects as go\n", "\n", "import triscale\n", "import triplots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download Source Files and Data\n", "[[Back to top](#TriScale)]\n", "\n", "The dataset for this case study is available on Zenodo: \n", "\n", "[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3451417.svg)](https://doi.org/10.5281/zenodo.3451417)\n", "\n", "\n", "The wget commands below download the required files to reproduce this case study. Downloading and unzipping might take a while...\n", "> **The .zip file is ~100 kB**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nothing to download\n" ] } ], "source": [ "# Set `download = True` to download (and extract) the data from this case study\n", "# Eventually, adjust the record_id for the file version you are interested in.\n", "\n", "# For reproducing the results of the TriScale paper, set `record_id = 3666724`\n", "\n", "download = True\n", "record_id = 3666724 # v3.0.1 (https://doi.org/10.5281/zenodo.3666724)\n", "\n", "files= ['UseCase_FailureDetection.zip']\n", "if download:\n", " for file in files:\n", " print(file)\n", " url = 'https://zenodo.org/record/'+str(record_id)+'/files/'+file \n", " os.system('wget %s' %url)\n", " if file[-4:] == '.zip': \n", " with zipfile.ZipFile(file,\"r\") as zip_file:\n", " zip_file.extractall()\n", " print('Done.')\n", "else: \n", " print('Nothing to download')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now import the custom module for the case study. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import UseCase_FailureDetection.failuredetection as fd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation objectives\n", "[[Back to top](#TriScale)]\n", "\n", "In this case study, 30 prefixes of 15 different internet traces have been selected. For each of these prefixes, 5 artificial traces have been generates, all of which include a failure. We are interested in evaluating\n", "1. The ratio of failures which are correctly detected (true positives)\n", "2. The time taken until the traffic is rerouted\n", "\n", "The experiment has been designed and performed by the authors of [the Blink paper](https://www.usenix.org/conference/nsdi19/presentation/holterbach). In this case study, we only perform the data analysis, using _TriScale_ approach to generalize the results.\n", "\n", "### 1. Compute the Metrics\n", "For each prefix, we compute two metrics\n", "1. The true positive rate; that is, the ratio of failures correctly detected by the algorithm. Since there are 5 synthetic trace per prefix, this metric has values in {0, 0.2, 0.4, 0.6, 0.8, 1}\n", "2. The median time taken to reroute the traffic (considering only the failures that have been detected)\n", "\n", "The computation of metric values is performed by the `compute_metrics()` function below." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output retrieved from file. Skipping computation.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ProtocolTracePrefixTPRSpeed_s
0blink100.61.998730
1blink111.01.579861
2blink120.0NaN
3blink131.01.707236
4blink140.81.419164
..................
1345infinite_timeout15250.81.681014
1346infinite_timeout15260.0NaN
1347infinite_timeout15270.42.107471
1348infinite_timeout15281.00.717849
1349infinite_timeout15291.00.743098
\n", "

1350 rows × 5 columns

\n", "
" ], "text/plain": [ " Protocol Trace Prefix TPR Speed_s\n", "0 blink 1 0 0.6 1.998730\n", "1 blink 1 1 1.0 1.579861\n", "2 blink 1 2 0.0 NaN\n", "3 blink 1 3 1.0 1.707236\n", "4 blink 1 4 0.8 1.419164\n", "... ... ... ... ... ...\n", "1345 infinite_timeout 15 25 0.8 1.681014\n", "1346 infinite_timeout 15 26 0.0 NaN\n", "1347 infinite_timeout 15 27 0.4 2.107471\n", "1348 infinite_timeout 15 28 1.0 0.717849\n", "1349 infinite_timeout 15 29 1.0 0.743098\n", "\n", "[1350 rows x 5 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Construct the path to the different test results\n", "result_dir = Path('UseCase_FailureDetection')\n", "config_file = Path('UseCase_FailureDetection/config.yml')\n", "\n", "out_file = result_dir / 'metrics.csv'\n", "df = fd.compute_metrics(config_file, result_dir, out_name=out_file)\n", "display(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Compute the KPIs\n", "For each set of prefixes, we compute one KPI: the 95% CI on the median of each metric (TPR and recovery time)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output retrieved from file. Skipping computation.\n" ] } ], "source": [ "KPI = { 'percentile' : 50,\n", " 'confidence' : 95,\n", " 'bounds': [0,1],\n", " 'bound': 'lower',\n", " }\n", "out_file = result_dir / 'kpis.csv'\n", "kpis = fd.compute_kpis(df,KPI,config_file,out_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then plot these KPIs for each of the Internet traces." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "marker": { "color": "rgb(0, 127, 0)" }, "name": "Blink", "type": "bar", "x": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ], "y": [ 0.6, 1, 1, 1, 0.8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ] }, { "marker": { "color": "rgb(250, 128, 114)" }, "name": "All flows", "type": "bar", "x": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ], "y": [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ] }, { "marker": { "color": "rgb(146, 146, 146)" }, "name": "Inf. Timeout", "type": "bar", "x": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ], "y": [ 0, 0.4, 0, 0, 0.2, 0.6, 0, 0, 0, 0, 0.8, 0, 0, 0.4, 0.8 ] } ], "layout": { "bargap": 0.3, "template": { "data": { "scatter": [ { "type": "scatter" } ] } }, "xaxis": { "tickvals": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ], "title": { "text": "Trace ID" } }, "yaxis": { "title": { "text": "True Positive Rate" } } } }, "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "marker": { "color": "rgb(0, 127, 0)" }, "name": "Blink", "type": "bar", "x": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ], "y": [ 0.7076479999999999, 0.7029770000000001, 0.668427, 0.656931, 0.6665270000000001, 0.6663100000000001, 0.658366, 1.018786, 0.6303979999999999, 0.918007, 0.619273, 0.950063, 0.935169, 0.630408, 0.619758 ] }, { "marker": { "color": "rgb(250, 128, 114)" }, "name": "All flows", "type": "bar", "x": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ], "y": [ 0.323455, 0.31403200000000003, 0.301345, 0.262991, 0.306316, 0.285565, 0.265935, 0.446373, 0.24131100000000005, 0.373739, 0.227582, 0.492607, 0.408211, 0.247434, 0.230497 ] }, { "marker": { "color": "rgb(146, 146, 146)" }, "name": "Inf. Timeout", "type": "bar", "x": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ], "y": [ 0.784892, 1.1097845, 0.697445, 0.675285, 0.8291440000000001, 0.752655, 1.013427, 0.932456, 0.665686, 0.995168, 0.654143, 1.170505, 1.113701, 0.7526609999999999, 1.024533 ] } ], "layout": { "bargap": 0.3, "template": { "data": { "scatter": [ { "type": "scatter" } ] } }, "xaxis": { "tickvals": [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ], "title": { "text": "Trace ID" } }, "yaxis": { "title": { "text": "Speed [s]" } } } }, "text/html": [ "
\n", " \n", " \n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "figure = fd.plot_TPR(kpis,config_file)\n", "figure.show()\n", "figure = fd.plot_speed(kpis,config_file)\n", "figure.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using TriScale, we can generalize the results. For each\n", "trace, the evaluation of Blink on one prefix can be seen as a\n", "TriScale run. Since the prefixes are randomly selected from\n", "a fixed set, runs are i.i.d. and we can use TriScale’s KPI to\n", "derive the expected performance of Blink for any set of prefixes. \n", "\n", "> We can claim with 95% confidence that, for\n", "at least 50% of the prefixes, \n", "Blink always detects link failures\n", "(TPR= 1) and reroutes traffic within 1 s or less" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }