{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# IPYNB - Alejandro Franco Vázquez \n", "### Código del Proyecto: Implementacion de un IDS con algoritmos de Machine Learning\n", " Última actualización: [04/06/2018]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En este notebook de jupyter se mostrarán ejemplos del código empleado en el proyecto. Los datos están simplificados de modo que las salidas mostrarán pocas filas (5) para no hacerlo demasiado complejo. Hay comentarios en las partes más importantes.\n", "\n", "*Los bloques de código deberán ser ejecutados por orden para ir guardando las variables.*\n", "\n", "**La mayoría de bloques de código están desarrollados por el autor del proyecto.**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00) \n", "[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]\n" ] } ], "source": [ "import sys\n", "print (sys.version)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Table of contents\n", "\n", "- [1. Importación de datos](#1.-Importación-de-datos)\n", "- [2. Agrupación de datos](#2.-Agrupación-de-datos)\n", "- [3. Normalización de datos](#3.-Normalización-de-datos)\n", " * [3.1 Vanilla Python](#3.1-Vanilla-Python)\n", " * [3.2 Numpy](#3.2-NumPy)\n", " * [3.3 Visualization](#3.3-Visualization)\n", " \n", " \n", "- [4. Isolation Forest](#4.-Isolation-Forest)\n", " * [4.1 Plot Isolation Forest](#4.1-Plot-Isolation-Forest)\n", " \n", " \n", "- [5. Gráficas](#5.-Gráficas)\n", " * [5.1 Sin ordenar tiempos](#5.1-Sin-ordenar-Tiempos)\n", " * [5.2 Ordenando tiempos](#5.2-Ordenando-Tiempos)\n", " \n", " \n", "- [6. Mapa de anomalías](#6.-Mapa-anomalías)\n", "\n", "- [8. Detector de IP pública o privada](#8.-Public-or-Private-IP-detector)\n", "- [9. Detector pais](#9.-Detector-pais)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Importación de datos" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
notimexinfoipsrcipdstprotolencount
012017-03-20 17:08:530snull10.3.20.10237.202.7.169TCP661
122017-03-20 17:08:540snull37.202.7.16910.3.20.102TCP601
232017-03-20 17:08:540snull10.3.20.10237.202.7.169TCP601
342017-03-20 17:08:540snull10.3.20.10237.202.7.169HTTP3091
452017-03-20 17:08:540snull37.202.7.16910.3.20.102TCP601
\n", "
" ], "text/plain": [ " no time x info ipsrc ipdst proto len \\\n", "0 1 2017-03-20 17:08:53 0s null 10.3.20.102 37.202.7.169 TCP 66 \n", "1 2 2017-03-20 17:08:54 0s null 37.202.7.169 10.3.20.102 TCP 60 \n", "2 3 2017-03-20 17:08:54 0s null 10.3.20.102 37.202.7.169 TCP 60 \n", "3 4 2017-03-20 17:08:54 0s null 10.3.20.102 37.202.7.169 HTTP 309 \n", "4 5 2017-03-20 17:08:54 0s null 37.202.7.169 10.3.20.102 TCP 60 \n", "\n", " count \n", "0 1 \n", "1 1 \n", "2 1 \n", "3 1 \n", "4 1 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "df = pd.read_csv('/home/alexfrancow/AAA/ransomware2s.csv')\n", "df.columns = ['no', 'time', 'x', 'info', 'ipsrc', 'ipdst', 'proto', 'len']\n", "df['info'] = \"null\"\n", "df.parse_dates=[\"time\"]\n", "df['time'] = pd.to_datetime(df['time'])\n", "\n", "# Se añade la columna [count] con valor 1 para luego hacer las sumas.\n", "df['count'] = 1\n", "\n", "df.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Agrupación de datos" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3.25 s, sys: 46.9 ms, total: 3.3 s\n", "Wall time: 3.3 s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ipdstprototimecount
010.3.20.102HTTP2017-03-20 17:08:553
710.3.20.102HTTP2017-03-20 17:09:301
810.3.20.102TCP2017-03-20 17:08:503
910.3.20.102TCP2017-03-20 17:08:55104
1010.3.20.102TCP2017-03-20 17:09:00204
\n", "
" ], "text/plain": [ " ipdst proto time count\n", "0 10.3.20.102 HTTP 2017-03-20 17:08:55 3\n", "7 10.3.20.102 HTTP 2017-03-20 17:09:30 1\n", "8 10.3.20.102 TCP 2017-03-20 17:08:50 3\n", "9 10.3.20.102 TCP 2017-03-20 17:08:55 104\n", "10 10.3.20.102 TCP 2017-03-20 17:09:00 204" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Se hace la agrupación por [ipdst] y [proto], un .resample del [time] en 5 segundos y se hace la suma. \n", " # También resetea el index y se dropea los valores NaN.\n", "%time dataGroup2 = df.groupby(['ipdst','proto']).resample('5S', on='time').sum().reset_index().dropna()\n", "\n", "# Quitamos los decimales.\n", "pd.options.display.float_format = '{:,.0f}'.format\n", "\n", "# Se depura la salida seleccionando unas columnas y un número de filas.\n", "dataGroup2 = dataGroup2.head()[['ipdst','proto','time','count']]\n", "dataGroup2\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Podemos usar:\n", "```python \n", "dataGroup2 = dataGroup2[dataGroup2.ipsrc != '10.10.31.101'] \n", "dataGroup2 =dataGroup2[dataGroup2.ipdst != '10.10.31.101']\n", "```\n", "para eliminar la fila que tenga esa IP, esto es útil si queremos sacar nuestra IP de la lista." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Normalización de datos\n", "http://sebastianraschka.com/Articles/2014_about_feature_scaling.html#about-min-max-scaling " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ipdstprototimecountcount_n
010.3.20.102HTTP2017-03-20 17:08:5530
710.3.20.102HTTP2017-03-20 17:09:3010
810.3.20.102TCP2017-03-20 17:08:5030
910.3.20.102TCP2017-03-20 17:08:551041
1010.3.20.102TCP2017-03-20 17:09:002041
\n", "
" ], "text/plain": [ " ipdst proto time count count_n\n", "0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 0\n", "7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 0\n", "8 10.3.20.102 TCP 2017-03-20 17:08:50 3 0\n", "9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1\n", "10 10.3.20.102 TCP 2017-03-20 17:09:00 204 1" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataNorm = dataGroup2.copy()\n", "\n", "# Se aplica la fórmula de escalado de variables.\n", "dataNorm['count_n'] = (dataGroup2['count'] - dataGroup2['count'].min()) / (dataGroup2['count'].max() - dataGroup2['count'].min())\n", "\n", "dataNorm = dataNorm.head(5)\n", "dataNorm" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAGoCAYAAABL+58oAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt0VOW9//HPNwEJEAhBaUQCghXlUpJQIwXpoaHUgi4K\nyHIVLW1JsdLjD2urZXGwFtFWWnsWpyytrT1UKboqBUUo1LZewI63UrlYBMOlYg0QTkSKQyBIFMjz\n+yOTNEgiuU0m+eb9WitrJjt79n6Sp1Pe7r1nxkIIAgAA8CQp0QMAAABoagQOAABwh8ABAADuEDgA\nAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAO+0SPQBJOu+880Lfvn2bbX/Hjh1T586dm21/\niC/m0xfm0xfm05eWMJ+bN2/+Vwihx9nWaxGB07dvX23atKnZ9heJRJSXl9ds+0N8MZ++MJ++MJ++\ntIT5NLM9dVmPU1QAAMAdAgcAALhD4AAAAHdaxDU4NTlx4oSKiopUVlbW5NtOS0vTjh07mny7SIy2\nPJ8pKSnKzMxU+/btEz0UAGhRWmzgFBUVqUuXLurbt6/MrEm3ffToUXXp0qVJt4nEaavzGULQoUOH\nVFRUpH79+iV6OADQorTYU1RlZWU699xzmzxuAC/MTOeee25cjnICQGvXYgNHEnEDnAXPEQCoWYsO\nHAAAgIYgcD7GFVdc0eTbLCws1NKlS5t8u5UikYjGjx8ft+0DANAatNiLjOtr6ztbtXLnSu0t2as+\naX00ecBkZZ2f1aht/vWvf22i0f1bZeB85StfafJtAwCACi6O4Gx9Z6sWrF+g6PGoMrtmKno8qgXr\nF2jrO1sbtd3U1FRJ/35r6muvvVYDBgzQ1KlTFUKQVPExE7Nnz9aQIUM0bNgw7d69W5KUn5+vFStW\nnLGtOXPm6KWXXlJOTo4WLlx42v6Ki4s1atQo5eTk6FOf+pReeuklSdLTTz+tT3/608rOztaYMWMk\nSRs2bNCIESM0dOhQXXHFFdq1a9cZ4z927JimT5+uYcOGaejQoVq9enWj/h4AALQWLo7grNy5Uukp\n6UrvmC5JVbcrd65s9FGcSn//+99VUFCgCy64QCNHjtQrr7yiz372s5Iq3odl27ZtevTRR/Xd735X\nTz31VK3buffee7VgwYIa11m6dKnGjh2rO+64Q6dOndL777+vgwcP6sYbb9SLL76ofv366b333pMk\nDRgwQC+99JLatWuntWvX6vvf/76efPLJ07Y3f/58ff7zn9fixYt1+PBhDRs2TF/4whcS/kFpAADE\nm4vA2VuyV5ldM09blpaSpr0le5tsH8OGDVNmZsU+cnJyVFhYWBU4119/fdXtrbfe2uB9XH755Zo+\nfbpOnDihSZMmKScnR5FIRKNGjap6n5Pu3btLkkpKSjRt2jS9+eabMjOdOHHijO09++yzWrNmjRYs\nWCCp4qX3e/fu1cCBAxs8RgAAWgMXgdMnrY+ix6NVR24kqaSsRH3S+jTZPjp06FB1Pzk5WSdPnqz6\nvvpLdSvvt2vXTuXl5ZKk8vJyffjhh2fdx6hRo/Tiiy/qj3/8o/Lz83XbbbcpPT29xnXnzp2r0aNH\na9WqVSosLKzx011DCHryySd16aWX1ul3BACgoeJxLWxjuLgGZ/KAyYqWRRU9HlV5KFf0eFTRsqgm\nD5jcLPtfvnx51e2IESMkVVybs3nzZknSmjVrqo6wdOnSRUePHq1xO3v27FFGRoZuvPFGffOb39Rr\nr72m4cOH68UXX9Tbb78tSVWnqEpKStSrVy9J0pIlS2rc3tixY/Xzn/+86nqhv//9703w2wIAcLp4\nXQvbGC4CJ+v8LM0aMUvpHdNVdKRI6R3TNWvErGYrx2g0qqysLN13331VFw7feOONeuGFF5Sdna31\n69dXXfeSlZWl5ORkZWdnn3GRcSQSUXZ2toYOHarly5frO9/5jnr06KFFixZp8uTJys7O1pQpUyRJ\ns2fP1u23366hQ4eedjSpurlz5+rEiRPKysrS4MGDNXfu3Dj+FQAAbVX1a2GTLEnpHdOVnpKulTtX\nJmxMVvlf94mUm5sbNm3adNqyHTt2xO1akab87KK+fftq06ZNOu+885pke6i/tvpZVJXi+VxJhMpX\nLcIH5tOX2uZz+urpyuyaqST793GT8lCuoiNFWjxxcZOOwcw2hxByz7aeiyM4AAAgcfqk9VFJWclp\ny5r6Wtj6InAaqbCwkKM3AIA2LdHXwtbkrIFjZr3N7C9mtt3MCszsO7Hl3c3sOTN7M3abXu0xt5vZ\nbjPbZWZj4/kLAACAxEr0tbA1qcvLxE9K+l4I4TUz6yJps5k9Jylf0roQwr1mNkfSHEn/ZWaDJF0n\nabCkCyStNbNLQgin4vMrAACARMs6PyuhQfNRZz2CE0IoDiG8Frt/VNIOSb0kTZT0SGy1RyRNit2f\nKGlZCOGDEMLbknZLGtbUAwcAAKhNva7BMbO+koZKelVSRgihOPajdyRlxO73krSv2sOKYssAAACa\nRZ3fydjMUiU9Kem7IYQj1d+9N4QQzKxerzc3sxmSZkhSRkaGIpHIaT9PS0ur9Q3xGuvUqVNx2zaa\nX1ufz7KysjOeP61ZaWmpq9+nrWM+fWlV8xlCOOuXpPaSnpF0W7VluyT1jN3vKWlX7P7tkm6vtt4z\nkkZ83PYvu+yy8FHbt28/Y9nH2bcvhJUrQ/jf/6243bev9nWPHDlSr21/nNWrV4ef/OQnTba9luTC\nCy8MBw8erPP6v/nNb8LMmTPPWD5v3rxwwQUXhLlz59Zr/zfccEMoKCg463pnm88lS5aEiy++OFx8\n8cVhyZIl9RpDc3rhhRfC0KFDQ3JycnjiiSeqlu/evTtkZ2eHzp071/i4+j5XWrq//OUviR4CmhDz\n6UtLmE9Jm0Id2qUur6IySQ9L2hFC+Fm1H62RNC12f5qk1dWWX2dmHcysn6T+kjY0uMDqoKhIWr1a\nev99KSOj4nb16orl8TZhwgTNmTMn/jtq5W699Vb98Ic/rNdjHnroIQ0aNKhR+33vvfd0991369VX\nX9WGDRt09913KxqNNmqb8dKnTx8tWbJEX/nKV05b/slPflJbtmxJ0KgAoHWqyzU4IyV9TdLnzWxL\n7OtqSfdKutLM3pT0hdj3CiEUSHpc0nZJT0uaGeL8CqqNG6Vu3aSuXaWkpIrbbt0qljdUYWGhBgwY\noPz8fF1yySWaOnWq1q5dq5EjR6p///7asKGi2ZYsWaKbb75ZkpSfn69bbrlFV1xxhS666CKtWLGi\nxm3n5+frpptu0vDhw3XRRRcpEolo+vTpGjhwoPLz86vWu+mmm5Sbm6vBgwdr3rx5kio+g+rSSy/V\nrl27JFV8gvmvf/3rM/YxZ84cDRo0SFlZWZo1a5Yk6cCBA7rmmmuUnZ2t7Oxs/fWvf5UkTZo0SZdd\ndpkGDx6sRYsW1Tjm3/72txo2bJhycnL0rW99S6dOVUzpb37zG11yySUaNmyYXnnllTr9be+66y5N\nmzZN//Ef/6ELL7xQK1eu1OzZszVkyBCNGzeu6nO78vLyVPkO16mpqbrjjjuUnZ2t4cOH68CBA3Xa\n1zPPPKMrr7xS3bt3V3p6uq688ko9/fTTH/uY5pifmvTt21dZWVlKSuLtqQCgseryKqqXQwgWQsgK\nIeTEvv4UQjgUQhgTQugfQvhCCOG9ao+ZH0L4ZAjh0hDCn+P7K0gHD0qpqacvS02tWN4Yu3fv1ve+\n9z3t3LlTO3fu1NKlS/Xyyy9rwYIF+vGPf1zjY4qLi/Xyyy/rqaee+tgjO9FoVOvXr9fChQs1YcIE\n3XrrrSooKNC2bduq/mt9/vz52rRpk7Zu3aoXXnhBW7duVVpamh544AHl5+dr2bJlikajuvHGG0/b\n9qFDh7Rq1SoVFBRo69at+sEPfiBJuuWWW/S5z31Or7/+ul577TUNHjxYkrR48WJt3rxZmzZt0v33\n369Dhw6dtr0dO3Zo+fLleuWVV7RlyxYlJyfrscceU3FxsebNm6dXXnlFL7/8srZv317nv+1bb72l\n559/XmvWrNFXv/pVjR49Wtu2bVPHjh31xz/+8Yz1jx07puHDh+v111/XqFGjqqLhscce08iRI5WT\nk3Pa17XXXitJ2r9/v3r37l21nczMTO3fv/+s44vH/EyZMuWMcebk5OjRRx+t898NAFA3db7IuCXr\n0UMqLa04clOptLRieWP069dPQ4YMkSQNHjxYY8aMkZlpyJAhKiwsrPExkyZNUlJSkgYNGvSxRxm+\n9KUvVW0rIyPjtP0UFhYqJydHjz/+uBYtWqSTJ0+quLhY27dvV1ZWlq688ko98cQTmjlzpl5//fUz\ntp2WlqaUlBTdcMMNGj9+vMaPHy9Jev7556v+MU1OTlZaWpok6f7779eqVaskSfv27dObb76pc889\nt2p769at0+bNm3X55ZdLko4fP65PfOITevXVV5WXl6cesT/0lClT9I9//KNOf9urrrpK7du315Ah\nQ3Tq1CmNGzdOkmr9255zzjlVv8dll12m5557TpI0depUTZgwock/iyoe81P5qfMAgPhzETiXX15x\nzY1UceSmtFQ6fFj63Ocat90OHTpU3U9KSqr6PikpqdZP8K7+mBD7INM77rij6qhE5X/9V9/WR/dz\n8uRJvf3221qwYIE2btyo9PR05efnq6ysTJJUXl6uHTt2qFOnTopGo8rMzDxtDO3atdOGDRu0bt06\nrVixQg888ICef/75GscbiUS0du1arV+/Xp06dVJeXl7Vfqr/HtOmTdNPfvKT05b//ve/r3GbdVH9\n92/fvr0qX5VX29+2+jrJyclV6zz22GP66U9/esZpnYsvvlgrVqxQr169Trviv6ioqE4f/BeP+Zky\nZUrVqavqbrvtNn39618/65gAAHXn4mR/ZqY0caLUqZN04EDF7cSJFctbgvnz52vLli31ulD0yJEj\n6ty5s9LS0nTgwAH9+c//PtO3cOFCDRw4UEuXLtU3vvGNqmtWKpWWlqqkpERXX321Fi5cWHUUYcyY\nMXrwwQclVby0uqSkRCUlJUpPT1enTp20c+dO/e1vfztjLGPGjNGKFSv07rvvSqq4cHfPnj36zGc+\noxdeeEGHDh3SiRMn9MQTT9T7b9NYU6dOrTp1Vv2r8vqnsWPH6tlnn1U0GlU0GtWzzz6rsWMrPj3k\n9ttvrzpyVV8NmZ/ly5efMc4tW7YQNwAQBy6O4EgVMdNSgqYpZGdna+jQoRowYIB69+6tkSNHSpJ2\n7dqlhx56SBs2bFCXLl00atQo3XPPPbr77rurHnv06FFNnDhRZWVlCiHoZz+rePHbfffdpxkzZujh\nhx9WcnKyHnzwQY0bN06/+tWvNHDgQF166aUaPnz4GWMZNGiQ7rnnHn3xi19UeXm52rdvr1/84hca\nPny47rrrLo0YMULdunVTTk5O8/xx6qF79+6aO3du1em1O++8U927d5ckbdu2TRMmTGjQdhszP7XZ\nuHGjrrnmGkWjUf3hD3/QvHnzVFBQ0KDxAUBbZ5WnURIpNzc3VL5aptKOHTs0cODAuOzv6NGjTX7N\nBmp31113KTU1terVXE2tofM5duxYPfPMM3EYUXykpqaqtLT0jOXxfK4kQiQSqdNpRLQOzKcvLWE+\nzWxzCCH3bOu5OEWFli01NVWLFi3SnXfemeihnKa1xM1bb72lnJwcZWRknH1lAICkFn6KKoSg6h8J\ngdZp1qxZcTt60xZ83Bv9tYQjsADQErXYIzgpKSk6dOgQ/wcO1CKEoEOHDiklJSXRQwGAFqfFHsHJ\nzMxUUVGRDjb23fpqUFZWxj8KjrTl+UxJSTnjbQIAAC04cNq3b69+/frFZduRSERDhw6Ny7bR/JhP\nAMBHtdhTVAAAAA1F4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD\n4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6B\nAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQO\nAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgA\nAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAA\nAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA\n3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4M5ZA8fMFpvZu2b2RrVld5nZ\nfjPbEvu6utrPbjez3Wa2y8zGxmvgAAAAtanLEZwlksbVsHxhCCEn9vUnSTKzQZKukzQ49phfmlly\nUw0WAACgLs4aOCGEFyW9V8ftTZS0LITwQQjhbUm7JQ1rxPgAAADqrTHX4HzbzLbGTmGlx5b1krSv\n2jpFsWUAAADNpl0DH/egpB9JCrHb/5E0vT4bMLMZkmZIUkZGhiKRSAOHUn+lpaXNuj/EF/PpC/Pp\nC/PpS2uazwYFTgjhQOV9M/u1pKdi3+6X1LvaqpmxZTVtY5GkRZKUm5sb8vLyGjKUBolEImrO/SG+\nmE9fmE9fmE9fWtN8NugUlZn1rPbtNZIqX2G1RtJ1ZtbBzPpJ6i9pQ+OGCAAAUD9nPYJjZr+TlCfp\nPDMrkjRPUp6Z5ajiFFWhpG9JUgihwMwel7Rd0klJM0MIp+IzdAAAgJqdNXBCCNfXsPjhj1l/vqT5\njRkUAABAY/BOxgAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAA\nwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAA\ndwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADc\nIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCH\nwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0C\nBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgc\nAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAA\nAIA7BA4AAHCnXaIHAAB1tfWdrVq5c6X2luxVn7Q+mjxgsrLOz0r0sAC0QBzBAdAqbH1nqxasX6Do\n8agyu2YqejyqBesXaOs7WxM9NAAtEIEDoFVYuXOl0lPSld4xXUmWpPSO6UpPSdfKnSsTPTQALRCB\nA6BV2FuyV2kpaactS0tJ096SvQkaEYCWjMAB0Cr0SeujkrKS05aVlJWoT1qfBI0IQEtG4ABoFSYP\nmKxoWVTR41GVh3JFj0cVLYtq8oDJiR4agBborIFjZovN7F0ze6Pasu5m9pyZvRm7Ta/2s9vNbLeZ\n7TKzsfEaOIC2Jev8LM0aMUvpHdNVdKRI6R3TNWvELF5FBaBGdXmZ+BJJD0h6tNqyOZLWhRDuNbM5\nse//y8wGSbpO0mBJF0haa2aXhBBONe2wAbRFWednETQA6uSsR3BCCC9Keu8jiydKeiR2/xFJk6ot\nXxZC+CCE8Lak3ZKGNdFYAQAA6qSh1+BkhBCKY/ffkZQRu99L0r5q6xXFlgEAADSbRr+TcQghmFmo\n7+PMbIakGZKUkZGhSCTS2KHUWWlpabPuD/HFfPrCfPrCfPrSmuazoYFzwMx6hhCKzaynpHdjy/dL\n6l1tvczYsjOEEBZJWiRJubm5IS8vr4FDqb9IJKLm3B/ii/n0hfn0hfn0pTXNZ0NPUa2RNC12f5qk\n1dWWX2dmHcysn6T+kjY0bogAAAD1c9YjOGb2O0l5ks4zsyJJ8yTdK+lxM7tB0h5JX5akEEKBmT0u\nabukk5Jm8goqAADQ3M4aOCGE62v50Zha1p8vaX5jBgUAANAYvJMxAABwh8ABAADuEDgAAMAdAgcA\nALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA\n4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACA\nOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADu\nEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD\n4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6B\nAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQO\nAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgA\nAMAdAgcAALhD4AAAAHfaNebBZlYo6aikU5JOhhByzay7pOWS+koqlPTlEEK0ccMEAACou6Y4gjM6\nhJATQsiNfT9H0roQQn9J62LfAwAANJt4nKKaKOmR2P1HJE2Kwz4AAABq1djACZLWmtlmM5sRW5YR\nQiiO3X9HUkYj9wEAAFAvFkJo+IPNeoUQ9pvZJyQ9J+nbktaEELpVWycaQkiv4bEzJM2QpIyMjMuW\nLVvW4HHUV2lpqVJTU5ttf4gv5tMX5tMX5tOXljCfo0eP3lztsphaNeoi4xDC/tjtu2a2StIwSQfM\nrGcIodjMekp6t5bHLpK0SJJyc3NDXl5eY4ZSL5FIRM25P8QX8+kL8+kL8+lLa5rPBp+iMrPOZtal\n8r6kL0p6Q9IaSdNiq02TtLqxgwQAAKiPxhzByZC0yswqt7M0hPC0mW2U9LiZ3SBpj6QvN36YAAAA\nddfgwAkh/FNSdg3LD0ka05hBAQAANAbvZAwAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4\nQ+AAAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAO\ngQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMAANwhcAAAgDsE\nDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4\nAADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AA\nAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMA\nANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAA\ncIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgROzomCF8pbkqf/P+ytvSZ5WFKxI9JAAAEAD\ntUv0AFqCFQUrNHvtbHU9p6t6du6pw8cPa/ba2ZKkawdfm+DRAQCA+uIIjqQHNj6grud0VbeO3ZSU\nlKRuHbup6zld9cDGBxI9NAAA0AAEjqT9R/era4eupy3r2qGr9h/dn6ARAQCAxmiTp6iOvF+mKT9a\npn3FZerdM0Up7S/WkXbF6tax27/X+eCIenXplcBRAgCAhorbERwzG2dmu8xst5nNidd+6uuxP/9D\ne/ef0Bsv9ZEd7qfi/e10/I0r9U5xOx0+fljl5eU6fPywjnx4RDdffnOihwsAABogLoFjZsmSfiHp\nKkmDJF1vZoPisa/6WLJE+n/55+nDY51UvL2//rWnh479Xx9dkNpH/d7/srp17KbiYxVHcv77C//N\nBcYAALRS8TpFNUzS7hDCPyXJzJZJmihpe5z2d1ZPPSX98IfSBx9KSclB5SeTtH9HpnoNLFJKag9Z\nUqki+ZFEDQ8AADSheJ2i6iVpX7Xvi2LLEubhh6UOHaTOXU5KQWrf4ZTanXNCB/f00OF/tVfvnimJ\nHB4AAGhCFkJo+o2aXStpXAjhm7HvvybpMyGEm6utM0PSDEnKyMi4bNmyZU0+jup27qy4PXWqXD16\nHNP/vdNRMqn8lOmczu+rT6/26tqJyGmNSktLlZqamuhhoIkwn74wn760hPkcPXr05hBC7tnWi9cp\nqv2Self7PjO2rEoIYZGkRZKUm5sb8vLy4jSUCvfdJx08KB0/Lk2f/rx+tOBTOna0nTqcI/1yyb80\n4apL4rp/xE8kElG8//eD5sN8+sJ8+tKa5jNep6g2SupvZv3M7BxJ10laE6d91ckNN1TETceOUnJy\nkj7R+RO6IL277vtpd00lbgAAcCUuR3BCCCfN7GZJz0hKlrQ4hFAQj33V1fjxFbcPP1xxO2BARfRU\nLgcAAH7E7Y3+Qgh/kvSneG2/IcaPr/iKRKT//M9EjwYAAMQLH9UAAADcIXAAAIA7BA4AAHCHwAEA\nAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAA\nuEPgAAAAdwgcAADgjoUQEj0GmdlBSXuacZfnSfpXM+4P8cV8+sJ8+sJ8+tIS5vPCEEKPs63UIgKn\nuZnZphBCbqLHgabBfPrCfPrCfPrSmuaTU1QAAMAdAgcAALjTVgNnUaIHgCbFfPrCfPrCfPrSauaz\nTV6DAwBQleYSAAADJklEQVQAfGurR3AAAIBjBA4AAHCnTQWOmY0zs11mttvM5iR6PKg/Mys0s21m\ntsXMNsWWdTez58zszdhteqLHiZqZ2WIze9fM3qi2rNb5M7PbY8/XXWY2NjGjRm1qmc+7zGx/7Dm6\nxcyurvYz5rMFM7PeZvYXM9tuZgVm9p3Y8lb5HG0zgWNmyZJ+IekqSYMkXW9mgxI7KjTQ6BBCTrX3\nYpgjaV0Iob+kdbHv0TItkTTuI8tqnL/Y8/M6SYNjj/ll7HmMlmOJzpxPSVoYe47mhBD+JDGfrcRJ\nSd8LIQySNFzSzNi8tcrnaJsJHEnDJO0OIfwzhPChpGWSJiZ4TGgaEyU9Erv/iKRJCRwLPkYI4UVJ\n731kcW3zN1HSshDCByGEtyXtVsXzGC1ELfNZG+azhQshFIcQXovdPypph6ReaqXP0bYUOL0k7av2\nfVFsGVqXIGmtmW02sxmxZRkhhOLY/XckZSRmaGig2uaP52zr9W0z2xo7hVV5OoP5bEXMrK+koZJe\nVSt9jralwIEPnw0h5KjiVONMMxtV/Yeh4n0PeO+DVor5c+FBSRdJypFULOl/Ejsc1JeZpUp6UtJ3\nQwhHqv+sNT1H21Lg7JfUu9r3mbFlaEVCCPtjt+9KWqWKw6EHzKynJMVu303cCNEAtc0fz9lWKIRw\nIIRwKoRQLunX+vcpC+azFTCz9qqIm8dCCCtji1vlc7QtBc5GSf3NrJ+ZnaOKC6PWJHhMqAcz62xm\nXSrvS/qipDdUMY/TYqtNk7Q6MSNEA9U2f2skXWdmHcysn6T+kjYkYHyoh8p/CGOuUcVzVGI+Wzwz\nM0kPS9oRQvhZtR+1yudou0QPoLmEEE6a2c2SnpGULGlxCKEgwcNC/WRIWlXxHFQ7SUtDCE+b2UZJ\nj5vZDZL2SPpyAseIj2Fmv5OUJ+k8MyuSNE/Svaph/kIIBWb2uKTtqnh1x8wQwqmEDBw1qmU+88ws\nRxWnMQolfUtiPluJkZK+JmmbmW2JLfu+WulzlI9qAAAA7rSlU1QAAKCNIHAAAIA7BA4AAHCHwAEA\nAO4QOAAAwB0CBwAAuEPgAAAAd/4/ogjKQ4+3cRIAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "\n", "from matplotlib import pyplot as plt\n", "\n", "def plot():\n", " plt.figure(figsize=(8,6))\n", "\n", " plt.scatter(dataGroup2['count'], dataGroup2['count'],\n", " color='green', label='input scale', alpha=0.5)\n", " \n", " plt.scatter(dataNorm['count_n'], dataNorm['count_n'],\n", " color='blue', label='min-max scaled [min=0, max=1]', alpha=0.3)\n", " \n", " plt.legend(loc='upper left')\n", " plt.grid()\n", "\n", " plt.tight_layout()\n", "\n", "plot()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Vanilla Python" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.74299773604806973, -0.76776432724967203, -0.74299773604806973, 0.50771511963284766, 1.7460446797129638]\n", "[0.009852216748768473, 0.0, 0.009852216748768473, 0.5073891625615764, 1.0]\n" ] } ], "source": [ "# Standardization\n", "\n", "x = dataGroup2['count']\n", "mean = sum(x)/len(x)\n", "std_dev = (1/len(x) * sum([ (x_i - mean)**2 for x_i in x]))**0.5\n", "\n", "z_scores = [(x_i - mean)/std_dev for x_i in x]\n", "print(z_scores)\n", "# Min-Max scaling\n", "\n", "minmax = [(x_i - min(x)) / (max(x) - min(x)) for x_i in x]\n", "print(minmax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 NumPy" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.74299774 -0.76776433 -0.74299774 0.50771512 1.74604468]\n", "[ 0.00985222 0. 0.00985222 0.50738916 1. ]\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Standardization\n", "\n", "x_np = np.asarray(x)\n", "z_scores_np = (x_np - x_np.mean()) / x_np.std()\n", "print(z_scores_np)\n", "\n", "# Min-Max scaling\n", "\n", "np_minmax = (x_np - x_np.min()) / (x_np.max() - x_np.min())\n", "print(np_minmax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Visualization" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqYAAAFgCAYAAABpIrurAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm4JHV97/H3lxmIDoNsgwgIDG6JKIEALvGiwSUqRKNJ\nDEFHQGKcCMFrrlkkEm06BkO4TxIlXjC4RJZRwA1FxYXoiAsYGEVGRAPqDKuygwwoDPO9f9TvQJ/m\nLH1m+pz+dZ/363nm4XRXddX311X97U9XVTeRmUiSJEmDttmgC5AkSZLAYCpJkqRKGEwlSZJUBYOp\nJEmSqmAwlSRJUhUMppIkSaqCwXTIRTvWRDteNOg6ZlO0I6MdT+rj8h56zqIdb4t2fKBfy+5Yx/ui\nHW/v93Il1ae2PhzteG6040eDrmOQoh0rox1/Vv5eFu340qBrUm8WDrqA+SjasQbYEXgQWAdcAByT\nrbxnmsd9GLg+W/n3s11jr6IdS4GfAptnK9cPtpqZy1a+a1OXEe14HfBn2coDOpb7xk1drqTZM4x9\nONpxPNAC/jJb+Z6O+98MvBtoZyuPz1Z+Hfj1TVjPGmBnYOds5a0d938X2AfYI1u5ZmOXP9eylSuA\nFYOuQ73xiOngvDxbuRjYF9gfqCZsjopohx+8JE1lGPvw/wCHd913RLm/n34KvHrsRrRjL2BRn9ch\nPYJv3AOWrbwh2nEB8PRoxx8Dx2Yr9xubHu14C/A7wOeAZUBGO/4S+Gq28uVltn2iHf8K7A58ATgi\nW/nL8vg3AG8FtgO+AbwxW3ljmZbAUcBfATvQfKI8JluP/N+BRTueCZwCPAW4D1iRrXwLcFGZ5c5o\nB8DvAjcD7wf2BhL4IvAX2co7y7LWAO+laa4T1fw3wFvKY8e9UUQ7fg/4R+CJwF3AB7OVx5dpS2ma\n6Z/RHFVYAzwv2nFYecxi4F+7lnc88KRs5WujHe8FXtcx+VHAP2Yrj492HAu8AXgscB1wXLbyU9GO\npwLvAzaPdtwDrM9WbtN9VKVf20FS/w1LHy4uBfaLdjwtW3lltONpNL3q0o56DwTOylY+vtxewxQ9\ndxJnlvn/vdw+AjiDppeOrWeqfvwnwInA3tnKu6MdBwH/CeyVrbylc0XRjkcBHwAOAhYAVwMvy1b+\nPNqxHfAvwEuARwNfy1a+MtqxbanxWTRZ5pvleb2+eyDdZ7Wmes6jHQuAk8p4f1HW/e8M6VnBYeQR\n0wGLduwKHAx8F/gMsEcJO2MOA87IVp5G8+I5KVu5uKMZAhwCvBTYA/hNSriKdrwA+KcyfSdgLXB2\nVwkvA55RHncIzYt/Iu8B3pOtfAxNEzq33P+88t9tSl0XA1HWuzPwVGBX4Piu5U1W80uBv6YJuE8G\nuq/bWkfTLLcBfg84Ktrxyq55fqes9yXRjj2BU2mex52B7YHHTzTAbOUxZQyLgQOAO4BPl8k/Bp4L\nbA20gbOiHTtlK68C3ghcXB67Tfdy+7wdJPXZEPXhMWOhEZoAdWYPw5ywvilcAjwm2vHUEtYOBc7q\nmmfSfpytPAf4FnBytGN74IM04fAWHukImt66K02PfiPNAZCxsS4CnkZzYODfyv2b0QTd3YHdyvzv\nnWZMnSZ7zt9AE5D3oTmS3v3+olnmEdPBOS/asZ7mU+bngHdlK38V7TgHeC1wXPkkvBT47DTLOrnj\n0/f5NC8oaD7Zfyhb+Z0y7e+AO6IdSzuuDzqxHMm8M9rx1fLYL0ywjgeAJ0U7lpRrji6ZrJhs5TXA\nNeXmLeUoQqvHmg8B/jNb+f0y7Xg6TidlK1d2LOOKaMdHaYLoeR33H5+tXFce/yrgs9nKi8rttwPH\nTFZ7mWeHsrw3ZSu/W9b7sY5ZzinP5TN5OLhOpZ/bQVL/DFsfHnMW8I1ox9/TBMb/RRN+N6a+qYwF\n4K8BVwE3dE7soR//BXAFsBI4P1s52XP4AE0gfVK28gpgValzJ5qQuH228o4y79fKum8DPjG2gGjH\nCcBXexjTmMme80NoDsJcX5Z7IvDCGSxXm8hgOjivzFZeOMH9pwMfLQ3nMODcbOWvplnWzzr+vpfm\nyCDlv98Zm5CtvCfacRuwC81p7okeu3iSdbwe+Afgh9GOn9JcZD9hk4l27EhzhPW5wFY0n2zv6Jpt\nqppXdUxb27XsZ9GcHno6sAXwa0BnaITmVPuYnTtvZyvXledgQtGOzYGPAx/JVp7dcf/hNJcXLC13\nLQaWTLacLv3cDpL6Z9j68Ngyro12XAO8C7g6W3lduZRqxvWVSxieW+7/8/JFoTFn0lyutQfNafxx\npuvH2co7ox0fo+mdfzRFbWfSHC09O9qxDU3wPq7cd3tHKO1c9yKao6cvBbYtd28V7ViQrXxwinWN\nmew5H/ee0fW35oCn8iuTrbwEuJ+mUbyG8adoZnrN4Y00pzkAiHZsSfOp9IZJHzF5XVdnK19Ncyrl\nn4GPl+VNVNO7yv17lVP/r6U5vd+Lm2ia0ZjduqZ/hOZU267Zyq1pru/sXnZnTeOWV5rZ9lOs/9+B\nu+m4tjXasTvNNbPH0Hxy3wb4fsd6p9sufdsOkmZfrX24yxk010g+IjDORLbyoLFLmLpCKdnKtTTX\n7R8MfHKCh0/Zj6Md+wB/CnwUOHmKGh7IVrazlXsCz6E5zX44TSjcroTVbn9F88sDzyrvM2OXlfX6\nXjOZmxh/udeuk82o2WEwrdMZNNfKPJCt/EbH/T8HnjCD5XwUODLasU+049doAuO3N+ZnPqIdr412\n7JCt3ADcWe7eANxS/ttZ11bAPcBd0Y5dgL+ZwarOBV4X7dizhMjuSwC2ovkE/cvyhazXTLO8jwMv\ni3YcEO3Yguao74T7fbTjz2lOQy0r4xwzFsBvKfMdSXOEYMzPgceX5U+kb9tB0pyprg93OQd4MQ9f\n7z9bXg+8YOzyqC6T9uPyhaazgLcBRwK7RDuOnmgF0Y7nRzv2Ktey3k1zan9DtvImmp/xOiXasW20\nY/Nox1gA3YrmutI7yxekut8rNta5wJujHbuUQPzWPi1XPTKY1ulMmuDTfaH5B4E9ox13RjvOe+TD\nxiunqN5Ocx3OTTRfWjp0I2t6KXBlNN88fw9waLbyvmzlvcAJwDdLXc+m+XLQvjx83dZEn7Qnq/kC\nmt/j+wrNdapf6ZrlaOAfoh2/AN7BNE05W3klzXVOH6F5Du4AHvGtzeLVNG84N0Y77in/3pat/AHN\nNzMvpnlT2ovmG6BjvgJcCfws2nFr90L7vB0kzY0a+3Dncu/LVl6Yrbxv+rk3aT0/zlZeNsnkqfrx\nPwHXZStPLZdBvBb4x2jHkydYzuNoDiLcTXMt69d4+Cj1YTRB9Yc0v/jyl+X+d9N8S3/sOw/9uib/\n/cCXaK6N/S7weWA9ze/dag5E+os01Yl2PJrmBbhvtvLqQdcjSfONfVgA5Weu3pet3H3amdUXfvmp\nTkcBl9oMJWlg7MPzUPlA8nyao6Y70lwi8KmBFjXPGEwrE80PIQf+dpokDYR9eF4LmsvRzqG5hvVz\nNJcpaI54Kl+SJElV8MtPkiRJqsKMTuUvWbIkly5dOkul9N+6devYcsstB11G3zmu4TKq44LhGNuq\nVatuzcwdBl1Hv/SjDw/DdtsYjmt4jOKYwHFNpddePKNgunTpUi67bLJfjajPypUrOfDAAwddRt85\nruEyquOC4RhbRKydfq7h0Y8+PAzbbWM4ruEximMCxzWVXnuxp/IlSZJUBYOpJEmSqmAwlSRJUhUM\nppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJ\nqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFU\nkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElV\nMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqS\nJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoG\nU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIk\nVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVWY9WC6YvUKlr57KdGOh/4t\n/IeFHP25o2d71RoiK1avYPXNq9msvRlL372UFatXDLokVWCsf7hf9MfY87nqplUsftdiFvzDAnuy\nxvE1p06dPWOu9odZDaYrVq9g+fnLWXvX2nH3P5gPcuplp9oIBTy8n9z/4P0kydq71rL8/OU2xHmu\ns3+4X2y67n687oF1bMgNgD1ZDV9z6tTdM+Zqf5jVYHrcfx3HvQ/cO+n001adNpur15CYaD+594F7\nOe6/jhtQRaqB+0V/TdePwZ483/maU6dB7Q+zGkyvvevaKac/mA/O5uo1JCbbT6bbfzTa3C/6q5fn\nzZ48v/maU6dB7Q+zGkx323q3KacviAWzuXoNicn2k+n2H40294v+6uV5syfPb77m1GlQ+8OsBtMT\nXngCizZfNOn05fstn83Va0hMtJ8s2nwRJ7zwhAFVpBq4X/TXdP0Y7Mnzna85dRrU/rBwNhe+bK9l\nQHOdQucXoBbEApbvt5xTfu+U2Vy9hsTYfnL7VbcTBLttvRsnvPCEh+7X/NTZP66961r3i03U+XwC\nbLn5lty3/j425AZ7sgBfcxqvu2fsvvXuc7I/zGowhWZg7tSazrK9lrHytpVsOGTDoEtRRewf/TX2\nfK5cuZJ7Xn3PoMtRhXzNqVNnz1jz6jVzsk5/YF+SJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJ\nklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWD\nqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmS\nqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCV\nJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUkSVIV\nDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIk\nSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLB\nVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKkRm9j5zxC3A2tkrp++WALcOuohZ4LiGy6iOC4ZjbLtn\n5g6DLqJf+tSHh2G7bQzHNTxGcUzguKbSUy+eUTAdNhFxWWbuP+g6+s1xDZdRHReM9thG2ahuN8c1\nPEZxTOC4+sFT+ZIkSaqCwVSSJElVGPVgetqgC5gljmu4jOq4YLTHNspGdbs5ruEximMCx7XJRvoa\nU0mSJA2PUT9iKkmSpCFhMJUkSVIVRiqYRsR2EfHliLi6/HfbSeZbExGrI+LyiLhsruvsVUS8NCJ+\nFBHXRMSxE0yPiDi5TL8iIvYdRJ0z1cO4DoyIu8r2uTwi3jGIOmciIj4UETdHxPcnmT6U2wp6GtvQ\nba/5Yh73kGVlPKsj4lsRsfcg6pyJ6cbUMd8zImJ9RLxqLuvbWL2Mq/SQyyPiyoj42lzXuDF62Ae3\njojzI+J7ZVxHDqLOmajmfSwzR+YfcBJwbPn7WOCfJ5lvDbBk0PVOM5YFwI+BJwBbAN8D9uya52Dg\nAiCAZwPfHnTdfRrXgcBnB13rDMf1PGBf4PuTTB+6bTWDsQ3d9poP/+Z5D3kOsG35+6Dax9XLmDrm\n+wrweeBVg667T9tqG+AHwG7l9mMHXXefxvW2sQwC7ADcDmwx6NqnGVcV72MjdcQUeAVwevn7dOCV\nA6xlUz0TuCYzf5KZ9wNn04yv0yuAM7JxCbBNROw014XOUC/jGjqZeRFN45nMMG4roKexqU7ztodk\n5rcy845y8xLg8XNc40z12hffBHwCuHkui9sEvYzrNcAnM/NagMwchrH1Mq4EtoqIABbT9ND1c1vm\nzNTyPjZqwXTHzLyp/P0zYMdJ5kvgwohYFRHL56a0GdsFuK7j9vXlvpnOU5tea35OOVVwQUQ8bW5K\nm1XDuK1mYtS21yiY7z1kzOtpjvLUbNoxRcQuwB8Ap85hXZuql231FGDbiFhZ3pMPn7PqNl4v43ov\n8FTgRmA18ObM3DA35c2aOekXC/u9wNkWERcCj5tg0nGdNzIzI2Ky38I6IDNviIjHAl+OiB+WTwqq\nw3doTuvcExEHA+cBTx5wTZqc20tViojn0wTTAwZdSx+8G3hrZm5oDsKNjIXAfsALgUcDF0fEJZn5\nP4Mta5O9BLgceAHwRJqs8fXMvHuwZdVv6IJpZr5osmkR8fOI2CkzbyqHlyc8JZCZN5T/3hwRn6I5\nLF9bML0B2LXj9uPLfTOdpzbT1tz5ws3Mz0fEKRGxJDNvnaMaZ8MwbquejOj2GgXztocARMRvAh8A\nDsrM2+aoto3Vy5j2B84uoXQJcHBErM/M8+amxI3Sy7iuB27LzHXAuoi4CNgbqDmY9jKuI4ETs7k4\n85qI+CnwG8B/z02Js2JO+sWoncr/DHBE+fsI4NPdM0TElhGx1djfwIuBCb+BNmCXAk+OiD0iYgvg\nUJrxdfoMcHj5ptyzgbs6LmWo1bTjiojHletyiIhn0uyntb+xTGcYt1VPRnR7jYL53EN2Az4JHDYk\nR96mHVNm7pGZSzNzKfBx4OjKQyn0tg9+GjggIhZGxCLgWcBVc1znTPUyrmtpjgITETsCvw78ZE6r\n7L856RdDd8R0GicC50bE64G1wCEAEbEz8IHMPJjmutNPlffRhcBHMvMLA6p3Upm5PiKOAb5I8w3A\nD2XmlRHxxjL9fTTfzDwYuAa4l+YTWtV6HNergKMiYj1wH3Bo+dRZrYj4KM2305dExPVAC9gchndb\njelhbEO3veaDed5D3gFsD5xSev36zNx/UDVPp8cxDZ1expWZV0XEF4ArgA0079U1Hix6SI/b653A\nhyNiNc232N9a+1mkWt7H/F+SSpIkqQqjdipfkiRJQ8pgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElS\nFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJ0DEayJ4EWD\nrkOPFMGHI/jHPi7v+AjOKn/vFsE9ESzo1/LLcp8bwY/6uUxplNmD+yeCKyM4cNB1DEoEr4vgGx23\n74ngCYOsadQYTCdRGtl9Zaf7eQkwi3t4XF+DzkyUUJQRHNJx38Jy39JZWN+BEWwoz9EvIvhRBEf2\neR0rI/izfi5zrmRybSaLM3lwU5ZTtt+TOpb79Ux+fdMrlOplD+5pfX3pwREsLTV+t+v+JRHcH8Ga\nsfsyeVomKzey3rHn581d97+53H/8xix3kEqP/8mg6xglBtOpvTyTxcC+wP7A3w+4nl7cDrT7fZRu\nCjeW5+gxwFuB90ew5xyte2AiiAhfP9IsswdPr589eFEET++4/Rrgp5taYJf/AQ7vuu+Icr/kG2sv\nMrkBuAB4egR/HMGqzukRvCWCT0ewHFgG/G35BHt+x2z7RHBFBHdFcE4Ej+p4/BsiuCaC2yP4TAQ7\nd0zLCN4YwdUR3BnB/4sgpij3C8D9wGsnmth9BHKC0xIZwdFlfb+I4J0RPDGCb0VwdwTnRrDFBM9R\nZnIecAewZwSfi+BNXeu+IoI/mKCmR0VwVgS3lTFeGsGOEZwAPBd4b3k+31vmf08E15V6VkXw3I5l\nHV9qPKPUf2UE+3dM/60IvlOmnQPjtsO2EXw2glsiuKP8/fiu5+6ECL4J3As8IYI9IvhaWd6XgSUd\n848dhVgYwW+XMYz9++XYUYgInhnBxWXsN0Xw3rHnOIKLyuK+Vx73J+UoyfUd63lqqe3OMt7f75j2\n4bLPfK7U+O0InvjIPUOqlz14dntwhzNpQuKYw4Ezupbx0GUR0/XbSVxKE4CfVpbxNJo+fGnHOibt\nxRFsF8H1Eby83F5ctl132B1b1usi+Emp76cRLOuY9oYIrirTfhDBvuX+YyP4ccf9kz5n0XFGa7p+\nG8GLozmqfVcEp5T3jqE8IzibDKY9iGBX4GDgu8BngD0ieGrHLIcBZ2RyGrACOKkc3n95xzyHAC8F\n9gB+E3hdWfYLgH8q03cC1gJnd5XwMuAZ5XGHAC+ZotwE3g60Ith8xoNtvATYD3g28LfAaTRNdlfg\n6cCrux8QwWblxbsNsBo4nY7GHMHewC7A5yZY3xHA1mX52wNvBO7L5Djg68Ax5fk8psx/KbAPsB3w\nEeBjnW8ywO/TPIfb0GyvsUC7BXAeTfPdDvgY8Ecdj9sM+E9gd2A34L6xx3Y4DFgObEWzrT4CrKIJ\npO9kfFN/SCYXlzEsBrYFvg18tEx+EPg/ZRm/DbwQOLo87nllnr3L48/pXG7ZxucDXwIeC7wJWBEx\n7lT/oUC7rPca4ISJapRqZQ+e9R485izg0AgWRHPUdTFNr5rKhP12Gmfy8FHTI8rtTpP24kxuB/6U\n5sjwY4F/Ay7PHB+gASLYEjgZOCiTrYDnAJeXaX8MHF/qeEwZx23loT+mOSiyNU3vPCuCnXoYF0zS\nbyNYAnwc+Dua97kflXrUxWA6tfMiuBP4BvA14F2Z/Ao4h/KCL5/2lgKfnWZZJ2dyY3lRnU8TrKD5\ndP+hTL5Tlv13wG9HjLse6cRM7szkWuCrHY+dUCafAW6Bjf4kdlImd2dyJfB94EuZ/CSTu2iOWvxW\nx7w7l+foVqAFHJbJj2ga1FMieHKZ7zDgnEzun2B9D9C8UJ+UyYOZrMrk7inGd1Ymt2WyPpN/AX4N\nxgWxb2Ty+XJt55nA3uX+ZwObA+/O5IFMPk7Hp/SyzE9kcm8mv6BpKL/TtfoPZ3JlJutp3sSeAbw9\nk19lchGMO0IzmZOBXwDHlfWuyuSSMp41wH9MsN7JPJvmzePETO7P5Cs0+2LnG9enMvnvUvMKptl/\npIrYg+emB4+5niYwvYgmsHUHxolM1m+nchbw6hLcDy23HzJdL87kSzQHFv6L5gPLn0+xrg00R9of\nnclN5TmFZtuclMml5WjzNZmsLcv/WNlXNpSDAVcDz+xhXDB5vz0YuDKTT5ZpJwM/63GZ84rBdGqv\nzGSbTHbP5OhM7iv3nw68ppzOOQw4tzS0qXTugPfCQxfx70zzCR2ATO6h+dS2Sw+Pncrf0wSfR003\n4wR+3vH3fRPc7lz/jeU52i6TfTKbIw2Z/JLy5hHNtZivZvImdybwReDsCG6M4KSpjjRE8Nfl9Mtd\npSFvTccpdB75fD0qgoU0z/UNmWTH9Iee+wgWRfAfEayN4G7gImCbGH+t2HUdf+8M3JHJuomWN0nt\nfw4cCLwmkw3lvqeUU1U/K+t9V9d4prIzcN3Ysjpq2NT9R6qBPXhuenCnM2iOJvc6/4T9NoJl8fCl\nSxd0PqAE/Gtoet3VmeP6aq+9+DSao8cfznzoSOc4pTf/Cc1ZuJvKKfbfKJN3pTky+ggRHB7B5dFc\nunFnWU+vPXmq/eyhcZb3oevRIxhMN0Iml9BcQ/RcmovDO1+8OeGDJncjzekK4KFTD9sDN2xijV+m\neeEf3TVpHbCo4/bjNmU90zid5mjEC4F7M7l4opnK0ct2JnvSnNp4GQ+f5hn3fEZzPenf0pxO2zaT\nbYC7YMprvsbcBOwS468P263j77+iOfL6rEweAw+dRu+cv7Oem4BtyzabaHnjlNrfCbyi64jwqcAP\ngSeX9b6tx/FAs//sGuO/iLUbm7j/SDWzB/espx7c5RPA7wE/KQFyo2SyolxOsTiTgyaY5QyanvuI\nU/BM04tLQD2tPPbo6PjVkgnq+GImv0tzhuuHwPvLpOvgkdfbR7B7mecYYPvyHvN9eu/Jk7kJxn1n\nITpv62EG0413Bs01Lw9kPnzhOs0n25n8ptlHgSMj2CeCX6P5BPntckp3Ux1HE+I6XQ78YflE+iTg\n9X1Yz4RKE9wA/AtTfPKO4PkR7FWazd00p/bHjgB2P59bAetpTpMtjOAdNNcH9eLi8tj/HcHmEfwh\n40/PbEVzNOLOCLajOS021fjWApfRfAN3iwgOgHHXtHWOcVfgXODwzEd8+3QrmnHfUz7NH9U1fap9\n6ts0n8r/tozpwFJD9zVy0qixB0+j1x7c9Zh1wAvY+MsQenUO8GKavthtul78NpoPIH8K/F/gjJjg\nVxCi+RLtK8qHjV8B9/Dwe8sHgL+OYL9ofmXlSSWUblmWfUtZxpEw7pcKNtbngL0ieGU5g/cXzO6H\nkqFlMN14Z9LsrGd13f9Bmm9E3hnBedMtJJMLaS6U/wTNJ6on0lxzs8ky+Sbw3113/xvNkYaf03ya\nXtGPdU3hDGAvHvk8dXoczUXhdwNX0VxLNtZE3wO8KppvZp5Mc8r/CzQ/LbIW+CXjT69Pqlxb9Yc0\np6lupznF88mOWd4NPJrmWq1Lynqm8xrgWWV5LSb+9A/NEYsdgY93nN4au9bpr8tyfkHzSf2crsce\nD5xe9qlDOieUMb0cOKjUfQpN+P1hD7VLw8we3JteevA4mVyWOfFp7n7J5L5MLuy4PKPTpL04gv2A\nt9D0uQeBf6YJksdOsJzNyrw30vTo36F88M/kYzTXrn6EpveeB2yXyQ9ogvzFNNtoL+CbfRjvrcAf\nAyfRXCqyJ82BjekuQZl3InOmZz0EEMGjgZuBfTO5etD11Cqan/BYnskBg65F0uiwB/fGHlyncvnV\n9cCyTL466Hpq4hHTjXcUcKkNcXIRLKK5vuq0QdciaeTYg6dhD65LBC+JYJtyycjYdwkuGXBZ1Vk4\n6AKGUTT5+rYKAAALFklEQVQ/jB7AKwdcSrUieAnNafILaU6VSFJf2IOnZw+u0m/TbIstgB/Q/OrE\nRJcyzGueypckSVIVPJUvSZKkKszoVP6SJUty6dKls1RK/61bt44tt9xy+hmHjOMaLqM6LhiOsa1a\nterWzNxh0HX0Sz/68DBst43huIbHKI4JHNdUeu3FMwqmS5cu5bLLLtv4qubYypUrOfDAAwddRt85\nruEyquOC4RhbREz5f+MaNv3ow8Ow3TaG4xoeozgmcFxT6bUXeypfkiRJVTCYSpIkqQoGU0mSJFXB\nYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmS\npCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhM\nJUmSVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJU\nBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kk\nSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpg\nMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJ\nUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCrMejBdsQKWLoWIh/8tXAhHHz3ba9Yw\nWbECVq+GzTZr9pcVKwZdkWow1j/cL/pj7PlctQoWL4YFC+zJGs/XnDp19oy52h9mNZiuWAHLl8Pa\ntePvf/BBOPVUG6EaY/vJ/fdDZrO/LF9uQ5zvOvuH+8Wm6+7H69bBhg3N3/Zkga85jdfdM+Zqf5jV\nYHrccXDvvZNPP+202Vy7hsVE+8m99zb3a/5yv+iv6fox2JPnO19z6jSo/WFWg+m11049/cEHZ3Pt\nGhaT7SfT7T8abe4X/dXL82ZPnt98zanToPaHWQ2mu+029fQFC2Zz7RoWk+0n0+0/Gm3uF/3Vy/Nm\nT57ffM2p06D2h1kNpiecAIsWTT59+fLZXLuGxUT7yaJFzf2av9wv+mu6fgz25PnO15w6DWp/mNVg\numxZc83S7ruPv3/BAjjqKDjllNlcu4bF2H6yxRbNN4R33725vWzZoCvTIHX2D/eLTdfdj7fcsvnm\nNdiT1fA1p07dPWOu9oeFs7v4ZgDu1JrOsmWwcuXD3xKWwP7Rb2PP58qVcM89g65GNfI1p06dPWPN\nmrlZpz+wL0mSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRV\nwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJ\nkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoY\nTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmS\nVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOp\nJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKq\nYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUk\nSVIVDKaSJEmqQmRm7zNH3AKsnb1y+m4JcOugi5gFjmu4jOq4YDjGtntm7jDoIvqlT314GLbbxnBc\nw2MUxwSOayo99eIZBdNhExGXZeb+g66j3xzXcBnVccFoj22Ujep2c1zDYxTHBI6rHzyVL0mSpCoY\nTCVJklSFUQ+mpw26gFniuIbLqI4LRntso2xUt5vjGh6jOCZwXJtspK8xlSRJ0vAY9SOmkiRJGhIG\nU0mSJFVhpIJpRGwXEV+OiKvLf7edZL41EbE6Ii6PiMvmus5eRcRLI+JHEXFNRBw7wfSIiJPL9Csi\nYt9B1DlTPYzrwIi4q2yfyyPiHYOocyYi4kMRcXNEfH+S6UO5raCnsQ3d9pov5nEPWVbGszoivhUR\new+izpmYbkwd8z0jItZHxKvmsr6N1cu4Sg+5PCKujIivzXWNG6OHfXDriDg/Ir5XxnXkIOqciWre\nxzJzZP4BJwHHlr+PBf55kvnWAEsGXe80Y1kA/Bh4ArAF8D1gz655DgYuAAJ4NvDtQdfdp3EdCHx2\n0LXOcFzPA/YFvj/J9KHbVjMY29Btr/nwb573kOcA25a/D6p9XL2MqWO+rwCfB1416Lr7tK22AX4A\n7FZuP3bQdfdpXG8byyDADsDtwBaDrn2acVXxPjZSR0yBVwCnl79PB145wFo21TOBazLzJ5l5P3A2\nzfg6vQI4IxuXANtExE5zXegM9TKuoZOZF9E0nskM47YCehqb6jRve0hmfisz7yg3LwEeP8c1zlSv\nffFNwCeAm+eyuE3Qy7heA3wyM68FyMxhGFsv40pgq4gIYDFND10/t2XOTC3vY6MWTHfMzJvK3z8D\ndpxkvgQujIhVEbF8bkqbsV2A6zpuX1/um+k8tem15ueUUwUXRMTT5qa0WTWM22omRm17jYL53kPG\nvJ7mKE/Nph1TROwC/AFw6hzWtal62VZPAbaNiJXlPfnwOatu4/UyrvcCTwVuBFYDb87MDXNT3qyZ\nk36xsN8LnG0RcSHwuAkmHdd5IzMzIib7LawDMvOGiHgs8OWI+GH5pKA6fIfmtM49EXEwcB7w5AHX\npMm5vVSliHg+TTA9YNC19MG7gbdm5obmINzIWAjsB7wQeDRwcURckpn/M9iyNtlLgMuBFwBPpMka\nX8/MuwdbVv2GLphm5osmmxYRP4+InTLzpnJ4ecJTApl5Q/nvzRHxKZrD8rUF0xuAXTtuP77cN9N5\najNtzZ0v3Mz8fEScEhFLMvPWOapxNgzjturJiG6vUTBvewhARPwm8AHgoMy8bY5q21i9jGl/4OwS\nSpcAB0fE+sw8b25K3Ci9jOt64LbMXAesi4iLgL2BmoNpL+M6Ejgxm4szr4mInwK/Afz33JQ4K+ak\nX4zaqfzPAEeUv48APt09Q0RsGRFbjf0NvBiY8BtoA3Yp8OSI2CMitgAOpRlfp88Ah5dvyj0buKvj\nUoZaTTuuiHhcuS6HiHgmzX5a+xvLdIZxW/VkRLfXKJjPPWQ34JPAYUNy5G3aMWXmHpm5NDOXAh8H\njq48lEJv++CngQMiYmFELAKeBVw1x3XOVC/jupbmKDARsSPw68BP5rTK/puTfjF0R0yncSJwbkS8\nHlgLHAIQETsDH8jMg2muO/1UeR9dCHwkM78woHonlZnrI+IY4Is03wD8UGZeGRFvLNPfR/PNzIOB\na4B7aT6hVa3Hcb0KOCoi1gP3AYeWT53VioiP0nw7fUlEXA+0gM1heLfVmB7GNnTbaz6Y5z3kHcD2\nwCml16/PzP0HVfN0ehzT0OllXJl5VUR8AbgC2EDzXl3jwaKH9Li93gl8OCJW03yL/a21n0Wq5X3M\n/yWpJEmSqjBqp/IlSZI0pAymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVIX/\nDykR6mgcGYLTAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from matplotlib import pyplot as plt\n", "\n", "fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(10,5))\n", "\n", "y_pos = [0 for i in range(len(x))]\n", "\n", "ax1.scatter(z_scores, y_pos, color='g')\n", "ax1.set_title('Python standardization', color='g')\n", "\n", "ax2.scatter(minmax, y_pos, color='g')\n", "ax2.set_title('Python Min-Max scaling', color='g')\n", "\n", "ax3.scatter(z_scores_np, y_pos, color='b')\n", "ax3.set_title('Python NumPy standardization', color='b')\n", "\n", "ax4.scatter(np_minmax, y_pos, color='b')\n", "ax4.set_title('Python NumPy Min-Max scaling', color='b')\n", "\n", "plt.tight_layout()\n", "\n", "for ax in (ax1, ax2, ax3, ax4):\n", " ax.get_yaxis().set_visible(False)\n", " ax.grid()\n", "\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 4. Isolation Forest" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countprediction
031
711
831
91041
10204-1
\n", "
" ], "text/plain": [ " count prediction\n", "0 3 1\n", "7 1 1\n", "8 3 1\n", "9 104 1\n", "10 204 -1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.pipeline import Pipeline\n", "from sklearn.ensemble import IsolationForest\n", "\n", "dataNorm = dataNorm[['count','count_n']]\n", "\n", "# La funcion iloc nos permite seleccionar desde una posición a otra en un array.\n", "dataTrain = dataNorm.iloc[0:5]\n", "\n", "iforest = IsolationForest(n_estimators=100, contamination=0.00001, max_samples=5)\n", "iforest.fit(dataTrain)\n", "clf = iforest.fit(dataTrain)\n", "prediction = iforest.predict(dataNorm)\n", "\n", "dataGroup2['prediction'] = prediction\n", "dataGroup2[['count','prediction']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example *iloc*:\n", "```python\n", "#return second position (python counts from 0, so 1)\n", "print (df.columns.get_loc('Taste'))\n", "1\n", "\n", "df.iloc[0:2, df.columns.get_loc('Taste')] = 'good'\n", "df.iloc[2:6, df.columns.get_loc('Taste')] = 'bad'\n", "print (df)\n", " Food Taste\n", "0 Apple good\n", "1 Banana good\n", "2 Candy bad\n", "3 Milk bad\n", "4 Bread bad\n", "5 Strawberry bad\n", "```\n", "#### Examples *Isolation Forest*:\n", "- n_estimators : int, optional (default=100)\n", " \n", " The number of base estimators in the ensemble.\n", " \n", " \n", "- contamination : float in (0., 0.5), optional (default=0.1)\n", " \n", " The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.\n", " \n", " \n", "- max_features : int or float, optional (default=1.0)\n", "\n", " The number of features to draw from X to train each base estimator.\n", " If int, then draw max_features features.\n", " If float, then draw max_features * X.shape[1] features.\n", "\n", "\n", "\n", "```python\n", "\n", "iforest = IsolationForest(n_estimators=100, contamination=0.1)\n", "\n", "```\n", "\n", "```\n", " \tcount \tprediction\n", "76 \t34 \t -1\n", "77 \t31 \t -1\n", "78 \t2 \t 1\n", "79 \t68 \t -1\n", "80 \t98 \t -1\n", "83 \t4 \t 1\n", "92 \t1 \t 1\n", "95 \t4 \t 1\n", "... 1 1\n", "... 1 1\n", "... 1 1\n", "\n", "```\n", "```python\n", "\n", "iforest = IsolationForest(n_estimators=100, contamination=0.01)\n", "\n", "```\n", "\n", "```\n", " \tcount \tprediction\n", "76 \t34 \t 1\n", "77 \t31 \t 1\n", "78 \t2 \t 1\n", "79 \t68 \t 1\n", "80 \t98 \t -1\n", "83 \t4 \t 1\n", "92 \t1 \t 1\n", "95 \t4 \t 1\n", "... 1 1\n", "... 1 1\n", "... 1 1\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.1 Plot Isolation Forest" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAGoCAYAAABL+58oAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHg9JREFUeJzt3X+QnXVh7/HPN0sgNRtCUG9kTNqElqshhB+y5Uap7Ubu\nAMIV9HpLcWxLxCua0k7t1XFQp1Ln6lirlnGqXm6qKKPUlKt1ZBREcLrYcUjlxwAS0AtIgDAQfsiP\nLBgxy/f+cU5yl5AlP3aXs/vd12tm55zz7LPP+X7znMO+eZ5zzpZaawAAWjKr1wMAAJhoAgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDn79XoASfKyl72sLlmyZMK3+9RTT2Xu3LkT\nvt2pxjzbM1Pmap7tmSlzNc/eueGGGx6ptb58d+tNicBZsmRJrr/++gnf7tDQUAYHByd8u1ONebZn\npszVPNszU+Zqnr1TSrlnT9ZzigoAaI7AAQCaI3AAgOZMidfg7Mqvf/3rbNq0KVu3bt3nbcyfPz+3\n3377BI5qatrbec6ZMyeLFi3K7NmzJ3FUANA7UzZwNm3alHnz5mXJkiUppezTNrZs2ZJ58+ZN8Mim\nnr2ZZ601jz76aDZt2pSlS5dO8sgAoDem7CmqrVu35qUvfek+xw27VkrJS1/60nEdGQOAqW7KBk4S\ncTNJ/LsC0LopHTgAAPtC4Exhg4ODk/IBiADQOoEzSbZt29brIQDAjNVU4IyMJE880bmcCBs3bsyy\nZcvyrne9K8uXL8+JJ56YX/7yl7npppuycuXKHHnkkXnLW96Sxx57LEnniMt73/veDAwM5LOf/WxW\nr16dNWvWZOXKlTn00EMzNDSUs88+O8uWLcvq1at33M+aNWsyMDCQ5cuX5/zzz5+YwQPADNZM4IyM\nJN/9bnLJJZ3LiYqcO+64I+eee242bNiQgw46KN/85jfzp3/6p/nkJz+ZW265JStWrMhHP/rRHes/\n88wzuf766/O+970vSfLYY4/l2muvzQUXXJDTTjstf/VXf5UNGzbkJz/5SW666aYkycc//vFcf/31\nueWWW3LNNdfklltumZjBA8AM1UzgDA8nmzYlixd3LoeHJ2a7S5cuzdFHH50kOfbYY3PXXXfl8ccf\nzx/8wR8kSc4666z88Ic/3LH+H/3RHz3n59/0pjellJIVK1Zk4cKFWbFiRWbNmpXly5dn48aNSZJL\nL700r3nNa3LMMcdkw4YNue222yZm8AAwQ03ZD/rbW/39yaJFyX33dS77+5Onnx7/dg844IAd1/v6\n+vL444+/4Po7/1n57T8/a9as52xr1qxZ2bZtW+6+++58+tOfznXXXZcFCxZk9erVPqMGAMapmSM4\nfX3Jqacmb39757Kvb3LuZ/78+VmwYEH+7d/+LUny1a9+dcfRnH3x5JNPZu7cuZk/f342b96cK664\nYqKGCgAzVjNHcJJO1MyfP/n3c/HFF+c973lPnn766Rx66KH58pe/vM/bOuqoo3LMMcfk1a9+dRYv\nXpzjjz9+AkcKAJOo1mT0h8fufLuHmgqcibZkyZLceuutO26///3v33F9/fr1z1t/aGjoObe/8pWv\njLmt0d8bff2FtgcAU8bQULJ1a3LSSZ2oqTW58spkzpxkcLDXo2vnFBUA8CKptRM369dn5PIr88Tj\nNSOXX5msX99ZXmuvR+gIDgCwl0pJTjopIyPJnV9bnye/sD4HHpj8zh+vTN/2Izo95ggOALD3Ssnw\n8SflySeTAw9MnnwyGT5+asRNInAAgH1Ra/p/dOWOuDnwwKT/R1dOidNTiVNUAMDe6r6guO+69fmd\nP16Z4eNPSv+POrfTl///wuMeEjgAwN4ppfNuqZWd19zMLyU55aRO3MyZ0/O4SfbgFFUpZXEp5V9L\nKbeVUjaUUv6yu/zgUspVpZQ7upcLRv3MB0spd5ZSflZKOWkyJ7DDzofEpsghsokwODiY66+/Pkly\nyimn7PbTlAFg0g0OPvdITfeFx1PhLeLJnr0GZ1uS99VaD0+yMsm5pZTDk5yX5Ae11sOS/KB7O93v\nnZlkeZKTk3yhlDJJnyvcNTTUee/99qjZfuis+2nDLbn88stz0EEH9XoYAPD8IzVT4MjNdrsNnFrr\nA7XWG7vXtyS5Pckrk5ye5OLuahcneXP3+ulJ1tVaf1VrvTvJnUmOm+iBjxrgjvfi74icKyfuvfhv\nfvObc+yxx2b58uVZu3ZtkqS/vz8f/vCHc9RRR2XlypXZvHlzkmTjxo15wxvekCOPPDInnHBC7r33\n3iTJ6tWrs2bNmqxcuTKHHnpohoaGcvbZZ2fZsmVZvXr1jvtas2ZNBgYGsnz58px//vm7HM+SJUvy\nyCOPJEm+9rWv5bjjjsvxxx+fd7/73RkZGcnIyEhWr16dI444IitWrMgFF1wwrvkDwHS0V6/BKaUs\nSXJMkn9PsrDW+kD3Ww8mWdi9/sokoz/md1N32c7bOifJOUmycOHC531q7/z587Nly5Y9G9jrXpe+\np59O3zXXJNdckyQZ+d3fzTOrVmVknH9W/LOf/WwOPvjg/PKXv8zg4GBOPPHEPPXUUznqqKNy3nnn\n5a//+q/zuc99Lh/4wAeyZs2anHHGGXn729+er371q/mzP/uzfP3rX8+vf/3rbNmyJd///vdz+eWX\n57TTTsv3v//9XHDBBRkcHMyPfvSjHHnkkTnvvPNy8MEHZ2RkJG9605ty8skn54gjjsjIyEieeuqp\nbNmyJbXWDA8PZ+PGjbnkkkvyve99L7Nmzcr73//+fPGLX8yyZcty77335tprr02SPP7447v8d9y6\ndeu0+6Tk4eHhaTfmfTVT5mqe7ZkpczXPqW+PA6eU0p/km0neW2t9sow6DFVrraWUvTpUUmtdm2Rt\nkgwMDNTBnc7Z3X777Zk3b96eb/Atb0luueU5t0eGh/duG7vwmc98Jt/61reSJPfff38efPDB7L//\n/vnDP/zDlFLy2te+NldddVXmzZuX6667Lpdddllmz56dd73rXfnIRz6SefPmZfbs2TnllFNy4IEH\n5rjjjsvChQuzcuXKJMmKFSvy8MMPZ968ebnkkkuydu3abNu2LQ888EDuueeevPa1r01fX1/mzp2b\nefPmpZSS/v7+fOc738nNN9+cN7zhDXn22Wfzq1/9KosWLcoZZ5yRe+65Jx/60Idy6qmn5sQTT8ys\nWc8/UDdnzpwcc8wx4/q3ebENDQ1l58dJq2bKXM2zPTNlruY59e1R4JRSZqcTN5fUWv+lu3hzKeWQ\nWusDpZRDkjzUXX5/ksWjfnxRd9nk2X5aarQrr0xe97pxbXZoaChXX311rr322rzkJS/J4OBgtm7d\nmtmzZ2d74PX19WXbtm273dYBBxyQJJk1a9aO69tvb9u2LXfffXc+/elP57rrrsuCBQuyevXqbN26\ndczt1Vpz1lln5ROf+ES2bNnynJC7+eabc+WVV+bCCy/MpZdemosuumhf/wkAYFrak3dRlSRfSnJ7\nrfXvR33rsiRnda+fleTbo5afWUo5oJSyNMlhSX48cUPeyejX3KxcmZx/fudy/fr0XX31uF6D88QT\nT2TBggV5yUtekp/+9Ke7/AObo73uda/LunXrkiSXXHJJXv/61+/xfT355JOZO3du5s+fn82bN+eK\nK654wfVPOOGEfOMb38hDD3W68he/+EXuueeePPLII3n22Wfz1re+NR/72Mdy44037vEYAKAVe3IE\n5/gkf5LkJ6WUm7rLPpTkb5NcWkp5Z5J7kpyRJLXWDaWUS5Pcls47sM6ttY5M+Mi3G/Ve/B1vVzup\n+870Z58d1yu6Tz755Fx44YVZtmxZXvWqV+04rTSWf/iHf8g73vGOfOpTn8rLX/7yfPnLX97j+zrq\nqKNyzDHH5NWvfnUWL16c448//gXXP/zww/Oxj30sJ554YrZt25YDDjggn//85/Mbv/Ebecc73pFn\nn302SfKJT3xij8cAAK0odQp8XszAwEDd/jkv291+++1ZtmzZnm+k1ufGTK3ZMgGvwZkOdj5FtSf2\n+t93CpjO54L31kyZq3m2Z6bM1Tx7p5RyQ611YHfrtfO3qKbwe/EBgBdXO4EDANA1pQNnKpw+a5F/\nVwBaN2UDZ86cOXn00Uf9Mp5gtdY8+uijmTNnTq+HAgCTZsr+NfFFixZl06ZNefjhh/d5G1u3bp0R\nv8j3dp5z5szJokWLJnFEANBbUzZwZs+enaVLl45rG0NDQ9Pu03r3xUyZJwDsqSl7igoAYF8JHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaM5uA6eUclEp5aFSyq2j\nlv1NKeX+UspN3a9TRn3vg6WUO0spPyulnDRZAwcAGMueHMH5SpKTd7H8glrr0d2vy5OklHJ4kjOT\nLO/+zBdKKX0TNVgAgD2x28Cptf4wyS/2cHunJ1lXa/1VrfXuJHcmOW4c4wMA2GvjeQ3OX5RSbume\nwlrQXfbKJPeNWmdTdxkAwIum1Fp3v1IpS5J8p9Z6RPf2wiSPJKlJ/meSQ2qtZ5dSPpdkfa31a931\nvpTkilrrN3axzXOSnJMkCxcuPHbdunUTMqHRhoeH09/fP+HbnWrMsz0zZa7m2Z6ZMlfz7J1Vq1bd\nUGsd2N16++3Lxmutm7dfL6X8Y5LvdG/en2TxqFUXdZftahtrk6xNkoGBgTo4OLgvQ3lBQ0NDmYzt\nTjXm2Z6ZMlfzbM9Mmat5Tn37dIqqlHLIqJtvSbL9HVaXJTmzlHJAKWVpksOS/Hh8QwQA2Du7PYJT\nSvl6ksEkLyulbEpyfpLBUsrR6Zyi2pjk3UlSa91QSrk0yW1JtiU5t9Y6MjlDBwDYtd0GTq31bbtY\n/KUXWP/jST4+nkEBAIyHTzIGAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABpqyRkeSJJzqXAHtjv14PAGBXRkaS73432bQpWbQoOfXUpK+v16MC\npgtHcIApaXi4EzeLF3cuh4d7PSJgOhE4wJTU3985cnPffZ3L/v5ejwiYTpyiAqakvr7Oaanh4U7c\nOD0F7A2BA0xZfX3J/Pm9HgUwHe32FFUp5aJSykOllFtHLTu4lHJVKeWO7uWCUd/7YCnlzlLKz0op\nJ03WwAEAxrInr8H5SpKTd1p2XpIf1FoPS/KD7u2UUg5PcmaS5d2f+UIpxYFlAOBFtdvAqbX+MMkv\ndlp8epKLu9cvTvLmUcvX1Vp/VWu9O8mdSY6boLECAOyRUmvd/UqlLEnynVrrEd3bj9daD+peL0ke\nq7UeVEr5XJL1tdavdb/3pSRX1Fq/sYttnpPknCRZuHDhsevWrZuYGY0yPDyc/hnw1gvzbM9Mmat5\ntmemzNU8e2fVqlU31FoHdrfeuF9kXGutpZTdV9Lzf25tkrVJMjAwUAcHB8c7lOcZGhrKZGx3qjHP\n9syUuZpne2bKXM1z6tvXz8HZXEo5JEm6lw91l9+fZPGo9RZ1lwEAvGj2NXAuS3JW9/pZSb49avmZ\npZQDSilLkxyW5MfjGyIAwN7Z7SmqUsrXkwwmeVkpZVOS85P8bZJLSynvTHJPkjOSpNa6oZRyaZLb\nkmxLcm6t1Z/JAwBeVLsNnFrr28b41gljrP/xJB8fz6AAAMbD36ICAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCas994friUsjHJliQjSbbVWgdKKQcn+eckS5JsTHJGrfWx8Q0TAGDPTcQRnFW11qNrrQPd2+cl\n+UGt9bAkP+jeBgB40UzGKarTk1zcvX5xkjdPwn0AAIyp1Fr3/YdLuTvJE+mcovrftda1pZTHa60H\ndb9fkjy2/fZOP3tOknOSZOHChceuW7dun8cxluHh4fT390/4dqca82zPTJmrebZnpszVPHtn1apV\nN4w6azSmcb0GJ8nv1VrvL6X8hyRXlVJ+OvqbtdZaStllQdVa1yZZmyQDAwN1cHBwnEN5vqGhoUzG\ndqca82zPTJmrebZnpszVPKe+cZ2iqrXe3718KMm3khyXZHMp5ZAk6V4+NN5BAgDsjX0OnFLK3FLK\nvO3Xk5yY5NYklyU5q7vaWUm+Pd5BAgDsjfGcolqY5Fudl9lkvyT/VGv9XinluiSXllLemeSeJGeM\nf5gAAHtunwOn1vrzJEftYvmjSU4Yz6AAAMbDJxkDAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAc2Zk4DzzTHLvvZ1LAKA9+/V6AC+2\nZ55JPvrR5K67kt/+7eT885P99+/1qACAiTTjjuA8+GAnbg47rHP54IO9HhEAMNGaD5yRkeSJJzqX\nSfKKV3SO3NxxR+fyFa/o7fgAgIk3aaeoSiknJ/lskr4kX6y1/u1k3dcL+e53k02bkkWLklNP7ZyO\nOv/8zpGbV7zC6SkAaNGkHMEppfQl+XySNyY5PMnbSimHT8Z9vZCRkU7cLF7cuRwe7izff//kN39T\n3ABAqybrFNVxSe6stf681vpMknVJTp+k+xpTX1/nyM1993Uu+/tf7BEAAL1Qaq0Tv9FS/luSk2ut\n/717+0+S/Kda65+PWuecJOckycKFC49dt27dhI9jeHg4/f39GRnpxE6rts+zdTNlnsnMmat5tmem\nzNU8e2fVqlU31FoHdrdez94mXmtdm2RtkgwMDNTBwcEJv4+hoaFMxnanGvNsz0yZq3m2Z6bM1Tyn\nvsk6RXV/ksWjbi/qLgMAmHSTFTjXJTmslLK0lLJ/kjOTXDZJ9wUA8ByTcoqq1rqtlPLnSa5M523i\nF9VaN0zGfQEA7GzSXoNTa708yeWTtX0AgLE0/0nGAMDMI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDm\nCBwAoDkCBwBojsABAJojcACA5pRaa6/HkFLKw0numYRNvyzJI5Ow3anGPNszU+Zqnu2ZKXM1z975\nrVrry3e30pQInMlSSrm+1jrQ63FMNvNsz0yZq3m2Z6bM1TynPqeoAIDmCBwAoDmtB87aXg/gRWKe\n7ZkpczXP9syUuZrnFNf0a3AAgJmp9SM4AMAMJHAAgOY0GTillJNLKT8rpdxZSjmv1+OZSKWUxaWU\nfy2l3FZK2VBK+cvu8r8ppdxfSrmp+3VKr8c6XqWUjaWUn3Tnc3132cGllKtKKXd0Lxf0epzjUUp5\n1ah9dlMp5clSyntb2Z+llItKKQ+VUm4dtWzMfVhK+WD3efuzUspJvRn13htjnp8qpfy0lHJLKeVb\npZSDusuXlFJ+OWrfXti7ke+dMeY55mN1uu7PZMy5/vOoeW4spdzUXT6d9+lYv1Om//O01trUV5K+\nJHclOTTJ/kluTnJ4r8c1gfM7JMlrutfnJfm/SQ5P8jdJ3t/r8U3wXDcmedlOy/4uyXnd6+cl+WSv\nxzmB8+1L8mCS32plfyb5/SSvSXLr7vZh93F8c5IDkiztPo/7ej2HcczzxCT7da9/ctQ8l4xebzp9\njTHPXT5Wp/P+HGuuO33/M0k+0sA+Het3yrR/nrZ4BOe4JHfWWn9ea30mybokp/d4TBOm1vpArfXG\n7vUtSW5P8srejupFdXqSi7vXL07y5h6OZaKdkOSuWutkfKp3T9Raf5jkFzstHmsfnp5kXa31V7XW\nu5Pcmc7zecrb1Txrrd+vtW7r3lyfZNGLPrAJNsb+HMu03Z/JC8+1lFKSnJHk6y/qoCbBC/xOmfbP\n0xYD55VJ7ht1e1MaDYBSypIkxyT59+6iv+geDr9oup+66apJri6l3FBKOae7bGGt9YHu9QeTLOzN\n0CbFmXnufzBb25/bjbUPW37unp3kilG3l3ZPZVxTSnl9rwY1gXb1WG15f74+yeZa6x2jlk37fbrT\n75Rp/zxtMXBmhFJKf5JvJnlvrfXJJP8rndNyRyd5IJ3Dp9Pd79Vaj07yxiTnllJ+f/Q3a+d4aROf\nc1BK2T/JaUn+T3dRi/vzeVrah2MppXw4ybYkl3QXPZDkN7uP7f+R5J9KKQf2anwTYEY8Vnfytjz3\nf0am/T7dxe+UHabr87TFwLk/yeJRtxd1lzWjlDI7nQfiJbXWf0mSWuvmWutIrfXZJP+YKXrIcG/U\nWu/vXj6U5FvpzGlzKeWQJOlePtS7EU6oNya5sda6OWlzf44y1j5s7rlbSlmd5L8keXv3l0S6h/Yf\n7V6/IZ3XMPzHng1ynF7gsdrc/kySUsp+Sf5rkn/evmy679Nd/U5JA8/TFgPnuiSHlVKWdv+v+Mwk\nl/V4TBOme+73S0lur7X+/ajlh4xa7S1Jbt35Z6eTUsrcUsq87dfTecHmrensy7O6q52V5Nu9GeGE\ne87/Eba2P3cy1j68LMmZpZQDSilLkxyW5Mc9GN+EKKWcnOQDSU6rtT49avnLSyl93euHpjPPn/dm\nlOP3Ao/VpvbnKP85yU9rrZu2L5jO+3Ss3ylp4Xna61c5T8ZXklPSeSX4XUk+3OvxTPDcfi+dQ4W3\nJLmp+3VKkq8m+Ul3+WVJDun1WMc5z0PTeaX+zUk2bN+PSV6a5AdJ7khydZKDez3WCZjr3CSPJpk/\nalkT+zOdaHsgya/TOVf/zhfah0k+3H3e/izJG3s9/nHO8850Xquw/Xl6YXfdt3Yf0zcluTHJm3o9\n/nHOc8zH6nTdn2PNtbv8K0nes9O603mfjvU7Zdo/T/2pBgCgOS2eogIAZjiBAwA0R+AAAM0ROABA\ncwQOANAcgQMANEfgAADN+X/bOQmsNS8kIwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x = dataGroup2[(dataGroup2['prediction'] == -1)]['count'].values\n", "\n", "%matplotlib inline\n", "\n", "from matplotlib import pyplot as plt\n", "\n", "def plot():\n", " plt.figure(figsize=(8,6))\n", "\n", " plt.scatter(dataGroup2['count'], dataGroup2['count'], s=6, label=\"normal\", alpha=0.3, color=\"blue\")\n", " \n", " plt.scatter(x, x, marker=\"x\", color=\"red\", label=\"anomalies\", alpha=0.5)\n", " \n", " plt.legend(loc='upper left')\n", " plt.grid()\n", "\n", " plt.tight_layout()\n", "\n", "plot()\n", "plt.show()\n", "\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ipdstprototimecountpredictionidpst_label
010.3.20.102HTTP2017-03-20 17:08:55310
710.3.20.102HTTP2017-03-20 17:09:30110
810.3.20.102TCP2017-03-20 17:08:50310
910.3.20.102TCP2017-03-20 17:08:5510410
1010.3.20.102TCP2017-03-20 17:09:00204-10
\n", "
" ], "text/plain": [ " ipdst proto time count prediction idpst_label\n", "0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 1 0\n", "7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 1 0\n", "8 10.3.20.102 TCP 2017-03-20 17:08:50 3 1 0\n", "9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1 0\n", "10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np, matplotlib.pyplot as plt\n", "from matplotlib.colors import ListedColormap\n", "\n", "def plot_decision(X, y, classifier, test_idx=None, resolution=0.02, figsize=(6,6)):\n", "\n", " # setup marker generator and color map\n", " markers = ('s', 'x', 'o', '^', 'v')\n", " colors = ('#cc0000', '#003399', '#00cc00', '#999999', '#66ffff')\n", " cmap = ListedColormap(colors[:len(np.unique(y))])\n", " \n", " # get dimensions\n", " x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n", " x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n", " xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))\n", " xmin = xx1.min()\n", " xmax = xx1.max()\n", " ymin = xx2.min()\n", " ymax = xx2.max()\n", " \n", " # create the figure\n", " fig, ax = plt.subplots(figsize=figsize)\n", " ax.set_xlim(xmin, xmax)\n", " ax.set_ylim(ymin, ymax)\n", " \n", " # plot the decision surface\n", " Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)\n", " Z = Z.reshape(xx1.shape)\n", " ax.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap, zorder=1)\n", " \n", " # plot all samples\n", " for idx, cl in enumerate(np.unique(y)):\n", " ax.scatter(x=X[y == cl, 0], \n", " y=X[y == cl, 1],\n", " alpha=0.6, \n", " c=cmap(idx),\n", " edgecolor='black',\n", " marker='o',#markers[idx],\n", " s=50,\n", " label=cl,\n", " zorder=3)\n", "\n", " # highlight test samples\n", " if test_idx:\n", " X_test, y_test = X[test_idx, :], y[test_idx]\n", " ax.scatter(X_test[:, 0],\n", " X_test[:, 1],\n", " c='w',\n", " alpha=1.0,\n", " edgecolor='black',\n", " linewidths=1,\n", " marker='o',\n", " s=150, \n", " label='test set',\n", " zorder=2)\n", " \n", "dataGroup2['idpst_label'], _ = pd.factorize(dataGroup2['ipdst'])\n", "dataGroup2" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Misclassified samples: 0\n", "Accuracy: 1.00\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAF3CAYAAAC/h9zqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAG+pJREFUeJzt3X2wXXV97/H3lySGxCSQk5BnTAggCtYeIBOqomLF+lCv\nEVsR6rUgTIOj5epc7xSUudbpXKdopVZp0cZKjb1cBAUMWkTBKoojIsGI4SGShEQTE2ISyYMhhCTf\n+8degU38nWQnOXuvfXLer5k9Z63fetjf8yNzPqy1fmutyEwkSdrbEXUXIEnqTgaEJKnIgJAkFRkQ\nkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpaGjdBRyKcaNG5bHjxtVdxoC1dft2XrR5M68Z\n+vv/DL67cydLjjqK5w8fXkNlktrpZ7/85frMPGZ/6w3ogDh23DjuuOKKussYsP7z/vvZcffdvLqn\n5/eWjd64keedeSZ/etppNVQmqZ0mXHLJylbW8xTTIDZuzBjW9rFsLTB+zJhOliOpyxgQg1jv9Oms\nHjGCZdu2Pad92bZtrB4xgt7p02uqTFI3aFtARMSxEfHdiHgoIh6MiPdX7T0RcUdEPFr9HNu0zYci\nYmlELImI17erNjUcOWwYF82Zw4IIbti4kbs2buSGjRtZEMFFc+YwfNiwukuUVKN2XoPYCXwwM++P\niNHAwoi4A7gQ+E5mXhkRlwOXA5dFxMnAecApwBTgzoh4YWbuOpAv3TV0KJtmzuTpkSP79Zfpb8O2\nbeOo5csZsnNnrXXMnDCB/33hhSxauZL1mzdz2pgxXDR9uuEgqX0BkZlrgDXV9JaIeBiYCswBzqpW\nmw98D7isav9yZj4FPBYRS4HZwI8O5Hs3zZzJ0ccey9jRo4mI/vhV+l1m8tstW3gC6PnFL+ouh+HD\nhnHGCSfUXYakLtORaxARMQM4FfgxMLEKD2hcC51YTU8FftW02aqq7YA8PXJkV4cDQEQwdvTorj/K\nkTS4tT0gImIUcBPwgczc3LwsG+87PaB3nkbE3Ii4LyLu27B1a1/rHGy5HTMQapQ0uLU1ICJiGI1w\nuC4zb66aH4+IydXyycC6qn01cGzT5tOqtufIzHmZOSszZ40bNap9xR+ib/3Xf3HKy17Gi2fP5hOf\n+Uzd5UjSAWvnKKYAvgA8nJn/2LToVuCCavoCYEFT+3kRMTwijgNOBO5tV317bN++nR/ecw9f+8Y3\n+OE997B9+/ZD3ueuXbt4/2WX8fXrr+dnd9/NDTffzENLlvRDtZLUOe0cxfQK4F3AzyNiUdX2YeBK\n4MaIuBhYCZwLkJkPRsSNwEM0RkC970BHMB2opcuXc82VVzJ5yxYmZfJoBDeNHs17L7+cE2bOPOj9\n/uT++zn+uOOYOWMGAOeecw5fv/12Tj7ppH6qXJLar52jmO4G+jrR/to+tvkY8LF21dRs+/btXHPl\nlbxl925OmDLlmfalmzZxzZVX8vdXX83wg3wO0eq1a5k29dnr61MnT+Yn999/yDVLUicN2jupFy5a\nxOQtWzjhqKOe037CUUcxecsWFi5a1MeWkjQ4DNqA+M369UzK8gCqSZmsW7/+oPc9ddIkVq1+9vr6\n6jVrmDJ58kHvT5LqMGgD4pjx41nbx1DTtRFMGD/+oPc969RTWbp8OY+tXMmOHTu48ZZbePPrfXKI\npIFl0AbE6b29rBk9mqWbNj2nfemmTawZPZrTe3sPet9Dhw7ln668kj99xzt46StewZ/PmcMpL3rR\noZYsSR01oN8HcSiOPPJI3nv55Y1RTL/+NZMyWRvBmmoU08FeoN7jjWefzRvPPrufqpWkzhu0AQFw\nwsyZ/P3VV7Nw0SLWrV/P8ePHc3pv7yGHgyQdDgZ1QAAMHz6cl59xRt1lSFLXGbTXICRJ+3ZYBkT2\nMXy1mwyEGiUNboddQAzbto3fbtnS1X+A97wPYther/qUpG5y2F2DOGr5cp4AftPl71rY80Y5SepW\nh11ADNm5syve0iZJA91hd4pJktQ/DAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiS\nigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSUdsCIiKujYh1EbG4qe2G\niFhUfVZExKKqfUZEPNm07HPtqkuS1Jp2vpP6i8A/A1/a05CZ79gzHRFXAZua1l+Wmb1trEeSdADa\nFhCZ+f2ImFFaFhEBnAv8cbu+X5J0aOq6BvFK4PHMfLSp7bjq9NJdEfHKmuqSJFXaeYppX84Hrm+a\nXwO8IDM3RMTpwNci4pTM3Lz3hhExF5gLMK2npyPFStJg1PEjiIgYCrwNuGFPW2Y+lZkbqumFwDLg\nhaXtM3NeZs7KzFnjRo3qRMmSNCjVcYrpbOCRzFy1pyEijomIIdX0TOBEYHkNtUmSKu0c5no98CPg\npIhYFREXV4vO47mnlwBeBTxQDXv9KvCezNzYrtokSfvXzlFM5/fRfmGh7SbgpnbVIkk6cN5JLUkq\nMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCID\nQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAk\nSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSitoWEBFxbUSsi4jFTW0fjYjVEbGo+rypadmHImJpRCyJ\niNe3qy5JUmvaeQTxReANhfZPZWZv9bkNICJOBs4DTqm2uSYihrSxNknSfrQtIDLz+8DGFlefA3w5\nM5/KzMeApcDsdtUmSdq/Oq5BXBoRD1SnoMZWbVOBXzWts6pqkyTVpNMB8VlgJtALrAGuOtAdRMTc\niLgvIu7bsHVrf9cnSap0NCAy8/HM3JWZu4HP8+xppNXAsU2rTqvaSvuYl5mzMnPWuFGj2luwJA1i\nHQ2IiJjcNHsOsGeE063AeRExPCKOA04E7u1kbZKk5xrarh1HxPXAWcD4iFgF/C1wVkT0AgmsAC4B\nyMwHI+JG4CFgJ/C+zNzVrtokSfvXtoDIzPMLzV/Yx/ofAz7WrnokSQfGO6klSUUGhCSpyICQJBUZ\nEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEh\nSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKk\nIgNCklRkQEiSigwISVJR2wIiIq6NiHURsbip7R8i4pGIeCAibomIo6v2GRHxZEQsqj6fa1ddkqTW\ntPMI4ovAG/ZquwN4SWa+FPgF8KGmZcsys7f6vKeNdUmSWtC2gMjM7wMb92r7dmburGbvAaa16/sl\nSYemzmsQFwHfbJo/rjq9dFdEvLKuoiRJDUPr+NKIuALYCVxXNa0BXpCZGyLidOBrEXFKZm4ubDsX\nmAswraenUyVL0qDT8SOIiLgQeDPwzsxMgMx8KjM3VNMLgWXAC0vbZ+a8zJyVmbPGjRrVoaolafDp\naEBExBuAvwHekpnbmtqPiYgh1fRM4ERgeSdrkyQ9V9tOMUXE9cBZwPiIWAX8LY1RS8OBOyIC4J5q\nxNKrgL+LiKeB3cB7MnNjcceSpI5oW0Bk5vmF5i/0se5NwE3tqkWSdOC8k1qSVGRASJKKDAhJUpEB\nIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCS\npCIDQpJUZEBIkooMCElSUUsBERHfaaVNknT4GLqvhRFxJDASGB8RY4GoFo0Bpra5NklSjfYZEMAl\nwAeAKcBCng2IzcA/t7EuSVLN9hkQmflp4NMRcWlmXt2hmiRJXWB/RxAAZObVEfFyYEbzNpn5pTbV\nJUmqWUsBERH/ARwPLAJ2Vc0JGBCS2m7700+zaOVKNmzezLgxY+idPp0jhw2ru6zDXksBAcwCTs7M\nbGcxkrS35evWce2CBUx98kkmAT8FvjFiBBfNmcPMCRPqLu+w1up9EIuBSe0sRJL2tv3pp7l2wQLm\nZPKOnh5e3dPDO3p6mJPJtQsW8NTTT9dd4mGt1SOI8cBDEXEv8NSexsx8S1uqkiRg0cqVTH3ySY7v\n6XlO+/EjRzJ140YWrVzJGSecUFN1h79WA+Kj7SxCkko2bN7c56mLScD6zZs7Wc6g0+ooprvaXYgk\n7W3cmDH8tI9la4HTxozpZDmDTquP2tgSEZurz/aI2BURRrektuqdPp3VI0awbNu257Qv27aN1SNG\n0Dt9ek2VDQ4tBURmjs7MMZk5BhgB/Blwzb62iYhrI2JdRCxuauuJiDsi4tHq59imZR+KiKURsSQi\nXn+Qv4+kw8iRw4Zx0Zw5LIjgho0buWvjRm7YuJEFEVw0Zw7DHeraVnGwI1cj4qeZeeo+lr8K2Ap8\nKTNfUrV9AtiYmVdGxOXA2My8LCJOBq4HZtN4rMedwAszc1cfuwegd/r0vOOKKw6qfkkDx1PVfRDr\nN29mfHUfhOFw8CZccsnCzJy1v/VavVHubU2zR9C4L2L7vrbJzO9HxIy9mucAZ1XT84HvAZdV7V/O\nzKeAxyJiKY2w+FEr9Uk6vA0fNszRSjVodRTTf2ua3gmsoPFH/UBNzMw11fRaYGI1PRW4p2m9Vfi0\nWEmqVaujmN7d31+cmRkRB3x+KyLmAnMBpu01NlqS1H9aHcU0LSJuqS46r4uImyJi2kF83+MRMbna\n52RgXdW+Gji2ab1pVdvvycx5mTkrM2eNGzXqIEqQJLWi1Udt/DtwK40LyFOAr1dtB+pW4IJq+gJg\nQVP7eRExPCKOA04E7j2I/UuS+kmrAXFMZv57Zu6sPl8EjtnXBhFxPY2LzCdFxKqIuBi4EnhdRDwK\nnF3Nk5kPAjcCDwG3A+/b3wgmSVJ7tXqRekNE/HcaQ1EBzgc27GuDzDy/j0Wv7WP9jwEfa7EeSVKb\ntXoEcRFwLo2RR2uAPwcubFNNkqQu0OoRxN8BF2Tmb6FxRzTwSRrBIUk6DLV6BPHSPeEAkJkbgT7v\nopYkDXytBsQRez03qYfWjz4kSQNQq3/krwJ+FBFfqebfjheUJemw1uqd1F+KiPuAP66a3paZD7Wv\nLElS3Vo+TVQFgqEgSYNEq9cgJEmDjAEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJ\nUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQV\nGRCSpCIDQpJUZEBIkoqGdvoLI+Ik4IamppnAR4Cjgb8CflO1fzgzb+tweZKkSscDIjOXAL0AETEE\nWA3cArwb+FRmfrLTNUmSfl/dp5heCyzLzJU11yFJ2kvdAXEecH3T/KUR8UBEXBsRY0sbRMTciLgv\nIu7bsHVrZ6qUpEGotoCIiOcBbwG+UjV9lsb1iF5gDXBVabvMnJeZszJz1rhRozpSqyQNRnUeQbwR\nuD8zHwfIzMczc1dm7gY+D8yusTZJGvTqDIjzaTq9FBGTm5adAyzueEWSpGd0fBQTQEQ8H3gdcElT\n8yciohdIYMVeyyRJHVZLQGTm74Bxe7W9q45aJElldY9ikiR1KQNCklRkQEiSigwISVKRASFJKjIg\nJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KS\nVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElF\nQ+v40ohYAWwBdgE7M3NWRPQANwAzgBXAuZn52zrqkyTVewTxmszszcxZ1fzlwHcy80TgO9W8JKkm\n3XSKaQ4wv5qeD7y1xlokadCrKyASuDMiFkbE3KptYmauqabXAhPrKU2SBDVdgwDOzMzVETEBuCMi\nHmlemJkZEVnasAqUuQDTenraX6kkDVK1HEFk5urq5zrgFmA28HhETAaofq7rY9t5mTkrM2eNGzWq\nUyVL0qDT8YCIiOdHxOg908CfAIuBW4ELqtUuABZ0ujZJ0rPqOMU0EbglIvZ8///LzNsj4ifAjRFx\nMbASOLeG2iRJlY4HRGYuB/6w0L4BeG2n65EklXXTMFdJUhcxICRJRQaEJKnIgJAkFRkQkqQiA0KS\nVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElF\nBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRA\nSJKKOh4QEXFsRHw3Ih6KiAcj4v1V+0cjYnVELKo+b+p0bZKkZw2t4Tt3Ah/MzPsjYjSwMCLuqJZ9\nKjM/WUNNkqS9dDwgMnMNsKaa3hIRDwNTO12HJGnfar0GEREzgFOBH1dNl0bEAxFxbUSMra0wSVJ9\nARERo4CbgA9k5mbgs8BMoJfGEcZVfWw3NyLui4j7Nmzd2rF6JWmwqSUgImIYjXC4LjNvBsjMxzNz\nV2buBj4PzC5tm5nzMnNWZs4aN2pU54qWpEGmjlFMAXwBeDgz/7GpfXLTaucAiztdmyTpWXWMYnoF\n8C7g5xGxqGr7MHB+RPQCCawALqmhNklSpY5RTHcDUVh0W6drkST1zTupJUlFBoQkqciAkCQVGRCS\npCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkq\nMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqWho3QVI0v48sW0b\nN997L6vXr2fq+PG8bfZsjh45su6yDnsGhKSudtfDD3PV/PmcvGMH0yP4RSZ/efvtfPCCC3j1i19c\nd3mHNQNCUtd6Yts2rpo/n/dm8tLRo59pf2D7dq6aP59TP/IRxngk0TZeg5DUtW6+915O3rGDPxg+\nnCd37GDr9u08Wc2fvGMHN917b90lHtY8gpDUtVavX8+xu3ezftMmhu7ezVBgO7D1iCM4NoJVGzbU\nXeJhreuOICLiDRGxJCKWRsTlddcjqT4Txo7lkR07GAWMGTKEkUOGMGbIEEYBj+zYwaSjj667xMNa\nVwVERAwB/gV4I3AycH5EnFxvVZLqcvzEiSyOYMnu3c9pX7J7N4sjOGHixJoqGxy67RTTbGBpZi4H\niIgvA3OAh2qtSlItntqxg9dNmcI1a9dyytNPMx1YCTx4xBG8bsoUtu3YUXeJh7VuC4ipwK+a5lcB\nZ9RUi6SajRszhrGjR3PN1Kl8fd06Vm3fzguOPJL3TZjAf27ezPgxY+ou8bDWbQGxXxExF5hbzT41\n4ZJLFtdZT4vGA+vrLqIFA6VOGDi1WuehiQkwYz3ERNg1DY4cCtvnL1s25AbIdQ8+uALIuoss6Nb+\n3GN6Kyt1W0CsBo5tmp9WtT0jM+cB8wAi4r7MnNW58g6Odfa/gVKrdfav5jo/U3cx+zBQ+nN/uuoi\nNfAT4MSIOC4ingecB9xac02SNCh11RFEZu6MiL8GvgUMAa7NzAdrLkuSBqWuCgiAzLwNuK3F1ee1\ns5Z+ZJ39b6DUap39yzo7KDK78fqOJKlu3XYNQpLUJQZUQETEP0TEIxHxQETcEhHF++zrflxHRLw9\nIh6MiN0R0edIhohYERE/j4hFEXFfJ2usvr/VOuvuz56IuCMiHq1+ju1jvVr6c3/9Ew2fqZY/EBGn\ndaq2A6zzrIjYVPXfooj4SE11XhsR6yKiOIS9i/pzf3V2RX8ekswcMB/gT4Ch1fTHgY8X1hkCLANm\nAs8Dfgac3OE6XwycBHwPmLWP9VYA42vsz/3W2SX9+Qng8mr68tJ/97r6s5X+Ad4EfBMI4I+AH9fw\n37qVOs8CvlHHv8W96ngVcBqwuI/ltfdni3V2RX8eymdAHUFk5rczc2c1ew+N+yT29szjOjJzB7Dn\ncR0dk5kPZ+aSTn7nwWixztr7s/q++dX0fOCtHf7+fWmlf+YAX8qGe4CjI2JyF9bZFTLz+8DGfazS\nDf3ZSp0D3oAKiL1cROP/IvZWelzH1I5UdOASuDMiFlZ3iHejbujPiZm5pppeC/T1hLY6+rOV/umG\nPmy1hpdXp22+GRGndKa0A9YN/dmqgdCffeq6Ya4RcScwqbDoisxcUK1zBbATuK6TtTVrpc4WnJmZ\nqyNiAnBHRDxS/V9Jv+mnOttuX3U2z2RmRkRfQ+/a3p+HufuBF2Tm1oh4E/A14MSaaxrIBnx/dl1A\nZObZ+1oeERcCbwZem9WJvr3s93Ed/WF/dba4j9XVz3URcQuN0wD9+getH+qsvT8j4vGImJyZa6pT\nCev62Efb+7Oglf7pSB/uRyuPsdncNH1bRFwTEeMzs9ueKdQN/blfA6g/+zSgTjFFxBuAvwHekpnb\n+lhtQDyuIyKeHxGj90zTuADfjQ8e7Ib+vBW4oJq+APi9I58a+7OV/rkV+Mtq9M0fAZuaTpl1yn7r\njIhJERHV9Gwafx+68ZVt3dCf+zWA+rNvdV8lP5APsJTGucdF1edzVfsU4Lam9d4E/ILGqI0raqjz\nHBrnRZ8CHge+tXedNEaT/Kz6PNitdXZJf44DvgM8CtwJ9HRTf5b6B3gP8J5qOmi8CGsZ8HP2MbKt\n5jr/uuq7n9EYBPLymuq8HlgDPF39+7y4S/tzf3V2RX8eysc7qSVJRQPqFJMkqXMMCElSkQEhSSoy\nICRJRQaEJKnIgJD6QURsrX5OiYiv7mfdD0TEyKb526KPJxNLdXKYq9SHiBiSmbtaXHdrZo5qcd0V\nNMbuD5g7ajU4eQShQSkiZkTj3SLXRcTDEfHViBgZjXdKfDwi7gfeHhHHR8Tt1QMAfxARL6q2Py4i\nfhSN90/8n732u7iaHhIRn4yIxdUD2y6NiP9B4wa/70bEd6v1VkTE+Gr6f1brL46IDzTt8+GI+Hw0\n3t/x7YgY0eEu0yBkQGgwOwm4JjNfDGwG3lu1b8jM0zLzyzTeLXxpZp4O/C/gmmqdTwOfzcw/oHE3\nbclcYAbQm5kvBa7LzM8AvwZek5mvaV45Ik4H3g2cQeM9B38VEadWi08E/iUzTwGeAP7s0H51af8M\nCA1mv8rMH1bT/xc4s5q+ASAiRgEvB74SEYuAfwX2vHfgFTQetQDwH33s/2zgX7N6h0lm7u/dAWcC\nt2Tm7zJzK3Az8Mpq2WOZuaiaXkgjeKS26rqnuUodtPcFuD3zv6t+HgE8kZm9LW7fTk81Te8CPMWk\ntvMIQoPZCyLiZdX0XwB3Ny/MxuOaH4uIt8Mz70L+w2rxD2k8ERXgnX3s/w7gkogYWm3fU7VvAUYX\n1v8B8NbqWsjzaTxM8QcH/mtJ/cOA0GC2BHhfRDwMjAU+W1jnncDFEbHnKbF7XtP5/mrbn9P328z+\nDfgl8EC1/V9U7fOA2/dcpN4jM+8HvgjcC/wY+LfM/OlB/m7SIXOYqwaliJhB44XyL6m5FKlreQQh\nSSryCEKSVOQRhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVLR/wf/BFOH0okqngAAAABJRU5ErkJg\ngg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np, pandas as pd, matplotlib.pyplot as plt, pydotplus\n", "from sklearn import tree, metrics, model_selection, preprocessing\n", "from IPython.display import Image, display\n", "\n", "dataGroup2['idpst_label'], _ = pd.factorize(dataGroup2['ipdst'])\n", "\n", "y = dataGroup2['idpst_label']\n", "X = dataGroup2[['prediction', 'count']]\n", "\n", "# split data randomly into 70% training and 30% test\n", "X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=0)\n", "\n", "# train the decision tree\n", "dtree = tree.DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)\n", "dtree.fit(X_train, y_train)\n", "\n", "# use the model to make predictions with the test data\n", "y_pred = dtree.predict(X_test)\n", "\n", "# how did our model perform?\n", "count_misclassified = (y_test != y_pred).sum()\n", "print('Misclassified samples: {}'.format(count_misclassified))\n", "accuracy = metrics.accuracy_score(y_test, y_pred)\n", "print('Accuracy: {:.2f}'.format(accuracy))\n", "\n", "\n", "\n", "# visualize the model's decision regions to see how it separates the samples\n", "X_combined = np.vstack((X_train, X_test))\n", "y_combined = np.hstack((y_train, y_test))\n", "plot_decision(X=X_combined, y=y_combined, classifier=dtree)\n", "plt.xlabel('prediction')\n", "plt.ylabel('count')\n", "plt.legend(loc='upper left')\n", "plt.show()\n", "\n", "# Solo tenemos una IP." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAKUAAABRCAIAAAA8SvGCAAAABmJLR0QA/wD/AP+gvaeTAAAMJklE\nQVR4nO2ca0wUVxvH/0PWLZfdtUhl2UVutiSokFXTViCxUlNRA+KFgrZiQawFW1OjtknxSzVNYz80\njf1gmrSxlmqCSjRpGolUUILYtW1M1SpYEsEL4OqyLCyFggLTD+d9x+PsMjPLzqywO79PO8+e51zm\nmXOZmf8chmVZqAQNIc+6Aip+RY13cKHGO7jQ0AdDQ0M1NTWjo6PPqjYqsvPKK68kJiY+OWYpTp48\n+czqpaIMGzZsoEP8VP8eGRkBoK7YA4bCwkLeaK3O38GFGu/gQo13cKHGO7hQ4x1cqPH2H2fPnn39\n9dcNBoPBYFi6dGldXZ1yXuOhxttPVFZWZmdnp6WltbW1tbW1paamZmdnHz16VAkvIeib8ePHj/Ms\niuJegUClq6srIiIiIyNjbGyMWMbGxtLT0/V6vc1mk9eLpqCgoKCggLao/dsfHDp0aGBgoLS0lGEY\nYmEYprS0tL+///vvv5fXSxg13v6ATLqLFi2ijeTwl19+kddLmAnG++HDh9u2bZs1a5ZWq42NjX3v\nvfdsNhv3L/N/7t27t3r1ar1ebzQai4qKHA4HnYZO/O677/J8b926tW7dusjISHJI/rXZbGVlZaTc\nWbNmlZeXP3jwwL3c5ubmFStWGAwGnU6Xk5PT0tLCS0A4duwYsScmJtKluMNIQOB0kQrExcXRxvj4\neAA3b96U10sEenCXOH/bbLaEhASj0VhbW9vf39/Y2JiQkJCUlOR0Ork0JPONGzc2Nzf39vZu27YN\nQElJCZ2PewVo+7Jlyy5evDg4OFhTU0OS3b9/Py4uzmw219fXu1yuurq6mJiYhIQEejIjvpmZmU1N\nTf39/SRNZGRke3s7SUA6jclkevToEef13Xff5eTkiDZ8wmi1WgCPHz+mjY8fPwbw3HPPyetF4z5/\nTyTeZWVlAA4dOsRZTp06BWDPnj1P8gUANDQ0kMP29nYAZrP5qbIF433+/HmefevWrQCOHDnCWX74\n4QcAZWVlPN+amhpemuLiYs5isVgAVFZWcpa0tLSzZ8+KNnzCTO14m81mAF1dXZylu7sbQFpa2pN8\nAQAul4scDg8PA2AY5qmyBeM9MDDAs5tMJgCdnZ2cpaOjA0BsbCzPlx5pSBqTycRZyBUwf/58clhf\nXz9v3jzRVvtCdHQ0r1YsyzqdTgAxMTHyetHIsz5/+PAh6azc1PXCCy8AuHXrFi+lXq8nP8ilynrz\npjU8PJxnsdvtAEhZBPKb1Ifm+eef56UhvoS33nrLZDJduXLl3LlzAL7++usdO3YIV8bH+XvOnDkA\n7t27Rxvv3r0LICUlRV4vYSYSb6PRCKCnp4d3NQ0MDEysEhIh1zsZSwjkN7HT0AtDkmbmzJmcRavV\nbt++HcBXX33V1tZmtVqLioqEi5bSmQTc33jjDQC//fYbbfz9998BZGdny+slAl1jieP5Bx98AODU\nqVO0sbGxcdGiRbzGu58O2kJ68KNHjwYGBmbMmCGQkkDWDT/++CNnISNzeXk5z/enn37ipaHnb5Zl\nHQ5HeHg4wzA5OTkVFRWiTfaRzs7OiIiIzMxM2piZmanT6e7fvy+vF40883d3d3dycrLJZKquru7u\n7na5XD///HNSUhK3OmOlxTs9PR1AU1PTsWPHcnNzBVISyH0Btz6vr683mUwe1+crV668cOFCf38/\nSUOvzznILYNGo+no6BBtsu8cPnwYwI4dO+x2u91u//DDDxmGoa9d1lPDpXgJIE+8WZbt6enZtWtX\nUlLStGnTjEbjqlWrrFYrr9507T2OKH/88YfFYgkPD09PT//777/dU7pXhtx/m81mjUZjNpvJff9T\n7QEAtLe35+bm6vX6iIiIlStXNjc3uzehtbU1JCSEJ+9SlNra2iVLluh0Op1Ol5WV5X5H4LHJol4C\nyBbvSct4Y4M7o6OjJpOJvkwDD/X5+RNOnz4dHx9P5pTgIejizTDMpUuXnE7nvn379uzZ86yr428C\nKt70M3mBZBkZGcnJybm5uXl5eX6p1yRCI55k6sBKeJ4jJU0AE1D9W0WUwI+3lOedz5xr1659/PHH\nc+fODQ0NjY6Ofu211xT6tivw4z0lBnCLxWK1WquqqpxO57lz50ZHR998880vv/xS9oICP95ThcOH\nD1sslrCwsNTU1G+//RbAgQMHZC8loNZrUxfeIJSUlATA5XLJXpDavycjly9fBpCVlSV7zrLFu6+v\nb+fOnbNnzw4NDY2KisrMzPzoo4/IyztCXV1dXl5eZGRkaGjowoULOe0YgVtVdXV15efn6/X6qKio\n4uLivr6+27dv5+XlGQyGmJiYkpKS3t5ed6/xBGvjIay/E22LOz6+IKeLPnPmTGlp6cKFCw8ePCjF\nxTvoh6u+PD9fvXo1gAMHDvzzzz/Dw8M3b95cu3YtnRuANWvW2O32O3fuLFu2DMCZM2foHEh9ioqK\niOSNvHXNyclZu3YtLYLbunWru5eAYI11e6guqr8TbYtC7N+/n1R13bp1f/31l+8ZKvi+xGAwAKiu\nruYsnZ2dvHhzMSD9b/HixU9VBQAleSPutIUoPWj1EuclLFjjxVtUfyfaFuUYHh5ubW3du3dvWFhY\nSUnJ4OCgL7kpGO/NmzeT0xoXF7dly5bjx48PDw+Pl5hsJBEVFfVUVQBQkjduYwKexaMITliwxou3\nqP7Oq7YoBFmcv//++75komC8x8bGTp48mZ+fHxkZSU5WfHz8n3/+Sf51Op0VFRUpKSk6nW682UQu\ny9DQEACNRjNeGo3G811JeHi4lLZ4xGOGPETPIQ25amnZzwTwx/vv0dHRxsbG5cuXg9KAkgn7008/\ndTgc/ytY1nh3d3dzFtH+HRsbC0/6O4lt8Q89PT0AwsLCfMlEwfffDMOQEx0SErJ48WJy6XDr5IsX\nLwLYvXv3jBkzABB5soyQ/AnkiwIBRd+aNWsANDQ00MYLFy5w78KF26IEDMPwPhmpra0F8PLLL8tc\nEh18X/o3gOXLl1+/fn1oaMhms1VUVADIy8sj/5IuUlFR4XQ6HQ7Hrl273Ev3xSIsWON5iervhNui\nBAAWLFjQ0NDgcrkcDkdVVVVUVFRYWJiP8hsFx/Ompqbi4uLExMRp06ZNnz7dYrF8/vnn3DcDDx48\n2LRpU3R0tFarTU1NJQXRYXC/BKVYOKOAYM2jl7D+TrgtSmC1WsvKylJSUkJDQ7VabUJCQnFxsUfZ\nnVe4x5thqTNy4sSJ9evXs9JWH5ME8hxjatXZbxQWFgI4ceIEZ1GfpwYXaryDi6kdb4mCNRWOqf0+\nVJ22vWVq928Vb5m88Z4SujOCL+ozebdXE2XyxnsKjdUTVp/Jv72aKPTN+GT7fsy9hpMTAK2trdzh\n9evX4fbe1h3ft1cTRf1+TBFYlk1OTuYOJarPlNheTRQ13vIjUX2mxPZqoigSb1q0VV5eTowdHR28\nJZiwok0gWwELxLRpohX2m/pMke3VRKEHdxnn7/z8fACffPIJbfzss894MiMpijavLFL2hlMIb9Vn\nvm+3JYr/vvcnas7p06f39fURy+DgoNFovHHjxpOypSnavLJI2RtOObxSnwVUvFmWXbp0KYAvvviC\nHB48eFDgFbKAos0ri5S94fyAFPWZ79urieLXeBOFRkxMzNDQ0MjIyOzZs3/99VfuX4UUbaLaNI+M\nO9uNUzFRpKjPlixZAuDatWu08erVqwCysrK8Km48/Ho/lp2dvWDBApvNVllZWV1dHRsbm5GRwf1b\nWFi4f//+9evX37lzh1RFSp5k3UQGPQB9fX28BBPbG07KufOq7WSrsX///VcgjSLbq4lCN0n25y1k\nyf3SSy/Nnz//9OnT9F/kjHBaYyIqhVhvJltqcpfI+fPneWmk7A0nOwBaWlpoS1VVFdyWIzx8315N\nFH/vzzQyMvLiiy/C0/Q5MUXbO++8A2D79u29vb0tLS3cvohcAil7w8kOpKnP3Jvj4/ZqojyD/bi+\n+eYbAEePHuXZJ6BoY1nWbre//fbbM2fOjIiIWLVqFdlMlJdGWJumBBLVZ+7xZn3bXk2UANSvqQig\n6teCHTXewYUa7+BCjXdwocY7uFDjHVyo8Q4u1HgHFx7eJpGbdJUAwGq10u+owOvfr7766oYNG/xb\nJRUFycjIKCgooC2M+vQ0qFDn7+BCjXdwocY7uPgP9a1rUF1siogAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dot_data = tree.export_graphviz(dtree, out_file=None, filled=False, rounded=False,\n", " feature_names=['count', 'count'], \n", " class_names=['10.3.20.102'])\n", " #class_names=['ip1', 'ip2', 'ip3', ...])\n", "graph = pydotplus.graph_from_dot_data(dot_data) \n", "display(Image(graph.create_png()))\n", "\n", "# Solo muestra uno debido a que solo tenemos una IP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "El ejemplo con más IPs sería:\n", "![](prediction.png)\n", "![](tree.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Gráficas" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly.plotly as py\n", "from plotly import __version__\n", "from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot\n", "from plotly.graph_objs import Scatter, Figure, Layout\n", "init_notebook_mode(connected=True)\n", "\n", "import plotly.offline as offline\n", "import plotly.graph_objs as go\n", "from plotly.graph_objs import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.1 Sin ordenar Tiempos" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ipdstprototimecountpredictionidpst_label
010.3.20.102HTTP2017-03-20 17:08:55310
710.3.20.102HTTP2017-03-20 17:09:30110
810.3.20.102TCP2017-03-20 17:08:50310
910.3.20.102TCP2017-03-20 17:08:5510410
1010.3.20.102TCP2017-03-20 17:09:00204-10
\n", "
" ], "text/plain": [ " ipdst proto time count prediction idpst_label\n", "0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 1 0\n", "7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 1 0\n", "8 10.3.20.102 TCP 2017-03-20 17:08:50 3 1 0\n", "9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1 0\n", "10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataGroup2" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "data": [ { "mode": "lines+markers", "name": "Normal Traffic", "type": "scatter", "x": [ "2017-03-20 17:08:55", "2017-03-20 17:09:30", "2017-03-20 17:08:50", "2017-03-20 17:08:55" ], "y": [ 3, 1, 3, 104 ] }, { "marker": { "color": "rgb(255, 0, 0)", "size": 7, "symbol": "circle" }, "mode": "markers", "name": "Anomalies", "opacity": 0.8, "x": [ "2017-03-20 17:09:00" ], "y": [ 204 ] } ], "layout": { "legend": { "bgcolor": "#E2E2E2", "bordercolor": "#FFFFFF", "borderwidth": 2, "font": { "color": "#000", "family": "sans-serif", "size": 12 }, "traceorder": "normal", "x": 0, "y": 1 }, "title": "Peticiones totales por tiempo", "xaxis": { "rangeslider": {}, "title": "Date", "type": "date" }, "yaxis": { "title": "Nº packets" } } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Normal Traffic\n", "nor = dataGroup2[(dataGroup2['prediction'] == 1)]['count']\n", "#Anomalies\n", "ano = dataGroup2[(dataGroup2['prediction'] == -1)]['count']\n", "\n", "\n", "normal = go.Scatter(\n", " x = dataGroup2[(dataGroup2['prediction'] == 1)]['time'],\n", " y = nor,\n", " mode = \"lines+markers\",\n", " name = \"Normal Traffic\"\n", ")\n", "\n", "\n", "anomalies = dict(\n", " x=dataGroup2[(dataGroup2['prediction'] == -1)]['time'],\n", " y=ano,\n", " name = \"Anomalies\",\n", " mode = 'markers',\n", " marker=Marker(\n", " size=7,\n", " symbol= \"circle\",\n", " color='rgb(255, 0, 0)'\n", " ),\n", " opacity = 0.8)\n", "\n", "data = [normal, anomalies]\n", "\n", "layout = dict(\n", " title='Peticiones totales por tiempo',\n", " xaxis=dict(\n", " title = 'Date',\n", " rangeslider=dict(),\n", " type='date'\n", " ),\n", " yaxis=dict(\n", " title = 'Nº packets'\n", " ),\n", " legend=dict(\n", " x=0,\n", " y=1,\n", " traceorder='normal',\n", " font=dict(\n", " family='sans-serif',\n", " size=12,\n", " color='#000'\n", " ),\n", " bgcolor='#E2E2E2',\n", " bordercolor='#FFFFFF',\n", " borderwidth=2\n", " ) \n", ")\n", "\n", "fig = dict(data=data, layout=layout)\n", "iplot(fig, filename = \"Peticiones totales por tiempo\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.2 Ordenando Tiempos" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ipdstprototimecountpredictionidpst_label
810.3.20.102TCP2017-03-20 17:08:50310
010.3.20.102HTTP2017-03-20 17:08:55310
910.3.20.102TCP2017-03-20 17:08:5510410
1010.3.20.102TCP2017-03-20 17:09:00204-10
710.3.20.102HTTP2017-03-20 17:09:30110
\n", "
" ], "text/plain": [ " ipdst proto time count prediction idpst_label\n", "8 10.3.20.102 TCP 2017-03-20 17:08:50 3 1 0\n", "0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 1 0\n", "9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1 0\n", "10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0\n", "7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 1 0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataGroup3 = dataGroup2.sort_values(by=['time'])\n", "dataGroup3" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "data": [ { "mode": "lines+markers", "name": "Normal Traffic", "type": "scatter", "x": [ "2017-03-20 17:08:50", "2017-03-20 17:08:55", "2017-03-20 17:08:55", "2017-03-20 17:09:30" ], "y": [ 3, 3, 104, 1 ] }, { "marker": { "color": "rgb(255, 0, 0)", "size": 7, "symbol": "circle" }, "mode": "markers", "name": "Anomalies", "opacity": 0.8, "x": [ "2017-03-20 17:09:00" ], "y": [ 204 ] } ], "layout": { "legend": { "bgcolor": "#E2E2E2", "bordercolor": "#FFFFFF", "borderwidth": 2, "font": { "color": "#000", "family": "sans-serif", "size": 12 }, "traceorder": "normal", "x": 0, "y": 1 }, "title": "Peticiones totales por tiempo", "xaxis": { "rangeslider": {}, "title": "Date", "type": "date" }, "yaxis": { "title": "Nº packets" } } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Normal Traffic\n", "nor = dataGroup3[(dataGroup3['prediction'] == 1)]['count']\n", "#Anomalies\n", "ano = dataGroup3[(dataGroup3['prediction'] == -1)]['count']\n", "\n", "\n", "normal = go.Scatter(\n", " x = dataGroup3[(dataGroup3['prediction'] == 1)]['time'],\n", " y = nor,\n", " mode = \"lines+markers\",\n", " name = \"Normal Traffic\"\n", ")\n", "\n", "\n", "anomalies = dict(\n", " x=dataGroup3[(dataGroup3['prediction'] == -1)]['time'],\n", " y=ano,\n", " name = \"Anomalies\",\n", " mode = 'markers',\n", " marker=Marker(\n", " size=7,\n", " symbol= \"circle\",\n", " color='rgb(255, 0, 0)'\n", " ),\n", " opacity = 0.8)\n", "\n", "data = [normal, anomalies]\n", "\n", "layout = dict(\n", " title='Peticiones totales por tiempo',\n", " xaxis=dict(\n", " title = 'Date',\n", " rangeslider=dict(),\n", " type='date'\n", " ),\n", " yaxis=dict(\n", " title = 'Nº packets'\n", " ),\n", " legend=dict(\n", " x=0,\n", " y=1,\n", " traceorder='normal',\n", " font=dict(\n", " family='sans-serif',\n", " size=12,\n", " color='#000'\n", " ),\n", " bgcolor='#E2E2E2',\n", " bordercolor='#FFFFFF',\n", " borderwidth=2\n", " ) \n", ")\n", "\n", "fig = dict(data=data, layout=layout)\n", "iplot(fig, filename = \"Peticiones totales por tiempo\")" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timecountpredictionidpst_label
02017-03-20 17:08:50310
12017-03-20 17:08:5510720
22017-03-20 17:09:00204-10
32017-03-20 17:09:30110
\n", "
" ], "text/plain": [ " time count prediction idpst_label\n", "0 2017-03-20 17:08:50 3 1 0\n", "1 2017-03-20 17:08:55 107 2 0\n", "2 2017-03-20 17:09:00 204 -1 0\n", "3 2017-03-20 17:09:30 1 1 0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataGroup4 = dataGroup2.groupby(['time']).sum().reset_index().dropna()\n", "dataGroup4" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "data": [ { "mode": "lines+markers", "name": "Normal Traffic", "type": "scatter", "x": [ "2017-03-20 17:08:50", "2017-03-20 17:09:30" ], "y": [ 3, 1 ] }, { "marker": { "color": "rgb(255, 0, 0)", "size": 7, "symbol": "circle" }, "mode": "markers", "name": "Anomalies", "opacity": 0.8, "x": [ "2017-03-20 17:09:00" ], "y": [ 204 ] } ], "layout": { "legend": { "bgcolor": "#E2E2E2", "bordercolor": "#FFFFFF", "borderwidth": 2, "font": { "color": "#000", "family": "sans-serif", "size": 12 }, "traceorder": "normal", "x": 0, "y": 1 }, "title": "Peticiones totales por tiempo", "xaxis": { "rangeslider": {}, "title": "Date", "type": "date" }, "yaxis": { "title": "Nº packets" } } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Normal Traffic\n", "nor = dataGroup4[(dataGroup4['prediction'] == 1)]['count']\n", "#Anomalies\n", "ano = dataGroup4[(dataGroup4['prediction'] == -1)]['count']\n", "\n", "\n", "normal = go.Scatter(\n", " x = dataGroup4[(dataGroup4['prediction'] == 1)]['time'],\n", " y = nor,\n", " mode = \"lines+markers\",\n", " name = \"Normal Traffic\"\n", ")\n", "\n", "\n", "anomalies = dict(\n", " x=dataGroup4[(dataGroup4['prediction'] == -1)]['time'],\n", " y=ano,\n", " name = \"Anomalies\",\n", " mode = 'markers',\n", " marker=Marker(\n", " size=7,\n", " symbol= \"circle\",\n", " color='rgb(255, 0, 0)'\n", " ),\n", " opacity = 0.8)\n", "\n", "data = [normal, anomalies]\n", "\n", "layout = dict(\n", " title='Peticiones totales por tiempo',\n", " xaxis=dict(\n", " title = 'Date',\n", " rangeslider=dict(),\n", " type='date'\n", " ),\n", " yaxis=dict(\n", " title = 'Nº packets'\n", " ),\n", " legend=dict(\n", " x=0,\n", " y=1,\n", " traceorder='normal',\n", " font=dict(\n", " family='sans-serif',\n", " size=12,\n", " color='#000'\n", " ),\n", " bgcolor='#E2E2E2',\n", " bordercolor='#FFFFFF',\n", " borderwidth=2\n", " ) \n", ")\n", "\n", "fig = dict(data=data, layout=layout)\n", "iplot(fig, filename = \"Peticiones totales por tiempo\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Mapa anomalías" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "data": [ { "lat": [ "37.459", "29.4889", "37.7758", "43.3701" ], "lon": [ "-122.1781", "-98.3987", "-122.4128", "-8.3288" ], "marker": { "color": "rgb(255, 0, 0)", "opacity": 0.7, "size": 14 }, "mode": "markers", "text": [ "157.240.21.35", "23.253.135.79", "104.244.42.193", "213.60.47.49" ], "type": "scattermapbox" } ], "layout": { "autosize": true, "hovermode": "closest", "legend": { "bgcolor": "#E2E2E2", "bordercolor": "#FFFFFF", "borderwidth": 2, "font": { "color": "#000", "family": "sans-serif", "size": 12 }, "traceorder": "normal", "x": 0, "y": 1 }, "mapbox": { "accesstoken": "pk.eyJ1IjoiYWxleGZyYW5jb3ciLCJhIjoiY2pnbHlncDF5MHU4OTJ3cGhpNjE1eTV6ZCJ9.9RoVOSpRXa2JE9j_qnELdw", "bearing": 0, "center": { "lat": "43.3701", "lon": "-8.3288" }, "pitch": 0, "style": "light", "zoom": 1 }, "showlegend": false } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import re\n", "import json\n", "from urllib.request import urlopen\n", "import plotly.plotly as py\n", "from plotly.graph_objs import *\n", "import numpy\n", "\n", "mapbox_access_token = 'pk.eyJ1IjoiYWxleGZyYW5jb3ciLCJhIjoiY2pnbHlncDF5MHU4OTJ3cGhpNjE1eTV6ZCJ9.9RoVOSpRXa2JE9j_qnELdw'\n", "\n", "# Como el dataframe a analizar no tiene ninguna IP pública se hará uso de 4 IPs públicas elegidas manualmente.\n", "#ips = dataGroup2[(dataGroup2['prediction'] == -1)]['ipdst'].values\n", "ips = ['157.240.21.35','23.253.135.79','104.244.42.193', '213.60.47.49']\n", "\n", "\n", "outputLat = []\n", "outputLon = []\n", "for ip in ips:\n", " #url = 'http://ip-api.com/json/'+ip\n", " url = 'http://freegeoip.net/json/'+ip\n", " response = urlopen(url)\n", " data = json.load(response)\n", " #print(ip+\": \")\n", "\n", " try:\n", " data['message']\n", " print(\"IP Privada\")\n", "\n", " except (KeyError, TypeError) as e:\n", " lat = str(data['latitude'])\n", " latList = str(data['latitude']).split()\n", " lon = str(data['longitude'])\n", " lonList = str(data['longitude']).split()\n", " #print(lat, lon)\n", " outputLat.append(lat)\n", " outputLon.append(lon)\n", " \n", "#debug lat and lon array \n", "# print(outputLat)\n", "# print(outputLon)\n", " \n", "data = Data([\n", " Scattermapbox(\n", " lat=outputLat,\n", " lon=outputLon,\n", " mode='markers',\n", " marker=Marker(\n", " size=14,\n", " color='rgb(255, 0, 0)',\n", " opacity=0.7\n", " ),\n", " text=ips,\n", " ), \n", "])\n", "\n", "#debug data\n", "# print(data)\n", "\n", "layout = Layout(\n", " autosize=True,\n", " hovermode='closest',\n", " showlegend=False,\n", " mapbox=dict(\n", " accesstoken=mapbox_access_token,\n", " bearing=0,\n", " center=dict(\n", " lat=lat,\n", " lon=lon\n", " ),\n", " pitch=0,\n", " style='light',\n", " zoom=1\n", " ),\n", " legend=dict(\n", " x=0,\n", " y=1,\n", " traceorder='normal',\n", " font=dict(\n", " family='sans-serif',\n", " size=12,\n", " color='#000'\n", " ),\n", " bgcolor='#E2E2E2',\n", " bordercolor='#FFFFFF',\n", " borderwidth=2\n", " ), \n", ")\n", "\n", "fig = dict(data=data, layout=layout)\n", "iplot(fig, filename='Montreal Mapbox')\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## 7. Visualizing a Single Decision Tree\n", "https://towardsdatascience.com/random-forest-in-python-24d0893d51c0" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ipdstprototimecountpredictionidpst_label
010.3.20.102HTTP2017-03-20 17:08:55310
710.3.20.102HTTP2017-03-20 17:09:30110
810.3.20.102TCP2017-03-20 17:08:50310
910.3.20.102TCP2017-03-20 17:08:5510410
1010.3.20.102TCP2017-03-20 17:09:00204-10
\n", "
" ], "text/plain": [ " ipdst proto time count prediction idpst_label\n", "0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 1 0\n", "7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 1 0\n", "8 10.3.20.102 TCP 2017-03-20 17:08:50 3 1 0\n", "9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1 0\n", "10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataGroup2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](tree.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "dot_data = tree.export_graphviz(clf, out_file=None,\n", " feature_names=feature_list,\n", " class_names=feature_list,\n", " filled=True, rounded=True,\n", " special_characters=True)\n", "\n", "graph = pydotplus.graph_from_dot_data(dot_data)\n", "graph.write_pdf(\"tree-vis.pdf\")\n", "joblib.dump(clf, 'CART.pkl') \n", "```" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# 8. Detector de IP pública o privada\n", "\n", "URL: https://chrisalbon.com/python/data_wrangling/pandas_create_column_with_loop/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicciones" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10 10.3.20.102 privada\n", "Name: ipdst, dtype: object\n" ] } ], "source": [ "ips = dataGroup3[(dataGroup3['prediction'] == -1)]['ipdst']\n", "\n", "def is_public_ip(ip):\n", " ip = list(map(int, ip.strip().split('.')[:2]))\n", " if ip[0] == 10: return False\n", " if ip[0] == 172 and ip[1] in range(16, 32): return False\n", " if ip[0] == 192 and ip[1] == 168: return False\n", " return True\n", "\n", "for ip in ips:\n", " if is_public_ip(ip):\n", " print(dataGroup3[(dataGroup3['prediction'] == -1)]['ipdst'] + ' publica')\n", " else:\n", " print(dataGroup3[(dataGroup3['prediction'] == -1)]['ipdst'] + ' privada')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/alexfrancow/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:18: SettingWithCopyWarning:\n", "\n", "\n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ipdstprototimecountpredictionidpst_labeltipo
1010.3.20.102TCP2017-03-20 17:09:00204-10privada
\n", "
" ], "text/plain": [ " ipdst proto time count prediction idpst_label \\\n", "10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0 \n", "\n", " tipo \n", "10 privada " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ips = dataGroup3[(dataGroup3['prediction'] == -1)]['ipdst']\n", "dataGroup5 = dataGroup3[(dataGroup3['prediction'] == -1)]\n", "\n", "def is_public_ip(ip):\n", " ip = list(map(int, ip.strip().split('.')[:2]))\n", " if ip[0] == 10: return False\n", " if ip[0] == 172 and ip[1] in range(16, 32): return False\n", " if ip[0] == 192 and ip[1] == 168: return False\n", " return True\n", "\n", "tipo = []\n", "for ip in ips:\n", " if is_public_ip(ip):\n", " tipo.append('publica')\n", " else:\n", " tipo.append('privada')\n", " \n", "dataGroup5['tipo'] = tipo\n", "dataGroup5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 9. Detector pais\n", "\n", "Todas las IPS son privadas por lo tanto no va a haber nada." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Series([], Name: ipdst, dtype: object)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataGroup5[(dataGroup5['tipo'] == 'publica')]['ipdst']" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ips = dataGroup5[(dataGroup5['tipo'] == 'publica')]['ipdst']\n", "\n", "for ip in ips:\n", " #url = 'http://ip-api.com/json/'+ip\n", " url = 'http://freegeoip.net/json/'+ip\n", " response = urlopen(url)\n", " data = json.load(response)\n", " print(ip+\": \")\n", " data['country_name']\n", " country = str(data['country_name'])\n", " print (country)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ejemplo si hubiera una IP publica:\n", "```\n", "92.53.104.78: \n", "Russia\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }