{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# IPYNB - Alejandro Franco Vázquez \n",
"### Código del Proyecto: Implementacion de un IDS con algoritmos de Machine Learning\n",
" Última actualización: [04/06/2018]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"En este notebook de jupyter se mostrarán ejemplos del código empleado en el proyecto. Los datos están simplificados de modo que las salidas mostrarán pocas filas (5) para no hacerlo demasiado complejo. Hay comentarios en las partes más importantes.\n",
"\n",
"*Los bloques de código deberán ser ejecutados por orden para ir guardando las variables.*\n",
"\n",
"**La mayoría de bloques de código están desarrollados por el autor del proyecto.**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00) \n",
"[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]\n"
]
}
],
"source": [
"import sys\n",
"print (sys.version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of contents\n",
"\n",
"- [1. Importación de datos](#1.-Importación-de-datos)\n",
"- [2. Agrupación de datos](#2.-Agrupación-de-datos)\n",
"- [3. Normalización de datos](#3.-Normalización-de-datos)\n",
" * [3.1 Vanilla Python](#3.1-Vanilla-Python)\n",
" * [3.2 Numpy](#3.2-NumPy)\n",
" * [3.3 Visualization](#3.3-Visualization)\n",
" \n",
" \n",
"- [4. Isolation Forest](#4.-Isolation-Forest)\n",
" * [4.1 Plot Isolation Forest](#4.1-Plot-Isolation-Forest)\n",
" \n",
" \n",
"- [5. Gráficas](#5.-Gráficas)\n",
" * [5.1 Sin ordenar tiempos](#5.1-Sin-ordenar-Tiempos)\n",
" * [5.2 Ordenando tiempos](#5.2-Ordenando-Tiempos)\n",
" \n",
" \n",
"- [6. Mapa de anomalías](#6.-Mapa-anomalías)\n",
"\n",
"- [8. Detector de IP pública o privada](#8.-Public-or-Private-IP-detector)\n",
"- [9. Detector pais](#9.-Detector-pais)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Importación de datos"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" no | \n",
" time | \n",
" x | \n",
" info | \n",
" ipsrc | \n",
" ipdst | \n",
" proto | \n",
" len | \n",
" count | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 2017-03-20 17:08:53 | \n",
" 0s | \n",
" null | \n",
" 10.3.20.102 | \n",
" 37.202.7.169 | \n",
" TCP | \n",
" 66 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 2017-03-20 17:08:54 | \n",
" 0s | \n",
" null | \n",
" 37.202.7.169 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 60 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 2017-03-20 17:08:54 | \n",
" 0s | \n",
" null | \n",
" 10.3.20.102 | \n",
" 37.202.7.169 | \n",
" TCP | \n",
" 60 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 2017-03-20 17:08:54 | \n",
" 0s | \n",
" null | \n",
" 10.3.20.102 | \n",
" 37.202.7.169 | \n",
" HTTP | \n",
" 309 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 2017-03-20 17:08:54 | \n",
" 0s | \n",
" null | \n",
" 37.202.7.169 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 60 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" no time x info ipsrc ipdst proto len \\\n",
"0 1 2017-03-20 17:08:53 0s null 10.3.20.102 37.202.7.169 TCP 66 \n",
"1 2 2017-03-20 17:08:54 0s null 37.202.7.169 10.3.20.102 TCP 60 \n",
"2 3 2017-03-20 17:08:54 0s null 10.3.20.102 37.202.7.169 TCP 60 \n",
"3 4 2017-03-20 17:08:54 0s null 10.3.20.102 37.202.7.169 HTTP 309 \n",
"4 5 2017-03-20 17:08:54 0s null 37.202.7.169 10.3.20.102 TCP 60 \n",
"\n",
" count \n",
"0 1 \n",
"1 1 \n",
"2 1 \n",
"3 1 \n",
"4 1 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"df = pd.read_csv('/home/alexfrancow/AAA/ransomware2s.csv')\n",
"df.columns = ['no', 'time', 'x', 'info', 'ipsrc', 'ipdst', 'proto', 'len']\n",
"df['info'] = \"null\"\n",
"df.parse_dates=[\"time\"]\n",
"df['time'] = pd.to_datetime(df['time'])\n",
"\n",
"# Se añade la columna [count] con valor 1 para luego hacer las sumas.\n",
"df['count'] = 1\n",
"\n",
"df.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Agrupación de datos"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 3.25 s, sys: 46.9 ms, total: 3.3 s\n",
"Wall time: 3.3 s\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ipdst | \n",
" proto | \n",
" time | \n",
" count | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:08:55 | \n",
" 3 | \n",
"
\n",
" \n",
" 7 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:09:30 | \n",
" 1 | \n",
"
\n",
" \n",
" 8 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:50 | \n",
" 3 | \n",
"
\n",
" \n",
" 9 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:55 | \n",
" 104 | \n",
"
\n",
" \n",
" 10 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:09:00 | \n",
" 204 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ipdst proto time count\n",
"0 10.3.20.102 HTTP 2017-03-20 17:08:55 3\n",
"7 10.3.20.102 HTTP 2017-03-20 17:09:30 1\n",
"8 10.3.20.102 TCP 2017-03-20 17:08:50 3\n",
"9 10.3.20.102 TCP 2017-03-20 17:08:55 104\n",
"10 10.3.20.102 TCP 2017-03-20 17:09:00 204"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Se hace la agrupación por [ipdst] y [proto], un .resample del [time] en 5 segundos y se hace la suma. \n",
" # También resetea el index y se dropea los valores NaN.\n",
"%time dataGroup2 = df.groupby(['ipdst','proto']).resample('5S', on='time').sum().reset_index().dropna()\n",
"\n",
"# Quitamos los decimales.\n",
"pd.options.display.float_format = '{:,.0f}'.format\n",
"\n",
"# Se depura la salida seleccionando unas columnas y un número de filas.\n",
"dataGroup2 = dataGroup2.head()[['ipdst','proto','time','count']]\n",
"dataGroup2\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Podemos usar:\n",
"```python \n",
"dataGroup2 = dataGroup2[dataGroup2.ipsrc != '10.10.31.101'] \n",
"dataGroup2 =dataGroup2[dataGroup2.ipdst != '10.10.31.101']\n",
"```\n",
"para eliminar la fila que tenga esa IP, esto es útil si queremos sacar nuestra IP de la lista."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Normalización de datos\n",
"http://sebastianraschka.com/Articles/2014_about_feature_scaling.html#about-min-max-scaling "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ipdst | \n",
" proto | \n",
" time | \n",
" count | \n",
" count_n | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:08:55 | \n",
" 3 | \n",
" 0 | \n",
"
\n",
" \n",
" 7 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:09:30 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 8 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:50 | \n",
" 3 | \n",
" 0 | \n",
"
\n",
" \n",
" 9 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:55 | \n",
" 104 | \n",
" 1 | \n",
"
\n",
" \n",
" 10 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:09:00 | \n",
" 204 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ipdst proto time count count_n\n",
"0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 0\n",
"7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 0\n",
"8 10.3.20.102 TCP 2017-03-20 17:08:50 3 0\n",
"9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1\n",
"10 10.3.20.102 TCP 2017-03-20 17:09:00 204 1"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataNorm = dataGroup2.copy()\n",
"\n",
"# Se aplica la fórmula de escalado de variables.\n",
"dataNorm['count_n'] = (dataGroup2['count'] - dataGroup2['count'].min()) / (dataGroup2['count'].max() - dataGroup2['count'].min())\n",
"\n",
"dataNorm = dataNorm.head(5)\n",
"dataNorm"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAGoCAYAAABL+58oAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xt0VOW9//HPNwEJEAhBaUQCghXlUpJQIwXpoaHUgi4K\nyHIVLW1JsdLjD2urZXGwFtFWWnsWpyytrT1UKboqBUUo1LZewI63UrlYBMOlYg0QTkSKQyBIFMjz\n+yOTNEgiuU0m+eb9WitrJjt79n6Sp1Pe7r1nxkIIAgAA8CQp0QMAAABoagQOAABwh8ABAADuEDgA\nAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAO+0SPQBJOu+880Lfvn2bbX/Hjh1T586dm21/\niC/m0xfm0xfm05eWMJ+bN2/+Vwihx9nWaxGB07dvX23atKnZ9heJRJSXl9ds+0N8MZ++MJ++MJ++\ntIT5NLM9dVmPU1QAAMAdAgcAALhD4AAAAHdaxDU4NTlx4oSKiopUVlbW5NtOS0vTjh07mny7SIy2\nPJ8pKSnKzMxU+/btEz0UAGhRWmzgFBUVqUuXLurbt6/MrEm3ffToUXXp0qVJt4nEaavzGULQoUOH\nVFRUpH79+iV6OADQorTYU1RlZWU699xzmzxuAC/MTOeee25cjnICQGvXYgNHEnEDnAXPEQCoWYsO\nHAAAgIYgcD7GFVdc0eTbLCws1NKlS5t8u5UikYjGjx8ft+0DANAatNiLjOtr6ztbtXLnSu0t2as+\naX00ecBkZZ2f1aht/vWvf22i0f1bZeB85StfafJtAwCACi6O4Gx9Z6sWrF+g6PGoMrtmKno8qgXr\nF2jrO1sbtd3U1FRJ/35r6muvvVYDBgzQ1KlTFUKQVPExE7Nnz9aQIUM0bNgw7d69W5KUn5+vFStW\nnLGtOXPm6KWXXlJOTo4WLlx42v6Ki4s1atQo5eTk6FOf+pReeuklSdLTTz+tT3/608rOztaYMWMk\nSRs2bNCIESM0dOhQXXHFFdq1a9cZ4z927JimT5+uYcOGaejQoVq9enWj/h4AALQWLo7grNy5Uukp\n6UrvmC5JVbcrd65s9FGcSn//+99VUFCgCy64QCNHjtQrr7yiz372s5Iq3odl27ZtevTRR/Xd735X\nTz31VK3buffee7VgwYIa11m6dKnGjh2rO+64Q6dOndL777+vgwcP6sYbb9SLL76ofv366b333pMk\nDRgwQC+99JLatWuntWvX6vvf/76efPLJ07Y3f/58ff7zn9fixYt1+PBhDRs2TF/4whcS/kFpAADE\nm4vA2VuyV5ldM09blpaSpr0le5tsH8OGDVNmZsU+cnJyVFhYWBU4119/fdXtrbfe2uB9XH755Zo+\nfbpOnDihSZMmKScnR5FIRKNGjap6n5Pu3btLkkpKSjRt2jS9+eabMjOdOHHijO09++yzWrNmjRYs\nWCCp4qX3e/fu1cCBAxs8RgAAWgMXgdMnrY+ix6NVR24kqaSsRH3S+jTZPjp06FB1Pzk5WSdPnqz6\nvvpLdSvvt2vXTuXl5ZKk8vJyffjhh2fdx6hRo/Tiiy/qj3/8o/Lz83XbbbcpPT29xnXnzp2r0aNH\na9WqVSosLKzx011DCHryySd16aWX1ul3BACgoeJxLWxjuLgGZ/KAyYqWRRU9HlV5KFf0eFTRsqgm\nD5jcLPtfvnx51e2IESMkVVybs3nzZknSmjVrqo6wdOnSRUePHq1xO3v27FFGRoZuvPFGffOb39Rr\nr72m4cOH68UXX9Tbb78tSVWnqEpKStSrVy9J0pIlS2rc3tixY/Xzn/+86nqhv//9703w2wIAcLp4\nXQvbGC4CJ+v8LM0aMUvpHdNVdKRI6R3TNWvErGYrx2g0qqysLN13331VFw7feOONeuGFF5Sdna31\n69dXXfeSlZWl5ORkZWdnn3GRcSQSUXZ2toYOHarly5frO9/5jnr06KFFixZp8uTJys7O1pQpUyRJ\ns2fP1u23366hQ4eedjSpurlz5+rEiRPKysrS4MGDNXfu3Dj+FQAAbVX1a2GTLEnpHdOVnpKulTtX\nJmxMVvlf94mUm5sbNm3adNqyHTt2xO1akab87KK+fftq06ZNOu+885pke6i/tvpZVJXi+VxJhMpX\nLcIH5tOX2uZz+urpyuyaqST793GT8lCuoiNFWjxxcZOOwcw2hxByz7aeiyM4AAAgcfqk9VFJWclp\ny5r6Wtj6InAaqbCwkKM3AIA2LdHXwtbkrIFjZr3N7C9mtt3MCszsO7Hl3c3sOTN7M3abXu0xt5vZ\nbjPbZWZj4/kLAACAxEr0tbA1qcvLxE9K+l4I4TUz6yJps5k9Jylf0roQwr1mNkfSHEn/ZWaDJF0n\nabCkCyStNbNLQgin4vMrAACARMs6PyuhQfNRZz2CE0IoDiG8Frt/VNIOSb0kTZT0SGy1RyRNit2f\nKGlZCOGDEMLbknZLGtbUAwcAAKhNva7BMbO+koZKelVSRgihOPajdyRlxO73krSv2sOKYssAAACa\nRZ3fydjMUiU9Kem7IYQj1d+9N4QQzKxerzc3sxmSZkhSRkaGIpHIaT9PS0ur9Q3xGuvUqVNx2zaa\nX1ufz7KysjOeP61ZaWmpq9+nrWM+fWlV8xlCOOuXpPaSnpF0W7VluyT1jN3vKWlX7P7tkm6vtt4z\nkkZ83PYvu+yy8FHbt28/Y9nH2bcvhJUrQ/jf/6243bev9nWPHDlSr21/nNWrV4ef/OQnTba9luTC\nCy8MBw8erPP6v/nNb8LMmTPPWD5v3rxwwQUXhLlz59Zr/zfccEMoKCg463pnm88lS5aEiy++OFx8\n8cVhyZIl9RpDc3rhhRfC0KFDQ3JycnjiiSeqlu/evTtkZ2eHzp071/i4+j5XWrq//OUviR4CmhDz\n6UtLmE9Jm0Id2qUur6IySQ9L2hFC+Fm1H62RNC12f5qk1dWWX2dmHcysn6T+kjY0uMDqoKhIWr1a\nev99KSOj4nb16orl8TZhwgTNmTMn/jtq5W699Vb98Ic/rNdjHnroIQ0aNKhR+33vvfd0991369VX\nX9WGDRt09913KxqNNmqb8dKnTx8tWbJEX/nKV05b/slPflJbtmxJ0KgAoHWqyzU4IyV9TdLnzWxL\n7OtqSfdKutLM3pT0hdj3CiEUSHpc0nZJT0uaGeL8CqqNG6Vu3aSuXaWkpIrbbt0qljdUYWGhBgwY\noPz8fF1yySWaOnWq1q5dq5EjR6p///7asKGi2ZYsWaKbb75ZkpSfn69bbrlFV1xxhS666CKtWLGi\nxm3n5+frpptu0vDhw3XRRRcpEolo+vTpGjhwoPLz86vWu+mmm5Sbm6vBgwdr3rx5kio+g+rSSy/V\nrl27JFV8gvmvf/3rM/YxZ84cDRo0SFlZWZo1a5Yk6cCBA7rmmmuUnZ2t7Oxs/fWvf5UkTZo0SZdd\ndpkGDx6sRYsW1Tjm3/72txo2bJhycnL0rW99S6dOVUzpb37zG11yySUaNmyYXnnllTr9be+66y5N\nmzZN//Ef/6ELL7xQK1eu1OzZszVkyBCNGzeu6nO78vLyVPkO16mpqbrjjjuUnZ2t4cOH68CBA3Xa\n1zPPPKMrr7xS3bt3V3p6uq688ko9/fTTH/uY5pifmvTt21dZWVlKSuLtqQCgseryKqqXQwgWQsgK\nIeTEvv4UQjgUQhgTQugfQvhCCOG9ao+ZH0L4ZAjh0hDCn+P7K0gHD0qpqacvS02tWN4Yu3fv1ve+\n9z3t3LlTO3fu1NKlS/Xyyy9rwYIF+vGPf1zjY4qLi/Xyyy/rqaee+tgjO9FoVOvXr9fChQs1YcIE\n3XrrrSooKNC2bduq/mt9/vz52rRpk7Zu3aoXXnhBW7duVVpamh544AHl5+dr2bJlikajuvHGG0/b\n9qFDh7Rq1SoVFBRo69at+sEPfiBJuuWWW/S5z31Or7/+ul577TUNHjxYkrR48WJt3rxZmzZt0v33\n369Dhw6dtr0dO3Zo+fLleuWVV7RlyxYlJyfrscceU3FxsebNm6dXXnlFL7/8srZv317nv+1bb72l\n559/XmvWrNFXv/pVjR49Wtu2bVPHjh31xz/+8Yz1jx07puHDh+v111/XqFGjqqLhscce08iRI5WT\nk3Pa17XXXitJ2r9/v3r37l21nczMTO3fv/+s44vH/EyZMuWMcebk5OjRRx+t898NAFA3db7IuCXr\n0UMqLa04clOptLRieWP069dPQ4YMkSQNHjxYY8aMkZlpyJAhKiwsrPExkyZNUlJSkgYNGvSxRxm+\n9KUvVW0rIyPjtP0UFhYqJydHjz/+uBYtWqSTJ0+quLhY27dvV1ZWlq688ko98cQTmjlzpl5//fUz\ntp2WlqaUlBTdcMMNGj9+vMaPHy9Jev7556v+MU1OTlZaWpok6f7779eqVaskSfv27dObb76pc889\nt2p769at0+bNm3X55ZdLko4fP65PfOITevXVV5WXl6cesT/0lClT9I9//KNOf9urrrpK7du315Ah\nQ3Tq1CmNGzdOkmr9255zzjlVv8dll12m5557TpI0depUTZgwock/iyoe81P5qfMAgPhzETiXX15x\nzY1UceSmtFQ6fFj63Ocat90OHTpU3U9KSqr6PikpqdZP8K7+mBD7INM77rij6qhE5X/9V9/WR/dz\n8uRJvf3221qwYIE2btyo9PR05efnq6ysTJJUXl6uHTt2qFOnTopGo8rMzDxtDO3atdOGDRu0bt06\nrVixQg888ICef/75GscbiUS0du1arV+/Xp06dVJeXl7Vfqr/HtOmTdNPfvKT05b//ve/r3GbdVH9\n92/fvr0qX5VX29+2+jrJyclV6zz22GP66U9/esZpnYsvvlgrVqxQr169Trviv6ioqE4f/BeP+Zky\nZUrVqavqbrvtNn39618/65gAAHXn4mR/ZqY0caLUqZN04EDF7cSJFctbgvnz52vLli31ulD0yJEj\n6ty5s9LS0nTgwAH9+c//PtO3cOFCDRw4UEuXLtU3vvGNqmtWKpWWlqqkpERXX321Fi5cWHUUYcyY\nMXrwwQclVby0uqSkRCUlJUpPT1enTp20c+dO/e1vfztjLGPGjNGKFSv07rvvSqq4cHfPnj36zGc+\noxdeeEGHDh3SiRMn9MQTT9T7b9NYU6dOrTp1Vv2r8vqnsWPH6tlnn1U0GlU0GtWzzz6rsWMrPj3k\n9ttvrzpyVV8NmZ/ly5efMc4tW7YQNwAQBy6O4EgVMdNSgqYpZGdna+jQoRowYIB69+6tkSNHSpJ2\n7dqlhx56SBs2bFCXLl00atQo3XPPPbr77rurHnv06FFNnDhRZWVlCiHoZz+rePHbfffdpxkzZujh\nhx9WcnKyHnzwQY0bN06/+tWvNHDgQF166aUaPnz4GWMZNGiQ7rnnHn3xi19UeXm52rdvr1/84hca\nPny47rrrLo0YMULdunVTTk5O8/xx6qF79+6aO3du1em1O++8U927d5ckbdu2TRMmTGjQdhszP7XZ\nuHGjrrnmGkWjUf3hD3/QvHnzVFBQ0KDxAUBbZ5WnURIpNzc3VL5aptKOHTs0cODAuOzv6NGjTX7N\nBmp31113KTU1terVXE2tofM5duxYPfPMM3EYUXykpqaqtLT0jOXxfK4kQiQSqdNpRLQOzKcvLWE+\nzWxzCCH3bOu5OEWFli01NVWLFi3SnXfemeihnKa1xM1bb72lnJwcZWRknH1lAICkFn6KKoSg6h8J\ngdZp1qxZcTt60xZ83Bv9tYQjsADQErXYIzgpKSk6dOgQ/wcO1CKEoEOHDiklJSXRQwGAFqfFHsHJ\nzMxUUVGRDjb23fpqUFZWxj8KjrTl+UxJSTnjbQIAAC04cNq3b69+/frFZduRSERDhw6Ny7bR/JhP\nAMBHtdhTVAAAAA1F4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD\n4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6B\nAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQO\nAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgA\nAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAA\nAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA\n3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4M5ZA8fMFpvZu2b2RrVld5nZ\nfjPbEvu6utrPbjez3Wa2y8zGxmvgAAAAtanLEZwlksbVsHxhCCEn9vUnSTKzQZKukzQ49phfmlly\nUw0WAACgLs4aOCGEFyW9V8ftTZS0LITwQQjhbUm7JQ1rxPgAAADqrTHX4HzbzLbGTmGlx5b1krSv\n2jpFsWUAAADNpl0DH/egpB9JCrHb/5E0vT4bMLMZkmZIUkZGhiKRSAOHUn+lpaXNuj/EF/PpC/Pp\nC/PpS2uazwYFTgjhQOV9M/u1pKdi3+6X1LvaqpmxZTVtY5GkRZKUm5sb8vLyGjKUBolEImrO/SG+\nmE9fmE9fmE9fWtN8NugUlZn1rPbtNZIqX2G1RtJ1ZtbBzPpJ6i9pQ+OGCAAAUD9nPYJjZr+TlCfp\nPDMrkjRPUp6Z5ajiFFWhpG9JUgihwMwel7Rd0klJM0MIp+IzdAAAgJqdNXBCCNfXsPjhj1l/vqT5\njRkUAABAY/BOxgAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAA\nwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAA\ndwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADc\nIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCH\nwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0C\nBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgc\nAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAA\nAIA7BA4AAHCnXaIHAAB1tfWdrVq5c6X2luxVn7Q+mjxgsrLOz0r0sAC0QBzBAdAqbH1nqxasX6Do\n8agyu2YqejyqBesXaOs7WxM9NAAtEIEDoFVYuXOl0lPSld4xXUmWpPSO6UpPSdfKnSsTPTQALRCB\nA6BV2FuyV2kpaactS0tJ096SvQkaEYCWjMAB0Cr0SeujkrKS05aVlJWoT1qfBI0IQEtG4ABoFSYP\nmKxoWVTR41GVh3JFj0cVLYtq8oDJiR4agBborIFjZovN7F0ze6Pasu5m9pyZvRm7Ta/2s9vNbLeZ\n7TKzsfEaOIC2Jev8LM0aMUvpHdNVdKRI6R3TNWvELF5FBaBGdXmZ+BJJD0h6tNqyOZLWhRDuNbM5\nse//y8wGSbpO0mBJF0haa2aXhBBONe2wAbRFWednETQA6uSsR3BCCC9Keu8jiydKeiR2/xFJk6ot\nXxZC+CCE8Lak3ZKGNdFYAQAA6qSh1+BkhBCKY/ffkZQRu99L0r5q6xXFlgEAADSbRr+TcQghmFmo\n7+PMbIakGZKUkZGhSCTS2KHUWWlpabPuD/HFfPrCfPrCfPrSmuazoYFzwMx6hhCKzaynpHdjy/dL\n6l1tvczYsjOEEBZJWiRJubm5IS8vr4FDqb9IJKLm3B/ii/n0hfn0hfn0pTXNZ0NPUa2RNC12f5qk\n1dWWX2dmHcysn6T+kjY0bogAAAD1c9YjOGb2O0l5ks4zsyJJ8yTdK+lxM7tB0h5JX5akEEKBmT0u\nabukk5Jm8goqAADQ3M4aOCGE62v50Zha1p8vaX5jBgUAANAYvJMxAABwh8ABAADuEDgAAMAdAgcA\nALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA\n4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACA\nOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADu\nEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD\n4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6B\nAwAA3CFwAACAOwQOAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQO\nAABwh8ABAADuEDgAAMAdAgcAALhD4AAAAHcIHAAA4A6BAwAA3CFwAACAOwQOAABwh8ABAADuEDgA\nAMAdAgcAALhD4AAAAHfaNebBZlYo6aikU5JOhhByzay7pOWS+koqlPTlEEK0ccMEAACou6Y4gjM6\nhJATQsiNfT9H0roQQn9J62LfAwAANJt4nKKaKOmR2P1HJE2Kwz4AAABq1djACZLWmtlmM5sRW5YR\nQiiO3X9HUkYj9wEAAFAvFkJo+IPNeoUQ9pvZJyQ9J+nbktaEELpVWycaQkiv4bEzJM2QpIyMjMuW\nLVvW4HHUV2lpqVJTU5ttf4gv5tMX5tMX5tOXljCfo0eP3lztsphaNeoi4xDC/tjtu2a2StIwSQfM\nrGcIodjMekp6t5bHLpK0SJJyc3NDXl5eY4ZSL5FIRM25P8QX8+kL8+kL8+lLa5rPBp+iMrPOZtal\n8r6kL0p6Q9IaSdNiq02TtLqxgwQAAKiPxhzByZC0yswqt7M0hPC0mW2U9LiZ3SBpj6QvN36YAAAA\nddfgwAkh/FNSdg3LD0ka05hBAQAANAbvZAwAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4\nQ+AAAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAO\ngQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMAANwhcAAAgDsE\nDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4\nAADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AA\nAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMA\nANwhcAAAgDsEDgAAcIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgQMAANwhcAAAgDsEDgAA\ncIfAAQAA7hA4AADAHQIHAAC4Q+AAAAB3CBwAAOAOgROzomCF8pbkqf/P+ytvSZ5WFKxI9JAAAEAD\ntUv0AFqCFQUrNHvtbHU9p6t6du6pw8cPa/ba2ZKkawdfm+DRAQCA+uIIjqQHNj6grud0VbeO3ZSU\nlKRuHbup6zld9cDGBxI9NAAA0AAEjqT9R/era4eupy3r2qGr9h/dn6ARAQCAxmiTp6iOvF+mKT9a\npn3FZerdM0Up7S/WkXbF6tax27/X+eCIenXplcBRAgCAhorbERwzG2dmu8xst5nNidd+6uuxP/9D\ne/ef0Bsv9ZEd7qfi/e10/I0r9U5xOx0+fljl5eU6fPywjnx4RDdffnOihwsAABogLoFjZsmSfiHp\nKkmDJF1vZoPisa/6WLJE+n/55+nDY51UvL2//rWnh479Xx9dkNpH/d7/srp17KbiYxVHcv77C//N\nBcYAALRS8TpFNUzS7hDCPyXJzJZJmihpe5z2d1ZPPSX98IfSBx9KSclB5SeTtH9HpnoNLFJKag9Z\nUqki+ZFEDQ8AADSheJ2i6iVpX7Xvi2LLEubhh6UOHaTOXU5KQWrf4ZTanXNCB/f00OF/tVfvnimJ\nHB4AAGhCFkJo+o2aXStpXAjhm7HvvybpMyGEm6utM0PSDEnKyMi4bNmyZU0+jup27qy4PXWqXD16\nHNP/vdNRMqn8lOmczu+rT6/26tqJyGmNSktLlZqamuhhoIkwn74wn760hPkcPXr05hBC7tnWi9cp\nqv2Self7PjO2rEoIYZGkRZKUm5sb8vLy4jSUCvfdJx08KB0/Lk2f/rx+tOBTOna0nTqcI/1yyb80\n4apL4rp/xE8kElG8//eD5sN8+sJ8+tKa5jNep6g2SupvZv3M7BxJ10laE6d91ckNN1TETceOUnJy\nkj7R+RO6IL277vtpd00lbgAAcCUuR3BCCCfN7GZJz0hKlrQ4hFAQj33V1fjxFbcPP1xxO2BARfRU\nLgcAAH7E7Y3+Qgh/kvSneG2/IcaPr/iKRKT//M9EjwYAAMQLH9UAAADcIXAAAIA7BA4AAHCHwAEA\nAO4QOAAAwB0CBwAAuEPgAAAAdwgcAADgDoEDAADcIXAAAIA7BA4AAHCHwAEAAO4QOAAAwB0CBwAA\nuEPgAAAAdwgcAADgjoUQEj0GmdlBSXuacZfnSfpXM+4P8cV8+sJ8+sJ8+tIS5vPCEEKPs63UIgKn\nuZnZphBCbqLHgabBfPrCfPrCfPrSmuaTU1QAAMAdAgcAALjTVgNnUaIHgCbFfPrCfPrCfPrSauaz\nTV6DAwBQleYSAAADJklEQVQAfGurR3AAAIBjBA4AAHCnTQWOmY0zs11mttvM5iR6PKg/Mys0s21m\ntsXMNsWWdTez58zszdhteqLHiZqZ2WIze9fM3qi2rNb5M7PbY8/XXWY2NjGjRm1qmc+7zGx/7Dm6\nxcyurvYz5rMFM7PeZvYXM9tuZgVm9p3Y8lb5HG0zgWNmyZJ+IekqSYMkXW9mgxI7KjTQ6BBCTrX3\nYpgjaV0Iob+kdbHv0TItkTTuI8tqnL/Y8/M6SYNjj/ll7HmMlmOJzpxPSVoYe47mhBD+JDGfrcRJ\nSd8LIQySNFzSzNi8tcrnaJsJHEnDJO0OIfwzhPChpGWSJiZ4TGgaEyU9Erv/iKRJCRwLPkYI4UVJ\n731kcW3zN1HSshDCByGEtyXtVsXzGC1ELfNZG+azhQshFIcQXovdPypph6ReaqXP0bYUOL0k7av2\nfVFsGVqXIGmtmW02sxmxZRkhhOLY/XckZSRmaGig2uaP52zr9W0z2xo7hVV5OoP5bEXMrK+koZJe\nVSt9jralwIEPnw0h5KjiVONMMxtV/Yeh4n0PeO+DVor5c+FBSRdJypFULOl/Ejsc1JeZpUp6UtJ3\nQwhHqv+sNT1H21Lg7JfUu9r3mbFlaEVCCPtjt+9KWqWKw6EHzKynJMVu303cCNEAtc0fz9lWKIRw\nIIRwKoRQLunX+vcpC+azFTCz9qqIm8dCCCtji1vlc7QtBc5GSf3NrJ+ZnaOKC6PWJHhMqAcz62xm\nXSrvS/qipDdUMY/TYqtNk7Q6MSNEA9U2f2skXWdmHcysn6T+kjYkYHyoh8p/CGOuUcVzVGI+Wzwz\nM0kPS9oRQvhZtR+1yudou0QPoLmEEE6a2c2SnpGULGlxCKEgwcNC/WRIWlXxHFQ7SUtDCE+b2UZJ\nj5vZDZL2SPpyAseIj2Fmv5OUJ+k8MyuSNE/Svaph/kIIBWb2uKTtqnh1x8wQwqmEDBw1qmU+88ws\nRxWnMQolfUtiPluJkZK+JmmbmW2JLfu+WulzlI9qAAAA7rSlU1QAAKCNIHAAAIA7BA4AAHCHwAEA\nAO4QOAAAwB0CBwAAuEPgAAAAd/4/ogjKQ4+3cRIAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"\n",
"from matplotlib import pyplot as plt\n",
"\n",
"def plot():\n",
" plt.figure(figsize=(8,6))\n",
"\n",
" plt.scatter(dataGroup2['count'], dataGroup2['count'],\n",
" color='green', label='input scale', alpha=0.5)\n",
" \n",
" plt.scatter(dataNorm['count_n'], dataNorm['count_n'],\n",
" color='blue', label='min-max scaled [min=0, max=1]', alpha=0.3)\n",
" \n",
" plt.legend(loc='upper left')\n",
" plt.grid()\n",
"\n",
" plt.tight_layout()\n",
"\n",
"plot()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.1 Vanilla Python"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.74299773604806973, -0.76776432724967203, -0.74299773604806973, 0.50771511963284766, 1.7460446797129638]\n",
"[0.009852216748768473, 0.0, 0.009852216748768473, 0.5073891625615764, 1.0]\n"
]
}
],
"source": [
"# Standardization\n",
"\n",
"x = dataGroup2['count']\n",
"mean = sum(x)/len(x)\n",
"std_dev = (1/len(x) * sum([ (x_i - mean)**2 for x_i in x]))**0.5\n",
"\n",
"z_scores = [(x_i - mean)/std_dev for x_i in x]\n",
"print(z_scores)\n",
"# Min-Max scaling\n",
"\n",
"minmax = [(x_i - min(x)) / (max(x) - min(x)) for x_i in x]\n",
"print(minmax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2 NumPy"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.74299774 -0.76776433 -0.74299774 0.50771512 1.74604468]\n",
"[ 0.00985222 0. 0.00985222 0.50738916 1. ]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"# Standardization\n",
"\n",
"x_np = np.asarray(x)\n",
"z_scores_np = (x_np - x_np.mean()) / x_np.std()\n",
"print(z_scores_np)\n",
"\n",
"# Min-Max scaling\n",
"\n",
"np_minmax = (x_np - x_np.min()) / (x_np.max() - x_np.min())\n",
"print(np_minmax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 Visualization"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAqYAAAFgCAYAAABpIrurAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm4JHV97/H3lxmIDoNsgwgIDG6JKIEALvGiwSUqRKNJ\nDEFHQGKcCMFrrlkkEm06BkO4TxIlXjC4RJZRwA1FxYXoiAsYGEVGRAPqDKuygwwoDPO9f9TvQJ/m\nLH1m+pz+dZ/363nm4XRXddX311X97U9XVTeRmUiSJEmDttmgC5AkSZLAYCpJkqRKGEwlSZJUBYOp\nJEmSqmAwlSRJUhUMppIkSaqCwXTIRTvWRDteNOg6ZlO0I6MdT+rj8h56zqIdb4t2fKBfy+5Yx/ui\nHW/v93Il1ae2PhzteG6040eDrmOQoh0rox1/Vv5eFu340qBrUm8WDrqA+SjasQbYEXgQWAdcAByT\nrbxnmsd9GLg+W/n3s11jr6IdS4GfAptnK9cPtpqZy1a+a1OXEe14HfBn2coDOpb7xk1drqTZM4x9\nONpxPNAC/jJb+Z6O+98MvBtoZyuPz1Z+Hfj1TVjPGmBnYOds5a0d938X2AfYI1u5ZmOXP9eylSuA\nFYOuQ73xiOngvDxbuRjYF9gfqCZsjopohx+8JE1lGPvw/wCHd913RLm/n34KvHrsRrRjL2BRn9ch\nPYJv3AOWrbwh2nEB8PRoxx8Dx2Yr9xubHu14C/A7wOeAZUBGO/4S+Gq28uVltn2iHf8K7A58ATgi\nW/nL8vg3AG8FtgO+AbwxW3ljmZbAUcBfATvQfKI8JluP/N+BRTueCZwCPAW4D1iRrXwLcFGZ5c5o\nB8DvAjcD7wf2BhL4IvAX2co7y7LWAO+laa4T1fw3wFvKY8e9UUQ7fg/4R+CJwF3AB7OVx5dpS2ma\n6Z/RHFVYAzwv2nFYecxi4F+7lnc88KRs5WujHe8FXtcx+VHAP2Yrj492HAu8AXgscB1wXLbyU9GO\npwLvAzaPdtwDrM9WbtN9VKVf20FS/w1LHy4uBfaLdjwtW3lltONpNL3q0o56DwTOylY+vtxewxQ9\ndxJnlvn/vdw+AjiDppeOrWeqfvwnwInA3tnKu6MdBwH/CeyVrbylc0XRjkcBHwAOAhYAVwMvy1b+\nPNqxHfAvwEuARwNfy1a+MtqxbanxWTRZ5pvleb2+eyDdZ7Wmes6jHQuAk8p4f1HW/e8M6VnBYeQR\n0wGLduwKHAx8F/gMsEcJO2MOA87IVp5G8+I5KVu5uKMZAhwCvBTYA/hNSriKdrwA+KcyfSdgLXB2\nVwkvA55RHncIzYt/Iu8B3pOtfAxNEzq33P+88t9tSl0XA1HWuzPwVGBX4Piu5U1W80uBv6YJuE8G\nuq/bWkfTLLcBfg84Ktrxyq55fqes9yXRjj2BU2mex52B7YHHTzTAbOUxZQyLgQOAO4BPl8k/Bp4L\nbA20gbOiHTtlK68C3ghcXB67Tfdy+7wdJPXZEPXhMWOhEZoAdWYPw5ywvilcAjwm2vHUEtYOBc7q\nmmfSfpytPAf4FnBytGN74IM04fAWHukImt66K02PfiPNAZCxsS4CnkZzYODfyv2b0QTd3YHdyvzv\nnWZMnSZ7zt9AE5D3oTmS3v3+olnmEdPBOS/asZ7mU+bngHdlK38V7TgHeC1wXPkkvBT47DTLOrnj\n0/f5NC8oaD7Zfyhb+Z0y7e+AO6IdSzuuDzqxHMm8M9rx1fLYL0ywjgeAJ0U7lpRrji6ZrJhs5TXA\nNeXmLeUoQqvHmg8B/jNb+f0y7Xg6TidlK1d2LOOKaMdHaYLoeR33H5+tXFce/yrgs9nKi8rttwPH\nTFZ7mWeHsrw3ZSu/W9b7sY5ZzinP5TN5OLhOpZ/bQVL/DFsfHnMW8I1ox9/TBMb/RRN+N6a+qYwF\n4K8BVwE3dE7soR//BXAFsBI4P1s52XP4AE0gfVK28gpgValzJ5qQuH228o4y79fKum8DPjG2gGjH\nCcBXexjTmMme80NoDsJcX5Z7IvDCGSxXm8hgOjivzFZeOMH9pwMfLQ3nMODcbOWvplnWzzr+vpfm\nyCDlv98Zm5CtvCfacRuwC81p7okeu3iSdbwe+Afgh9GOn9JcZD9hk4l27EhzhPW5wFY0n2zv6Jpt\nqppXdUxb27XsZ9GcHno6sAXwa0BnaITmVPuYnTtvZyvXledgQtGOzYGPAx/JVp7dcf/hNJcXLC13\nLQaWTLacLv3cDpL6Z9j68Ngyro12XAO8C7g6W3lduZRqxvWVSxieW+7/8/JFoTFn0lyutQfNafxx\npuvH2co7ox0fo+mdfzRFbWfSHC09O9qxDU3wPq7cd3tHKO1c9yKao6cvBbYtd28V7ViQrXxwinWN\nmew5H/ee0fW35oCn8iuTrbwEuJ+mUbyG8adoZnrN4Y00pzkAiHZsSfOp9IZJHzF5XVdnK19Ncyrl\nn4GPl+VNVNO7yv17lVP/r6U5vd+Lm2ia0ZjduqZ/hOZU267Zyq1pru/sXnZnTeOWV5rZ9lOs/9+B\nu+m4tjXasTvNNbPH0Hxy3wb4fsd6p9sufdsOkmZfrX24yxk010g+IjDORLbyoLFLmLpCKdnKtTTX\n7R8MfHKCh0/Zj6Md+wB/CnwUOHmKGh7IVrazlXsCz6E5zX44TSjcroTVbn9F88sDzyrvM2OXlfX6\nXjOZmxh/udeuk82o2WEwrdMZNNfKPJCt/EbH/T8HnjCD5XwUODLasU+049doAuO3N+ZnPqIdr412\n7JCt3ADcWe7eANxS/ttZ11bAPcBd0Y5dgL+ZwarOBV4X7dizhMjuSwC2ovkE/cvyhazXTLO8jwMv\ni3YcEO3Yguao74T7fbTjz2lOQy0r4xwzFsBvKfMdSXOEYMzPgceX5U+kb9tB0pyprg93OQd4MQ9f\n7z9bXg+8YOzyqC6T9uPyhaazgLcBRwK7RDuOnmgF0Y7nRzv2Ktey3k1zan9DtvImmp/xOiXasW20\nY/Nox1gA3YrmutI7yxekut8rNta5wJujHbuUQPzWPi1XPTKY1ulMmuDTfaH5B4E9ox13RjvOe+TD\nxiunqN5Ocx3OTTRfWjp0I2t6KXBlNN88fw9waLbyvmzlvcAJwDdLXc+m+XLQvjx83dZEn7Qnq/kC\nmt/j+wrNdapf6ZrlaOAfoh2/AN7BNE05W3klzXVOH6F5Du4AHvGtzeLVNG84N0Y77in/3pat/AHN\nNzMvpnlT2ovmG6BjvgJcCfws2nFr90L7vB0kzY0a+3Dncu/LVl6Yrbxv+rk3aT0/zlZeNsnkqfrx\nPwHXZStPLZdBvBb4x2jHkydYzuNoDiLcTXMt69d4+Cj1YTRB9Yc0v/jyl+X+d9N8S3/sOw/9uib/\n/cCXaK6N/S7weWA9ze/dag5E+os01Yl2PJrmBbhvtvLqQdcjSfONfVgA5Weu3pet3H3amdUXfvmp\nTkcBl9oMJWlg7MPzUPlA8nyao6Y70lwi8KmBFjXPGEwrE80PIQf+dpokDYR9eF4LmsvRzqG5hvVz\nNJcpaI54Kl+SJElV8MtPkiRJqsKMTuUvWbIkly5dOkul9N+6devYcsstB11G3zmu4TKq44LhGNuq\nVatuzcwdBl1Hv/SjDw/DdtsYjmt4jOKYwHFNpddePKNgunTpUi67bLJfjajPypUrOfDAAwddRt85\nruEyquOC4RhbRKydfq7h0Y8+PAzbbWM4ruEximMCxzWVXnuxp/IlSZJUBYOpJEmSqmAwlSRJUhUM\nppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJ\nqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFU\nkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElV\nMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqS\nJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoG\nU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIk\nVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVWY9WC6YvUKlr57KdGOh/4t\n/IeFHP25o2d71RoiK1avYPXNq9msvRlL372UFatXDLokVWCsf7hf9MfY87nqplUsftdiFvzDAnuy\nxvE1p06dPWOu9odZDaYrVq9g+fnLWXvX2nH3P5gPcuplp9oIBTy8n9z/4P0kydq71rL8/OU2xHmu\ns3+4X2y67n687oF1bMgNgD1ZDV9z6tTdM+Zqf5jVYHrcfx3HvQ/cO+n001adNpur15CYaD+594F7\nOe6/jhtQRaqB+0V/TdePwZ483/maU6dB7Q+zGkyvvevaKac/mA/O5uo1JCbbT6bbfzTa3C/6q5fn\nzZ48v/maU6dB7Q+zGkx323q3KacviAWzuXoNicn2k+n2H40294v+6uV5syfPb77m1GlQ+8OsBtMT\nXngCizZfNOn05fstn83Va0hMtJ8s2nwRJ7zwhAFVpBq4X/TXdP0Y7Mnzna85dRrU/rBwNhe+bK9l\nQHOdQucXoBbEApbvt5xTfu+U2Vy9hsTYfnL7VbcTBLttvRsnvPCEh+7X/NTZP66961r3i03U+XwC\nbLn5lty3/j425AZ7sgBfcxqvu2fsvvXuc7I/zGowhWZg7tSazrK9lrHytpVsOGTDoEtRRewf/TX2\nfK5cuZJ7Xn3PoMtRhXzNqVNnz1jz6jVzsk5/YF+SJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJ\nklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWD\nqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmS\nqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCV\nJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUkSVIV\nDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIk\nSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLB\nVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKkRm9j5zxC3A2tkrp++WALcOuohZ4LiGy6iOC4ZjbLtn\n5g6DLqJf+tSHh2G7bQzHNTxGcUzguKbSUy+eUTAdNhFxWWbuP+g6+s1xDZdRHReM9thG2ahuN8c1\nPEZxTOC4+sFT+ZIkSaqCwVSSJElVGPVgetqgC5gljmu4jOq4YLTHNspGdbs5ruEximMCx7XJRvoa\nU0mSJA2PUT9iKkmSpCFhMJUkSVIVRiqYRsR2EfHliLi6/HfbSeZbExGrI+LyiLhsruvsVUS8NCJ+\nFBHXRMSxE0yPiDi5TL8iIvYdRJ0z1cO4DoyIu8r2uTwi3jGIOmciIj4UETdHxPcnmT6U2wp6GtvQ\nba/5Yh73kGVlPKsj4lsRsfcg6pyJ6cbUMd8zImJ9RLxqLuvbWL2Mq/SQyyPiyoj42lzXuDF62Ae3\njojzI+J7ZVxHDqLOmajmfSwzR+YfcBJwbPn7WOCfJ5lvDbBk0PVOM5YFwI+BJwBbAN8D9uya52Dg\nAiCAZwPfHnTdfRrXgcBnB13rDMf1PGBf4PuTTB+6bTWDsQ3d9poP/+Z5D3kOsG35+6Dax9XLmDrm\n+wrweeBVg667T9tqG+AHwG7l9mMHXXefxvW2sQwC7ADcDmwx6NqnGVcV72MjdcQUeAVwevn7dOCV\nA6xlUz0TuCYzf5KZ9wNn04yv0yuAM7JxCbBNROw014XOUC/jGjqZeRFN45nMMG4roKexqU7ztodk\n5rcy845y8xLg8XNc40z12hffBHwCuHkui9sEvYzrNcAnM/NagMwchrH1Mq4EtoqIABbT9ND1c1vm\nzNTyPjZqwXTHzLyp/P0zYMdJ5kvgwohYFRHL56a0GdsFuK7j9vXlvpnOU5tea35OOVVwQUQ8bW5K\nm1XDuK1mYtS21yiY7z1kzOtpjvLUbNoxRcQuwB8Ap85hXZuql231FGDbiFhZ3pMPn7PqNl4v43ov\n8FTgRmA18ObM3DA35c2aOekXC/u9wNkWERcCj5tg0nGdNzIzI2Ky38I6IDNviIjHAl+OiB+WTwqq\nw3doTuvcExEHA+cBTx5wTZqc20tViojn0wTTAwZdSx+8G3hrZm5oDsKNjIXAfsALgUcDF0fEJZn5\nP4Mta5O9BLgceAHwRJqs8fXMvHuwZdVv6IJpZr5osmkR8fOI2CkzbyqHlyc8JZCZN5T/3hwRn6I5\nLF9bML0B2LXj9uPLfTOdpzbT1tz5ws3Mz0fEKRGxJDNvnaMaZ8MwbquejOj2GgXztocARMRvAh8A\nDsrM2+aoto3Vy5j2B84uoXQJcHBErM/M8+amxI3Sy7iuB27LzHXAuoi4CNgbqDmY9jKuI4ETs7k4\n85qI+CnwG8B/z02Js2JO+sWoncr/DHBE+fsI4NPdM0TElhGx1djfwIuBCb+BNmCXAk+OiD0iYgvg\nUJrxdfoMcHj5ptyzgbs6LmWo1bTjiojHletyiIhn0uyntb+xTGcYt1VPRnR7jYL53EN2Az4JHDYk\nR96mHVNm7pGZSzNzKfBx4OjKQyn0tg9+GjggIhZGxCLgWcBVc1znTPUyrmtpjgITETsCvw78ZE6r\n7L856RdDd8R0GicC50bE64G1wCEAEbEz8IHMPJjmutNPlffRhcBHMvMLA6p3Upm5PiKOAb5I8w3A\nD2XmlRHxxjL9fTTfzDwYuAa4l+YTWtV6HNergKMiYj1wH3Bo+dRZrYj4KM2305dExPVAC9gchndb\njelhbEO3veaDed5D3gFsD5xSev36zNx/UDVPp8cxDZ1expWZV0XEF4ArgA0079U1Hix6SI/b653A\nhyNiNc232N9a+1mkWt7H/F+SSpIkqQqjdipfkiRJQ8pgKkmSpCoYTCVJklQFg6kkSZKqYDCVJElS\nFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJ0DEayJ4EWD\nrkOPFMGHI/jHPi7v+AjOKn/vFsE9ESzo1/LLcp8bwY/6uUxplNmD+yeCKyM4cNB1DEoEr4vgGx23\n74ngCYOsadQYTCdRGtl9Zaf7eQkwi3t4XF+DzkyUUJQRHNJx38Jy39JZWN+BEWwoz9EvIvhRBEf2\neR0rI/izfi5zrmRybSaLM3lwU5ZTtt+TOpb79Ux+fdMrlOplD+5pfX3pwREsLTV+t+v+JRHcH8Ga\nsfsyeVomKzey3rHn581d97+53H/8xix3kEqP/8mg6xglBtOpvTyTxcC+wP7A3w+4nl7cDrT7fZRu\nCjeW5+gxwFuB90ew5xyte2AiiAhfP9IsswdPr589eFEET++4/Rrgp5taYJf/AQ7vuu+Icr/kG2sv\nMrkBuAB4egR/HMGqzukRvCWCT0ewHFgG/G35BHt+x2z7RHBFBHdFcE4Ej+p4/BsiuCaC2yP4TAQ7\nd0zLCN4YwdUR3BnB/4sgpij3C8D9wGsnmth9BHKC0xIZwdFlfb+I4J0RPDGCb0VwdwTnRrDFBM9R\nZnIecAewZwSfi+BNXeu+IoI/mKCmR0VwVgS3lTFeGsGOEZwAPBd4b3k+31vmf08E15V6VkXw3I5l\nHV9qPKPUf2UE+3dM/60IvlOmnQPjtsO2EXw2glsiuKP8/fiu5+6ECL4J3As8IYI9IvhaWd6XgSUd\n848dhVgYwW+XMYz9++XYUYgInhnBxWXsN0Xw3rHnOIKLyuK+Vx73J+UoyfUd63lqqe3OMt7f75j2\n4bLPfK7U+O0InvjIPUOqlz14dntwhzNpQuKYw4Ezupbx0GUR0/XbSVxKE4CfVpbxNJo+fGnHOibt\nxRFsF8H1Eby83F5ctl132B1b1usi+Emp76cRLOuY9oYIrirTfhDBvuX+YyP4ccf9kz5n0XFGa7p+\nG8GLozmqfVcEp5T3jqE8IzibDKY9iGBX4GDgu8BngD0ieGrHLIcBZ2RyGrACOKkc3n95xzyHAC8F\n9gB+E3hdWfYLgH8q03cC1gJnd5XwMuAZ5XGHAC+ZotwE3g60Ith8xoNtvATYD3g28LfAaTRNdlfg\n6cCrux8QwWblxbsNsBo4nY7GHMHewC7A5yZY3xHA1mX52wNvBO7L5Djg68Ax5fk8psx/KbAPsB3w\nEeBjnW8ywO/TPIfb0GyvsUC7BXAeTfPdDvgY8Ecdj9sM+E9gd2A34L6xx3Y4DFgObEWzrT4CrKIJ\npO9kfFN/SCYXlzEsBrYFvg18tEx+EPg/ZRm/DbwQOLo87nllnr3L48/pXG7ZxucDXwIeC7wJWBEx\n7lT/oUC7rPca4ISJapRqZQ+e9R485izg0AgWRHPUdTFNr5rKhP12Gmfy8FHTI8rtTpP24kxuB/6U\n5sjwY4F/Ay7PHB+gASLYEjgZOCiTrYDnAJeXaX8MHF/qeEwZx23loT+mOSiyNU3vPCuCnXoYF0zS\nbyNYAnwc+Dua97kflXrUxWA6tfMiuBP4BvA14F2Z/Ao4h/KCL5/2lgKfnWZZJ2dyY3lRnU8TrKD5\ndP+hTL5Tlv13wG9HjLse6cRM7szkWuCrHY+dUCafAW6Bjf4kdlImd2dyJfB94EuZ/CSTu2iOWvxW\nx7w7l+foVqAFHJbJj2ga1FMieHKZ7zDgnEzun2B9D9C8UJ+UyYOZrMrk7inGd1Ymt2WyPpN/AX4N\nxgWxb2Ty+XJt55nA3uX+ZwObA+/O5IFMPk7Hp/SyzE9kcm8mv6BpKL/TtfoPZ3JlJutp3sSeAbw9\nk19lchGMO0IzmZOBXwDHlfWuyuSSMp41wH9MsN7JPJvmzePETO7P5Cs0+2LnG9enMvnvUvMKptl/\npIrYg+emB4+5niYwvYgmsHUHxolM1m+nchbw6hLcDy23HzJdL87kSzQHFv6L5gPLn0+xrg00R9of\nnclN5TmFZtuclMml5WjzNZmsLcv/WNlXNpSDAVcDz+xhXDB5vz0YuDKTT5ZpJwM/63GZ84rBdGqv\nzGSbTHbP5OhM7iv3nw68ppzOOQw4tzS0qXTugPfCQxfx70zzCR2ATO6h+dS2Sw+Pncrf0wSfR003\n4wR+3vH3fRPc7lz/jeU52i6TfTKbIw2Z/JLy5hHNtZivZvImdybwReDsCG6M4KSpjjRE8Nfl9Mtd\npSFvTccpdB75fD0qgoU0z/UNmWTH9Iee+wgWRfAfEayN4G7gImCbGH+t2HUdf+8M3JHJuomWN0nt\nfw4cCLwmkw3lvqeUU1U/K+t9V9d4prIzcN3Ysjpq2NT9R6qBPXhuenCnM2iOJvc6/4T9NoJl8fCl\nSxd0PqAE/Gtoet3VmeP6aq+9+DSao8cfznzoSOc4pTf/Cc1ZuJvKKfbfKJN3pTky+ggRHB7B5dFc\nunFnWU+vPXmq/eyhcZb3oevRIxhMN0Iml9BcQ/RcmovDO1+8OeGDJncjzekK4KFTD9sDN2xijV+m\neeEf3TVpHbCo4/bjNmU90zid5mjEC4F7M7l4opnK0ct2JnvSnNp4GQ+f5hn3fEZzPenf0pxO2zaT\nbYC7YMprvsbcBOwS468P263j77+iOfL6rEweAw+dRu+cv7Oem4BtyzabaHnjlNrfCbyi64jwqcAP\ngSeX9b6tx/FAs//sGuO/iLUbm7j/SDWzB/espx7c5RPA7wE/KQFyo2SyolxOsTiTgyaY5QyanvuI\nU/BM04tLQD2tPPbo6PjVkgnq+GImv0tzhuuHwPvLpOvgkdfbR7B7mecYYPvyHvN9eu/Jk7kJxn1n\nITpv62EG0413Bs01Lw9kPnzhOs0n25n8ptlHgSMj2CeCX6P5BPntckp3Ux1HE+I6XQ78YflE+iTg\n9X1Yz4RKE9wA/AtTfPKO4PkR7FWazd00p/bHjgB2P59bAetpTpMtjOAdNNcH9eLi8tj/HcHmEfwh\n40/PbEVzNOLOCLajOS021fjWApfRfAN3iwgOgHHXtHWOcVfgXODwzEd8+3QrmnHfUz7NH9U1fap9\n6ts0n8r/tozpwFJD9zVy0qixB0+j1x7c9Zh1wAvY+MsQenUO8GKavthtul78NpoPIH8K/F/gjJjg\nVxCi+RLtK8qHjV8B9/Dwe8sHgL+OYL9ofmXlSSWUblmWfUtZxpEw7pcKNtbngL0ieGU5g/cXzO6H\nkqFlMN14Z9LsrGd13f9Bmm9E3hnBedMtJJMLaS6U/wTNJ6on0lxzs8ky+Sbw3113/xvNkYaf03ya\nXtGPdU3hDGAvHvk8dXoczUXhdwNX0VxLNtZE3wO8KppvZp5Mc8r/CzQ/LbIW+CXjT69Pqlxb9Yc0\np6lupznF88mOWd4NPJrmWq1Lynqm8xrgWWV5LSb+9A/NEYsdgY93nN4au9bpr8tyfkHzSf2crsce\nD5xe9qlDOieUMb0cOKjUfQpN+P1hD7VLw8we3JteevA4mVyWOfFp7n7J5L5MLuy4PKPTpL04gv2A\nt9D0uQeBf6YJksdOsJzNyrw30vTo36F88M/kYzTXrn6EpveeB2yXyQ9ogvzFNNtoL+CbfRjvrcAf\nAyfRXCqyJ82BjekuQZl3InOmZz0EEMGjgZuBfTO5etD11Cqan/BYnskBg65F0uiwB/fGHlyncvnV\n9cCyTL466Hpq4hHTjXcUcKkNcXIRLKK5vuq0QdciaeTYg6dhD65LBC+JYJtyycjYdwkuGXBZ1Vk4\n6AKGUTT5+rYKAAALFklEQVQ/jB7AKwdcSrUieAnNafILaU6VSFJf2IOnZw+u0m/TbIstgB/Q/OrE\nRJcyzGueypckSVIVPJUvSZKkKszoVP6SJUty6dKls1RK/61bt44tt9xy+hmHjOMaLqM6LhiOsa1a\nterWzNxh0HX0Sz/68DBst43huIbHKI4JHNdUeu3FMwqmS5cu5bLLLtv4qubYypUrOfDAAwddRt85\nruEyquOC4RhbREz5f+MaNv3ow8Ow3TaG4xoeozgmcFxT6bUXeypfkiRJVTCYSpIkqQoGU0mSJFXB\nYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmS\npCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhM\nJUmSVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJU\nBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kk\nSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpg\nMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOpJEmSqmAwlSRJ\nUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCrMejBdsQKWLoWIh/8tXAhHHz3ba9Yw\nWbECVq+GzTZr9pcVKwZdkWow1j/cL/pj7PlctQoWL4YFC+zJGs/XnDp19oy52h9mNZiuWAHLl8Pa\ntePvf/BBOPVUG6EaY/vJ/fdDZrO/LF9uQ5zvOvuH+8Wm6+7H69bBhg3N3/Zkga85jdfdM+Zqf5jV\nYHrccXDvvZNPP+202Vy7hsVE+8m99zb3a/5yv+iv6fox2JPnO19z6jSo/WFWg+m11049/cEHZ3Pt\nGhaT7SfT7T8abe4X/dXL82ZPnt98zanToPaHWQ2mu+029fQFC2Zz7RoWk+0n0+0/Gm3uF/3Vy/Nm\nT57ffM2p06D2h1kNpiecAIsWTT59+fLZXLuGxUT7yaJFzf2av9wv+mu6fgz25PnO15w6DWp/mNVg\numxZc83S7ruPv3/BAjjqKDjllNlcu4bF2H6yxRbNN4R33725vWzZoCvTIHX2D/eLTdfdj7fcsvnm\nNdiT1fA1p07dPWOu9oeFs7v4ZgDu1JrOsmWwcuXD3xKWwP7Rb2PP58qVcM89g65GNfI1p06dPWPN\nmrlZpz+wL0mSpCoYTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRV\nwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJ\nkqQqGEwlSZJUBYOpJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoY\nTCVJklQFg6kkSZKqYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmS\nVAWDqSRJkqpgMJUkSVIVDKaSJEmqgsFUkiRJVTCYSpIkqQoGU0mSJFXBYCpJkqQqGEwlSZJUBYOp\nJEmSqmAwlSRJUhUMppIkSaqCwVSSJElVMJhKkiSpCgZTSZIkVcFgKkmSpCoYTCVJklQFg6kkSZKq\nYDCVJElSFQymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVAWDqSRJkqpgMJUk\nSVIVDKaSJEmqQmRm7zNH3AKsnb1y+m4JcOugi5gFjmu4jOq4YDjGtntm7jDoIvqlT314GLbbxnBc\nw2MUxwSOayo99eIZBdNhExGXZeb+g66j3xzXcBnVccFoj22Ujep2c1zDYxTHBI6rHzyVL0mSpCoY\nTCVJklSFUQ+mpw26gFniuIbLqI4LRntso2xUt5vjGh6jOCZwXJtspK8xlSRJ0vAY9SOmkiRJGhIG\nU0mSJFVhpIJpRGwXEV+OiKvLf7edZL41EbE6Ii6PiMvmus5eRcRLI+JHEXFNRBw7wfSIiJPL9Csi\nYt9B1DlTPYzrwIi4q2yfyyPiHYOocyYi4kMRcXNEfH+S6UO5raCnsQ3d9pov5nEPWVbGszoivhUR\new+izpmYbkwd8z0jItZHxKvmsr6N1cu4Sg+5PCKujIivzXWNG6OHfXDriDg/Ir5XxnXkIOqciWre\nxzJzZP4BJwHHlr+PBf55kvnWAEsGXe80Y1kA/Bh4ArAF8D1gz655DgYuAAJ4NvDtQdfdp3EdCHx2\n0LXOcFzPA/YFvj/J9KHbVjMY29Btr/nwb573kOcA25a/D6p9XL2MqWO+rwCfB1416Lr7tK22AX4A\n7FZuP3bQdfdpXG8byyDADsDtwBaDrn2acVXxPjZSR0yBVwCnl79PB145wFo21TOBazLzJ5l5P3A2\nzfg6vQI4IxuXANtExE5zXegM9TKuoZOZF9E0nskM47YCehqb6jRve0hmfisz7yg3LwEeP8c1zlSv\nffFNwCeAm+eyuE3Qy7heA3wyM68FyMxhGFsv40pgq4gIYDFND10/t2XOTC3vY6MWTHfMzJvK3z8D\ndpxkvgQujIhVEbF8bkqbsV2A6zpuX1/um+k8tem15ueUUwUXRMTT5qa0WTWM22omRm17jYL53kPG\nvJ7mKE/Nph1TROwC/AFw6hzWtal62VZPAbaNiJXlPfnwOatu4/UyrvcCTwVuBFYDb87MDXNT3qyZ\nk36xsN8LnG0RcSHwuAkmHdd5IzMzIib7LawDMvOGiHgs8OWI+GH5pKA6fIfmtM49EXEwcB7w5AHX\npMm5vVSliHg+TTA9YNC19MG7gbdm5obmINzIWAjsB7wQeDRwcURckpn/M9iyNtlLgMuBFwBPpMka\nX8/MuwdbVv2GLphm5osmmxYRP4+InTLzpnJ4ecJTApl5Q/nvzRHxKZrD8rUF0xuAXTtuP77cN9N5\najNtzZ0v3Mz8fEScEhFLMvPWOapxNgzjturJiG6vUTBvewhARPwm8AHgoMy8bY5q21i9jGl/4OwS\nSpcAB0fE+sw8b25K3Ci9jOt64LbMXAesi4iLgL2BmoNpL+M6Ejgxm4szr4mInwK/Afz33JQ4K+ak\nX4zaqfzPAEeUv48APt09Q0RsGRFbjf0NvBiY8BtoA3Yp8OSI2CMitgAOpRlfp88Ah5dvyj0buKvj\nUoZaTTuuiHhcuS6HiHgmzX5a+xvLdIZxW/VkRLfXKJjPPWQ34JPAYUNy5G3aMWXmHpm5NDOXAh8H\njq48lEJv++CngQMiYmFELAKeBVw1x3XOVC/jupbmKDARsSPw68BP5rTK/puTfjF0R0yncSJwbkS8\nHlgLHAIQETsDH8jMg2muO/1UeR9dCHwkM78woHonlZnrI+IY4Is03wD8UGZeGRFvLNPfR/PNzIOB\na4B7aT6hVa3Hcb0KOCoi1gP3AYeWT53VioiP0nw7fUlEXA+0gM1heLfVmB7GNnTbaz6Y5z3kHcD2\nwCml16/PzP0HVfN0ehzT0OllXJl5VUR8AbgC2EDzXl3jwaKH9Li93gl8OCJW03yL/a21n0Wq5X3M\n/yWpJEmSqjBqp/IlSZI0pAymkiRJqoLBVJIkSVUwmEqSJKkKBlNJkiRVwWAqSZKkKhhMJUmSVIX/\nDykR6mgcGYLTAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from matplotlib import pyplot as plt\n",
"\n",
"fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(10,5))\n",
"\n",
"y_pos = [0 for i in range(len(x))]\n",
"\n",
"ax1.scatter(z_scores, y_pos, color='g')\n",
"ax1.set_title('Python standardization', color='g')\n",
"\n",
"ax2.scatter(minmax, y_pos, color='g')\n",
"ax2.set_title('Python Min-Max scaling', color='g')\n",
"\n",
"ax3.scatter(z_scores_np, y_pos, color='b')\n",
"ax3.set_title('Python NumPy standardization', color='b')\n",
"\n",
"ax4.scatter(np_minmax, y_pos, color='b')\n",
"ax4.set_title('Python NumPy Min-Max scaling', color='b')\n",
"\n",
"plt.tight_layout()\n",
"\n",
"for ax in (ax1, ax2, ax3, ax4):\n",
" ax.get_yaxis().set_visible(False)\n",
" ax.grid()\n",
"\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## 4. Isolation Forest"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" prediction | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3 | \n",
" 1 | \n",
"
\n",
" \n",
" 7 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 8 | \n",
" 3 | \n",
" 1 | \n",
"
\n",
" \n",
" 9 | \n",
" 104 | \n",
" 1 | \n",
"
\n",
" \n",
" 10 | \n",
" 204 | \n",
" -1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count prediction\n",
"0 3 1\n",
"7 1 1\n",
"8 3 1\n",
"9 104 1\n",
"10 204 -1"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.pipeline import Pipeline\n",
"from sklearn.ensemble import IsolationForest\n",
"\n",
"dataNorm = dataNorm[['count','count_n']]\n",
"\n",
"# La funcion iloc nos permite seleccionar desde una posición a otra en un array.\n",
"dataTrain = dataNorm.iloc[0:5]\n",
"\n",
"iforest = IsolationForest(n_estimators=100, contamination=0.00001, max_samples=5)\n",
"iforest.fit(dataTrain)\n",
"clf = iforest.fit(dataTrain)\n",
"prediction = iforest.predict(dataNorm)\n",
"\n",
"dataGroup2['prediction'] = prediction\n",
"dataGroup2[['count','prediction']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Example *iloc*:\n",
"```python\n",
"#return second position (python counts from 0, so 1)\n",
"print (df.columns.get_loc('Taste'))\n",
"1\n",
"\n",
"df.iloc[0:2, df.columns.get_loc('Taste')] = 'good'\n",
"df.iloc[2:6, df.columns.get_loc('Taste')] = 'bad'\n",
"print (df)\n",
" Food Taste\n",
"0 Apple good\n",
"1 Banana good\n",
"2 Candy bad\n",
"3 Milk bad\n",
"4 Bread bad\n",
"5 Strawberry bad\n",
"```\n",
"#### Examples *Isolation Forest*:\n",
"- n_estimators : int, optional (default=100)\n",
" \n",
" The number of base estimators in the ensemble.\n",
" \n",
" \n",
"- contamination : float in (0., 0.5), optional (default=0.1)\n",
" \n",
" The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.\n",
" \n",
" \n",
"- max_features : int or float, optional (default=1.0)\n",
"\n",
" The number of features to draw from X to train each base estimator.\n",
" If int, then draw max_features features.\n",
" If float, then draw max_features * X.shape[1] features.\n",
"\n",
"\n",
"\n",
"```python\n",
"\n",
"iforest = IsolationForest(n_estimators=100, contamination=0.1)\n",
"\n",
"```\n",
"\n",
"```\n",
" \tcount \tprediction\n",
"76 \t34 \t -1\n",
"77 \t31 \t -1\n",
"78 \t2 \t 1\n",
"79 \t68 \t -1\n",
"80 \t98 \t -1\n",
"83 \t4 \t 1\n",
"92 \t1 \t 1\n",
"95 \t4 \t 1\n",
"... 1 1\n",
"... 1 1\n",
"... 1 1\n",
"\n",
"```\n",
"```python\n",
"\n",
"iforest = IsolationForest(n_estimators=100, contamination=0.01)\n",
"\n",
"```\n",
"\n",
"```\n",
" \tcount \tprediction\n",
"76 \t34 \t 1\n",
"77 \t31 \t 1\n",
"78 \t2 \t 1\n",
"79 \t68 \t 1\n",
"80 \t98 \t -1\n",
"83 \t4 \t 1\n",
"92 \t1 \t 1\n",
"95 \t4 \t 1\n",
"... 1 1\n",
"... 1 1\n",
"... 1 1\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.1 Plot Isolation Forest"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAGoCAYAAABL+58oAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHg9JREFUeJzt3X+QnXVh7/HPN0sgNRtCUG9kTNqElqshhB+y5Uap7Ubu\nAMIV9HpLcWxLxCua0k7t1XFQp1Ln6lirlnGqXm6qKKPUlKt1ZBREcLrYcUjlxwAS0AtIgDAQfsiP\nLBgxy/f+cU5yl5AlP3aXs/vd12tm55zz7LPP+X7znMO+eZ5zzpZaawAAWjKr1wMAAJhoAgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDn79XoASfKyl72sLlmyZMK3+9RTT2Xu3LkT\nvt2pxjzbM1Pmap7tmSlzNc/eueGGGx6ptb58d+tNicBZsmRJrr/++gnf7tDQUAYHByd8u1ONebZn\npszVPNszU+Zqnr1TSrlnT9ZzigoAaI7AAQCaI3AAgOZMidfg7Mqvf/3rbNq0KVu3bt3nbcyfPz+3\n3377BI5qatrbec6ZMyeLFi3K7NmzJ3FUANA7UzZwNm3alHnz5mXJkiUppezTNrZs2ZJ58+ZN8Mim\nnr2ZZ601jz76aDZt2pSlS5dO8sgAoDem7CmqrVu35qUvfek+xw27VkrJS1/60nEdGQOAqW7KBk4S\ncTNJ/LsC0LopHTgAAPtC4Exhg4ODk/IBiADQOoEzSbZt29brIQDAjNVU4IyMJE880bmcCBs3bsyy\nZcvyrne9K8uXL8+JJ56YX/7yl7npppuycuXKHHnkkXnLW96Sxx57LEnniMt73/veDAwM5LOf/WxW\nr16dNWvWZOXKlTn00EMzNDSUs88+O8uWLcvq1at33M+aNWsyMDCQ5cuX5/zzz5+YwQPADNZM4IyM\nJN/9bnLJJZ3LiYqcO+64I+eee242bNiQgw46KN/85jfzp3/6p/nkJz+ZW265JStWrMhHP/rRHes/\n88wzuf766/O+970vSfLYY4/l2muvzQUXXJDTTjstf/VXf5UNGzbkJz/5SW666aYkycc//vFcf/31\nueWWW3LNNdfklltumZjBA8AM1UzgDA8nmzYlixd3LoeHJ2a7S5cuzdFHH50kOfbYY3PXXXfl8ccf\nzx/8wR8kSc4666z88Ic/3LH+H/3RHz3n59/0pjellJIVK1Zk4cKFWbFiRWbNmpXly5dn48aNSZJL\nL700r3nNa3LMMcdkw4YNue222yZm8AAwQ03ZD/rbW/39yaJFyX33dS77+5Onnx7/dg844IAd1/v6\n+vL444+/4Po7/1n57T8/a9as52xr1qxZ2bZtW+6+++58+tOfznXXXZcFCxZk9erVPqMGAMapmSM4\nfX3Jqacmb39757Kvb3LuZ/78+VmwYEH+7d/+LUny1a9+dcfRnH3x5JNPZu7cuZk/f342b96cK664\nYqKGCgAzVjNHcJJO1MyfP/n3c/HFF+c973lPnn766Rx66KH58pe/vM/bOuqoo3LMMcfk1a9+dRYv\nXpzjjz9+AkcKAJOo1mT0h8fufLuHmgqcibZkyZLceuutO26///3v33F9/fr1z1t/aGjoObe/8pWv\njLmt0d8bff2FtgcAU8bQULJ1a3LSSZ2oqTW58spkzpxkcLDXo2vnFBUA8CKptRM369dn5PIr88Tj\nNSOXX5msX99ZXmuvR+gIDgCwl0pJTjopIyPJnV9bnye/sD4HHpj8zh+vTN/2Izo95ggOALD3Ssnw\n8SflySeTAw9MnnwyGT5+asRNInAAgH1Ra/p/dOWOuDnwwKT/R1dOidNTiVNUAMDe6r6guO+69fmd\nP16Z4eNPSv+POrfTl///wuMeEjgAwN4ppfNuqZWd19zMLyU55aRO3MyZ0/O4SfbgFFUpZXEp5V9L\nKbeVUjaUUv6yu/zgUspVpZQ7upcLRv3MB0spd5ZSflZKOWkyJ7DDzofEpsghsokwODiY66+/Pkly\nyimn7PbTlAFg0g0OPvdITfeFx1PhLeLJnr0GZ1uS99VaD0+yMsm5pZTDk5yX5Ae11sOS/KB7O93v\nnZlkeZKTk3yhlDJJnyvcNTTUee/99qjZfuis+2nDLbn88stz0EEH9XoYAPD8IzVT4MjNdrsNnFrr\nA7XWG7vXtyS5Pckrk5ye5OLuahcneXP3+ulJ1tVaf1VrvTvJnUmOm+iBjxrgjvfi74icKyfuvfhv\nfvObc+yxx2b58uVZu3ZtkqS/vz8f/vCHc9RRR2XlypXZvHlzkmTjxo15wxvekCOPPDInnHBC7r33\n3iTJ6tWrs2bNmqxcuTKHHnpohoaGcvbZZ2fZsmVZvXr1jvtas2ZNBgYGsnz58px//vm7HM+SJUvy\nyCOPJEm+9rWv5bjjjsvxxx+fd7/73RkZGcnIyEhWr16dI444IitWrMgFF1wwrvkDwHS0V6/BKaUs\nSXJMkn9PsrDW+kD3Ww8mWdi9/sokoz/md1N32c7bOifJOUmycOHC531q7/z587Nly5Y9G9jrXpe+\np59O3zXXJNdckyQZ+d3fzTOrVmVknH9W/LOf/WwOPvjg/PKXv8zg4GBOPPHEPPXUUznqqKNy3nnn\n5a//+q/zuc99Lh/4wAeyZs2anHHGGXn729+er371q/mzP/uzfP3rX8+vf/3rbNmyJd///vdz+eWX\n57TTTsv3v//9XHDBBRkcHMyPfvSjHHnkkTnvvPNy8MEHZ2RkJG9605ty8skn54gjjsjIyEieeuqp\nbNmyJbXWDA8PZ+PGjbnkkkvyve99L7Nmzcr73//+fPGLX8yyZcty77335tprr02SPP7447v8d9y6\ndeu0+6Tk4eHhaTfmfTVT5mqe7ZkpczXPqW+PA6eU0p/km0neW2t9sow6DFVrraWUvTpUUmtdm2Rt\nkgwMDNTBnc7Z3X777Zk3b96eb/Atb0luueU5t0eGh/duG7vwmc98Jt/61reSJPfff38efPDB7L//\n/vnDP/zDlFLy2te+NldddVXmzZuX6667Lpdddllmz56dd73rXfnIRz6SefPmZfbs2TnllFNy4IEH\n5rjjjsvChQuzcuXKJMmKFSvy8MMPZ968ebnkkkuydu3abNu2LQ888EDuueeevPa1r01fX1/mzp2b\nefPmpZSS/v7+fOc738nNN9+cN7zhDXn22Wfzq1/9KosWLcoZZ5yRe+65Jx/60Idy6qmn5sQTT8ys\nWc8/UDdnzpwcc8wx4/q3ebENDQ1l58dJq2bKXM2zPTNlruY59e1R4JRSZqcTN5fUWv+lu3hzKeWQ\nWusDpZRDkjzUXX5/ksWjfnxRd9nk2X5aarQrr0xe97pxbXZoaChXX311rr322rzkJS/J4OBgtm7d\nmtmzZ2d74PX19WXbtm273dYBBxyQJJk1a9aO69tvb9u2LXfffXc+/elP57rrrsuCBQuyevXqbN26\ndczt1Vpz1lln5ROf+ES2bNnynJC7+eabc+WVV+bCCy/MpZdemosuumhf/wkAYFrak3dRlSRfSnJ7\nrfXvR33rsiRnda+fleTbo5afWUo5oJSyNMlhSX48cUPeyejX3KxcmZx/fudy/fr0XX31uF6D88QT\nT2TBggV5yUtekp/+9Ke7/AObo73uda/LunXrkiSXXHJJXv/61+/xfT355JOZO3du5s+fn82bN+eK\nK654wfVPOOGEfOMb38hDD3W68he/+EXuueeePPLII3n22Wfz1re+NR/72Mdy44037vEYAKAVe3IE\n5/gkf5LkJ6WUm7rLPpTkb5NcWkp5Z5J7kpyRJLXWDaWUS5Pcls47sM6ttY5M+Mi3G/Ve/B1vVzup\n+870Z58d1yu6Tz755Fx44YVZtmxZXvWqV+04rTSWf/iHf8g73vGOfOpTn8rLX/7yfPnLX97j+zrq\nqKNyzDHH5NWvfnUWL16c448//gXXP/zww/Oxj30sJ554YrZt25YDDjggn//85/Mbv/Ebecc73pFn\nn302SfKJT3xij8cAAK0odQp8XszAwEDd/jkv291+++1ZtmzZnm+k1ufGTK3ZMgGvwZkOdj5FtSf2\n+t93CpjO54L31kyZq3m2Z6bM1Tx7p5RyQ611YHfrtfO3qKbwe/EBgBdXO4EDANA1pQNnKpw+a5F/\nVwBaN2UDZ86cOXn00Uf9Mp5gtdY8+uijmTNnTq+HAgCTZsr+NfFFixZl06ZNefjhh/d5G1u3bp0R\nv8j3dp5z5szJokWLJnFEANBbUzZwZs+enaVLl45rG0NDQ9Pu03r3xUyZJwDsqSl7igoAYF8JHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACg\nOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaM5uA6eUclEp5aFSyq2j\nlv1NKeX+UspN3a9TRn3vg6WUO0spPyulnDRZAwcAGMueHMH5SpKTd7H8glrr0d2vy5OklHJ4kjOT\nLO/+zBdKKX0TNVgAgD2x28Cptf4wyS/2cHunJ1lXa/1VrfXuJHcmOW4c4wMA2GvjeQ3OX5RSbume\nwlrQXfbKJPeNWmdTdxkAwIum1Fp3v1IpS5J8p9Z6RPf2wiSPJKlJ/meSQ2qtZ5dSPpdkfa31a931\nvpTkilrrN3axzXOSnJMkCxcuPHbdunUTMqHRhoeH09/fP+HbnWrMsz0zZa7m2Z6ZMlfz7J1Vq1bd\nUGsd2N16++3Lxmutm7dfL6X8Y5LvdG/en2TxqFUXdZftahtrk6xNkoGBgTo4OLgvQ3lBQ0NDmYzt\nTjXm2Z6ZMlfzbM9Mmat5Tn37dIqqlHLIqJtvSbL9HVaXJTmzlHJAKWVpksOS/Hh8QwQA2Du7PYJT\nSvl6ksEkLyulbEpyfpLBUsrR6Zyi2pjk3UlSa91QSrk0yW1JtiU5t9Y6MjlDBwDYtd0GTq31bbtY\n/KUXWP/jST4+nkEBAIyHTzIGAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7AAQCaI3AAgOYIHACgOQIHAGiOwAEA\nmiNwAIDmCBwAoDkCBwBojsABpqyRkeSJJzqXAHtjv14PAGBXRkaS73432bQpWbQoOfXUpK+v16MC\npgtHcIApaXi4EzeLF3cuh4d7PSJgOhE4wJTU3985cnPffZ3L/v5ejwiYTpyiAqakvr7Oaanh4U7c\nOD0F7A2BA0xZfX3J/Pm9HgUwHe32FFUp5aJSykOllFtHLTu4lHJVKeWO7uWCUd/7YCnlzlLKz0op\nJ03WwAEAxrInr8H5SpKTd1p2XpIf1FoPS/KD7u2UUg5PcmaS5d2f+UIpxYFlAOBFtdvAqbX+MMkv\ndlp8epKLu9cvTvLmUcvX1Vp/VWu9O8mdSY6boLECAOyRUmvd/UqlLEnynVrrEd3bj9daD+peL0ke\nq7UeVEr5XJL1tdavdb/3pSRX1Fq/sYttnpPknCRZuHDhsevWrZuYGY0yPDyc/hnw1gvzbM9Mmat5\ntmemzNU8e2fVqlU31FoHdrfeuF9kXGutpZTdV9Lzf25tkrVJMjAwUAcHB8c7lOcZGhrKZGx3qjHP\n9syUuZpne2bKXM1z6tvXz8HZXEo5JEm6lw91l9+fZPGo9RZ1lwEAvGj2NXAuS3JW9/pZSb49avmZ\npZQDSilLkxyW5MfjGyIAwN7Z7SmqUsrXkwwmeVkpZVOS85P8bZJLSynvTHJPkjOSpNa6oZRyaZLb\nkmxLcm6t1Z/JAwBeVLsNnFrr28b41gljrP/xJB8fz6AAAMbD36ICAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCaI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDmCBwAoDkCBwBojsABAJojcACA5ggcAKA5AgcAaI7A\nAQCas994friUsjHJliQjSbbVWgdKKQcn+eckS5JsTHJGrfWx8Q0TAGDPTcQRnFW11qNrrQPd2+cl\n+UGt9bAkP+jeBgB40UzGKarTk1zcvX5xkjdPwn0AAIyp1Fr3/YdLuTvJE+mcovrftda1pZTHa60H\ndb9fkjy2/fZOP3tOknOSZOHChceuW7dun8cxluHh4fT390/4dqca82zPTJmrebZnpszVPHtn1apV\nN4w6azSmcb0GJ8nv1VrvL6X8hyRXlVJ+OvqbtdZaStllQdVa1yZZmyQDAwN1cHBwnEN5vqGhoUzG\ndqca82zPTJmrebZnpszVPKe+cZ2iqrXe3718KMm3khyXZHMp5ZAk6V4+NN5BAgDsjX0OnFLK3FLK\nvO3Xk5yY5NYklyU5q7vaWUm+Pd5BAgDsjfGcolqY5Fudl9lkvyT/VGv9XinluiSXllLemeSeJGeM\nf5gAAHtunwOn1vrzJEftYvmjSU4Yz6AAAMbDJxkDAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAcwQOANAcgQMANEfgAADNETgAQHME\nDgDQHIEDADRH4AAAzRE4AEBzBA4A0ByBAwA0R+AAAM0ROABAc2Zk4DzzTHLvvZ1LAKA9+/V6AC+2\nZ55JPvrR5K67kt/+7eT885P99+/1qACAiTTjjuA8+GAnbg47rHP54IO9HhEAMNGaD5yRkeSJJzqX\nSfKKV3SO3NxxR+fyFa/o7fgAgIk3aaeoSiknJ/lskr4kX6y1/u1k3dcL+e53k02bkkWLklNP7ZyO\nOv/8zpGbV7zC6SkAaNGkHMEppfQl+XySNyY5PMnbSimHT8Z9vZCRkU7cLF7cuRwe7izff//kN39T\n3ABAqybrFNVxSe6stf681vpMknVJTp+k+xpTX1/nyM1993Uu+/tf7BEAAL1Qaq0Tv9FS/luSk2ut\n/717+0+S/Kda65+PWuecJOckycKFC49dt27dhI9jeHg4/f39GRnpxE6rts+zdTNlnsnMmat5tmem\nzNU8e2fVqlU31FoHdrdez94mXmtdm2RtkgwMDNTBwcEJv4+hoaFMxnanGvNsz0yZq3m2Z6bM1Tyn\nvsk6RXV/ksWjbi/qLgMAmHSTFTjXJTmslLK0lLJ/kjOTXDZJ9wUA8ByTcoqq1rqtlPLnSa5M523i\nF9VaN0zGfQEA7GzSXoNTa708yeWTtX0AgLE0/0nGAMDMI3AAgOYIHACgOQIHAGiOwAEAmiNwAIDm\nCBwAoDkCBwBojsABAJojcACA5pRaa6/HkFLKw0numYRNvyzJI5Ow3anGPNszU+Zqnu2ZKXM1z975\nrVrry3e30pQInMlSSrm+1jrQ63FMNvNsz0yZq3m2Z6bM1TynPqeoAIDmCBwAoDmtB87aXg/gRWKe\n7ZkpczXP9syUuZrnFNf0a3AAgJmp9SM4AMAMJHAAgOY0GTillJNLKT8rpdxZSjmv1+OZSKWUxaWU\nfy2l3FZK2VBK+cvu8r8ppdxfSrmp+3VKr8c6XqWUjaWUn3Tnc3132cGllKtKKXd0Lxf0epzjUUp5\n1ah9dlMp5clSyntb2Z+llItKKQ+VUm4dtWzMfVhK+WD3efuzUspJvRn13htjnp8qpfy0lHJLKeVb\npZSDusuXlFJ+OWrfXti7ke+dMeY55mN1uu7PZMy5/vOoeW4spdzUXT6d9+lYv1Om//O01trUV5K+\nJHclOTTJ/kluTnJ4r8c1gfM7JMlrutfnJfm/SQ5P8jdJ3t/r8U3wXDcmedlOy/4uyXnd6+cl+WSv\nxzmB8+1L8mCS32plfyb5/SSvSXLr7vZh93F8c5IDkiztPo/7ej2HcczzxCT7da9/ctQ8l4xebzp9\njTHPXT5Wp/P+HGuuO33/M0k+0sA+Het3yrR/nrZ4BOe4JHfWWn9ea30mybokp/d4TBOm1vpArfXG\n7vUtSW5P8srejupFdXqSi7vXL07y5h6OZaKdkOSuWutkfKp3T9Raf5jkFzstHmsfnp5kXa31V7XW\nu5Pcmc7zecrb1Txrrd+vtW7r3lyfZNGLPrAJNsb+HMu03Z/JC8+1lFKSnJHk6y/qoCbBC/xOmfbP\n0xYD55VJ7ht1e1MaDYBSypIkxyT59+6iv+geDr9oup+66apJri6l3FBKOae7bGGt9YHu9QeTLOzN\n0CbFmXnufzBb25/bjbUPW37unp3kilG3l3ZPZVxTSnl9rwY1gXb1WG15f74+yeZa6x2jlk37fbrT\n75Rp/zxtMXBmhFJKf5JvJnlvrfXJJP8rndNyRyd5IJ3Dp9Pd79Vaj07yxiTnllJ+f/Q3a+d4aROf\nc1BK2T/JaUn+T3dRi/vzeVrah2MppXw4ybYkl3QXPZDkN7uP7f+R5J9KKQf2anwTYEY8Vnfytjz3\nf0am/T7dxe+UHabr87TFwLk/yeJRtxd1lzWjlDI7nQfiJbXWf0mSWuvmWutIrfXZJP+YKXrIcG/U\nWu/vXj6U5FvpzGlzKeWQJOlePtS7EU6oNya5sda6OWlzf44y1j5s7rlbSlmd5L8keXv3l0S6h/Yf\n7V6/IZ3XMPzHng1ynF7gsdrc/kySUsp+Sf5rkn/evmy679Nd/U5JA8/TFgPnuiSHlVKWdv+v+Mwk\nl/V4TBOme+73S0lur7X+/ajlh4xa7S1Jbt35Z6eTUsrcUsq87dfTecHmrensy7O6q52V5Nu9GeGE\ne87/Eba2P3cy1j68LMmZpZQDSilLkxyW5Mc9GN+EKKWcnOQDSU6rtT49avnLSyl93euHpjPPn/dm\nlOP3Ao/VpvbnKP85yU9rrZu2L5jO+3Ss3ylp4Xna61c5T8ZXklPSeSX4XUk+3OvxTPDcfi+dQ4W3\nJLmp+3VKkq8m+Ul3+WVJDun1WMc5z0PTeaX+zUk2bN+PSV6a5AdJ7khydZKDez3WCZjr3CSPJpk/\nalkT+zOdaHsgya/TOVf/zhfah0k+3H3e/izJG3s9/nHO8850Xquw/Xl6YXfdt3Yf0zcluTHJm3o9\n/nHOc8zH6nTdn2PNtbv8K0nes9O603mfjvU7Zdo/T/2pBgCgOS2eogIAZjiBAwA0R+AAAM0ROABA\ncwQOANAcgQMANEfgAADN+X/bOQmsNS8kIwAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x = dataGroup2[(dataGroup2['prediction'] == -1)]['count'].values\n",
"\n",
"%matplotlib inline\n",
"\n",
"from matplotlib import pyplot as plt\n",
"\n",
"def plot():\n",
" plt.figure(figsize=(8,6))\n",
"\n",
" plt.scatter(dataGroup2['count'], dataGroup2['count'], s=6, label=\"normal\", alpha=0.3, color=\"blue\")\n",
" \n",
" plt.scatter(x, x, marker=\"x\", color=\"red\", label=\"anomalies\", alpha=0.5)\n",
" \n",
" plt.legend(loc='upper left')\n",
" plt.grid()\n",
"\n",
" plt.tight_layout()\n",
"\n",
"plot()\n",
"plt.show()\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ipdst | \n",
" proto | \n",
" time | \n",
" count | \n",
" prediction | \n",
" idpst_label | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:08:55 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 7 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:09:30 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 8 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:50 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 9 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:55 | \n",
" 104 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 10 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:09:00 | \n",
" 204 | \n",
" -1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ipdst proto time count prediction idpst_label\n",
"0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 1 0\n",
"7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 1 0\n",
"8 10.3.20.102 TCP 2017-03-20 17:08:50 3 1 0\n",
"9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1 0\n",
"10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np, matplotlib.pyplot as plt\n",
"from matplotlib.colors import ListedColormap\n",
"\n",
"def plot_decision(X, y, classifier, test_idx=None, resolution=0.02, figsize=(6,6)):\n",
"\n",
" # setup marker generator and color map\n",
" markers = ('s', 'x', 'o', '^', 'v')\n",
" colors = ('#cc0000', '#003399', '#00cc00', '#999999', '#66ffff')\n",
" cmap = ListedColormap(colors[:len(np.unique(y))])\n",
" \n",
" # get dimensions\n",
" x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n",
" x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n",
" xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))\n",
" xmin = xx1.min()\n",
" xmax = xx1.max()\n",
" ymin = xx2.min()\n",
" ymax = xx2.max()\n",
" \n",
" # create the figure\n",
" fig, ax = plt.subplots(figsize=figsize)\n",
" ax.set_xlim(xmin, xmax)\n",
" ax.set_ylim(ymin, ymax)\n",
" \n",
" # plot the decision surface\n",
" Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)\n",
" Z = Z.reshape(xx1.shape)\n",
" ax.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap, zorder=1)\n",
" \n",
" # plot all samples\n",
" for idx, cl in enumerate(np.unique(y)):\n",
" ax.scatter(x=X[y == cl, 0], \n",
" y=X[y == cl, 1],\n",
" alpha=0.6, \n",
" c=cmap(idx),\n",
" edgecolor='black',\n",
" marker='o',#markers[idx],\n",
" s=50,\n",
" label=cl,\n",
" zorder=3)\n",
"\n",
" # highlight test samples\n",
" if test_idx:\n",
" X_test, y_test = X[test_idx, :], y[test_idx]\n",
" ax.scatter(X_test[:, 0],\n",
" X_test[:, 1],\n",
" c='w',\n",
" alpha=1.0,\n",
" edgecolor='black',\n",
" linewidths=1,\n",
" marker='o',\n",
" s=150, \n",
" label='test set',\n",
" zorder=2)\n",
" \n",
"dataGroup2['idpst_label'], _ = pd.factorize(dataGroup2['ipdst'])\n",
"dataGroup2"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Misclassified samples: 0\n",
"Accuracy: 1.00\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAF3CAYAAAC/h9zqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAG+pJREFUeJzt3X2wXXV97/H3lySGxCSQk5BnTAggCtYeIBOqomLF+lCv\nEVsR6rUgTIOj5epc7xSUudbpXKdopVZp0cZKjb1cBAUMWkTBKoojIsGI4SGShEQTE2ISyYMhhCTf\n+8degU38nWQnOXuvfXLer5k9Z63fetjf8yNzPqy1fmutyEwkSdrbEXUXIEnqTgaEJKnIgJAkFRkQ\nkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpaGjdBRyKcaNG5bHjxtVdxoC1dft2XrR5M68Z\n+vv/DL67cydLjjqK5w8fXkNlktrpZ7/85frMPGZ/6w3ogDh23DjuuOKKussYsP7z/vvZcffdvLqn\n5/eWjd64keedeSZ/etppNVQmqZ0mXHLJylbW8xTTIDZuzBjW9rFsLTB+zJhOliOpyxgQg1jv9Oms\nHjGCZdu2Pad92bZtrB4xgt7p02uqTFI3aFtARMSxEfHdiHgoIh6MiPdX7T0RcUdEPFr9HNu0zYci\nYmlELImI17erNjUcOWwYF82Zw4IIbti4kbs2buSGjRtZEMFFc+YwfNiwukuUVKN2XoPYCXwwM++P\niNHAwoi4A7gQ+E5mXhkRlwOXA5dFxMnAecApwBTgzoh4YWbuOpAv3TV0KJtmzuTpkSP79Zfpb8O2\nbeOo5csZsnNnrXXMnDCB/33hhSxauZL1mzdz2pgxXDR9uuEgqX0BkZlrgDXV9JaIeBiYCswBzqpW\nmw98D7isav9yZj4FPBYRS4HZwI8O5Hs3zZzJ0ccey9jRo4mI/vhV+l1m8tstW3gC6PnFL+ouh+HD\nhnHGCSfUXYakLtORaxARMQM4FfgxMLEKD2hcC51YTU8FftW02aqq7YA8PXJkV4cDQEQwdvTorj/K\nkTS4tT0gImIUcBPwgczc3LwsG+87PaB3nkbE3Ii4LyLu27B1a1/rHGy5HTMQapQ0uLU1ICJiGI1w\nuC4zb66aH4+IydXyycC6qn01cGzT5tOqtufIzHmZOSszZ40bNap9xR+ib/3Xf3HKy17Gi2fP5hOf\n+Uzd5UjSAWvnKKYAvgA8nJn/2LToVuCCavoCYEFT+3kRMTwijgNOBO5tV317bN++nR/ecw9f+8Y3\n+OE997B9+/ZD3ueuXbt4/2WX8fXrr+dnd9/NDTffzENLlvRDtZLUOe0cxfQK4F3AzyNiUdX2YeBK\n4MaIuBhYCZwLkJkPRsSNwEM0RkC970BHMB2opcuXc82VVzJ5yxYmZfJoBDeNHs17L7+cE2bOPOj9\n/uT++zn+uOOYOWMGAOeecw5fv/12Tj7ppH6qXJLar52jmO4G+jrR/to+tvkY8LF21dRs+/btXHPl\nlbxl925OmDLlmfalmzZxzZVX8vdXX83wg3wO0eq1a5k29dnr61MnT+Yn999/yDVLUicN2jupFy5a\nxOQtWzjhqKOe037CUUcxecsWFi5a1MeWkjQ4DNqA+M369UzK8gCqSZmsW7/+oPc9ddIkVq1+9vr6\n6jVrmDJ58kHvT5LqMGgD4pjx41nbx1DTtRFMGD/+oPc969RTWbp8OY+tXMmOHTu48ZZbePPrfXKI\npIFl0AbE6b29rBk9mqWbNj2nfemmTawZPZrTe3sPet9Dhw7ln668kj99xzt46StewZ/PmcMpL3rR\noZYsSR01oN8HcSiOPPJI3nv55Y1RTL/+NZMyWRvBmmoU08FeoN7jjWefzRvPPrufqpWkzhu0AQFw\nwsyZ/P3VV7Nw0SLWrV/P8ePHc3pv7yGHgyQdDgZ1QAAMHz6cl59xRt1lSFLXGbTXICRJ+3ZYBkT2\nMXy1mwyEGiUNboddQAzbto3fbtnS1X+A97wPYther/qUpG5y2F2DOGr5cp4AftPl71rY80Y5SepW\nh11ADNm5syve0iZJA91hd4pJktQ/DAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiS\nigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSUdsCIiKujYh1EbG4qe2G\niFhUfVZExKKqfUZEPNm07HPtqkuS1Jp2vpP6i8A/A1/a05CZ79gzHRFXAZua1l+Wmb1trEeSdADa\nFhCZ+f2ImFFaFhEBnAv8cbu+X5J0aOq6BvFK4PHMfLSp7bjq9NJdEfHKmuqSJFXaeYppX84Hrm+a\nXwO8IDM3RMTpwNci4pTM3Lz3hhExF5gLMK2npyPFStJg1PEjiIgYCrwNuGFPW2Y+lZkbqumFwDLg\nhaXtM3NeZs7KzFnjRo3qRMmSNCjVcYrpbOCRzFy1pyEijomIIdX0TOBEYHkNtUmSKu0c5no98CPg\npIhYFREXV4vO47mnlwBeBTxQDXv9KvCezNzYrtokSfvXzlFM5/fRfmGh7SbgpnbVIkk6cN5JLUkq\nMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCID\nQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAk\nSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSitoWEBFxbUSsi4jFTW0fjYjVEbGo+rypadmHImJpRCyJ\niNe3qy5JUmvaeQTxReANhfZPZWZv9bkNICJOBs4DTqm2uSYihrSxNknSfrQtIDLz+8DGFlefA3w5\nM5/KzMeApcDsdtUmSdq/Oq5BXBoRD1SnoMZWbVOBXzWts6pqkyTVpNMB8VlgJtALrAGuOtAdRMTc\niLgvIu7bsHVrf9cnSap0NCAy8/HM3JWZu4HP8+xppNXAsU2rTqvaSvuYl5mzMnPWuFGj2luwJA1i\nHQ2IiJjcNHsOsGeE063AeRExPCKOA04E7u1kbZKk5xrarh1HxPXAWcD4iFgF/C1wVkT0AgmsAC4B\nyMwHI+JG4CFgJ/C+zNzVrtokSfvXtoDIzPMLzV/Yx/ofAz7WrnokSQfGO6klSUUGhCSpyICQJBUZ\nEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEh\nSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKk\nIgNCklRkQEiSigwISVJR2wIiIq6NiHURsbip7R8i4pGIeCAibomIo6v2GRHxZEQsqj6fa1ddkqTW\ntPMI4ovAG/ZquwN4SWa+FPgF8KGmZcsys7f6vKeNdUmSWtC2gMjM7wMb92r7dmburGbvAaa16/sl\nSYemzmsQFwHfbJo/rjq9dFdEvLKuoiRJDUPr+NKIuALYCVxXNa0BXpCZGyLidOBrEXFKZm4ubDsX\nmAswraenUyVL0qDT8SOIiLgQeDPwzsxMgMx8KjM3VNMLgWXAC0vbZ+a8zJyVmbPGjRrVoaolafDp\naEBExBuAvwHekpnbmtqPiYgh1fRM4ERgeSdrkyQ9V9tOMUXE9cBZwPiIWAX8LY1RS8OBOyIC4J5q\nxNKrgL+LiKeB3cB7MnNjcceSpI5oW0Bk5vmF5i/0se5NwE3tqkWSdOC8k1qSVGRASJKKDAhJUpEB\nIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCS\npCIDQpJUZEBIkooMCElSUUsBERHfaaVNknT4GLqvhRFxJDASGB8RY4GoFo0Bpra5NklSjfYZEMAl\nwAeAKcBCng2IzcA/t7EuSVLN9hkQmflp4NMRcWlmXt2hmiRJXWB/RxAAZObVEfFyYEbzNpn5pTbV\nJUmqWUsBERH/ARwPLAJ2Vc0JGBCS2m7700+zaOVKNmzezLgxY+idPp0jhw2ru6zDXksBAcwCTs7M\nbGcxkrS35evWce2CBUx98kkmAT8FvjFiBBfNmcPMCRPqLu+w1up9EIuBSe0sRJL2tv3pp7l2wQLm\nZPKOnh5e3dPDO3p6mJPJtQsW8NTTT9dd4mGt1SOI8cBDEXEv8NSexsx8S1uqkiRg0cqVTH3ySY7v\n6XlO+/EjRzJ140YWrVzJGSecUFN1h79WA+Kj7SxCkko2bN7c56mLScD6zZs7Wc6g0+ooprvaXYgk\n7W3cmDH8tI9la4HTxozpZDmDTquP2tgSEZurz/aI2BURRrektuqdPp3VI0awbNu257Qv27aN1SNG\n0Dt9ek2VDQ4tBURmjs7MMZk5BhgB/Blwzb62iYhrI2JdRCxuauuJiDsi4tHq59imZR+KiKURsSQi\nXn+Qv4+kw8iRw4Zx0Zw5LIjgho0buWvjRm7YuJEFEVw0Zw7DHeraVnGwI1cj4qeZeeo+lr8K2Ap8\nKTNfUrV9AtiYmVdGxOXA2My8LCJOBq4HZtN4rMedwAszc1cfuwegd/r0vOOKKw6qfkkDx1PVfRDr\nN29mfHUfhOFw8CZccsnCzJy1v/VavVHubU2zR9C4L2L7vrbJzO9HxIy9mucAZ1XT84HvAZdV7V/O\nzKeAxyJiKY2w+FEr9Uk6vA0fNszRSjVodRTTf2ua3gmsoPFH/UBNzMw11fRaYGI1PRW4p2m9Vfi0\nWEmqVaujmN7d31+cmRkRB3x+KyLmAnMBpu01NlqS1H9aHcU0LSJuqS46r4uImyJi2kF83+MRMbna\n52RgXdW+Gji2ab1pVdvvycx5mTkrM2eNGzXqIEqQJLWi1Udt/DtwK40LyFOAr1dtB+pW4IJq+gJg\nQVP7eRExPCKOA04E7j2I/UuS+kmrAXFMZv57Zu6sPl8EjtnXBhFxPY2LzCdFxKqIuBi4EnhdRDwK\nnF3Nk5kPAjcCDwG3A+/b3wgmSVJ7tXqRekNE/HcaQ1EBzgc27GuDzDy/j0Wv7WP9jwEfa7EeSVKb\ntXoEcRFwLo2RR2uAPwcubFNNkqQu0OoRxN8BF2Tmb6FxRzTwSRrBIUk6DLV6BPHSPeEAkJkbgT7v\nopYkDXytBsQRez03qYfWjz4kSQNQq3/krwJ+FBFfqebfjheUJemw1uqd1F+KiPuAP66a3paZD7Wv\nLElS3Vo+TVQFgqEgSYNEq9cgJEmDjAEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJ\nUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQV\nGRCSpCIDQpJUZEBIkoqGdvoLI+Ik4IamppnAR4Cjgb8CflO1fzgzb+tweZKkSscDIjOXAL0AETEE\nWA3cArwb+FRmfrLTNUmSfl/dp5heCyzLzJU11yFJ2kvdAXEecH3T/KUR8UBEXBsRY0sbRMTciLgv\nIu7bsHVrZ6qUpEGotoCIiOcBbwG+UjV9lsb1iF5gDXBVabvMnJeZszJz1rhRozpSqyQNRnUeQbwR\nuD8zHwfIzMczc1dm7gY+D8yusTZJGvTqDIjzaTq9FBGTm5adAyzueEWSpGd0fBQTQEQ8H3gdcElT\n8yciohdIYMVeyyRJHVZLQGTm74Bxe7W9q45aJElldY9ikiR1KQNCklRkQEiSigwISVKRASFJKjIg\nJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KS\nVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElF\nQ+v40ohYAWwBdgE7M3NWRPQANwAzgBXAuZn52zrqkyTVewTxmszszcxZ1fzlwHcy80TgO9W8JKkm\n3XSKaQ4wv5qeD7y1xlokadCrKyASuDMiFkbE3KptYmauqabXAhPrKU2SBDVdgwDOzMzVETEBuCMi\nHmlemJkZEVnasAqUuQDTenraX6kkDVK1HEFk5urq5zrgFmA28HhETAaofq7rY9t5mTkrM2eNGzWq\nUyVL0qDT8YCIiOdHxOg908CfAIuBW4ELqtUuABZ0ujZJ0rPqOMU0EbglIvZ8///LzNsj4ifAjRFx\nMbASOLeG2iRJlY4HRGYuB/6w0L4BeG2n65EklXXTMFdJUhcxICRJRQaEJKnIgJAkFRkQkqQiA0KS\nVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElF\nBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRA\nSJKKOh4QEXFsRHw3Ih6KiAcj4v1V+0cjYnVELKo+b+p0bZKkZw2t4Tt3Ah/MzPsjYjSwMCLuqJZ9\nKjM/WUNNkqS9dDwgMnMNsKaa3hIRDwNTO12HJGnfar0GEREzgFOBH1dNl0bEAxFxbUSMra0wSVJ9\nARERo4CbgA9k5mbgs8BMoJfGEcZVfWw3NyLui4j7Nmzd2rF6JWmwqSUgImIYjXC4LjNvBsjMxzNz\nV2buBj4PzC5tm5nzMnNWZs4aN2pU54qWpEGmjlFMAXwBeDgz/7GpfXLTaucAiztdmyTpWXWMYnoF\n8C7g5xGxqGr7MHB+RPQCCawALqmhNklSpY5RTHcDUVh0W6drkST1zTupJUlFBoQkqciAkCQVGRCS\npCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkq\nMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqWho3QVI0v48sW0b\nN997L6vXr2fq+PG8bfZsjh45su6yDnsGhKSudtfDD3PV/PmcvGMH0yP4RSZ/efvtfPCCC3j1i19c\nd3mHNQNCUtd6Yts2rpo/n/dm8tLRo59pf2D7dq6aP59TP/IRxngk0TZeg5DUtW6+915O3rGDPxg+\nnCd37GDr9u08Wc2fvGMHN917b90lHtY8gpDUtVavX8+xu3ezftMmhu7ezVBgO7D1iCM4NoJVGzbU\nXeJhreuOICLiDRGxJCKWRsTlddcjqT4Txo7lkR07GAWMGTKEkUOGMGbIEEYBj+zYwaSjj667xMNa\nVwVERAwB/gV4I3AycH5EnFxvVZLqcvzEiSyOYMnu3c9pX7J7N4sjOGHixJoqGxy67RTTbGBpZi4H\niIgvA3OAh2qtSlItntqxg9dNmcI1a9dyytNPMx1YCTx4xBG8bsoUtu3YUXeJh7VuC4ipwK+a5lcB\nZ9RUi6SajRszhrGjR3PN1Kl8fd06Vm3fzguOPJL3TZjAf27ezPgxY+ou8bDWbQGxXxExF5hbzT41\n4ZJLFtdZT4vGA+vrLqIFA6VOGDi1WuehiQkwYz3ERNg1DY4cCtvnL1s25AbIdQ8+uALIuoss6Nb+\n3GN6Kyt1W0CsBo5tmp9WtT0jM+cB8wAi4r7MnNW58g6Odfa/gVKrdfav5jo/U3cx+zBQ+nN/uuoi\nNfAT4MSIOC4ingecB9xac02SNCh11RFEZu6MiL8GvgUMAa7NzAdrLkuSBqWuCgiAzLwNuK3F1ee1\ns5Z+ZJ39b6DUap39yzo7KDK78fqOJKlu3XYNQpLUJQZUQETEP0TEIxHxQETcEhHF++zrflxHRLw9\nIh6MiN0R0edIhohYERE/j4hFEXFfJ2usvr/VOuvuz56IuCMiHq1+ju1jvVr6c3/9Ew2fqZY/EBGn\ndaq2A6zzrIjYVPXfooj4SE11XhsR6yKiOIS9i/pzf3V2RX8ekswcMB/gT4Ch1fTHgY8X1hkCLANm\nAs8Dfgac3OE6XwycBHwPmLWP9VYA42vsz/3W2SX9+Qng8mr68tJ/97r6s5X+Ad4EfBMI4I+AH9fw\n37qVOs8CvlHHv8W96ngVcBqwuI/ltfdni3V2RX8eymdAHUFk5rczc2c1ew+N+yT29szjOjJzB7Dn\ncR0dk5kPZ+aSTn7nwWixztr7s/q++dX0fOCtHf7+fWmlf+YAX8qGe4CjI2JyF9bZFTLz+8DGfazS\nDf3ZSp0D3oAKiL1cROP/IvZWelzH1I5UdOASuDMiFlZ3iHejbujPiZm5pppeC/T1hLY6+rOV/umG\nPmy1hpdXp22+GRGndKa0A9YN/dmqgdCffeq6Ya4RcScwqbDoisxcUK1zBbATuK6TtTVrpc4WnJmZ\nqyNiAnBHRDxS/V9Jv+mnOttuX3U2z2RmRkRfQ+/a3p+HufuBF2Tm1oh4E/A14MSaaxrIBnx/dl1A\nZObZ+1oeERcCbwZem9WJvr3s93Ed/WF/dba4j9XVz3URcQuN0wD9+getH+qsvT8j4vGImJyZa6pT\nCev62Efb+7Oglf7pSB/uRyuPsdncNH1bRFwTEeMzs9ueKdQN/blfA6g/+zSgTjFFxBuAvwHekpnb\n+lhtQDyuIyKeHxGj90zTuADfjQ8e7Ib+vBW4oJq+APi9I58a+7OV/rkV+Mtq9M0fAZuaTpl1yn7r\njIhJERHV9Gwafx+68ZVt3dCf+zWA+rNvdV8lP5APsJTGucdF1edzVfsU4Lam9d4E/ILGqI0raqjz\nHBrnRZ8CHge+tXedNEaT/Kz6PNitdXZJf44DvgM8CtwJ9HRTf5b6B3gP8J5qOmi8CGsZ8HP2MbKt\n5jr/uuq7n9EYBPLymuq8HlgDPF39+7y4S/tzf3V2RX8eysc7qSVJRQPqFJMkqXMMCElSkQEhSSoy\nICRJRQaEJKnIgJD6QURsrX5OiYiv7mfdD0TEyKb526KPJxNLdXKYq9SHiBiSmbtaXHdrZo5qcd0V\nNMbuD5g7ajU4eQShQSkiZkTj3SLXRcTDEfHViBgZjXdKfDwi7gfeHhHHR8Tt1QMAfxARL6q2Py4i\nfhSN90/8n732u7iaHhIRn4yIxdUD2y6NiP9B4wa/70bEd6v1VkTE+Gr6f1brL46IDzTt8+GI+Hw0\n3t/x7YgY0eEu0yBkQGgwOwm4JjNfDGwG3lu1b8jM0zLzyzTeLXxpZp4O/C/gmmqdTwOfzcw/oHE3\nbclcYAbQm5kvBa7LzM8AvwZek5mvaV45Ik4H3g2cQeM9B38VEadWi08E/iUzTwGeAP7s0H51af8M\nCA1mv8rMH1bT/xc4s5q+ASAiRgEvB74SEYuAfwX2vHfgFTQetQDwH33s/2zgX7N6h0lm7u/dAWcC\nt2Tm7zJzK3Az8Mpq2WOZuaiaXkgjeKS26rqnuUodtPcFuD3zv6t+HgE8kZm9LW7fTk81Te8CPMWk\ntvMIQoPZCyLiZdX0XwB3Ny/MxuOaH4uIt8Mz70L+w2rxD2k8ERXgnX3s/w7gkogYWm3fU7VvAUYX\n1v8B8NbqWsjzaTxM8QcH/mtJ/cOA0GC2BHhfRDwMjAU+W1jnncDFEbHnKbF7XtP5/mrbn9P328z+\nDfgl8EC1/V9U7fOA2/dcpN4jM+8HvgjcC/wY+LfM/OlB/m7SIXOYqwaliJhB44XyL6m5FKlreQQh\nSSryCEKSVOQRhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVLR/wf/BFOH0okqngAAAABJRU5ErkJg\ngg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import numpy as np, pandas as pd, matplotlib.pyplot as plt, pydotplus\n",
"from sklearn import tree, metrics, model_selection, preprocessing\n",
"from IPython.display import Image, display\n",
"\n",
"dataGroup2['idpst_label'], _ = pd.factorize(dataGroup2['ipdst'])\n",
"\n",
"y = dataGroup2['idpst_label']\n",
"X = dataGroup2[['prediction', 'count']]\n",
"\n",
"# split data randomly into 70% training and 30% test\n",
"X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=0)\n",
"\n",
"# train the decision tree\n",
"dtree = tree.DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)\n",
"dtree.fit(X_train, y_train)\n",
"\n",
"# use the model to make predictions with the test data\n",
"y_pred = dtree.predict(X_test)\n",
"\n",
"# how did our model perform?\n",
"count_misclassified = (y_test != y_pred).sum()\n",
"print('Misclassified samples: {}'.format(count_misclassified))\n",
"accuracy = metrics.accuracy_score(y_test, y_pred)\n",
"print('Accuracy: {:.2f}'.format(accuracy))\n",
"\n",
"\n",
"\n",
"# visualize the model's decision regions to see how it separates the samples\n",
"X_combined = np.vstack((X_train, X_test))\n",
"y_combined = np.hstack((y_train, y_test))\n",
"plot_decision(X=X_combined, y=y_combined, classifier=dtree)\n",
"plt.xlabel('prediction')\n",
"plt.ylabel('count')\n",
"plt.legend(loc='upper left')\n",
"plt.show()\n",
"\n",
"# Solo tenemos una IP."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAKUAAABRCAIAAAA8SvGCAAAABmJLR0QA/wD/AP+gvaeTAAAMJklE\nQVR4nO2ca0wUVxvH/0PWLZfdtUhl2UVutiSokFXTViCxUlNRA+KFgrZiQawFW1OjtknxSzVNYz80\njf1gmrSxlmqCSjRpGolUUILYtW1M1SpYEsEL4OqyLCyFggLTD+d9x+PsMjPLzqywO79PO8+e51zm\nmXOZmf8chmVZqAQNIc+6Aip+RY13cKHGO7jQ0AdDQ0M1NTWjo6PPqjYqsvPKK68kJiY+OWYpTp48\n+czqpaIMGzZsoEP8VP8eGRkBoK7YA4bCwkLeaK3O38GFGu/gQo13cKHGO7hQ4x1cqPH2H2fPnn39\n9dcNBoPBYFi6dGldXZ1yXuOhxttPVFZWZmdnp6WltbW1tbW1paamZmdnHz16VAkvIeib8ePHj/Ms\niuJegUClq6srIiIiIyNjbGyMWMbGxtLT0/V6vc1mk9eLpqCgoKCggLao/dsfHDp0aGBgoLS0lGEY\nYmEYprS0tL+///vvv5fXSxg13v6ATLqLFi2ijeTwl19+kddLmAnG++HDh9u2bZs1a5ZWq42NjX3v\nvfdsNhv3L/N/7t27t3r1ar1ebzQai4qKHA4HnYZO/O677/J8b926tW7dusjISHJI/rXZbGVlZaTc\nWbNmlZeXP3jwwL3c5ubmFStWGAwGnU6Xk5PT0tLCS0A4duwYsScmJtKluMNIQOB0kQrExcXRxvj4\neAA3b96U10sEenCXOH/bbLaEhASj0VhbW9vf39/Y2JiQkJCUlOR0Ork0JPONGzc2Nzf39vZu27YN\nQElJCZ2PewVo+7Jlyy5evDg4OFhTU0OS3b9/Py4uzmw219fXu1yuurq6mJiYhIQEejIjvpmZmU1N\nTf39/SRNZGRke3s7SUA6jclkevToEef13Xff5eTkiDZ8wmi1WgCPHz+mjY8fPwbw3HPPyetF4z5/\nTyTeZWVlAA4dOsRZTp06BWDPnj1P8gUANDQ0kMP29nYAZrP5qbIF433+/HmefevWrQCOHDnCWX74\n4QcAZWVlPN+amhpemuLiYs5isVgAVFZWcpa0tLSzZ8+KNnzCTO14m81mAF1dXZylu7sbQFpa2pN8\nAQAul4scDg8PA2AY5qmyBeM9MDDAs5tMJgCdnZ2cpaOjA0BsbCzPlx5pSBqTycRZyBUwf/58clhf\nXz9v3jzRVvtCdHQ0r1YsyzqdTgAxMTHyetHIsz5/+PAh6azc1PXCCy8AuHXrFi+lXq8nP8ilynrz\npjU8PJxnsdvtAEhZBPKb1Ifm+eef56UhvoS33nrLZDJduXLl3LlzAL7++usdO3YIV8bH+XvOnDkA\n7t27Rxvv3r0LICUlRV4vYSYSb6PRCKCnp4d3NQ0MDEysEhIh1zsZSwjkN7HT0AtDkmbmzJmcRavV\nbt++HcBXX33V1tZmtVqLioqEi5bSmQTc33jjDQC//fYbbfz9998BZGdny+slAl1jieP5Bx98AODU\nqVO0sbGxcdGiRbzGu58O2kJ68KNHjwYGBmbMmCGQkkDWDT/++CNnISNzeXk5z/enn37ipaHnb5Zl\nHQ5HeHg4wzA5OTkVFRWiTfaRzs7OiIiIzMxM2piZmanT6e7fvy+vF40883d3d3dycrLJZKquru7u\n7na5XD///HNSUhK3OmOlxTs9PR1AU1PTsWPHcnNzBVISyH0Btz6vr683mUwe1+crV668cOFCf38/\nSUOvzznILYNGo+no6BBtsu8cPnwYwI4dO+x2u91u//DDDxmGoa9d1lPDpXgJIE+8WZbt6enZtWtX\nUlLStGnTjEbjqlWrrFYrr9507T2OKH/88YfFYgkPD09PT//777/dU7pXhtx/m81mjUZjNpvJff9T\n7QEAtLe35+bm6vX6iIiIlStXNjc3uzehtbU1JCSEJ+9SlNra2iVLluh0Op1Ol5WV5X5H4LHJol4C\nyBbvSct4Y4M7o6OjJpOJvkwDD/X5+RNOnz4dHx9P5pTgIejizTDMpUuXnE7nvn379uzZ86yr428C\nKt70M3mBZBkZGcnJybm5uXl5eX6p1yRCI55k6sBKeJ4jJU0AE1D9W0WUwI+3lOedz5xr1659/PHH\nc+fODQ0NjY6Ofu211xT6tivw4z0lBnCLxWK1WquqqpxO57lz50ZHR998880vv/xS9oICP95ThcOH\nD1sslrCwsNTU1G+//RbAgQMHZC8loNZrUxfeIJSUlATA5XLJXpDavycjly9fBpCVlSV7zrLFu6+v\nb+fOnbNnzw4NDY2KisrMzPzoo4/IyztCXV1dXl5eZGRkaGjowoULOe0YgVtVdXV15efn6/X6qKio\n4uLivr6+27dv5+XlGQyGmJiYkpKS3t5ed6/xBGvjIay/E22LOz6+IKeLPnPmTGlp6cKFCw8ePCjF\nxTvoh6u+PD9fvXo1gAMHDvzzzz/Dw8M3b95cu3YtnRuANWvW2O32O3fuLFu2DMCZM2foHEh9ioqK\niOSNvHXNyclZu3YtLYLbunWru5eAYI11e6guqr8TbYtC7N+/n1R13bp1f/31l+8ZKvi+xGAwAKiu\nruYsnZ2dvHhzMSD9b/HixU9VBQAleSPutIUoPWj1EuclLFjjxVtUfyfaFuUYHh5ubW3du3dvWFhY\nSUnJ4OCgL7kpGO/NmzeT0xoXF7dly5bjx48PDw+Pl5hsJBEVFfVUVQBQkjduYwKexaMITliwxou3\nqP7Oq7YoBFmcv//++75komC8x8bGTp48mZ+fHxkZSU5WfHz8n3/+Sf51Op0VFRUpKSk6nW682UQu\ny9DQEACNRjNeGo3G811JeHi4lLZ4xGOGPETPIQ25amnZzwTwx/vv0dHRxsbG5cuXg9KAkgn7008/\ndTgc/ytY1nh3d3dzFtH+HRsbC0/6O4lt8Q89PT0AwsLCfMlEwfffDMOQEx0SErJ48WJy6XDr5IsX\nLwLYvXv3jBkzABB5soyQ/AnkiwIBRd+aNWsANDQ00MYLFy5w78KF26IEDMPwPhmpra0F8PLLL8tc\nEh18X/o3gOXLl1+/fn1oaMhms1VUVADIy8sj/5IuUlFR4XQ6HQ7Hrl273Ev3xSIsWON5iervhNui\nBAAWLFjQ0NDgcrkcDkdVVVVUVFRYWJiP8hsFx/Ompqbi4uLExMRp06ZNnz7dYrF8/vnn3DcDDx48\n2LRpU3R0tFarTU1NJQXRYXC/BKVYOKOAYM2jl7D+TrgtSmC1WsvKylJSUkJDQ7VabUJCQnFxsUfZ\nnVe4x5thqTNy4sSJ9evXs9JWH5ME8hxjatXZbxQWFgI4ceIEZ1GfpwYXaryDi6kdb4mCNRWOqf0+\nVJ22vWVq928Vb5m88Z4SujOCL+ozebdXE2XyxnsKjdUTVp/Jv72aKPTN+GT7fsy9hpMTAK2trdzh\n9evX4fbe1h3ft1cTRf1+TBFYlk1OTuYOJarPlNheTRQ13vIjUX2mxPZqoigSb1q0VV5eTowdHR28\nJZiwok0gWwELxLRpohX2m/pMke3VRKEHdxnn7/z8fACffPIJbfzss894MiMpijavLFL2hlMIb9Vn\nvm+3JYr/vvcnas7p06f39fURy+DgoNFovHHjxpOypSnavLJI2RtOObxSnwVUvFmWXbp0KYAvvviC\nHB48eFDgFbKAos0ri5S94fyAFPWZ79urieLXeBOFRkxMzNDQ0MjIyOzZs3/99VfuX4UUbaLaNI+M\nO9uNUzFRpKjPlixZAuDatWu08erVqwCysrK8Km48/Ho/lp2dvWDBApvNVllZWV1dHRsbm5GRwf1b\nWFi4f//+9evX37lzh1RFSp5k3UQGPQB9fX28BBPbG07KufOq7WSrsX///VcgjSLbq4lCN0n25y1k\nyf3SSy/Nnz//9OnT9F/kjHBaYyIqhVhvJltqcpfI+fPneWmk7A0nOwBaWlpoS1VVFdyWIzx8315N\nFH/vzzQyMvLiiy/C0/Q5MUXbO++8A2D79u29vb0tLS3cvohcAil7w8kOpKnP3Jvj4/ZqojyD/bi+\n+eYbAEePHuXZJ6BoY1nWbre//fbbM2fOjIiIWLVqFdlMlJdGWJumBBLVZ+7xZn3bXk2UANSvqQig\n6teCHTXewYUa7+BCjXdwocY7uFDjHVyo8Q4u1HgHFx7eJpGbdJUAwGq10u+owOvfr7766oYNG/xb\nJRUFycjIKCgooC2M+vQ0qFDn7+BCjXdwocY7uPgP9a1rUF1siogAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dot_data = tree.export_graphviz(dtree, out_file=None, filled=False, rounded=False,\n",
" feature_names=['count', 'count'], \n",
" class_names=['10.3.20.102'])\n",
" #class_names=['ip1', 'ip2', 'ip3', ...])\n",
"graph = pydotplus.graph_from_dot_data(dot_data) \n",
"display(Image(graph.create_png()))\n",
"\n",
"# Solo muestra uno debido a que solo tenemos una IP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"El ejemplo con más IPs sería:\n",
"![](prediction.png)\n",
"![](tree.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Gráficas"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/vnd.plotly.v1+html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import plotly.plotly as py\n",
"from plotly import __version__\n",
"from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot\n",
"from plotly.graph_objs import Scatter, Figure, Layout\n",
"init_notebook_mode(connected=True)\n",
"\n",
"import plotly.offline as offline\n",
"import plotly.graph_objs as go\n",
"from plotly.graph_objs import *"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.1 Sin ordenar Tiempos"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ipdst | \n",
" proto | \n",
" time | \n",
" count | \n",
" prediction | \n",
" idpst_label | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:08:55 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 7 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:09:30 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 8 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:50 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 9 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:55 | \n",
" 104 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 10 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:09:00 | \n",
" 204 | \n",
" -1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ipdst proto time count prediction idpst_label\n",
"0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 1 0\n",
"7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 1 0\n",
"8 10.3.20.102 TCP 2017-03-20 17:08:50 3 1 0\n",
"9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1 0\n",
"10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataGroup2"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"data": [
{
"mode": "lines+markers",
"name": "Normal Traffic",
"type": "scatter",
"x": [
"2017-03-20 17:08:55",
"2017-03-20 17:09:30",
"2017-03-20 17:08:50",
"2017-03-20 17:08:55"
],
"y": [
3,
1,
3,
104
]
},
{
"marker": {
"color": "rgb(255, 0, 0)",
"size": 7,
"symbol": "circle"
},
"mode": "markers",
"name": "Anomalies",
"opacity": 0.8,
"x": [
"2017-03-20 17:09:00"
],
"y": [
204
]
}
],
"layout": {
"legend": {
"bgcolor": "#E2E2E2",
"bordercolor": "#FFFFFF",
"borderwidth": 2,
"font": {
"color": "#000",
"family": "sans-serif",
"size": 12
},
"traceorder": "normal",
"x": 0,
"y": 1
},
"title": "Peticiones totales por tiempo",
"xaxis": {
"rangeslider": {},
"title": "Date",
"type": "date"
},
"yaxis": {
"title": "Nº packets"
}
}
},
"text/html": [
""
],
"text/vnd.plotly.v1+html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Normal Traffic\n",
"nor = dataGroup2[(dataGroup2['prediction'] == 1)]['count']\n",
"#Anomalies\n",
"ano = dataGroup2[(dataGroup2['prediction'] == -1)]['count']\n",
"\n",
"\n",
"normal = go.Scatter(\n",
" x = dataGroup2[(dataGroup2['prediction'] == 1)]['time'],\n",
" y = nor,\n",
" mode = \"lines+markers\",\n",
" name = \"Normal Traffic\"\n",
")\n",
"\n",
"\n",
"anomalies = dict(\n",
" x=dataGroup2[(dataGroup2['prediction'] == -1)]['time'],\n",
" y=ano,\n",
" name = \"Anomalies\",\n",
" mode = 'markers',\n",
" marker=Marker(\n",
" size=7,\n",
" symbol= \"circle\",\n",
" color='rgb(255, 0, 0)'\n",
" ),\n",
" opacity = 0.8)\n",
"\n",
"data = [normal, anomalies]\n",
"\n",
"layout = dict(\n",
" title='Peticiones totales por tiempo',\n",
" xaxis=dict(\n",
" title = 'Date',\n",
" rangeslider=dict(),\n",
" type='date'\n",
" ),\n",
" yaxis=dict(\n",
" title = 'Nº packets'\n",
" ),\n",
" legend=dict(\n",
" x=0,\n",
" y=1,\n",
" traceorder='normal',\n",
" font=dict(\n",
" family='sans-serif',\n",
" size=12,\n",
" color='#000'\n",
" ),\n",
" bgcolor='#E2E2E2',\n",
" bordercolor='#FFFFFF',\n",
" borderwidth=2\n",
" ) \n",
")\n",
"\n",
"fig = dict(data=data, layout=layout)\n",
"iplot(fig, filename = \"Peticiones totales por tiempo\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2 Ordenando Tiempos"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ipdst | \n",
" proto | \n",
" time | \n",
" count | \n",
" prediction | \n",
" idpst_label | \n",
"
\n",
" \n",
" \n",
" \n",
" 8 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:50 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:08:55 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 9 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:55 | \n",
" 104 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 10 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:09:00 | \n",
" 204 | \n",
" -1 | \n",
" 0 | \n",
"
\n",
" \n",
" 7 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:09:30 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ipdst proto time count prediction idpst_label\n",
"8 10.3.20.102 TCP 2017-03-20 17:08:50 3 1 0\n",
"0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 1 0\n",
"9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1 0\n",
"10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0\n",
"7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 1 0"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataGroup3 = dataGroup2.sort_values(by=['time'])\n",
"dataGroup3"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"data": [
{
"mode": "lines+markers",
"name": "Normal Traffic",
"type": "scatter",
"x": [
"2017-03-20 17:08:50",
"2017-03-20 17:08:55",
"2017-03-20 17:08:55",
"2017-03-20 17:09:30"
],
"y": [
3,
3,
104,
1
]
},
{
"marker": {
"color": "rgb(255, 0, 0)",
"size": 7,
"symbol": "circle"
},
"mode": "markers",
"name": "Anomalies",
"opacity": 0.8,
"x": [
"2017-03-20 17:09:00"
],
"y": [
204
]
}
],
"layout": {
"legend": {
"bgcolor": "#E2E2E2",
"bordercolor": "#FFFFFF",
"borderwidth": 2,
"font": {
"color": "#000",
"family": "sans-serif",
"size": 12
},
"traceorder": "normal",
"x": 0,
"y": 1
},
"title": "Peticiones totales por tiempo",
"xaxis": {
"rangeslider": {},
"title": "Date",
"type": "date"
},
"yaxis": {
"title": "Nº packets"
}
}
},
"text/html": [
""
],
"text/vnd.plotly.v1+html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Normal Traffic\n",
"nor = dataGroup3[(dataGroup3['prediction'] == 1)]['count']\n",
"#Anomalies\n",
"ano = dataGroup3[(dataGroup3['prediction'] == -1)]['count']\n",
"\n",
"\n",
"normal = go.Scatter(\n",
" x = dataGroup3[(dataGroup3['prediction'] == 1)]['time'],\n",
" y = nor,\n",
" mode = \"lines+markers\",\n",
" name = \"Normal Traffic\"\n",
")\n",
"\n",
"\n",
"anomalies = dict(\n",
" x=dataGroup3[(dataGroup3['prediction'] == -1)]['time'],\n",
" y=ano,\n",
" name = \"Anomalies\",\n",
" mode = 'markers',\n",
" marker=Marker(\n",
" size=7,\n",
" symbol= \"circle\",\n",
" color='rgb(255, 0, 0)'\n",
" ),\n",
" opacity = 0.8)\n",
"\n",
"data = [normal, anomalies]\n",
"\n",
"layout = dict(\n",
" title='Peticiones totales por tiempo',\n",
" xaxis=dict(\n",
" title = 'Date',\n",
" rangeslider=dict(),\n",
" type='date'\n",
" ),\n",
" yaxis=dict(\n",
" title = 'Nº packets'\n",
" ),\n",
" legend=dict(\n",
" x=0,\n",
" y=1,\n",
" traceorder='normal',\n",
" font=dict(\n",
" family='sans-serif',\n",
" size=12,\n",
" color='#000'\n",
" ),\n",
" bgcolor='#E2E2E2',\n",
" bordercolor='#FFFFFF',\n",
" borderwidth=2\n",
" ) \n",
")\n",
"\n",
"fig = dict(data=data, layout=layout)\n",
"iplot(fig, filename = \"Peticiones totales por tiempo\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" time | \n",
" count | \n",
" prediction | \n",
" idpst_label | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2017-03-20 17:08:50 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2017-03-20 17:08:55 | \n",
" 107 | \n",
" 2 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 2017-03-20 17:09:00 | \n",
" 204 | \n",
" -1 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" 2017-03-20 17:09:30 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" time count prediction idpst_label\n",
"0 2017-03-20 17:08:50 3 1 0\n",
"1 2017-03-20 17:08:55 107 2 0\n",
"2 2017-03-20 17:09:00 204 -1 0\n",
"3 2017-03-20 17:09:30 1 1 0"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataGroup4 = dataGroup2.groupby(['time']).sum().reset_index().dropna()\n",
"dataGroup4"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"data": [
{
"mode": "lines+markers",
"name": "Normal Traffic",
"type": "scatter",
"x": [
"2017-03-20 17:08:50",
"2017-03-20 17:09:30"
],
"y": [
3,
1
]
},
{
"marker": {
"color": "rgb(255, 0, 0)",
"size": 7,
"symbol": "circle"
},
"mode": "markers",
"name": "Anomalies",
"opacity": 0.8,
"x": [
"2017-03-20 17:09:00"
],
"y": [
204
]
}
],
"layout": {
"legend": {
"bgcolor": "#E2E2E2",
"bordercolor": "#FFFFFF",
"borderwidth": 2,
"font": {
"color": "#000",
"family": "sans-serif",
"size": 12
},
"traceorder": "normal",
"x": 0,
"y": 1
},
"title": "Peticiones totales por tiempo",
"xaxis": {
"rangeslider": {},
"title": "Date",
"type": "date"
},
"yaxis": {
"title": "Nº packets"
}
}
},
"text/html": [
""
],
"text/vnd.plotly.v1+html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Normal Traffic\n",
"nor = dataGroup4[(dataGroup4['prediction'] == 1)]['count']\n",
"#Anomalies\n",
"ano = dataGroup4[(dataGroup4['prediction'] == -1)]['count']\n",
"\n",
"\n",
"normal = go.Scatter(\n",
" x = dataGroup4[(dataGroup4['prediction'] == 1)]['time'],\n",
" y = nor,\n",
" mode = \"lines+markers\",\n",
" name = \"Normal Traffic\"\n",
")\n",
"\n",
"\n",
"anomalies = dict(\n",
" x=dataGroup4[(dataGroup4['prediction'] == -1)]['time'],\n",
" y=ano,\n",
" name = \"Anomalies\",\n",
" mode = 'markers',\n",
" marker=Marker(\n",
" size=7,\n",
" symbol= \"circle\",\n",
" color='rgb(255, 0, 0)'\n",
" ),\n",
" opacity = 0.8)\n",
"\n",
"data = [normal, anomalies]\n",
"\n",
"layout = dict(\n",
" title='Peticiones totales por tiempo',\n",
" xaxis=dict(\n",
" title = 'Date',\n",
" rangeslider=dict(),\n",
" type='date'\n",
" ),\n",
" yaxis=dict(\n",
" title = 'Nº packets'\n",
" ),\n",
" legend=dict(\n",
" x=0,\n",
" y=1,\n",
" traceorder='normal',\n",
" font=dict(\n",
" family='sans-serif',\n",
" size=12,\n",
" color='#000'\n",
" ),\n",
" bgcolor='#E2E2E2',\n",
" bordercolor='#FFFFFF',\n",
" borderwidth=2\n",
" ) \n",
")\n",
"\n",
"fig = dict(data=data, layout=layout)\n",
"iplot(fig, filename = \"Peticiones totales por tiempo\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Mapa anomalías"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"data": [
{
"lat": [
"37.459",
"29.4889",
"37.7758",
"43.3701"
],
"lon": [
"-122.1781",
"-98.3987",
"-122.4128",
"-8.3288"
],
"marker": {
"color": "rgb(255, 0, 0)",
"opacity": 0.7,
"size": 14
},
"mode": "markers",
"text": [
"157.240.21.35",
"23.253.135.79",
"104.244.42.193",
"213.60.47.49"
],
"type": "scattermapbox"
}
],
"layout": {
"autosize": true,
"hovermode": "closest",
"legend": {
"bgcolor": "#E2E2E2",
"bordercolor": "#FFFFFF",
"borderwidth": 2,
"font": {
"color": "#000",
"family": "sans-serif",
"size": 12
},
"traceorder": "normal",
"x": 0,
"y": 1
},
"mapbox": {
"accesstoken": "pk.eyJ1IjoiYWxleGZyYW5jb3ciLCJhIjoiY2pnbHlncDF5MHU4OTJ3cGhpNjE1eTV6ZCJ9.9RoVOSpRXa2JE9j_qnELdw",
"bearing": 0,
"center": {
"lat": "43.3701",
"lon": "-8.3288"
},
"pitch": 0,
"style": "light",
"zoom": 1
},
"showlegend": false
}
},
"text/html": [
""
],
"text/vnd.plotly.v1+html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import re\n",
"import json\n",
"from urllib.request import urlopen\n",
"import plotly.plotly as py\n",
"from plotly.graph_objs import *\n",
"import numpy\n",
"\n",
"mapbox_access_token = 'pk.eyJ1IjoiYWxleGZyYW5jb3ciLCJhIjoiY2pnbHlncDF5MHU4OTJ3cGhpNjE1eTV6ZCJ9.9RoVOSpRXa2JE9j_qnELdw'\n",
"\n",
"# Como el dataframe a analizar no tiene ninguna IP pública se hará uso de 4 IPs públicas elegidas manualmente.\n",
"#ips = dataGroup2[(dataGroup2['prediction'] == -1)]['ipdst'].values\n",
"ips = ['157.240.21.35','23.253.135.79','104.244.42.193', '213.60.47.49']\n",
"\n",
"\n",
"outputLat = []\n",
"outputLon = []\n",
"for ip in ips:\n",
" #url = 'http://ip-api.com/json/'+ip\n",
" url = 'http://freegeoip.net/json/'+ip\n",
" response = urlopen(url)\n",
" data = json.load(response)\n",
" #print(ip+\": \")\n",
"\n",
" try:\n",
" data['message']\n",
" print(\"IP Privada\")\n",
"\n",
" except (KeyError, TypeError) as e:\n",
" lat = str(data['latitude'])\n",
" latList = str(data['latitude']).split()\n",
" lon = str(data['longitude'])\n",
" lonList = str(data['longitude']).split()\n",
" #print(lat, lon)\n",
" outputLat.append(lat)\n",
" outputLon.append(lon)\n",
" \n",
"#debug lat and lon array \n",
"# print(outputLat)\n",
"# print(outputLon)\n",
" \n",
"data = Data([\n",
" Scattermapbox(\n",
" lat=outputLat,\n",
" lon=outputLon,\n",
" mode='markers',\n",
" marker=Marker(\n",
" size=14,\n",
" color='rgb(255, 0, 0)',\n",
" opacity=0.7\n",
" ),\n",
" text=ips,\n",
" ), \n",
"])\n",
"\n",
"#debug data\n",
"# print(data)\n",
"\n",
"layout = Layout(\n",
" autosize=True,\n",
" hovermode='closest',\n",
" showlegend=False,\n",
" mapbox=dict(\n",
" accesstoken=mapbox_access_token,\n",
" bearing=0,\n",
" center=dict(\n",
" lat=lat,\n",
" lon=lon\n",
" ),\n",
" pitch=0,\n",
" style='light',\n",
" zoom=1\n",
" ),\n",
" legend=dict(\n",
" x=0,\n",
" y=1,\n",
" traceorder='normal',\n",
" font=dict(\n",
" family='sans-serif',\n",
" size=12,\n",
" color='#000'\n",
" ),\n",
" bgcolor='#E2E2E2',\n",
" bordercolor='#FFFFFF',\n",
" borderwidth=2\n",
" ), \n",
")\n",
"\n",
"fig = dict(data=data, layout=layout)\n",
"iplot(fig, filename='Montreal Mapbox')\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## 7. Visualizing a Single Decision Tree\n",
"https://towardsdatascience.com/random-forest-in-python-24d0893d51c0"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ipdst | \n",
" proto | \n",
" time | \n",
" count | \n",
" prediction | \n",
" idpst_label | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:08:55 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 7 | \n",
" 10.3.20.102 | \n",
" HTTP | \n",
" 2017-03-20 17:09:30 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 8 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:50 | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 9 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:08:55 | \n",
" 104 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 10 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:09:00 | \n",
" 204 | \n",
" -1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ipdst proto time count prediction idpst_label\n",
"0 10.3.20.102 HTTP 2017-03-20 17:08:55 3 1 0\n",
"7 10.3.20.102 HTTP 2017-03-20 17:09:30 1 1 0\n",
"8 10.3.20.102 TCP 2017-03-20 17:08:50 3 1 0\n",
"9 10.3.20.102 TCP 2017-03-20 17:08:55 104 1 0\n",
"10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataGroup2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](tree.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
"dot_data = tree.export_graphviz(clf, out_file=None,\n",
" feature_names=feature_list,\n",
" class_names=feature_list,\n",
" filled=True, rounded=True,\n",
" special_characters=True)\n",
"\n",
"graph = pydotplus.graph_from_dot_data(dot_data)\n",
"graph.write_pdf(\"tree-vis.pdf\")\n",
"joblib.dump(clf, 'CART.pkl') \n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# 8. Detector de IP pública o privada\n",
"\n",
"URL: https://chrisalbon.com/python/data_wrangling/pandas_create_column_with_loop/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predicciones"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10 10.3.20.102 privada\n",
"Name: ipdst, dtype: object\n"
]
}
],
"source": [
"ips = dataGroup3[(dataGroup3['prediction'] == -1)]['ipdst']\n",
"\n",
"def is_public_ip(ip):\n",
" ip = list(map(int, ip.strip().split('.')[:2]))\n",
" if ip[0] == 10: return False\n",
" if ip[0] == 172 and ip[1] in range(16, 32): return False\n",
" if ip[0] == 192 and ip[1] == 168: return False\n",
" return True\n",
"\n",
"for ip in ips:\n",
" if is_public_ip(ip):\n",
" print(dataGroup3[(dataGroup3['prediction'] == -1)]['ipdst'] + ' publica')\n",
" else:\n",
" print(dataGroup3[(dataGroup3['prediction'] == -1)]['ipdst'] + ' privada')"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/alexfrancow/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:18: SettingWithCopyWarning:\n",
"\n",
"\n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ipdst | \n",
" proto | \n",
" time | \n",
" count | \n",
" prediction | \n",
" idpst_label | \n",
" tipo | \n",
"
\n",
" \n",
" \n",
" \n",
" 10 | \n",
" 10.3.20.102 | \n",
" TCP | \n",
" 2017-03-20 17:09:00 | \n",
" 204 | \n",
" -1 | \n",
" 0 | \n",
" privada | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ipdst proto time count prediction idpst_label \\\n",
"10 10.3.20.102 TCP 2017-03-20 17:09:00 204 -1 0 \n",
"\n",
" tipo \n",
"10 privada "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ips = dataGroup3[(dataGroup3['prediction'] == -1)]['ipdst']\n",
"dataGroup5 = dataGroup3[(dataGroup3['prediction'] == -1)]\n",
"\n",
"def is_public_ip(ip):\n",
" ip = list(map(int, ip.strip().split('.')[:2]))\n",
" if ip[0] == 10: return False\n",
" if ip[0] == 172 and ip[1] in range(16, 32): return False\n",
" if ip[0] == 192 and ip[1] == 168: return False\n",
" return True\n",
"\n",
"tipo = []\n",
"for ip in ips:\n",
" if is_public_ip(ip):\n",
" tipo.append('publica')\n",
" else:\n",
" tipo.append('privada')\n",
" \n",
"dataGroup5['tipo'] = tipo\n",
"dataGroup5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 9. Detector pais\n",
"\n",
"Todas las IPS son privadas por lo tanto no va a haber nada."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Series([], Name: ipdst, dtype: object)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataGroup5[(dataGroup5['tipo'] == 'publica')]['ipdst']"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"ips = dataGroup5[(dataGroup5['tipo'] == 'publica')]['ipdst']\n",
"\n",
"for ip in ips:\n",
" #url = 'http://ip-api.com/json/'+ip\n",
" url = 'http://freegeoip.net/json/'+ip\n",
" response = urlopen(url)\n",
" data = json.load(response)\n",
" print(ip+\": \")\n",
" data['country_name']\n",
" country = str(data['country_name'])\n",
" print (country)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ejemplo si hubiera una IP publica:\n",
"```\n",
"92.53.104.78: \n",
"Russia\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}