{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Anomaly Exploration (understanding 'Odd')\n", "In this notebook we're going to be using the zat Python module for processing, transformation and anomaly detection on Zeek network data. We're going to look at 'normal' http traffic and demonstrate the use of Isolation Forests for anomaly detection. We'll then explore those anomalies with clustering and PCA.\n", "\n", "**Software**\n", "- zat: https://github.com/SuperCowPowers/zat\n", "- Pandas: https://github.com/pandas-dev/pandas\n", "- Scikit-Learn: http://scikit-learn.org/stable/index.html\n", "\n", "**Techniques**\n", "- One Hot Encoding: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html\n", "- Isolation Forest: http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html\n", "- PCA: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html\n", "\n", "**Related Notebooks**\n", "- Zeek to Scikit-Learn: https://nbviewer.jupyter.org/github/SuperCowPowers/zat/blob/main/notebooks/Zeek_to_Scikit_Learn.ipynb\n", "\n", "**Note:** A previous version of this notebook used a large http log (1 million rows) but we wanted people to be able to run the notebook themselves, so we've changed it to run on the local example http.log." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zat: 0.4.4\n", "Pandas: 1.3.5\n", "Numpy: 1.22.1\n", "Scikit Learn Version: 1.0.2\n" ] } ], "source": [ "import zat\n", "from zat.log_to_dataframe import LogToDataFrame\n", "from zat.dataframe_to_matrix import DataFrameToMatrix\n", "print('zat: {:s}'.format(zat.__version__))\n", "import pandas as pd\n", "print('Pandas: {:s}'.format(pd.__version__))\n", "import numpy as np\n", "print('Numpy: {:s}'.format(np.__version__))\n", "import sklearn\n", "from sklearn.ensemble import IsolationForest\n", "from sklearn.decomposition import PCA\n", "from sklearn.cluster import KMeans\n", "print('Scikit Learn Version:', sklearn.__version__)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Read in 150 Rows...\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
uidid.orig_hid.orig_pid.resp_hid.resp_ptrans_depthmethodhosturireferrer...info_msgfilenametagsusernamepasswordproxiedorig_fuidsorig_mime_typesresp_fuidsresp_mime_types
ts
2013-09-15 23:44:27.668081920CyIaMO7IheOh38Zsi192.168.33.10103154.245.228.191801GETguyspy.com/NaN...NaNNaN(empty)NaNNaNNaNNaNNaNFnjq3r4R0VGmHVWiN5text/html
2013-09-15 23:44:27.731702016CoyZrY2g74UvMMgp4a192.168.33.10103254.245.228.191801GETwww.guyspy.com/NaN...NaNNaN(empty)NaNNaNNaNNaNNaNFCQ5aX37YzsjAKpcv8text/html
2013-09-15 23:44:28.092921856CoyZrY2g74UvMMgp4a192.168.33.10103254.245.228.191802GETwww.guyspy.com/wp-content/plugins/slider-pro/css/advanced-sl...http://www.guyspy.com/...NaNNaN(empty)NaNNaNNaNNaNNaNFD9Xu815Hwui3sniSftext/html
2013-09-15 23:44:28.150300928CiCKTz4e0fkYYazBS3192.168.33.10104054.245.228.191801GETwww.guyspy.com/wp-content/plugins/contact-form-7/includes/cs...http://www.guyspy.com/...NaNNaN(empty)NaNNaNNaNNaNNaNFMZXWm1yCdsCAU3K9dtext/plain
2013-09-15 23:44:28.150601984C1YBkC1uuO9bzndRvh192.168.33.10104154.245.228.191801GETwww.guyspy.com/wp-content/plugins/slider-pro/css/slider/adva...http://www.guyspy.com/...NaNNaN(empty)NaNNaNNaNNaNNaNFA4NM039Rf9Y8Sn2Rhtext/plain
\n", "

5 rows × 26 columns

\n", "
" ], "text/plain": [ " uid id.orig_h id.orig_p \\\n", "ts \n", "2013-09-15 23:44:27.668081920 CyIaMO7IheOh38Zsi 192.168.33.10 1031 \n", "2013-09-15 23:44:27.731702016 CoyZrY2g74UvMMgp4a 192.168.33.10 1032 \n", "2013-09-15 23:44:28.092921856 CoyZrY2g74UvMMgp4a 192.168.33.10 1032 \n", "2013-09-15 23:44:28.150300928 CiCKTz4e0fkYYazBS3 192.168.33.10 1040 \n", "2013-09-15 23:44:28.150601984 C1YBkC1uuO9bzndRvh 192.168.33.10 1041 \n", "\n", " id.resp_h id.resp_p trans_depth method \\\n", "ts \n", "2013-09-15 23:44:27.668081920 54.245.228.191 80 1 GET \n", "2013-09-15 23:44:27.731702016 54.245.228.191 80 1 GET \n", "2013-09-15 23:44:28.092921856 54.245.228.191 80 2 GET \n", "2013-09-15 23:44:28.150300928 54.245.228.191 80 1 GET \n", "2013-09-15 23:44:28.150601984 54.245.228.191 80 1 GET \n", "\n", " host \\\n", "ts \n", "2013-09-15 23:44:27.668081920 guyspy.com \n", "2013-09-15 23:44:27.731702016 www.guyspy.com \n", "2013-09-15 23:44:28.092921856 www.guyspy.com \n", "2013-09-15 23:44:28.150300928 www.guyspy.com \n", "2013-09-15 23:44:28.150601984 www.guyspy.com \n", "\n", " uri \\\n", "ts \n", "2013-09-15 23:44:27.668081920 / \n", "2013-09-15 23:44:27.731702016 / \n", "2013-09-15 23:44:28.092921856 /wp-content/plugins/slider-pro/css/advanced-sl... \n", "2013-09-15 23:44:28.150300928 /wp-content/plugins/contact-form-7/includes/cs... \n", "2013-09-15 23:44:28.150601984 /wp-content/plugins/slider-pro/css/slider/adva... \n", "\n", " referrer ... info_msg filename \\\n", "ts ... \n", "2013-09-15 23:44:27.668081920 NaN ... NaN NaN \n", "2013-09-15 23:44:27.731702016 NaN ... NaN NaN \n", "2013-09-15 23:44:28.092921856 http://www.guyspy.com/ ... NaN NaN \n", "2013-09-15 23:44:28.150300928 http://www.guyspy.com/ ... NaN NaN \n", "2013-09-15 23:44:28.150601984 http://www.guyspy.com/ ... NaN NaN \n", "\n", " tags username password proxied orig_fuids \\\n", "ts \n", "2013-09-15 23:44:27.668081920 (empty) NaN NaN NaN NaN \n", "2013-09-15 23:44:27.731702016 (empty) NaN NaN NaN NaN \n", "2013-09-15 23:44:28.092921856 (empty) NaN NaN NaN NaN \n", "2013-09-15 23:44:28.150300928 (empty) NaN NaN NaN NaN \n", "2013-09-15 23:44:28.150601984 (empty) NaN NaN NaN NaN \n", "\n", " orig_mime_types resp_fuids \\\n", "ts \n", "2013-09-15 23:44:27.668081920 NaN Fnjq3r4R0VGmHVWiN5 \n", "2013-09-15 23:44:27.731702016 NaN FCQ5aX37YzsjAKpcv8 \n", "2013-09-15 23:44:28.092921856 NaN FD9Xu815Hwui3sniSf \n", "2013-09-15 23:44:28.150300928 NaN FMZXWm1yCdsCAU3K9d \n", "2013-09-15 23:44:28.150601984 NaN FA4NM039Rf9Y8Sn2Rh \n", "\n", " resp_mime_types \n", "ts \n", "2013-09-15 23:44:27.668081920 text/html \n", "2013-09-15 23:44:27.731702016 text/html \n", "2013-09-15 23:44:28.092921856 text/html \n", "2013-09-15 23:44:28.150300928 text/plain \n", "2013-09-15 23:44:28.150601984 text/plain \n", "\n", "[5 rows x 26 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a Pandas dataframe from the Zeek HTTP log\n", "log_to_df = LogToDataFrame()\n", "zeek_df = log_to_df.create_dataframe('../data/http.log')\n", "print('Read in {:d} Rows...'.format(len(zeek_df)))\n", "zeek_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
\n", "## So... what just happened?\n", "**Yep it was quick... the two little lines of code above turned a Zeek log (any log) into a Pandas DataFrame. The zat package also supports streaming data from dynamic/active logs, handles log rotations and in general tries to make your life a bit easier when doing data analysis and machine learning on Zeek data.**\n", "\n", "**Now that we have the data in a dataframe there are a million wonderful things we could do for data munging, processing and analysis but that will have to wait for another time/notebook.**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# We're going to pick some features that might be interesting\n", "# some of the features are numerical and some are categorical\n", "features = ['id.resp_p', 'method', 'resp_mime_types', 'request_body_len']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Our HTTP features are a mix of numeric and categorical data\n", "When we look at the http records some of the data is numerical and some of it is categorical so we'll need a way of handling both data types in a generalized way. zat has a DataFrameToMatrix class that handles a lot of the details and mechanics of combining numerical and categorical data, we'll use below." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id.resp_pmethodresp_mime_typesrequest_body_len
ts
2013-09-15 23:44:27.66808192080GETtext/html0
2013-09-15 23:44:27.73170201680GETtext/html0
2013-09-15 23:44:28.09292185680GETtext/html0
2013-09-15 23:44:28.15030092880GETtext/plain0
2013-09-15 23:44:28.15060198480GETtext/plain0
\n", "
" ], "text/plain": [ " id.resp_p method resp_mime_types \\\n", "ts \n", "2013-09-15 23:44:27.668081920 80 GET text/html \n", "2013-09-15 23:44:27.731702016 80 GET text/html \n", "2013-09-15 23:44:28.092921856 80 GET text/html \n", "2013-09-15 23:44:28.150300928 80 GET text/plain \n", "2013-09-15 23:44:28.150601984 80 GET text/plain \n", "\n", " request_body_len \n", "ts \n", "2013-09-15 23:44:27.668081920 0 \n", "2013-09-15 23:44:27.731702016 0 \n", "2013-09-15 23:44:28.092921856 0 \n", "2013-09-15 23:44:28.150300928 0 \n", "2013-09-15 23:44:28.150601984 0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Show the dataframe with mixed feature types\n", "zeek_df[features].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Transformers\n", "**We'll now use a scikit-learn tranformer class to convert the Pandas DataFrame to a numpy ndarray (matrix). Yes it's awesome... I'm not sure it's Optimus Prime awesome.. but it's still pretty nice.**" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Normalizing column id.resp_p...\n", "Normalizing column request_body_len...\n", "(150, 12)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/briford/.pyenv/versions/3.9.9/envs/py39/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:2631: FutureWarning: The `inplace` parameter in pandas.Categorical.add_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.\n", " res = method(*args, **kwargs)\n" ] }, { "data": { "text/plain": [ "array([[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0.]], dtype=float32)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use the zat DataframeToMatrix class (handles categorical data)\n", "# You can see below it uses a heuristic to detect category data. When doing\n", "# this for real we should explicitly convert before sending to the transformer.\n", "to_matrix = DataFrameToMatrix()\n", "zeek_matrix = to_matrix.fit_transform(zeek_df[features], normalize=True)\n", "print(zeek_matrix.shape)\n", "zeek_matrix[:1]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "IsolationForest(contamination=0.25)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Train/fit and Predict anomalous instances using the Isolation Forest model\n", "odd_clf = IsolationForest(contamination=0.25) # Marking 25% odd\n", "odd_clf.fit(zeek_matrix)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(32, 4)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id.resp_pmethodresp_mime_typesrequest_body_len
ts
2013-09-15 23:44:47.46416102480GETapplication/x-dosexec0
2013-09-15 23:44:47.46416102480GETapplication/x-dosexec0
2013-09-15 23:44:49.22197785680GETapplication/x-dosexec0
2013-09-15 23:44:50.80535014480GETapplication/x-dosexec0
2013-09-15 23:44:51.40461900880GETapplication/x-dosexec0
\n", "
" ], "text/plain": [ " id.resp_p method resp_mime_types \\\n", "ts \n", "2013-09-15 23:44:47.464161024 80 GET application/x-dosexec \n", "2013-09-15 23:44:47.464161024 80 GET application/x-dosexec \n", "2013-09-15 23:44:49.221977856 80 GET application/x-dosexec \n", "2013-09-15 23:44:50.805350144 80 GET application/x-dosexec \n", "2013-09-15 23:44:51.404619008 80 GET application/x-dosexec \n", "\n", " request_body_len \n", "ts \n", "2013-09-15 23:44:47.464161024 0 \n", "2013-09-15 23:44:47.464161024 0 \n", "2013-09-15 23:44:49.221977856 0 \n", "2013-09-15 23:44:50.805350144 0 \n", "2013-09-15 23:44:51.404619008 0 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Now we create a new dataframe using the prediction from our classifier\n", "odd_df = zeek_df[features][odd_clf.predict(zeek_matrix) == -1]\n", "print(odd_df.shape)\n", "odd_df.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Normalizing column id.resp_p...\n", "Normalizing column request_body_len...\n" ] } ], "source": [ "# Now we're going to explore our odd dataframe with help from KMeans and PCA algorithms\n", "odd_matrix = to_matrix.fit_transform(odd_df)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id.resp_pmethodresp_mime_typesrequest_body_lenxycluster
ts
2013-09-15 23:44:47.46416102480GETapplication/x-dosexec01.112839-0.6157750
2013-09-15 23:44:47.46416102480GETapplication/x-dosexec01.112838-0.6157740
2013-09-15 23:44:49.22197785680GETapplication/x-dosexec01.112838-0.6157740
2013-09-15 23:44:50.80535014480GETapplication/x-dosexec01.112838-0.6157740
2013-09-15 23:44:51.40461900880GETapplication/x-dosexec01.112838-0.6157740
\n", "
" ], "text/plain": [ " id.resp_p method resp_mime_types \\\n", "ts \n", "2013-09-15 23:44:47.464161024 80 GET application/x-dosexec \n", "2013-09-15 23:44:47.464161024 80 GET application/x-dosexec \n", "2013-09-15 23:44:49.221977856 80 GET application/x-dosexec \n", "2013-09-15 23:44:50.805350144 80 GET application/x-dosexec \n", "2013-09-15 23:44:51.404619008 80 GET application/x-dosexec \n", "\n", " request_body_len x y cluster \n", "ts \n", "2013-09-15 23:44:47.464161024 0 1.112839 -0.615775 0 \n", "2013-09-15 23:44:47.464161024 0 1.112838 -0.615774 0 \n", "2013-09-15 23:44:49.221977856 0 1.112838 -0.615774 0 \n", "2013-09-15 23:44:50.805350144 0 1.112838 -0.615774 0 \n", "2013-09-15 23:44:51.404619008 0 1.112838 -0.615774 0 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Just some simple stuff for this example, KMeans and PCA\n", "kmeans = KMeans(n_clusters=4).fit_predict(odd_matrix) # Change this to 3/5 for fun\n", "pca = PCA(n_components=3).fit_transform(odd_matrix)\n", "\n", "# Now we can put our ML results back onto our dataframe!\n", "odd_df['x'] = pca[:, 0] # PCA X Column\n", "odd_df['y'] = pca[:, 1] # PCA Y Column\n", "odd_df['cluster'] = kmeans\n", "odd_df.head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Plotting defaults\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.rcParams['font.size'] = 14.0\n", "plt.rcParams['figure.figsize'] = 15.0, 6.0\n", "\n", "# Helper method for scatter/beeswarm plot\n", "def jitter(arr):\n", " stdev = .02*(max(arr)-min(arr))\n", " return arr + np.random.randn(len(arr)) * stdev" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Jitter so we can see instances that are projected coincident in 2D\n", "odd_df['jx'] = jitter(odd_df['x'])\n", "odd_df['jy'] = jitter(odd_df['y'])\n", "\n", "# Now use dataframe group by cluster\n", "cluster_groups = odd_df.groupby('cluster')\n", "\n", "# Plot the Machine Learning results\n", "colors = {0:'green', 1:'blue', 2:'red', 3:'orange', 4:'purple', 5:'brown'}\n", "fig, ax = plt.subplots()\n", "for key, group in cluster_groups:\n", " group.plot(ax=ax, kind='scatter', x='jx', y='jy', alpha=0.5, s=250,\n", " label='Cluster: {:d}'.format(key), color=colors[key])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Cluster 0: 8 observations\n", " id.resp_p method resp_mime_types request_body_len\n", "ts \n", "2013-09-15 23:44:47.464161024 80 GET application/x-dosexec 0\n", "2013-09-15 23:44:47.464161024 80 GET application/x-dosexec 0\n", "2013-09-15 23:44:49.221977856 80 GET application/x-dosexec 0\n", "2013-09-15 23:44:50.805350144 80 GET application/x-dosexec 0\n", "2013-09-15 23:44:51.404619008 80 GET application/x-dosexec 0\n", "\n", "Cluster 1: 10 observations\n", " id.resp_p method resp_mime_types request_body_len\n", "ts \n", "2013-09-15 23:48:10.495719936 80 POST text/plain 69823\n", "2013-09-15 23:48:11.495719936 80 POST text/plain 69993\n", "2013-09-15 23:48:12.495719936 80 POST text/plain 71993\n", "2013-09-15 23:48:13.495719936 80 POST text/plain 70993\n", "2013-09-15 23:48:14.495719936 80 POST text/plain 72993\n", "\n", "Cluster 2: 7 observations\n", " id.resp_p method resp_mime_types request_body_len\n", "ts \n", "2013-09-15 23:48:03.495719936 8080 GET text/plain 0\n", "2013-09-15 23:48:04.495719936 8080 GET text/plain 0\n", "2013-09-15 23:48:04.495719936 8080 GET text/plain 0\n", "2013-09-15 23:48:04.495719936 8080 GET text/plain 0\n", "2013-09-15 23:48:04.495719936 8080 GET text/plain 0\n", "\n", "Cluster 3: 7 observations\n", " id.resp_p method resp_mime_types request_body_len\n", "ts \n", "2013-09-15 23:48:06.495719936 80 OPTIONS text/plain 0\n", "2013-09-15 23:48:07.495719936 80 OPTIONS text/plain 0\n", "2013-09-15 23:48:08.495719936 80 OPTIONS text/plain 0\n", "2013-09-15 23:48:08.495719936 80 OPTIONS text/plain 0\n", "2013-09-15 23:48:08.495719936 80 OPTIONS text/plain 0\n" ] } ], "source": [ "# Now print out the details for each cluster\n", "pd.set_option('display.width', 1000)\n", "for key, group in cluster_groups:\n", " print('\\nCluster {:d}: {:d} observations'.format(key, len(group)))\n", " print(group[features].head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####
\n", "## Categorical variables that are anomalous\n", "- Cluster 0: application/x-dosexec mime_types\n", "- Cluster 1: http method of OPTIONS (instead of normal GET/POST)\n", "- Cluster 2: See Below\n", "- Cluster 3: response port of 8080 (instead of 80)\n", "\n", "## Numerical variable outliers\n", "- Cluster 2: The request_body_len values are outliers (for this demo dataset)\n", "\n", "**The important thing here is that both categorical and numerical variables were properly handled and the machine learning algorithm 'did the right thing' when marking outliers (for categorical and numerical fields)**" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "For this small demo dataset almost all request_body_len are 0\n", "Cluster 2 represents outliers\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3QAAAF9CAYAAABMACpaAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAopElEQVR4nO3de5hlZX0n+u9PQGVsRBO0MRguHi8hSg6RNhEiWm1EkxiNUXMcISomEY2KOsOMipIEr4NOvBDQHHA0iIKtiSdeUcHRkgQMI4wXUFCPghi5NAhBmzQi5J0/1qqw2RbQ1b2r9l7U5/M869l7v+vda717/aqr97fWrVprAQAAYHjuMu0BAAAAsHUEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugA4HZU1dFV1apq1xVY13xVzQ9t2QBMj0AHwJ1eVf1OVR097XEAwKQJdACsBr+T5C+mPQgAmDSBDoBFVdU9pj0GAOD2CXQAjJ4n9rCqel9VXZPkgn7e46vqC1W1qZ8+XVX7LrKMp1TVBVV1Q//4+1V1UlVdMtJnrl/P3Nh79+zbDx1rf3BVfaiqftgv98tV9fSxPttX1VFV9a2q2lxV11TVOVX11H7+SUle1D9vI9OeS9xMP1dVp1bVdVV1bVWdUFVrFtkOLxjZDlf0/X5ukX6HVdV3+jH/r6o6cGz+PavqX6vqrxZ5789X1Y1V9aYlfobx5VRVHV5V5/fj3VhV766qXcb6XdLX/VH9WG+oqu9W1bO3Zf0AbDuBDoBRH0xy7yRHJTm2qg5O8ukkNyQ5MsnRSR6Q5B+q6pcW3lRVj0/y4f7lq5L8fZL3JFm3tQOpqr2TnJNknyRvTnJEkh8m+duq+sORrn+R5LVJvpDkJf3zi5L8Wj//hCRn9M+fNTJdtcQhbUi3bV6V5O+SHJbkQ2NjPirJXye5Msl/7d/zR0k+V1V3G+n3x/24rkjyiiT/kOSjSX5xoU9r7UdJPpLkGVW1/dhYnpFkhyQnL/EzjPvrJG9Nt51fmuTEJE9P8vmquvtY373Sfe4z0tXi2iQnVdVDt3EMAGyD8f8gAFjdLmqtPS3590Muv5/kpNbaHy10qKp3J/lmkj9PcnDf/KYkG5P8Rmvtur7f55P8zyTf28qxHJvksiTrWmub+7Z3VNXpSY6pqlNaay3J7yY5rbX2vMUW0lr7YlV9K8lBrbX3b+VYkuQHSX6nX2eq6vIkf1ZVj2utfbaq7pPkz9J95ie01m7u+30lyd8keV6S46tqhyRvTPKVJOtbazf2/b6e5N3ptvmCk5M8M8njk5w20v6HSb7cWvv61n6YqjogyfOTPKe1dvJI+6fTBcxnpwt4Cx6c5DGttTP7fh/qx/rcJP9la8cBwLaxhw6AUX898vygdHukTq2qXRamJNul+8K/Pkmq6n5J9k3yvoUwlySttc8l2arA0R+i+Lh0e8DuMbb+TyfZLV3ASJLrkjy0qh68+NIm5viFMNdbOBTyd/vHxyW5a5JjF8Jc733p9tg9sX+9Lsl9k7xrIcz1Tk7yL2PrPCPJ5en2KCZJquoBSfbvl7st/p8km5J8emz7XtSPd/1Y/28thLkkaa1dlS7YP2AbxwHANrCHDoBR3xl5vhCQzlisY5J/6x/36B+/vUifbyV5+FaM44FJKt0hnkffRp/75pY9hR9J8s2q+kaSzyQ5tbV27las9/bc6vO11q6uqmuT7Nk3LWyHb471u7mqvr1Iv/Hl3VRVFy/y3vcneVFV7dRa+3G6vXM3J/nAtn2cPDjJmnThbTH3HXt96SJ9rk0X+gGYEoEOgFGbR54vHMVxaLrDDSeh3Ub7dmOvF9b9ttz6UMNRFyRJa+3Mqvq/kjwp3aGJz07ysqp6ZWvtzds43llwcrrz8Z6a5L1JDklyRmvtim1c7l3SnZP4H29j/rVjr29etFcXvAGYEoEOgNuysLfuqtbaZ2+n38I5cg9aZN74YZALIeFeY+17jL3+bv940x2sO0nSWrs2XfA5uap2TBcCX1NVb+kPf7ytILkUD0q3xzFJ0h+eeO8kl/RNC9vhIWP97tK/98tj/R6Ukb2f/YVP9kry1bHPdkFV/e8kz6qqC9Nt09dM4PN8J91htf/UWts0geUBMAXOoQPgtnwm3Tldr6qqu47P7C8Cktba5eku8PGsqtp5ZP5jk4xfAfF76fb0PHqs/YWjL1prG5N8Psnzqmq321p3//znx967Od15YHdPsmPffH3fd1sOD3xxVY3ujXpJ//jJ/vGMJDcmeUkf4hYckmRtkk/0r89Nd4XN541t12fnZ4PugvemO6ft5Ul+nO4qotvqg+m+B/z5+Iyq2m4btxUAK8QeOgAW1Vr7UVW9IMkpSb5cVR9Id77V7kl+K90FTw7tux+ZLtj8Y1X9TbpgcnjfZ83IMq+rqr9NcnhVtXR7iX43P3u+VpL8aZKzknytqt7V971vkl9P8svpzrNLkgur6swkX0pydZL/O8mfJPnEyJ6nhfPpjq+qTyW5KcnHW2vXL2GT7JbktKr6RL+O5yU5vbV2Rv/Zrq6q1yV5XZLTq+oj6S4Y8uJ0e93+R9/vp/3tDU5Id3uADenOr3tubtkzOe4DSf4yydPSXXV0823022L9oarvSPJfq+pX0gX4n6Tbrk9PF/RO2tb1ALC8BDoAblNr7YNVdVm6e68dkW6v12XpgtYJI/0+XVV/kOT16S7J/51091/7vSRzY4s9PN091F6QLkB8KN05YheMrfubVbUu3X3mnp1kl3R7tr6a7vYAC96e5MlJHptuj9z3kxyT7lYKC/6/vt8z+6nSHd64lED3zCSv7j9fSxfQjhgb8+ur6ur+M74l3R7Ok5Ic2Vr7yUi/E6tqu/5z//ck56fbVq9bbMWttav6IPrkbPvVLUeX++L+cM4XJHlDuqB7abqafG5S6wFg+dStr8AMAJNTVSclmWut7TnloQxev2fzkUn2aK392x31B2B1cA4dAMy4qrpv+r1zwhwAoxxyCcCq1F/AZcc76HbV2E3CV1RV7ZXkN9IdvvpvSd65SJ9d72AxN7bWrlmG4QEwAwQ6AFarY5M85w767JVbbkswDY9J8jfpzgs8tLX2z4v0ufwOlvGF/Ox5jADcSTiHDoBVqap+Ockv3EG3f2yt3bAS49laVfW4O+hybWvtvBUZDAArTqADAAAYqEEccrnLLru0Pffcc9rD+BnXX3997nGPe0x7GIxQk9mkLrNHTWaTusweNZlN6jJ71GT5nXfeeVe31u4z3j6IQLfnnnvm3HPPveOOK2x+fj5zc3PTHgYj1GQ2qcvsUZPZpC6zR01mk7rMHjVZflX1vcXa3bYAAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgtijQVdWjq+pjVfWDqmpVdejt9D2h7/NfxtrvVlXHVdXVVXV9v7z7b+P4AQAAVq0t3UO3JskFSV6aZPNtdaqqpyf5tSSXLTL77UmeluSZSQ5Mcs8kn6iq7ZYwXgAAAHrbb0mn1tppSU5Lkqo6abE+VbVHkmOTPC7Jp8bm7Zzkj5M8t7V2Rt/2rCTf6/t/ZuuGDwAAsHpN5By6qto+yQeSvL61duEiXfZLskOS0xcaWmvfT3JhkgMmMQYAAIDVplprS3tD1aYkL26tnTTS9oYk+7TWnty/viTJ8a21v+xfH5zk5CQ7tJEVVtXnkny7tfb8RdZzWJLDkmTt2rX7bdiwYWmfbAVsvOa6XHmbB6Aybp/ddl72dWzatClr1qxZ9vWwNOoye9RkNqnL7FGT2aQus0dNlt/69evPa62tG2/fokMub09VzSU5NMm+27qsUa21E5OcmCTr1q1rc3Nzk1z8RBx3ykfzlvO3eROuGpccMrfs65ifn88s/qysduoye9RkNqnL7FGT2aQus0dNpmcSh1zOJblfksur6qaquinJHkneVFX/3Pe5Isl2SXYZe+/afh4AAABLNIlA984kv5JuD93CdFmStyX5zb7PeUl+muSghTf1tyzYO8nZExgDAADAqrNFxwtW1ZokD+xf3iXJ7lW1b5JrWmuXJtk41v+nSa5orX0zSVpr11XVu5O8uao2Jvlhkrcm+VqSz07igwAAAKw2W7qHbl2SL/fTjkle0z9/7RLW9bIkf5/kg0nOSrIpyZNaazcvYRkAAAD0tvQ+dPNJaksX2lrbc5G2nyQ5vJ8AAADYRhO5Dx0AAAArT6ADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABioLQp0VfXoqvpYVf2gqlpVHToyb4eqelNVfa2qrq+qy6vq1KrafWwZd6uq46rq6r7fx6rq/hP+PAAAAKvGlu6hW5PkgiQvTbJ5bN5/SPLwJG/oH38vyS8m+XRVbT/S7+1JnpbkmUkOTHLPJJ+oqu22dvAAAACr2fZ33CVprZ2W5LQkqaqTxuZdl+Sg0baqen6SryfZO8n5VbVzkj9O8tzW2hl9n2cl+V6SxyX5zDZ9CgAAgFVouc6hu2f/eG3/uF+SHZKcvtChtfb9JBcmOWCZxgAAAHCnVq21pb2halOSF7fWTrqN+XdN8vkkP2ytPblvOzjJyUl2aCMrrKrPJfl2a+35iyznsCSHJcnatWv327Bhw5LGuRI2XnNdrhw/AJXbtM9uOy/7OjZt2pQ1a9Ys+3pYGnWZPWoym9Rl9qjJbFKX2aMmy2/9+vXntdbWjbdv0SGXW6o/Z+79Se6V5MnbsqzW2olJTkySdevWtbm5uW0d3sQdd8pH85bzJ7oJ79QuOWRu2dcxPz+fWfxZWe3UZfaoyWxSl9mjJrNJXWaPmkzPxA657MPcB5L8SpLfbK39cGT2FUm2S7LL2NvW9vMAAABYookEuqraIckH04W59a218ZB2XpKfZuTiKf0tC/ZOcvYkxgAAALDabNHxglW1JskD+5d3SbJ7Ve2b5JoklyX52ySPSPKkJK2qdu37Xtda29xau66q3p3kzVW1MckPk7w1ydeSfHZSHwYAAGA12dI9dOuSfLmfdkzymv75a5PcP929534h3Z64y0emZ4ws42VJ/j7dnryzkmxK8qTW2s3b+iEAAABWoy29D918krqdLrc3b2EZP0lyeD8BAACwjZbrPnQAAAAsM4EOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABioLQp0VfXoqvpYVf2gqlpVHTo2v6rq6Kq6rKo2V9V8VT10rM+9q+p9VXVdP72vqu41uY8CAACwumzpHro1SS5I8tIkmxeZ//IkRyQ5PMkjkmxMckZV7TTS59QkD0/yW/308CTv27phAwAAsP2WdGqtnZbktCSpqpNG51VVJXlZkmNaax/u256TLtQdnOSEqto7XYh7VGvti32f5yf5h6p6SGvtmxP5NAAAAKvIJM6h2yvJrklOX2horW1OcmaSA/qm/ZNsSnL2yPvOSnL9SB8AAACWYIv20N2BXfvHK8far0yy20ifq1prbWFma61V1caR999KVR2W5LAkWbt2bebn5ycw1Mlau2NyxD43TXsYg7ESNdy0adNM/qysduoye9RkNqnL7FGT2aQus0dNpmcSgW5ZtNZOTHJikqxbt67Nzc1Nd0CLOO6Uj+Yt58/sJpw5lxwyt+zrmJ+fzyz+rKx26jJ71GQ2qcvsUZPZpC6zR02mZxKHXF7RP64da187Mu+KJPfpz7dL8u/n3t13pA8AAABLMIlAd3G6UHbQQkNV3T3JgbnlnLkvprtS5v4j79s/yT1y6/PqAAAA2EJbdLxgVa1J8sD+5V2S7F5V+ya5prV2aVW9PcmrquqiJN9KclS6i6CcmiSttQur6tPprnh5WL+cE5J8whUuAQAAts6W7qFbl+TL/bRjktf0z1/bz39zkrcleUeSc5PcL8njW2s/HlnGwUm+muQz/fTVJM/axvEDAACsWlt6H7r5JHU781uSo/vptvpcm+QPlzQ6AAAAbtMkzqEDAABgCgQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgJhLoqmq7qnpdVV1cVTf0j6+vqu1H+lRVHV1Vl1XV5qqar6qHTmL9AAAAq9Gk9tC9IsmLkrwkyS8leWn/+siRPi9PckSSw5M8IsnGJGdU1U4TGgMAAMCqsv0dd9kiByT5eGvt4/3rS6rqY0l+Pen2ziV5WZJjWmsf7tueky7UHZzkhAmNAwAAYNWY1B66f0yyvqp+KUmq6peTPDbJaf38vZLsmuT0hTe01jYnOTNdGAQAAGCJqrW27Qvp9sC9Pt0hljen2/P3htbaUf38A5KclWSP1tqlI+97T5LdWmtPWGSZhyU5LEnWrl2734YNG7Z5nJO28ZrrcuXmaY9iOPbZbedlX8emTZuyZs2aZV8PS6Mus0dNZpO6zB41mU3qMnvUZPmtX7/+vNbauvH2SR1y+Ywkz053+OTXk+yb5Niquri19u6tWWBr7cQkJybJunXr2tzc3GRGOkHHnfLRvOX8SW3CO79LDplb9nXMz89nFn9WVjt1mT1qMpvUZfaoyWxSl9mjJtMzqTTy35P8ZWttYTfa+VW1R7o9du9OckXfvjbJpSPvWzsyDwAAgCWY1Dl0/yHdoZajbh5Z/sXpgttBCzOr6u5JDkxy9oTGAAAAsKpMag/dx5O8sqouTnfI5a8m+c9JTk6S1lqrqrcneVVVXZTkW0mOSrIpyakTGgMAAMCqMqlAd3iS1yV5Z5L7Jrk8ybuSvHakz5uT7JjkHUnuneScJI9vrf14QmMAAABYVSYS6PpQ9rJ+uq0+LcnR/QQAAMA2mtQ5dAAAAKwwgQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAmFuiq6n5V9d6quqqqbqiqb1TVY0bmV1UdXVWXVdXmqpqvqodOav0AAACrzUQCXVXdK8lZSSrJE5PsneTwJBtHur08yRF9+yP6eWdU1U6TGAMAAMBqs/2ElvPyJJe31p490nbxwpOqqiQvS3JMa+3Dfdtz0oW6g5OcMKFxAAAArBqTOuTyKUnOqaoPVtXGqvpKVb24D3JJsleSXZOcvvCG1trmJGcmOWBCYwAAAFhVqrW27QupuqF/+rYkH0qyb5LjkryytXZ8VR2Q7pDMPVprl4687z1JdmutPWGRZR6W5LAkWbt27X4bNmzY5nFO2sZrrsuVm6c9iuHYZ7edl30dmzZtypo1a5Z9PSyNusweNZlN6jJ71GQ2qcvsUZPlt379+vNaa+vG2yd1yOVdkpzbWjuyf/3lqnpQkhclOX5rFthaOzHJiUmybt26Njc3N4lxTtRxp3w0bzl/Upvwzu+SQ+aWfR3z8/OZxZ+V1U5dZo+azCZ1mT1qMpvUZfaoyfRM6pDLy5N8Y6ztwiS798+v6B/XjvVZOzIPAACAJZhUoDsryUPG2h6c5Hv984vTBbeDFmZW1d2THJjk7AmNAQAAYFWZVKB7W5JHVtWrq+qBVfUHSV6S5B1J0roT9d6e5BVV9dSqeliSk5JsSnLqhMYAAACwqkzkBLDW2peq6ilJ3pjkz5Jc2j++c6Tbm5PsmC7k3TvJOUke31r78STGAAAAsNpM7IoerbVPJvnk7cxvSY7uJwAAALbRpA65BAAAYIUJdAAAAAMl0AEAAAyUQAcAADBQAh0AAMBACXQAAAADJdABAAAMlEAHAAAwUAIdAADAQAl0AAAAAyXQAQAADJRABwAAMFACHQAAwEAJdAAAAAMl0AEAAAyUQAcAADBQAh0AAMBACXQAAAADJdABAAAMlEAHAAAwUAIdAADAQAl0AAAAAyXQAQAADJRABwAAMFACHQAAwEAJdAAAAAMl0AEAAAyUQAcAADBQAh0AAMBACXQAAAADJdABAAAMlEAHAAAwUAIdAADAQC1LoKuqI6uqVdXxI21VVUdX1WVVtbmq5qvqocuxfgAAgNVg4oGuqh6Z5LAkXxub9fIkRyQ5PMkjkmxMckZV7TTpMQAAAKwGEw10VbVzklOS/FGSa0faK8nLkhzTWvtwa+2CJM9JslOSgyc5BgAAgNVi0nvoTkzyd621z4+175Vk1ySnLzS01jYnOTPJARMeAwAAwKpQrbXJLKjqeUlekOSRrbWfVtV8kgtaay+uqgOSnJVkj9bapSPveU+S3VprT1hkeYelO3Qza9eu3W/Dhg0TGeckbbzmuly5edqjGI59dtt52dexadOmrFmzZtnXw9Koy+xRk9mkLrNHTWaTusweNVl+69evP6+1tm68fftJLLyqHpLkjUke1Vr76SSW2Vo7Md0ev6xbt67Nzc1NYrETddwpH81bzp/IJlwVLjlkbtnXMT8/n1n8WVnt1GX2qMlsUpfZoyazSV1mj5pMz6QOudw/yS5Jvl5VN1XVTUkek+SF/fMf9v3Wjr1vbZIrJjQGAACAVWVSge4jSfZJsu/IdG6SDf3zb6ULbgctvKGq7p7kwCRnT2gMAAAAq8pEjhdsrf1Lkn8Zbauq65Nc01/RMlX19iSvqqqL0gW8o5JsSnLqJMYAAACw2qzkCWBvTrJjknckuXeSc5I8vrX24xUcAwAAwJ3GsgW61trc2OuW5Oh+AgAAYBtN+j50AAAArBCBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYqIkEuqo6sqq+VFU/qqqrqurjVfWwsT5VVUdX1WVVtbmq5qvqoZNYPwAAwGo0qT10c0nemeSAJI9NclOSz1bVz430eXmSI5IcnuQRSTYmOaOqdprQGAAAAFaV7SexkNbaE0ZfV9WzklyX5DeSfLyqKsnLkhzTWvtw3+c56ULdwUlOmMQ4AAAAVpPlOodup37Z1/av90qya5LTFzq01jYnOTPdXj0AAACWqFprk19o1YeSPCjJutbazVV1QJKzkuzRWrt0pN97kuw2voevn3dYksOSZO3atftt2LBh4uPcVhuvuS5Xbp72KIZjn912XvZ1bNq0KWvWrFn29bA06jJ71GQ2qcvsUZPZpC6zR02W3/r1689rra0bb5/IIZejquqtSR6V5FGttZu3djmttROTnJgk69ata3Nzc5MZ4AQdd8pH85bzJ74J77QuOWRu2dcxPz+fWfxZWe3UZfaoyWxSl9mjJrNJXWaPmkzPRA+5rKq3JXlmkse21r47MuuK/nHt2FvWjswDAABgCSYW6Krq2NwS5i4am31xuuB20Ej/uyc5MMnZkxoDAADAajKR4wWr6h1JnpXkKUmurapd+1mbWmubWmutqt6e5FVVdVGSbyU5KsmmJKdOYgwAAACrzaROAHth//g/x9pfk+To/vmbk+yY5B1J7p3knCSPb639eEJjAAAAWFUmdR+62oI+LV24O3oS6wQAAFjtlus+dAAAACwzgQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGavtpDwAAAFg+e77yk8u+jiP2uSmHrsB6VsIlxzxx2kNYEnvoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZKoAMAABgogQ4AAGCgBDoAAICBEugAAAAGSqADAAAYKIEOAABgoAQ6AACAgRLoAAAABkqgAwAAGCiBDgAAYKAEOgAAgIES6AAAAAZqxQNdVb2wqi6uqhuq6ryqOnClxwAAAHBnsKKBrqqekeTYJG9M8qtJzk7yqarafSXHAQAAcGew0nvo/nOSk1pr72qtXdhaOzzJ5Un+dIXHAQAAMHgrFuiq6q5J9kty+tis05McsFLjAAAAuLPYfgXXtUuS7ZJcOdZ+ZZLHjXeuqsOSHNa/3FRV31ze4W2VXZJcPe1BDEW9aUVWoyazSV1mj5rMJnWZPWoym9RlxrzkTlSTFfrOujX2WKxxJQPdkrTWTkxy4rTHcXuq6tzW2rppj4NbqMlsUpfZoyazSV1mj5rMJnWZPWoyPSt5Dt3VSW5OsnasfW2SK1ZwHAAAAHcKKxboWms3JjkvyUFjsw5Kd7VLAAAAlmClD7l8a5L3VdX/SnJWkhck+YUk/+8Kj2NSZvqQ0FVKTWaTusweNZlN6jJ71GQ2qcvsUZMpqdbayq6w6oVJXp7kfkkuSPKfWmtnruggAAAA7gRWPNABAAAwGSt9Y3EAAAAmRKDbClX1wqq6uKpuqKrzqurAaY9pqKrq0VX1sar6QVW1qjp0bH5V1dFVdVlVba6q+ap66Fife1fV+6rqun56X1Xda6zPPlX1hX4ZP6iqP6+qGuvztKr6RlX9pH/8/eX63LOsqo6sqi9V1Y+q6qqq+nhVPWysj7qsoKp6UVV9ra/Jj6rqi1X1xJH56jFl/b+bVlXHj7Spywrrt3cbm64Yma8mU1JV96uq91b3/8oN/TZ5zMh8tVlBVXXJIv9WWlV9cqTP7X7fraq7VdVxVXV1VV1f3fe5+4/12b267xHX9/3+qqruOtbnMf3yb6iq71bVC5b3098JtdZMS5iSPCPJT5M8L8neSY5LsinJ7tMe2xCnJL+T5I1Jnp7kX5McOjb/FUl+nORpSR6W5ENJLkuy00ifTyX5epL9++nrST4+Mv+e6W6N8aF+GU/vl3nESJ/9k9yU5NV9XV/dv/71aW+jKdTkM0me22+rfZL8fb/9fk5dplaT30vy20kemOTBSd7Q/x76FfWY/pTkkUkuTvLVJMePtKvLytfi6CQXJdl1ZLqPmky9LvdK8t0kJyf5tSR7JfnNJHurzdRqcp+xfye/muTfkjynn3+H33eT/HVfo4OSPDzJfJKvJNmun79dkvP79of3/S5LctzIMvZKcn2//L379f00ydOmvY2GNE19AEObkpyT5F1jbd9O8t+mPbahT/0vikNHXleSy5O8eqRtx/6X8/P713snaUl+Y6TPo/q2h/Sv/zTJj5LsONLnqCQ/yC3nkX4wyRlj4/lskg9Me7tMe0qyJt09JJ+kLrMzJbkmyfPVY+p12DnJd5Ks77+0HN+3q8t06nF0kgtuY56aTK8ub0xy1u3MV5vp1+jVSf5lYdvlDr7v9r/7bkxyyMj8X0wXCp/Qv/7t/vUvjvT5wyQ3JLln//pNSb49tp7/keSL094mQ5occrkE/S7i/ZKcPjbr9CQHrPyI7vT2SvdXo3/f3q21zUnOzC3be/90QXD0XoZnpftrz2iff+jfu+Az6W6ZsedIn/G6fibqmiQ7pTs8+9r+tbpMUVVtV1X/MV3QPjvqMW0nJvm71trnx9rVZXoe0B+2d3FVbaiqB/TtajI9T0lyTlV9sKo2VtVXqurFI4dCqs0U9XX44yTvb61t3sLvu/sl2SG3rtn3k1yYW9fjwr59wWeS3K1//0Kfxeqxrqp22JbPtZoIdEuzS7rdx1eOtV+Z7hcRk7WwTW9ve++a5KrW/0knSfrnG8f6LLaMbEEfdU2OTXcIxRf71+oyBf15IZuS/CTdvTt/v7V2ftRjaqrqeekOgz1qkdnqMh3nJDk0yW+lO3Rr1yRnV9XPR02m6QFJXpjusMsnpPt/5ZgkL+rnq810HZQuVL+rf70l33d3TXf0ztV30Gd8GVf377ujemzfj4MtsNI3FgcGpKremu6Qlke11m6e9nhWuW8m2TfdYS5PT/Leqpqb4nhWtap6SLrDyB7VWvvptMdDp7X2qdHXVfVP6ULEc5L801QGRdLtQDi3tXZk//rLVfWgdIHu+Nt+GyvkeUm+1Fr76rQHwtaxh25pFv6qsHasfW26k3CZrIVtenvb+4ok9xm9glX//L5jfRZbRragz6qta1W9Lckzkzy2tfbdkVnqMgWttRtba/9/a+28/kvRV5L8p6jHtOyf7q/HX6+qm6rqpiSPSfLC/vkP+37qMkWttU3pLpzxoPi3Mk2XJ/nGWNuFSXbvn6vNlFTVfdNdeOtdI81b8n33inR78cb3oo33GV/Gwt6/O6rHTfnZvX/cBoFuCVprNyY5L92u6VEH5dbHdDMZF6f7h/7v27uq7p7kwNyyvb+Y7lyi/Ufet3+Se4z1ObB/74KFKy1dMtJHXXtVdWxuCXMXjc1Wl9lwl3TnIajHdHwk3VVg9x2Zzk2yoX/+rajL1PXb7ZfSBQr/VqbnrCQPGWt7cJLv9c/VZnoOTXco/wcWGrbw++556a5GOVqz+6e7eM1oPfYeu5XBQf36zhvps9h6znX0wxJM+6osQ5vSXcb1xiR/ku6H9th0J+nuMe2xDXFK98t533761yR/3j/fvZ//iiTXJXlquksQb8jilzE+P7dcxvj83Poyxjun+49iQ7+Mp6a7CtboZYwPSPfXoFem+8//yHS/qFbVZYz7bfGOfvs8Nre+pPGakT7qsrI1OSbdF5s904WI/5buymG/rR6zM2XkKpfqMrUa/GW6PaV7Jfn1JJ/ot9ceajLVujyi//yvTnfe6R/0dXjRSB+1Wfm6VLo/Pr1rkXl3+H033W0L/jnJ49Ld9uDzWfy2BZ/r5z8u3RVHF7ttwdv79fxJv163LVhKLac9gCFO6U7svSS3/IXh0dMe01CnJHPpLjk8Pp3Uz690l6G+PN1lbr+Q5GFjy7h3kvf3v7R/1D+/11iffdJdLeuGfll/kf4SxiN9np7u/kU3pjsU5KnT3j5Tqsli9WhJjh7poy4rW5OT0v0l+yfpLgDw2fSXhVaP2Znys4FOXVa+Bgsh4MZ0Xxw/nOSX1WT6U5InprtX4w3pQsRLRreZ2kylJuvT/f/+a7cx/3a/76Y7SuS4dIeY/2uSj2fkFgV9n93T/WHlX/t+f5XkbmN9HpPkf/fruTjJC6a9bYY2LdyTAwAAgIFxDh0AAMBACXQAAAADJdABAAAMlEAHAAAwUAIdAADAQAl0AAAAAyXQAQAADJRABwAAMFACHQAAwED9H+7XatI7Q85VAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Distribution of the request body length\n", "zeek_df[['request_body_len']].hist()\n", "print('\\nFor this small demo dataset almost all request_body_len are 0\\nCluster 2 represents outliers')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## The anomalies identified by the model might be fine/expected\n", "Looking at the anomalous clusters for this small demo http log reveals four clusters that may be perfectly fine. So\n", "here we're not equating anomalous with 'bad'. The use of an anomaly detection algorithm can bring latent issues to the attention of threat hunters and system administrations. The results might be expected or a misconfigured appliance or something more nefarious that needs attention from security.\n", "\n", "\n", "If you liked this notebook please visit the [zat](https://github.com/SuperCowPowers/zat) project for more notebooks and examples.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" } }, "nbformat": 4, "nbformat_minor": 2 }