{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Investigate Web Application Firewall (WAF) Data
\n", "\n", "**Author:** Vani Asawa
\n", "**Date:** December 2020
\n", "**Notebook Version:** 1.0
\n", "**Python Version:** Python 3.6
\n", "**Required Packages:** msticpy, pandas, kqlmagic
\n", "**Data Sources Required:** WAF data (AzureDiagnostics)
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is the purpose of this Notebook?\n", "\n", "Web Application Firewall (WAF) data records the monitored and blocked HTTP traffic to and from a web service. \n", "Due to the large magnitudes of HTTP requests made to such services in any workspace, the data tends to be incredibly noisy, and hence may prevent an analyst from determining if there are any bad requests made to the servers, which could result in a potentially malicious attack.\n", "\n", "\n", "This notebook analyses the blocked WAF Alerts and aim to surface any unusual HTTP requests made by the client IPs to the servers, using a variety of statistical techniques applied on several features of the WAF data, such as the Rule ID of the triggering event, the HTTP status code returned to the client from the alerts, and the contents of the request URIs themselves\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "**[Distribution of WAF logs and blocked alerts over an extended time frame](#DistributionofWAF)**\n", "1. Set an extended time frame to visualise the distribution of the logs/alerts on a bar graph\n", "\n", "**[What is the distribution of WAF blocked alerts over Rule IDs, http-status codes, and client IP entities?](#DistOver_rID_http_ip)**\n", "1. Set a time frame (recommended: time period of interest, after analysing the distribution of alerts in the extended time frame)\n", "2. Pick a host entity to explore in further detail\n", "3. Set x and y axes from the variables above, and view the number of alerts over the designate time frame.\n", "\n", "**[Cluster the request URIs in WAF blocked alerts, based on TFIDF scores](#ClusterURIs)**\n", "\n", "*Term frequency-inverse document frequency (TFIDF)* score is a numerical statistic of how important a variable is to a document. The value of the statistic is directly proportional to the variable's frequency in the document, and inversely proportional to the number of documents that contain the variable. More information about TFIDF can be found [here](https://www.researchgate.net/publication/326425709_Text_Mining_Use_of_TF-IDF_to_Examine_the_Relevance_of_Words_to_Documents)\n", "\n", "In our analysis, the *variable* will be the 'split URIs' and 'rule IDs', while a single *document* is all the blocked alerts for a single client IP in the selected time frame. We will be assessing the relative importance of every single token of the split request URIs and the number of times a ruleID is triggered for our blocked alerts over multiple such 'documents'. We will be using these two sets of scores to cluster the request URIs, and obtain single/grouped sets of interesting (and potentially malicious) request URIs that were blocked by the WAF.\n", "\n", "1. Compute TFIDF scores based on the following 2 approaches:\n", " - Request URIs split on \"/\" against the client IP entities\n", " - Number of blocked alerts for every Rule ID against the client IP entities\n", "2. Visualising the TFIDF scores for both approaches\n", "3. Performing DBScan Clustering + PCA to obtain the clustered and outlier request URIs for both approaches\n", "4. KQL query to further examine the WAF logs and blocked alerts in the time frames with outlier request URIs**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the Notebook\n", "\n", "**Prerequisites**\n", "\n", "- msticpy - install the latest using pip install --upgrade msticpy\n", "- pandas- install using pip install pandas\n", "- kqlmagic\n", "\n", "**Running the Notebook**\n", "\n", "The best way of using the notebook is as follows:\n", "\n", "1. Individually run all of the cells up to the start of Section 1:\n", " - Initialization and installation of libraries\n", " - Authenticating to the workspace\n", " - Setting notebook parameters\n", " \n", "2. Default paramenters will allow the entire notebook to run from Section I using the 'Run Selected Cell and All Below' option under the Run tab. However, for added value, run the cells sequentially in any given section. \n", " - At the beginning of each section, set the time parameters. It is recommended that the first and third section have a larger timeframe than the second and fourth sections.\n", " - Wait for the cell to finish running, before proceeding\n", " - Select the options from the widget boxes when displayed and proceed." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\r\n", "import os\r\n", "import sys\r\n", "from pathlib import Path\r\n", "from IPython.display import display, HTML\r\n", "\r\n", "REQ_PYTHON_VER=(3, 6)\r\n", "REQ_MSTICPY_VER=(1, 0, 0)\r\n", "REQ_MP_EXTRAS = [\"ml\", \"kql\"]\r\n", "\r\n", "update_nbcheck = (\r\n", " \"

\"\r\n", " \"Warning: we needed to update 'utils/nb_check.py'
\"\r\n", " \"Please restart the kernel and re-run this cell.\"\r\n", " \"

\"\r\n", ")\r\n", "\r\n", "display(HTML(\"

Starting Notebook setup...

\"))\r\n", "if Path(\"./utils/nb_check.py\").is_file():\r\n", " try:\r\n", " from utils.nb_check import check_versions\r\n", " except ImportError as err:\r\n", " %xmode Minimal\r\n", " !curl https://raw.githubusercontent.com/Azure/Azure-Sentinel-Notebooks/master/utils/nb_check.py > ./utils/nb_check.py 2>/dev/null\r\n", " display(HTML(update_nbcheck))\r\n", " if \"check_versions\" not in globals():\r\n", " raise ImportError(\"Old version of nb_check.py detected - see instructions below.\")\r\n", " %xmode Verbose\r\n", " check_versions(REQ_PYTHON_VER, REQ_MSTICPY_VER, REQ_MP_EXTRAS)\r\n", "\r\n", " \r\n", "# If not using Azure Notebooks, install msticpy with\r\n", "# !pip install msticpy\r\n", "from msticpy.nbtools import nbinit\r\n", "nbinit.init_notebook(\r\n", " namespace=globals(),\r\n", " additional_packages=[\"adjustText\", \"plotly\"]\r\n", ");\r\n", "\r\n", "from ipywidgets import widgets\r\n", "import plotly.graph_objects as go\r\n", "import plotly.express as px\r\n", "import re\r\n", "from sklearn.feature_extraction.text import TfidfVectorizer\r\n", "\r\n", "%matplotlib inline\r\n", "\r\n", "from sklearn.cluster import KMeans\r\n", "from sklearn import metrics\r\n", "from sklearn.cluster import DBSCAN\r\n", "from sklearn.decomposition import PCA\r\n", "from adjustText import adjust_text\r\n", "import itertools\r\n", "import ipaddress\r\n", "import traceback\r\n", "\r\n", "pd.set_option('display.max_rows', 100)\r\n", "pd.set_option('display.max_columns', 50)\r\n", "pd.set_option('display.max_colwidth', 40)\r\n", "pd.set_option('display.max_colwidth', None)\r\n", "\r\n", "layout = widgets.Layout(width=\"50%\", height=\"80px\")\r\n", "style = {\"description_width\": \"200px\"}\r\n", "\r\n", "class color:\r\n", " BOLD = '\\033[1m'\r\n", " END = '\\033[0m'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "63fccc98a51a416099e1a34899300648", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='', description='Enter Tenant ID: ', layout=Layout(height='30px', width='50%'), placeholder='Enter …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "12b9e02ad30140f1b7694f5e337ce42d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='', description='Enter Workspace ID: ', layout=Layout(height='30px', width='50%'), placeholder='Ent…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# See if we have an Azure Sentinel Workspace defined in our config file.\n", "# If not, let the user specify Workspace and Tenant IDs\n", "\n", "ws_config = WorkspaceConfig()\n", "if not ws_config.config_loaded:\n", " ws_config.prompt_for_ws()\n", " \n", "qry_prov = QueryProvider(data_environment=\"AzureSentinel\")\n", "print(\"done\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Authenticate to Azure Sentinel workspace\n", "qry_prov.connect(ws_config)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Querying Function** : Accessing the results of the Kusto query as a pandas dataframe, and removing empty/null columns from the dataframe" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def showQuery(query):\n", " df = qry_prov.exec_query(query)\n", " trimDF(df)\n", " return df\n", "\n", "def trimDF(df):\n", " # Store names of columns with null values for all entries\n", " empty_null_cols = [col for col in df.columns if df[col].isnull().all()]\n", " \n", " # Store names of columns with empty string '' values for all entries\n", " empty_str_cols = []\n", " for col in df.columns:\n", " try:\n", " if ''.join(df[col].map(str)) == '':\n", " empty_str_cols = empty_str_cols + [col]\n", " except:\n", " continue\n", " \n", " df.drop(empty_null_cols + empty_str_cols, axis=1, inplace=True)\n", "\n", "binIntervals = ['1m', '5m', '10m', '15m', '30m', '1h', '12h', '1d', '5d', '10d']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Selecting a Host**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def queryHost(startTime, endTime):\n", " query = '''\n", " AzureDiagnostics\n", " | where TimeGenerated between (datetime({startTime}).. datetime({endTime}))\n", " | where Category == \"ApplicationGatewayFirewallLog\"\n", " | where action_s == 'Blocked' or isempty(action_s)\n", " | summarize AlertCountPerHost = count() by hostname_s, bin(timeStamp_t, {binInterval})\n", " | render timechart\n", " '''.format(startTime = startTime, endTime = endTime, binInterval = '1h')\n", " return(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Auto determine masking bits for clubbing IPs**" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def maskBitsVal(uniqueIPLen):\n", " if uniqueIPLen > 150:\n", " return '/8'\n", " elif uniqueIPLen > 40:\n", " return '/16'\n", " elif uniqueIPLen > 15:\n", " return '/24'\n", " return '/32'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section I: Distribution of WAF logs and blocked alerts over an extended time frame \n", "\n", "Select an extended time frame to view the distribution of WAF logs and blocked alerts over all hosts." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "426477f005ae4eed88d9c63bf04b86d4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HTML(value='

Set query time boundaries

')" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f5a5edba3cb44e4e87cc9b0020934f8c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(DatePicker(value=datetime.date(2020, 12, 4), description='Origin Date'), Text(value='10:31:59.0…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "47bbfe58b9224650a8e1b2ed613e0f51", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(IntRangeSlider(value=(-15, 1), description='Time Range (day):', layout=Layout(width='80%'), max…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query_times_1 = nbwidgets.QueryTime(units='day', max_before=30, before=-15, max_after=-1)\n", "query_times_1.display()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "322d48b8b2eb4eda81da0bf6ef795a9c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Select(description='Choose logs/alerts: ', layout=Layout(height='80px', width='50%'), op…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "categories = ['ApplicationGatewayAccessLog', 'ApplicationGatewayFirewallLog']\r\n", "\r\n", "def viewLogs(category):\r\n", " log_alert_query = '''\r\n", " AzureDiagnostics\r\n", " | where TimeGenerated between (datetime({startTime}).. datetime({endTime}))\r\n", " | where Category == \"{category}\"\r\n", " | where action_s == 'Blocked' or isempty(action_s)\r\n", " | summarize NoOfAlerts= count() by bin(timeStamp_t, {binInterval})\r\n", " | render timechart '''.format(startTime = query_times_1.start, endTime = query_times_1.end, category = category, binInterval = '1h')\r\n", "\r\n", " %kql -query log_alert_query\r\n", " \r\n", " rawDataQuery = \"\"\"\r\n", " AzureDiagnostics\r\n", " | where TimeGenerated between (datetime({startTime}).. datetime({endTime}))\r\n", " | where Category == '{category}'\r\n", " | where action_s == 'Blocked' or isempty(action_s)\r\n", " | take 15\r\n", " \"\"\".format(startTime = query_times_1.start, endTime = query_times_1.end, category = category)\r\n", "\r\n", " display(showQuery(rawDataQuery).head(5))\r\n", "\r\n", "category = widgets.Select(options = categories, style = style, layout = layout, description = 'Choose logs/alerts: ')\r\n", "display(category) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "viewLogs(category = category.value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section II: What is the distribution of blocked WAF alerts over Rule IDs, http-status codes, and client IP Entities? \n", "\n", "Select a time frame of interest to view the distribution of WAF blocked alerts over all hosts.\n", "\n", "*Recommended:* Analyse a shorter time frame than Section I for more detail" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9bd778a844094d1f8ebfa717fce038d8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HTML(value='

Set query time boundaries

')" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "fbc28001a1b9497fa32e839210a9f47b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(DatePicker(value=datetime.date(2020, 12, 4), description='Origin Date'), Text(value='10:32:15.6…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1802daf652fc4b358322321abc0c1e43", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(IntRangeSlider(value=(-10, 1), description='Time Range (day):', layout=Layout(width='80%'), max…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query_times_2 = nbwidgets.QueryTime(units='day', max_before=30, before=-10, max_after=-1)\n", "query_times_2.display()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select a host entity\n", "\n", "The following host entity will be used for the remainder of this section" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", "

 * 8ecf8077-cf51-4820-aadd-14040956f35d@loganalytics

\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "line": { "color": "rgb(31, 118, 179)", "width": 1 }, "name": ":AlertCountPerHost", "opacity": 0.8, "type": "scatter", "x": [ "2020-11-04T11:00:00+00:00", "2020-11-04T12:00:00+00:00", "2020-11-04T13:00:00+00:00", "2020-11-04T17:00:00+00:00", "2020-11-04T18:00:00+00:00", "2020-11-04T22:00:00+00:00", "2020-11-05T02:00:00+00:00", "2020-11-05T08:00:00+00:00", "2020-11-05T10:00:00+00:00", "2020-11-05T11:00:00+00:00", "2020-11-05T14:00:00+00:00", "2020-11-05T21:00:00+00:00", "2020-11-06T01:00:00+00:00", "2020-11-06T03:00:00+00:00", "2020-11-06T05:00:00+00:00", "2020-11-06T06:00:00+00:00", "2020-11-06T09:00:00+00:00", "2020-11-06T11:00:00+00:00", "2020-11-06T12:00:00+00:00", "2020-11-06T13:00:00+00:00", "2020-11-06T21:00:00+00:00", "2020-11-06T22:00:00+00:00", "2020-11-06T23:00:00+00:00", "2020-11-07T05:00:00+00:00", "2020-11-07T06:00:00+00:00", "2020-11-07T07:00:00+00:00", "2020-11-07T11:00:00+00:00", "2020-11-07T12:00:00+00:00", "2020-11-07T15:00:00+00:00", "2020-11-07T20:00:00+00:00", "2020-11-07T21:00:00+00:00", "2020-11-07T23:00:00+00:00", "2020-11-08T01:00:00+00:00", "2020-11-08T02:00:00+00:00", "2020-11-08T03:00:00+00:00", "2020-11-08T06:00:00+00:00", "2020-11-08T07:00:00+00:00", "2020-11-08T11:00:00+00:00", "2020-11-08T12:00:00+00:00", "2020-11-08T13:00:00+00:00", "2020-11-08T14:00:00+00:00", "2020-11-08T17:00:00+00:00", "2020-11-08T21:00:00+00:00", "2020-11-09T01:00:00+00:00", "2020-11-09T03:00:00+00:00", "2020-11-09T05:00:00+00:00", "2020-11-09T06:00:00+00:00", "2020-11-09T08:00:00+00:00" ], "y": [ 2, null, null, null, null, null, null, null, null, 2, null, null, null, null, null, null, 2, 2, null, null, null, 2, null, null, null, null, 2, null, null, null, null, null, null, null, null, null, 2, 2, null, null, null, null, null, null, null, 2, null, null ] }, { "line": { "color": "rgb(254, 127, 14)", "width": 1 }, "name": "13.89.108.163:AlertCountPerHost", "opacity": 0.8, "type": "scatter", "x": [ "2020-11-04T11:00:00+00:00", "2020-11-04T12:00:00+00:00", "2020-11-04T13:00:00+00:00", "2020-11-04T17:00:00+00:00", "2020-11-04T18:00:00+00:00", "2020-11-04T22:00:00+00:00", "2020-11-05T02:00:00+00:00", "2020-11-05T08:00:00+00:00", "2020-11-05T10:00:00+00:00", "2020-11-05T11:00:00+00:00", "2020-11-05T14:00:00+00:00", "2020-11-05T21:00:00+00:00", "2020-11-06T01:00:00+00:00", "2020-11-06T03:00:00+00:00", "2020-11-06T05:00:00+00:00", "2020-11-06T06:00:00+00:00", "2020-11-06T09:00:00+00:00", "2020-11-06T11:00:00+00:00", "2020-11-06T12:00:00+00:00", "2020-11-06T13:00:00+00:00", "2020-11-06T21:00:00+00:00", "2020-11-06T22:00:00+00:00", "2020-11-06T23:00:00+00:00", "2020-11-07T05:00:00+00:00", "2020-11-07T06:00:00+00:00", "2020-11-07T07:00:00+00:00", "2020-11-07T11:00:00+00:00", "2020-11-07T12:00:00+00:00", "2020-11-07T15:00:00+00:00", "2020-11-07T20:00:00+00:00", "2020-11-07T21:00:00+00:00", "2020-11-07T23:00:00+00:00", "2020-11-08T01:00:00+00:00", "2020-11-08T02:00:00+00:00", "2020-11-08T03:00:00+00:00", "2020-11-08T06:00:00+00:00", "2020-11-08T07:00:00+00:00", "2020-11-08T11:00:00+00:00", "2020-11-08T12:00:00+00:00", "2020-11-08T13:00:00+00:00", "2020-11-08T14:00:00+00:00", "2020-11-08T17:00:00+00:00", "2020-11-08T21:00:00+00:00", "2020-11-09T01:00:00+00:00", "2020-11-09T03:00:00+00:00", "2020-11-09T05:00:00+00:00", "2020-11-09T06:00:00+00:00", "2020-11-09T08:00:00+00:00" ], "y": [ null, 4, 4, null, 4, null, 2, 4, 3, null, 4, 4, 8, 8, 4, 2, 2, 2, null, 4, 6, null, 1, 4, null, 6, null, 8, null, null, 4, 2, 2, null, 2, 4, 4, null, 1, 4, 1, null, 4, null, null, null, 4, 4 ] }, { "line": { "color": "rgb(44, 160, 44)", "width": 1 }, "name": "13.89.108.163:80:AlertCountPerHost", "opacity": 0.8, "type": "scatter", "x": [ "2020-11-04T11:00:00+00:00", "2020-11-04T12:00:00+00:00", "2020-11-04T13:00:00+00:00", "2020-11-04T17:00:00+00:00", "2020-11-04T18:00:00+00:00", "2020-11-04T22:00:00+00:00", "2020-11-05T02:00:00+00:00", "2020-11-05T08:00:00+00:00", "2020-11-05T10:00:00+00:00", "2020-11-05T11:00:00+00:00", "2020-11-05T14:00:00+00:00", "2020-11-05T21:00:00+00:00", "2020-11-06T01:00:00+00:00", "2020-11-06T03:00:00+00:00", "2020-11-06T05:00:00+00:00", "2020-11-06T06:00:00+00:00", "2020-11-06T09:00:00+00:00", "2020-11-06T11:00:00+00:00", "2020-11-06T12:00:00+00:00", "2020-11-06T13:00:00+00:00", "2020-11-06T21:00:00+00:00", "2020-11-06T22:00:00+00:00", "2020-11-06T23:00:00+00:00", "2020-11-07T05:00:00+00:00", "2020-11-07T06:00:00+00:00", "2020-11-07T07:00:00+00:00", "2020-11-07T11:00:00+00:00", "2020-11-07T12:00:00+00:00", "2020-11-07T15:00:00+00:00", "2020-11-07T20:00:00+00:00", "2020-11-07T21:00:00+00:00", "2020-11-07T23:00:00+00:00", "2020-11-08T01:00:00+00:00", "2020-11-08T02:00:00+00:00", "2020-11-08T03:00:00+00:00", "2020-11-08T06:00:00+00:00", "2020-11-08T07:00:00+00:00", "2020-11-08T11:00:00+00:00", "2020-11-08T12:00:00+00:00", "2020-11-08T13:00:00+00:00", "2020-11-08T14:00:00+00:00", "2020-11-08T17:00:00+00:00", "2020-11-08T21:00:00+00:00", "2020-11-09T01:00:00+00:00", "2020-11-09T03:00:00+00:00", "2020-11-09T05:00:00+00:00", "2020-11-09T06:00:00+00:00", "2020-11-09T08:00:00+00:00" ], "y": [ null, null, null, 8, null, 2, null, null, 8, null, 8, null, 1, 1, null, null, null, null, 8, null, null, null, null, null, 11, null, null, 1, 8, null, null, null, null, 10, null, null, null, 1, null, null, null, 8, null, null, 8, null, null, null ] }, { "line": { "color": "rgb(214, 39, 39)", "width": 1 }, "name": ":AlertCountPerHost", "opacity": 0.8, "type": "scatter", "x": [ "2020-11-04T11:00:00+00:00", "2020-11-04T12:00:00+00:00", "2020-11-04T13:00:00+00:00", "2020-11-04T17:00:00+00:00", "2020-11-04T18:00:00+00:00", "2020-11-04T22:00:00+00:00", "2020-11-05T02:00:00+00:00", "2020-11-05T08:00:00+00:00", "2020-11-05T10:00:00+00:00", "2020-11-05T11:00:00+00:00", "2020-11-05T14:00:00+00:00", "2020-11-05T21:00:00+00:00", "2020-11-06T01:00:00+00:00", "2020-11-06T03:00:00+00:00", "2020-11-06T05:00:00+00:00", "2020-11-06T06:00:00+00:00", "2020-11-06T09:00:00+00:00", "2020-11-06T11:00:00+00:00", "2020-11-06T12:00:00+00:00", "2020-11-06T13:00:00+00:00", "2020-11-06T21:00:00+00:00", "2020-11-06T22:00:00+00:00", "2020-11-06T23:00:00+00:00", "2020-11-07T05:00:00+00:00", "2020-11-07T06:00:00+00:00", "2020-11-07T07:00:00+00:00", "2020-11-07T11:00:00+00:00", "2020-11-07T12:00:00+00:00", "2020-11-07T15:00:00+00:00", "2020-11-07T20:00:00+00:00", "2020-11-07T21:00:00+00:00", "2020-11-07T23:00:00+00:00", "2020-11-08T01:00:00+00:00", "2020-11-08T02:00:00+00:00", "2020-11-08T03:00:00+00:00", "2020-11-08T06:00:00+00:00", "2020-11-08T07:00:00+00:00", "2020-11-08T11:00:00+00:00", "2020-11-08T12:00:00+00:00", "2020-11-08T13:00:00+00:00", "2020-11-08T14:00:00+00:00", "2020-11-08T17:00:00+00:00", "2020-11-08T21:00:00+00:00", "2020-11-09T01:00:00+00:00", "2020-11-09T03:00:00+00:00", "2020-11-09T05:00:00+00:00", "2020-11-09T06:00:00+00:00", "2020-11-09T08:00:00+00:00" ], "y": [ null, null, null, null, 1, null, null, null, null, null, null, null, 1, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 1, null, null, null, null, null, null, null, null, null, null, 1, null, null, null, null ] }, { "line": { "color": "rgb(147, 102, 189)", "width": 1 }, "name": "127.0.0.1:AlertCountPerHost", "opacity": 0.8, "type": "scatter", "x": [ "2020-11-04T11:00:00+00:00", "2020-11-04T12:00:00+00:00", "2020-11-04T13:00:00+00:00", "2020-11-04T17:00:00+00:00", "2020-11-04T18:00:00+00:00", "2020-11-04T22:00:00+00:00", "2020-11-05T02:00:00+00:00", "2020-11-05T08:00:00+00:00", "2020-11-05T10:00:00+00:00", "2020-11-05T11:00:00+00:00", "2020-11-05T14:00:00+00:00", "2020-11-05T21:00:00+00:00", "2020-11-06T01:00:00+00:00", "2020-11-06T03:00:00+00:00", "2020-11-06T05:00:00+00:00", "2020-11-06T06:00:00+00:00", "2020-11-06T09:00:00+00:00", "2020-11-06T11:00:00+00:00", "2020-11-06T12:00:00+00:00", "2020-11-06T13:00:00+00:00", "2020-11-06T21:00:00+00:00", "2020-11-06T22:00:00+00:00", "2020-11-06T23:00:00+00:00", "2020-11-07T05:00:00+00:00", "2020-11-07T06:00:00+00:00", "2020-11-07T07:00:00+00:00", "2020-11-07T11:00:00+00:00", "2020-11-07T12:00:00+00:00", "2020-11-07T15:00:00+00:00", "2020-11-07T20:00:00+00:00", "2020-11-07T21:00:00+00:00", "2020-11-07T23:00:00+00:00", "2020-11-08T01:00:00+00:00", "2020-11-08T02:00:00+00:00", "2020-11-08T03:00:00+00:00", "2020-11-08T06:00:00+00:00", "2020-11-08T07:00:00+00:00", "2020-11-08T11:00:00+00:00", "2020-11-08T12:00:00+00:00", "2020-11-08T13:00:00+00:00", "2020-11-08T14:00:00+00:00", "2020-11-08T17:00:00+00:00", "2020-11-08T21:00:00+00:00", "2020-11-09T01:00:00+00:00", "2020-11-09T03:00:00+00:00", "2020-11-09T05:00:00+00:00", "2020-11-09T06:00:00+00:00", "2020-11-09T08:00:00+00:00" ], "y": [ null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 2, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null ] } ], "layout": { "autosize": true, "showlegend": true, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "title": { "text": "timechart" }, "xaxis": { "autorange": true, "range": [ "2020-11-04 11:00", "2020-11-09 08:00" ], "title": { "text": "timeStamp_t" }, "type": "date" }, "yaxis": { "autorange": true, "range": [ 0.4444444444444444, 11.555555555555555 ], "ticksuffix": "", "title": { "text": "AlertCountPerHost" }, "type": "linear" } } }, "image/png": "", "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", "

Done (00:02.647): 60 records

\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Num_blocked_alerts
hostname_s
4
127.0.0.12
13.89.108.163120
13.89.108.163:8083
<undefined>18
\n", "
" ], "text/plain": [ " Num_blocked_alerts\n", "hostname_s \n", " 4\n", "127.0.0.1 2\n", "13.89.108.163 120\n", "13.89.108.163:80 83\n", " 18" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ff2504111c0743048ac34b02fc9893d8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Select(description='Select Host: ', index=4, layout=Layout(height='80px', width='50%'), options=('', '127.0.0.…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = queryHost(query_times_2.start, query_times_2.end)\n", "%kql -query query\n", "\n", "try:\n", " df_host = showQuery(query)\n", " list_hosts = set([x for x in df_host['hostname_s']])\n", " df = df_host.groupby(['hostname_s']).agg({'AlertCountPerHost': sum}).rename(columns = {'AlertCountPerHost': 'Num_blocked_alerts'})\n", " hosts = widgets.Select(options=list_hosts, style = style, layout = layout, value=df['Num_blocked_alerts'].idxmax(), description = 'Select Host: ')\n", " display(df)\n", " display(hosts)\n", "except Exception as e:\n", " print('Error: ' + e)\n", " traceback.print_exc()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Render visualisations of the distribution of blocked alerts for the selected host\n", "\n", "We will be using balloon plots to visualise the number of WAF alerts over rule IDs, http-status codes, and client IP entities, for the selected host entity." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query_distribution = '''\n", "AzureDiagnostics\n", "| where TimeGenerated between (datetime({startTime}).. datetime({endTime}))\n", "| where Category == \"ApplicationGatewayFirewallLog\"\n", "| where hostname_s == \"{host}\"\n", "| where action_s == 'Blocked' or isempty(action_s)\n", "| join kind=leftouter ( AzureDiagnostics\n", "| where TimeGenerated between (datetime({startTime}).. datetime({endTime}))\n", "| where Category == \"ApplicationGatewayAccessLog\"\n", "| summarize by requestUri_s, httpStatus_d\n", ") on requestUri_s \n", "| summarize NoOfAlerts = count(), make_set(requestUri_s), DistinctURIs = dcount(requestUri_s) by clientIp_s, ruleId_s, httpStatus_d1\n", "'''.format(startTime = query_times_2.start, endTime = query_times_2.end, host = hosts.value)\n", "\n", "try:\n", " df_distribution = showQuery(query_distribution)\n", " df_distribution.rename(columns = {'clientIp_s':'Ip Address', 'ruleId_s':'Rule ID', 'set_requestUri_s': 'Request Uris'}, inplace = True)\n", "\n", " if 'httpStatus_d1' in df_distribution.columns:\n", " df_distribution = df_distribution.sort_values(by=['httpStatus_d1'], ascending = True).reset_index(drop = True)\n", " df_distribution.rename(columns = {'httpStatus_d1':'Http status'}, inplace = True) \n", " df_distribution['Http status'] = 'h: ' + df_distribution['Http status'].astype(str)\n", " \n", " maskBits = maskBitsVal(len(df_distribution['Ip Address'].unique()))\n", " df_distribution['Ip Address'] = df_distribution['Ip Address'].apply(lambda x: ipaddress.IPv4Network(x + maskBits, strict = False))\n", " \n", " df_distribution['Ip Address'], df_distribution['Rule ID'] = 'Ip ' + df_distribution['Ip Address'].astype(str), 'rID ' + df_distribution['Rule ID'].astype(str)\n", "except Exception as e:\n", " print('Error: ' + e)\n", " traceback.print_exc()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f4c2af97e438486589f06399bde71a4d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Select(description='Select x-axis: ', layout=Layout(height='80px', width='50%'), options…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "options = ['Ip Address', 'Rule ID']\r\n", "if 'Http status' in df_distribution.columns:\r\n", " options += ['Http status']\r\n", "\r\n", "def viewBalloonPlot(x_axis, y_axis, display_rawResult):\r\n", " try: \r\n", " df_balloon_plot = (df_distribution\r\n", " .groupby([x_axis, y_axis], as_index=False)\r\n", " .agg({'NoOfAlerts': sum, 'DistinctURIs': sum, 'Request Uris': list})\r\n", " .reset_index(drop = True))\r\n", " fig = px.scatter(df_balloon_plot, x=df_balloon_plot[x_axis], y = df_balloon_plot[y_axis], \r\n", " size= np.log(1 + df_balloon_plot['NoOfAlerts'] ), color = 'NoOfAlerts',\r\n", " hover_data=['NoOfAlerts', 'DistinctURIs']) \r\n", "\r\n", " fig.update_layout(height = max(300, 30 * len(set(df_balloon_plot[y_axis]))), title_text='Alert Distribution for host ID '+ str(hosts.value))\r\n", "\r\n", " fig.show()\r\n", " if display_rawResult == 'Yes':\r\n", " print('Top 5 raw results with the highest number of alerts: \\n')\r\n", " df_balloon_plot['Request Uris'] = [np.unique(list(itertools.chain(*row['Request Uris']))) for index, row in df_balloon_plot.iterrows() ]\r\n", " df_balloon_plot['DistinctURIs'] = df_balloon_plot['Request Uris'].str.len()\r\n", " display(df_balloon_plot[[y_axis, x_axis, 'NoOfAlerts','Request Uris', 'DistinctURIs']].sort_values(by='NoOfAlerts', ascending = False).head(5))\r\n", " except ValueError:\r\n", " print('ValueError: Choose distinct x and y axes')\r\n", " except Exception as e:\r\n", " print('Error: ' + e)\r\n", " traceback.print_exc()\r\n", " \r\n", "\r\n", "x_axis = widgets.Select(options = options, style = style, layout = layout, description = 'Select x-axis: ')\r\n", "y_axis = widgets.Select(options = options, style = style, layout = layout, description = 'Select y-axis: ')\r\n", "display_rawResult = widgets.Select(options = ['Yes', 'No'], description = 'Display raw results: ')\r\n", "\r\n", "md(\"Select graph properties:\", \"bold\")\r\n", "display(x_axis)\r\n", "display(y_axis)\r\n", "display(display_rawResult)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "viewBalloonPlot, x_axis = x_axis.value, \r\n", " y_axis = y_axis.value, display_rawResult = display_rawResult.value)\r\n", "\r\n", "display(w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section III: Cluster the request URIs in blocked WAF Alerts, based on TFIDF scores \n", "\n", "Select the timeframe and host entity for this section of the notebook. \n", "\n", "*Recommended*: Set a timeframe of >20 days" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "240c8fd8dcc04798a556c911fe0fbabd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HTML(value='

Set query time boundaries

')" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "aed6e6a6719f4170a03d160ca7618b9a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(DatePicker(value=datetime.date(2020, 12, 4), description='Origin Date'), Text(value='10:32:42.8…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5c08089058604c4e9621bdd999e625a2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(IntRangeSlider(value=(-10, 1), description='Time Range (day):', layout=Layout(width='80%'), max…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query_times_3 = nbwidgets.QueryTime(units='day', max_before=30, before=10, max_after=-1)\n", "query_times_3.display()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Num_blocked_alerts
hostname_s
20
112.124.42.80:634351
123.125.114.1441
127.0.0.17
127.0.0.1:8011
13.89.108.16337341
13.89.108.163:80317
<undefined>86
\n", "
" ], "text/plain": [ " Num_blocked_alerts\n", "hostname_s \n", " 20\n", "112.124.42.80:63435 1\n", "123.125.114.144 1\n", "127.0.0.1 7\n", "127.0.0.1:80 11\n", "13.89.108.163 37341\n", "13.89.108.163:80 317\n", " 86" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7413032cfc1f4db69ccac877fbb6fd23", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Select(description='Select Host: ', index=6, options=('', '127.0.0.1', '13.89.108.163:80', '112.124.42.80:6343…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_host_2 = showQuery(queryHost(query_times_3.start, query_times_3.end))\n", "df = df_host_2.groupby(['hostname_s']).agg({'AlertCountPerHost': sum}).rename(columns = {'AlertCountPerHost': 'Num_blocked_alerts'})\n", "hosts_2 = widgets.Select(options=set([x for x in df_host_2['hostname_s']]), value=df['Num_blocked_alerts'].idxmax(), description = 'Select Host: ')\n", "display(df)\n", "display(hosts_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Enter min_df and max_df value parameters**\n", "\n", "*min_df*: The min_df variable is used to eliminate terms that do not appear very frequently in our data. A min_df value of 0.01 implies eliminating terms that apear in less than 1% of the data.\n", "\n", "*max_df*: The max_df variable eliminates terms that appear very frequently in our data. A max_df value of 0.9 implies eliminating terms that appear in more than 90% of the data.\n", "\n", "For more information about these parameters in the TFIDF vectorizer, please see [here](https://stackoverflow.com/questions/27697766/understanding-min-df-and-max-df-in-scikit-countvectorizer)\n", "\n", "\n", "**Note:** In the case of errors running the code below for the two approaches (Request URIs split on \"/\" against the client IP entities OR Number of blocked alerts for every Rule ID against the client IPs), run the TFIDF vectoriser for ALL the data \n", "\n", "If you would like to view the TFIDF scores for all the data, change the following code in the `tfidfScores` function:\n", "\n", "`vectorizer = TfidfVectorizer(tokenizer=identity_tokenizer, lowercase=False, min_df = min_df_value, max_df = max_df_value) `\n", "\n", "to\n", "\n", "`vectorizer = TfidfVectorizer(tokenizer=identity_tokenizer, lowercase=False) ` " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7d0ead1e93854831a2cb7e6eb8cc853b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='0.01', description='Enter min_df: ', layout=Layout(height='30px', width='50%'), placeholder='% or …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1854c018ece04802beb37f2fd8120fd0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='0.9', description='Enter max_df: ', layout=Layout(height='30px', width='50%'), placeholder='% or I…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "min_df_widget = widgets.Text(style = style, layout = widgets.Layout(width=\"50%\", height=\"30px\"), description = 'Enter min_df: ', placeholder = '% or Integer or None', value = '0.01')\n", "max_df_widget = widgets.Text(style = style, layout = widgets.Layout(width=\"50%\", height=\"30px\"), description = 'Enter max_df: ', placeholder = '% or Integer or None', value = '0.9')\n", "\n", "display(min_df_widget)\n", "display(max_df_widget)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "try:\n", " min_df_value = float(min_df_widget.value)\n", " max_df_value = float(max_df_widget.value)\n", "except Exception as e:\n", " print('Error: ' + str(e))\n", " traceback.print_exc()\n", " \n", "def tfidfScores(df, tokenList = None):\n", " def identity_tokenizer(text):\n", " return text\n", " \n", " vectorizer = TfidfVectorizer(tokenizer=identity_tokenizer, lowercase=False, min_df = min_df_value, max_df = max_df_value) \n", " vectors = vectorizer.fit_transform(tokenList)\n", " feature_names = vectorizer.get_feature_names()\n", " dense = vectors.todense()\n", " denselist = dense.tolist()\n", " df_scores = pd.DataFrame(denselist, columns = feature_names)\n", " multicol1 = pd.MultiIndex.from_tuples([('weight', str(j)) for j in df_scores.columns])\n", " df_multiIndex = pd.DataFrame([list(df_scores.iloc[i]) for i in range(0, len(df_scores))], index=[df['Ip Address']], columns=multicol1)\n", " return df_multiIndex" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Approach I: Compute TFIDF scores for split request URIs in the blocked WAF Alerts against client IP entities" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weight
!.php$DefaultNav${@die(md5(HelloThinkPHP))}%62%61%73%65%68%74%6D%6C%6F%66%66%69%63%65%73%65%72%76%6C%65%74%69%6D%69%73%70%69%72%69%74%70%6F%73%74%2E%70%68%70%72%65%67%69%73%74%65%72?%65%6c%65%6d%65%6e%74%5f%70%61%72%65%6e%74%73=%74%69%6d%65%7a%6f%6e%65%2f%74%69%6d%65%7a%6f%6e%65%2f%23%76%61%6c%75%65\\u0026%61%6a%61%78%5f%66%6f%72%6d=1\\u0026%5f%77%72%61%70%70%65%72%5f%66%6f%72%6d%61%74=%64%72%75%70%61%6c%5f%61%6a%61%78%73%65%65%79%6F%6E%75%70%6C%6F%61%64%2E%70%68%70%75%73%65%72%75%73%65%72%2e%70%68%70-.bzr.bzrignore.config.php.env.git.git.php.gitignore.hg.hgignore.htaccess.htaccess.svn-base...zencartzhk.phpzhui.phpzikulazipzipfileszmp.phpzp-corezshmindex.phpzuo.phpzuoindex.phpzuos.phpzuoshou.phpzuoshss.phpzuoss.phpzxc.phpzxc0.phpzxc1.phpzxc2.phpzxy.phpzyc.phpzz.phpzza.phpzzk.phpzzz.php
Ip Address
20.51.0.0/160.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
45.249.0.0/160.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
20.57.0.0/160.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
20.51.0.0/160.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
91.241.0.0/160.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", "

5 rows × 2737 columns

\n", "
" ], "text/plain": [ " weight \\\n", " !.php $DefaultNav ${@die(md5(HelloThinkPHP))} %62%61%73%65 \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 0.0 0.0 \n", "\n", " \\\n", " %68%74%6D%6C%6F%66%66%69%63%65%73%65%72%76%6C%65%74 %69%6D \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 \n", "\n", " \\\n", " %69%73%70%69%72%69%74 %70%6F%73%74%2E%70%68%70 \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 \n", "\n", " \\\n", " %72%65%67%69%73%74%65%72?%65%6c%65%6d%65%6e%74%5f%70%61%72%65%6e%74%73=%74%69%6d%65%7a%6f%6e%65%2f%74%69%6d%65%7a%6f%6e%65%2f%23%76%61%6c%75%65\\u0026%61%6a%61%78%5f%66%6f%72%6d=1\\u0026%5f%77%72%61%70%70%65%72%5f%66%6f%72%6d%61%74=%64%72%75%70%61%6c%5f%61%6a%61%78 \n", "Ip Address \n", "20.51.0.0/16 0.0 \n", "45.249.0.0/16 0.0 \n", "20.57.0.0/16 0.0 \n", "20.51.0.0/16 0.0 \n", "91.241.0.0/16 0.0 \n", "\n", " \\\n", " %73%65%65%79%6F%6E %75%70%6C%6F%61%64%2E%70%68%70 %75%73%65%72 \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 0.0 \n", "\n", " \\\n", " %75%73%65%72%2e%70%68%70 - .bzr .bzrignore .config.php .env \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " \\\n", " .git .git.php .gitignore .hg .hgignore .htaccess \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " ... \\\n", " .htaccess.svn-base ... zencart zhk.php zhui.php zikula zip \n", "Ip Address ... \n", "20.51.0.0/16 0.0 ... 0.0 0.0 0.0 0.0 0.0 \n", "45.249.0.0/16 0.0 ... 0.0 0.0 0.0 0.0 0.0 \n", "20.57.0.0/16 0.0 ... 0.0 0.0 0.0 0.0 0.0 \n", "20.51.0.0/16 0.0 ... 0.0 0.0 0.0 0.0 0.0 \n", "91.241.0.0/16 0.0 ... 0.0 0.0 0.0 0.0 0.0 \n", "\n", " \\\n", " zipfiles zmp.php zp-core zshmindex.php zuo.php zuoindex.php \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " \\\n", " zuos.php zuoshou.php zuoshss.php zuoss.php zxc.php zxc0.php \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " \n", " zxc1.php zxc2.php zxy.php zyc.php zz.php zza.php zzk.php zzz.php \n", "Ip Address \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "45.249.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.57.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "20.51.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "91.241.0.0/16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", "[5 rows x 2737 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query_URIs = '''\n", "AzureDiagnostics\n", "| where TimeGenerated between (datetime({startTime}).. datetime({endTime}))\n", "| where Category == \"ApplicationGatewayFirewallLog\"\n", "| where hostname_s startswith \"{host}\"\n", "| where action_s == 'Blocked' or isempty(action_s)\n", "| distinct clientIp_s, requestUri_s\n", "| summarize make_list(requestUri_s) by clientIp_s\n", "'''.format(startTime = query_times_3.start, endTime = query_times_3.end, host = hosts_2.value)\n", "\n", "try:\n", " df_URIs = showQuery(query_URIs)\n", " df_URIs.rename(columns = {'clientIp_s':'Ip Address', 'list_requestUri_s': 'RequestUris'}, inplace = True) \n", "\n", " viewData_splitUri = df_URIs.copy()\n", " maskBits = maskBitsVal(len(viewData_splitUri['Ip Address'].unique()))\n", " viewData_splitUri['Ip Address'] = viewData_splitUri['Ip Address'].apply(lambda x: ipaddress.IPv4Network(x + maskBits, strict = False))\n", " viewData_splitUri.groupby([\"Ip Address\"], as_index=False).agg({'RequestUris': list})\n", "\n", " tokenList = []\n", " for index, row in viewData_splitUri.iterrows():\n", " splitUris = re.split('/', ''.join(row['RequestUris']))\n", " tokenList = tokenList + [splitUris] \n", "\n", " df_splitUri_tfidf = tfidfScores(viewData_splitUri, tokenList)\n", "except Exception as e:\n", " print('Error: ' + str(e))\n", " traceback.print_exc()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Approach II: Computer TFIDF scores for volume of blocked WAF alerts for Rule Ids against the client IP entities" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
weight
949110980130BlockGeoLocationChinaBlockInternetExplorer11
Ip Address
3.16.0.0/160.7071070.7071070.00.0
3.236.0.0/160.7071070.7071070.00.0
13.65.0.0/160.7071070.7071070.00.0
13.68.0.0/160.7071070.7071070.00.0
13.84.0.0/160.7071070.7071070.00.0
\n", "
" ], "text/plain": [ " weight \n", " 949110 980130 BlockGeoLocationChina BlockInternetExplorer11\n", "Ip Address \n", "3.16.0.0/16 0.707107 0.707107 0.0 0.0\n", "3.236.0.0/16 0.707107 0.707107 0.0 0.0\n", "13.65.0.0/16 0.707107 0.707107 0.0 0.0\n", "13.68.0.0/16 0.707107 0.707107 0.0 0.0\n", "13.84.0.0/16 0.707107 0.707107 0.0 0.0" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query_RuleIds = '''\n", "AzureDiagnostics\n", "| where TimeGenerated between (datetime({startTime}).. datetime({endTime}))\n", "| where Category == \"ApplicationGatewayFirewallLog\"\n", "| where hostname_s startswith \"{host}\"\n", "| where action_s == 'Blocked'\n", "| summarize alertCount = count(), make_set(requestUri_s) by clientIp_s, ruleId_s\n", "'''.format(startTime = query_times_3.start, endTime = query_times_3.end, host = hosts_2.value)\n", "\n", "try:\n", " dfPrac = showQuery(query_RuleIds)\n", " df_RuleIds = showQuery(query_RuleIds)\n", " df_RuleIds.rename(columns = {'clientIp_s':'Ip Address', 'ruleId_s':'RuleId', 'set_requestUri_s': 'RequestUris'}, inplace = True) \n", " \n", " maskBits = maskBitsVal(len(df_RuleIds['Ip Address'].unique()))\n", " df_RuleIds['Ip Address'] = df_RuleIds['Ip Address'].apply(lambda x: ipaddress.IPv4Network(x + maskBits, strict = False))\n", "\n", " viewData_ruleId = df_RuleIds.groupby([\"Ip Address\"], as_index=False).agg({'RuleId': list, 'alertCount': list, 'RequestUris': list})\n", " tokenList = [sum([[s] * n for s, n in zip(viewData_ruleId['RuleId'][x], viewData_ruleId['alertCount'][x])], []) for x in range(0, len(viewData_ruleId))]\n", "\n", " df_ruleId_tfidf = tfidfScores(viewData_ruleId, tokenList)\n", "except Exception as e:\n", " print('Error: ' + e)\n", " traceback.print_exc()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualisation of the TFIDF scores for both approaches\n", "\n", "We will be using balloon plots to view the TFIDF scores for the two approaches" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "857bed850716444cbd1b3236f9576abe", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Select(description='TFIDF approach: ', layout=Layout(height='80px', width='50%'), option…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "options = ['RuleId', 'SplitUris']\r\n", "\r\n", "def visualiseTFIDF(TfidfCategory):\r\n", " try:\r\n", " max_category = 30\r\n", " df = pd.DataFrame()\r\n", " if TfidfCategory == 'RuleId': df = df_ruleId_tfidf.copy()\r\n", " else: \r\n", " df = df_splitUri_tfidf.copy()\r\n", "\r\n", " df_tfidf = df.iloc[:, : max_category].stack().reset_index(drop = False).rename(columns = {'level_1':TfidfCategory, 'weight':'tfidf'})\r\n", " df_tfidf['Ip Address'] = 'Ip ' + df_tfidf['Ip Address'].astype(str)\r\n", " if 'RuleId' == TfidfCategory: \r\n", " df_tfidf['RuleId'] = 'rID ' + df_tfidf['RuleId'].astype(str)\r\n", " else:\r\n", " df_tfidf['SplitUris'] = df_tfidf['SplitUris'].apply(lambda x: (x[0:20]+ '...') if len(x)> 20 else x)\r\n", " \r\n", " fig = px.scatter(df_tfidf, x = df_tfidf[TfidfCategory], y = df_tfidf['Ip Address'],\r\n", " size= np.log(1 + df_tfidf['tfidf']), color = df_tfidf['tfidf'],\r\n", " hover_data=[df_tfidf['tfidf']]) \r\n", " fig.update_layout(height = max(800, 20 * len(set(df_tfidf[TfidfCategory]))), title_text= 'TFIDF distribution of ' + TfidfCategory + ' against client IPs', width = 1700)\r\n", " fig.show()\r\n", " except Exception as e:\r\n", " print('Error: ' + e)\r\n", " traceback.print_exc()\r\n", "TfidfCategory = widgets.Select(options = options, style = style, layout = layout, description = 'TFIDF approach: ')\r\n", "display(TfidfCategory)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "visualiseTFIDF(TfidfCategory = TfidfCategory.value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DBSCAN Clustering and PCA of the request URIs for both approaches\n", "\n", "\n", "DBSCAN is a non-parametric density-based spatial clustering algorithm, which groups together points that are \"closely packed\" together. Points which lie in low density regions are marked as outliers. For more information, please see [here](https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf). We use DBScan on our data in order to aggregate request URIs which are similar to each other, and surface unusual request URIs as outliers. The clustering uses the Tfidf scores data obtained for the rule ID and split URIs approaches respectively. \n", "\n", "Select the eps and min_samples value for DBScan and n_components value for PCA below. More information about these parameters can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.dbscan.html) and [here](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html).\n", "\n", "**DBScan:**\n", "\n", "*eps value:* Eps value is a measure of the distance below which two points are considered neighbors.\n", "\n", "*min_samples:* The minimum number of neighbors that a point should have in order to be classified as a core point. The core point is included in the min_samples count.\n", "\n", "**PCA:** PCA is a dimensionality reduction technique that compresses the multivariate data into principal components, which describe most of the variation in the original dataset. In our case, we are able to better visualise the clubbing of similar and outlier request URIs by visualising the first two Principal components.\n", "\n", "*n_components:* Number of principal components" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8aed9bf4295548b680680fb5c2332b1e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='0.4', description='DBSCAN : Enter eps value', layout=Layout(height='30px', width='50%'), style=Des…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4e02f9bbb3b142e3945642ae42d8475e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "IntSlider(value=5, description='DBSCAN : Enter min samples', layout=Layout(height='30px', width='50%'), style=…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4aa3d9cbcc7844f8ab0b988cfa00e561", "version_major": 2, "version_minor": 0 }, "text/plain": [ "IntSlider(value=2, description='PCA : Enter n_components', layout=Layout(height='30px', width='50%'), style=Sl…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "eps_widget = widgets.Text(style = style, layout = widgets.Layout(width=\"50%\", height=\"30px\"), description = 'DBSCAN - Enter eps value', value = '0.4')\n", "min_samples_widget = widgets.IntSlider(style = style, layout = widgets.Layout(width=\"50%\", height=\"30px\"), description='DBSCAN - Enter min samples', start=1, end=15, step=1, value=5)\n", "n_components_widget = widgets.IntSlider(style = style, layout = widgets.Layout(width=\"50%\", height=\"30px\"), description='PCA - Enter n_components', start=1, end=15, step=1, value=2)\n", "\n", "display(eps_widget)\n", "display(min_samples_widget)\n", "display(n_components_widget)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def db_scan_clustering(data, eps = float(eps_widget.value)):\n", " dbscan = DBSCAN(eps=eps, min_samples = int(min_samples.value))\n", " dbscan.fit(data)\n", " return dbscan.labels_\n", " \n", "def principal_component_analysis(data, eps = float(eps_widget.value)):\n", " while True: \n", " try: \n", " pca = PCA(n_components=int(n_components_widget.value))\n", " pca.fit(data)\n", " x_pca = pca.transform(data)\n", " break \n", " except: \n", " continue\n", " clusters = db_scan_clustering(data.values, eps)\n", " \n", " label = list(range(0, len(data), 1))\n", " plt.figure(figsize=(20,15))\n", " scatter = plt.scatter(x_pca[:,0],x_pca[:,1],c = clusters,cmap='rainbow')\n", " handles, labels = scatter.legend_elements(prop=\"colors\", alpha=0.6)\n", " plt.legend(handles, labels, loc=\"upper right\", title=\"Clusters\")\n", " n = list(range(0, len(x_pca[:,0]), 1))\n", " texts = []\n", " for i, txt in enumerate(n):\n", " texts.append(plt.text(x_pca[:,0][i], x_pca[:,1][i], txt))\n", " adjust_text(texts)\n", "\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1mPrincipal Component Analysis \n", "\u001b[0m\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "420eb9856d1d483695aa7c14ff62cfd3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Select(description='TFIDF approach: ', layout=Layout(height='80px', width='50%'), option…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "options1 = ['RuleId', 'SplitUris']\r\n", "\r\n", "def viewPCA(tfidfCategory):\r\n", " df = df_splitUri_tfidf.copy()\r\n", " viewData = viewData_splitUri.copy()\r\n", " if tfidfCategory == 'RuleId': \r\n", " df = df_ruleId_tfidf.copy()\r\n", " viewData = viewData_ruleId.copy()\r\n", "\r\n", " print(tfidfCategory + ' approach (Outliers + Clustered request URI data): \\n')\r\n", " while True:\r\n", " try:\r\n", " principal_component_analysis(df)\r\n", " break\r\n", " except:\r\n", " continue\r\n", " \r\n", "print(color.BOLD + 'Principal Component Analysis \\n' + color.END)\r\n", "tfidfCategory = widgets.Select(options = options1, style = style, layout = layout, description = 'TFIDF approach: ')\r\n", "display(tfidfCategory)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "viewPCA(tfidfCategory = tfidfCategory.value)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1mDBScan Clustering of the Request URIs \n", "\u001b[0m\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "718fb765f4894a35924b8eb1752fcf17", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Select(description='TFIDF approach: ', layout=Layout(height='80px', width='50%'), option…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "options1 = ['RuleId', 'SplitUris']\r\n", "options2 = ['Outlier', 'Clustered']\r\n", "\r\n", "def viewClusters(tfidfCategory, requestURIs):\r\n", " try:\r\n", " df = df_splitUri_tfidf.copy()\r\n", " viewData = viewData_splitUri.copy()\r\n", " if tfidfCategory == 'RuleId': \r\n", " df = df_ruleId_tfidf.copy()\r\n", " viewData = viewData_ruleId.copy()\r\n", "\r\n", " clusters = db_scan_clustering(df.values)\r\n", " print(requestURIs + ' URIs for ' + tfidfCategory+ ': \\n')\r\n", "\r\n", " clusterList = list(set(clusters))\r\n", " try:\r\n", " clusterList.remove(-1)\r\n", " except:\r\n", " print()\r\n", "\r\n", " if requestURIs == 'Outlier':\r\n", " clusterList = [-1]\r\n", "\r\n", " if clusterList:\r\n", " for k in clusterList:\r\n", " print('Cluster ' + str(k))\r\n", " display(viewData[viewData['Ip Address'].isin(df.index.get_level_values(0)[clusters == k])])\r\n", " else:\r\n", " print('No Data')\r\n", " except Exception as e:\r\n", " print('Error: ' + e)\r\n", " traceback.print_exc()\r\n", "\r\n", "print(color.BOLD + 'DBScan Clustering of the Request URIs \\n' + color.END)\r\n", "tfidfCategory = widgets.Select(options = options1, style = style, layout = layout, description = 'TFIDF approach: ')\r\n", "requestURIs = widgets.Select(options = options2, style = style, layout = layout, description = 'Request URIs: ')\r\n", "display(tfidfCategory)\r\n", "display(requestURIs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "viewClusters, tfidfCategory = widgets.Select(options = options1, style = style, layout = layout, description = 'TFIDF approach: '), requestURIs = widgets.Select(options = options2, style = style, layout = layout, description = 'Request URIs: ') ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Kusto query to further examine the WAF logs and blocked alerts in the time frames with outlier request URIs" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m\n", "Start time: \u001b[0m2020-11-04 10:32:42.885697\n", "\n", "\u001b[1mEnd time: \u001b[0m2020-11-26 10:32:42.885697\n", "\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "df7b1fa8e6624b74ade7091c3b3492b0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='', description='IP address: ', layout=Layout(height='30px', width='50%'), placeholder='Enter maske…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f3595ee81cd542b199f8d11993f23288", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Text(value='', description='Request URI: ', layout=Layout(height='30px', width='50%'), placeholder='Enter requ…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ipAddress = widgets.Text(style = style, layout = widgets.Layout(width=\"50%\", height=\"30px\"), description = 'IP address: ', placeholder = 'Enter masked IP address from the results above. Include masking bits.')\n", "requestURI = widgets.Text(style = style, layout = widgets.Layout(width=\"50%\", height=\"30px\"), description = 'Request URI: ', placeholder = 'Enter request URI from the results above')\n", "\n", "print(color.BOLD + '\\nStart time: ' + color.END + str(query_times_3.start) + '\\n')\n", "print(color.BOLD + 'End time: ' + color.END + str(query_times_3.end) + '\\n')\n", "\n", "display(ipAddress)\n", "display(requestURI)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m\n", "Start time: \u001b[0m2020-11-04 10:32:42.885697\n", "\n", "\u001b[1mEnd time: \u001b[0m2020-11-26 10:32:42.885697\n", "\n", "\u001b[1mIp Address entered: \u001b[0m108.4.0.0/16\n", "\n", "\u001b[1mRequest Uri entered: \u001b[0m\\\\xcc\\\\xb2\\\\xcc\\\\x85]-1572603645543.jpg\n", "\n" ] }, { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "ApplicationGatewayAccessLog (Raw) Data- \n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: []" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "ApplicationGatewayFirewallLog (Alert) Data- \n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TenantIdTimeGeneratedResourceIdCategoryResourceGroupSubscriptionIdResourceProviderResourceResourceTypeOperationNamerequestUri_sMessageinstanceId_sSourceSystemruleSetType_sruleSetVersion_sruleId_saction_ssite_sdetails_message_sdetails_file_sdetails_line_shostname_stransactionId_gpolicyId_spolicyScope_spolicyScopeName_stimeStamp_tclientIp_sType_ResourceId
08ecf8077-cf51-4820-aadd-14040956f35d2020-11-10 16:17:16.533000+00:00/SUBSCRIPTIONS/D1D8779D-38D7-4F06-91DB-9CBC8DE0176F/RESOURCEGROUPS/SOC-NS/PROVIDERS/MICROSOFT.NETWORK/APPLICATIONGATEWAYS/SOC-NS-AG-WAFV2ApplicationGatewayFirewallLogSOC-NSd1d8779d-38d7-4f06-91db-9cbc8de0176fMICROSOFT.NETWORKSOC-NS-AG-WAFV2APPLICATIONGATEWAYSApplicationGatewayFirewall/assets/public/images/uploads/my-rare-collectors-item!-[\\\\xcc\\\\xb2\\\\xcc\\\\x85$\\\\xcc\\\\xb2\\\\xcc\\\\x85(\\\\xcc\\\\xb2\\\\xcc\\\\x85-\\\\xcd\\\\xa1\\\\xc2\\\\xb0-\\\\xcd\\\\x9c\\\\xca\\\\x96-\\\\xcd\\\\xa1\\\\xc2\\\\xb0\\\\xcc\\\\xb2\\\\xcc\\\\x85)\\\\xcc\\\\xb2\\\\xcc\\\\x85$\\\\xcc\\\\xb2\\\\xcc\\\\x85]-1572603645543.jpgMandatory rule. Cannot be disabled. Inbound Anomaly Score Exceeded (Total Score: 5)appgw_1AzureOWASP_CRS3.1.0949110BlockedGlobalAccess denied with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score.rules/REQUEST-949-BLOCKING-EVALUATION.conf9313.89.108.163ecdc12b7-045a-1063-0485-9ca508f6fd2bdefaultGlobalGlobal2020-11-10 16:16:00+00:00108.4.232.173AzureDiagnostics/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-ns/providers/microsoft.network/applicationgateways/soc-ns-ag-wafv2
18ecf8077-cf51-4820-aadd-14040956f35d2020-11-10 16:17:16.533000+00:00/SUBSCRIPTIONS/D1D8779D-38D7-4F06-91DB-9CBC8DE0176F/RESOURCEGROUPS/SOC-NS/PROVIDERS/MICROSOFT.NETWORK/APPLICATIONGATEWAYS/SOC-NS-AG-WAFV2ApplicationGatewayFirewallLogSOC-NSd1d8779d-38d7-4f06-91db-9cbc8de0176fMICROSOFT.NETWORKSOC-NS-AG-WAFV2APPLICATIONGATEWAYSApplicationGatewayFirewall/assets/public/images/uploads/my-rare-collectors-item!-[\\\\xcc\\\\xb2\\\\xcc\\\\x85$\\\\xcc\\\\xb2\\\\xcc\\\\x85(\\\\xcc\\\\xb2\\\\xcc\\\\x85-\\\\xcd\\\\xa1\\\\xc2\\\\xb0-\\\\xcd\\\\x9c\\\\xca\\\\x96-\\\\xcd\\\\xa1\\\\xc2\\\\xb0\\\\xcc\\\\xb2\\\\xcc\\\\x85)\\\\xcc\\\\xb2\\\\xcc\\\\x85$\\\\xcc\\\\xb2\\\\xcc\\\\x85]-1572603645543.jpgMandatory rule. Cannot be disabled. Inbound Anomaly Score Exceeded (Total Inbound Score: 5 - SQLI=0,XSS=0,RFI=0,LFI=0,RCE=0,PHPI=5,HTTP=0,SESS=0): PHP Injection Attack: Variable Function Call Found; individual paranoia level scores: 5, 0, 0, 0appgw_1AzureOWASP_CRS3.1.0980130BlockedGlobalWarning. Operator GE matched 5 at TX:inbound_anomaly_score.rules/RESPONSE-980-CORRELATION.conf8613.89.108.163ecdc12b7-045a-1063-0485-9ca508f6fd2bdefaultGlobalGlobal2020-11-10 16:16:00+00:00108.4.232.173AzureDiagnostics/subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-ns/providers/microsoft.network/applicationgateways/soc-ns-ag-wafv2
\n", "
" ], "text/plain": [ " TenantId TimeGenerated \\\n", "0 8ecf8077-cf51-4820-aadd-14040956f35d 2020-11-10 16:17:16.533000+00:00 \n", "1 8ecf8077-cf51-4820-aadd-14040956f35d 2020-11-10 16:17:16.533000+00:00 \n", "\n", " ResourceId \\\n", "0 /SUBSCRIPTIONS/D1D8779D-38D7-4F06-91DB-9CBC8DE0176F/RESOURCEGROUPS/SOC-NS/PROVIDERS/MICROSOFT.NETWORK/APPLICATIONGATEWAYS/SOC-NS-AG-WAFV2 \n", "1 /SUBSCRIPTIONS/D1D8779D-38D7-4F06-91DB-9CBC8DE0176F/RESOURCEGROUPS/SOC-NS/PROVIDERS/MICROSOFT.NETWORK/APPLICATIONGATEWAYS/SOC-NS-AG-WAFV2 \n", "\n", " Category ResourceGroup \\\n", "0 ApplicationGatewayFirewallLog SOC-NS \n", "1 ApplicationGatewayFirewallLog SOC-NS \n", "\n", " SubscriptionId ResourceProvider Resource \\\n", "0 d1d8779d-38d7-4f06-91db-9cbc8de0176f MICROSOFT.NETWORK SOC-NS-AG-WAFV2 \n", "1 d1d8779d-38d7-4f06-91db-9cbc8de0176f MICROSOFT.NETWORK SOC-NS-AG-WAFV2 \n", "\n", " ResourceType OperationName \\\n", "0 APPLICATIONGATEWAYS ApplicationGatewayFirewall \n", "1 APPLICATIONGATEWAYS ApplicationGatewayFirewall \n", "\n", " requestUri_s \\\n", "0 /assets/public/images/uploads/my-rare-collectors-item!-[\\\\xcc\\\\xb2\\\\xcc\\\\x85$\\\\xcc\\\\xb2\\\\xcc\\\\x85(\\\\xcc\\\\xb2\\\\xcc\\\\x85-\\\\xcd\\\\xa1\\\\xc2\\\\xb0-\\\\xcd\\\\x9c\\\\xca\\\\x96-\\\\xcd\\\\xa1\\\\xc2\\\\xb0\\\\xcc\\\\xb2\\\\xcc\\\\x85)\\\\xcc\\\\xb2\\\\xcc\\\\x85$\\\\xcc\\\\xb2\\\\xcc\\\\x85]-1572603645543.jpg \n", "1 /assets/public/images/uploads/my-rare-collectors-item!-[\\\\xcc\\\\xb2\\\\xcc\\\\x85$\\\\xcc\\\\xb2\\\\xcc\\\\x85(\\\\xcc\\\\xb2\\\\xcc\\\\x85-\\\\xcd\\\\xa1\\\\xc2\\\\xb0-\\\\xcd\\\\x9c\\\\xca\\\\x96-\\\\xcd\\\\xa1\\\\xc2\\\\xb0\\\\xcc\\\\xb2\\\\xcc\\\\x85)\\\\xcc\\\\xb2\\\\xcc\\\\x85$\\\\xcc\\\\xb2\\\\xcc\\\\x85]-1572603645543.jpg \n", "\n", " Message \\\n", "0 Mandatory rule. Cannot be disabled. Inbound Anomaly Score Exceeded (Total Score: 5) \n", "1 Mandatory rule. Cannot be disabled. Inbound Anomaly Score Exceeded (Total Inbound Score: 5 - SQLI=0,XSS=0,RFI=0,LFI=0,RCE=0,PHPI=5,HTTP=0,SESS=0): PHP Injection Attack: Variable Function Call Found; individual paranoia level scores: 5, 0, 0, 0 \n", "\n", " instanceId_s SourceSystem ruleSetType_s ruleSetVersion_s ruleId_s action_s \\\n", "0 appgw_1 Azure OWASP_CRS 3.1.0 949110 Blocked \n", "1 appgw_1 Azure OWASP_CRS 3.1.0 980130 Blocked \n", "\n", " site_s \\\n", "0 Global \n", "1 Global \n", "\n", " details_message_s \\\n", "0 Access denied with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score. \n", "1 Warning. Operator GE matched 5 at TX:inbound_anomaly_score. \n", "\n", " details_file_s details_line_s hostname_s \\\n", "0 rules/REQUEST-949-BLOCKING-EVALUATION.conf 93 13.89.108.163 \n", "1 rules/RESPONSE-980-CORRELATION.conf 86 13.89.108.163 \n", "\n", " transactionId_g policyId_s policyScope_s \\\n", "0 ecdc12b7-045a-1063-0485-9ca508f6fd2b default Global \n", "1 ecdc12b7-045a-1063-0485-9ca508f6fd2b default Global \n", "\n", " policyScopeName_s timeStamp_t clientIp_s \\\n", "0 Global 2020-11-10 16:16:00+00:00 108.4.232.173 \n", "1 Global 2020-11-10 16:16:00+00:00 108.4.232.173 \n", "\n", " Type \\\n", "0 AzureDiagnostics \n", "1 AzureDiagnostics \n", "\n", " _ResourceId \n", "0 /subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-ns/providers/microsoft.network/applicationgateways/soc-ns-ag-wafv2 \n", "1 /subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-ns/providers/microsoft.network/applicationgateways/soc-ns-ag-wafv2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "try:\n", " pd.set_option('display.max_colwidth', 20)\n", " kql_query = '''\n", " AzureDiagnostics\n", " | where TimeGenerated between (datetime({startTime}).. datetime({endTime}))\n", " | where Category == \"{category}\"\n", " | where {hostname} startswith \"{host}\" \n", " | where action_s == 'Blocked' or isempty(action_s)\n", " | where {ip} startswith \"{ipaddress}\"\n", " | extend originalRequestUriWithArgs_s = column_ifexists(\"originalRequestUriWithArgs_s\", \"\")\n", " | where requestUri_s contains {uri} or originalRequestUriWithArgs_s contains {uri}\n", " | take 10\n", " '''\n", " cutOff = [1, 2, 3, 4]\n", " intlist = [8, 16, 24, 32]\n", " \n", " if ipAddress.value != '':\n", " ipaddress = str(ipAddress.value).strip().split('/')[0]\n", " maskBits = int(str(ipAddress.value).strip().split('/')[1])\n", " ipaddress = '.'.join(ipaddress.split('.')[0:cutOff[intlist.index(maskBits)]])\n", " else:\n", " ipaddress = ''\n", " \n", " print(color.BOLD + '\\nStart time: ' + color.END + str(query_times_3.start) + '\\n')\n", " print(color.BOLD + 'End time: '+ color.END + str(query_times_3.end) + '\\n')\n", " \n", " print(color.BOLD + 'Ip Address entered: ' + color.END + str(ipAddress.value) + '\\n')\n", " print(color.BOLD + 'Request Uri entered: ' + color.END + str((requestURI.value).strip()) + '\\n' )\n", " \n", " category = 'ApplicationGatewayAccessLog'\n", " ip_var = 'clientIP_s'\n", " host_var = 'host_s'\n", " uri = '\\'' + (requestURI.value).strip() + '\\''\n", " kql_accessLogs = kql_query.format(hostname = host_var, startTime = query_times_3.start, endTime = query_times_3.end, host = hosts_2.value, category = category, ip = ip_var, ipaddress = ipaddress, uri = uri)\n", " df_rawAccessKustoQuery = showQuery(kql_accessLogs)\n", " print(category + ' (Raw) Data- \\n')\n", " display(df_rawAccessKustoQuery.head(10))\n", " \n", " category = 'ApplicationGatewayFirewallLog'\n", " ip_var = 'clientIp_s'\n", " host_var = 'hostname_s'\n", " uri = '@' + '\\'' + (requestURI.value).strip() + '\\''\n", " kql_firewallLogs = kql_query.format(hostname = host_var, startTime = query_times_3.start, endTime = query_times_3.end, host = hosts_2.value, category = category, ip = ip_var, ipaddress = ipaddress, uri = uri,)\n", " df_rawFirewallKustoQuery = showQuery(kql_firewallLogs)\n", " print(category + ' (Alert) Data- \\n')\n", " display(df_rawFirewallKustoQuery.head(10))\n", " pd.reset_option('max_colwidth')\n", " \n", "except Exception as e:\n", " print('Error: ' + str(e))\n", " traceback.print_exc()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8 - AzureML", "language": "python", "name": "python38-azureml" }, "language_info": { "name": "python", "version": "" } }, "nbformat": 4, "nbformat_minor": 4 }