{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Entity Explorer - Linux Host\r\n",
    " <details>\r\n",
    "     <summary>&nbsp;<u>Details...</u></summary>\r\n",
    "\r\n",
    " **Notebook Version:** 1.1<br>\r\n",
    " **Python Version:** Python 3.6 (including Python 3.6 - AzureML)<br>\r\n",
    " **Required Packages**: kqlmagic, msticpy, pandas, pandas_bokeh, numpy, matplotlib, networkx, seaborn, datetime, ipywidgets, ipython, dnspython, ipwhois, folium, maxminddb_geolite2<br>\r\n",
    "\r\n",
    " **Data Sources Required**:\r\n",
    " - Log Analytics/Azure Sentinel - Syslog, Secuirty Alerts, Auditd, Azure Network Analytics.\r\n",
    " - (Optional) - AlienVault OTX (requires account and API key)\r\n",
    " </details>\r\n",
    "\r\n",
    "This Notebooks brings together a series of tools and techniques to enable threat hunting within the context of a singular Linux host. The notebook utilizes a range of data sources to achieve this but in order to support the widest possible range of scenarios this Notebook prioritizes using common Syslog data. If there is detailed auditd data available for a host you may wish to edit the Notebook to rely primarily on this dataset, as it currently stands auditd is used when available to provide insight not otherwise available via Syslog."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><ul class=\"toc-item\"><li><span><a href=\"#Notebook-initialization\" data-toc-modified-id=\"Notebook-initialization-0.1\"><span class=\"toc-item-num\">0.1&nbsp;&nbsp;</span>Notebook initialization</a></span></li><li><span><a href=\"#Get-WorkspaceId-and-Authenticate-to-Log-Analytics\" data-toc-modified-id=\"Get-WorkspaceId-and-Authenticate-to-Log-Analytics-0.2\"><span class=\"toc-item-num\">0.2&nbsp;&nbsp;</span>Get WorkspaceId and Authenticate to Log Analytics</a></span></li></ul></li><li><span><a href=\"#Set-Hunting-Time-Frame\" data-toc-modified-id=\"Set-Hunting-Time-Frame-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Set Hunting Time Frame</a></span><ul class=\"toc-item\"><li><span><a href=\"#Select-Host-to-Investigate\" data-toc-modified-id=\"Select-Host-to-Investigate-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Select Host to Investigate</a></span></li></ul></li><li><span><a href=\"#Host-Summary\" data-toc-modified-id=\"Host-Summary-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Host Summary</a></span><ul class=\"toc-item\"><li><span><a href=\"#Host-Alerts\" data-toc-modified-id=\"Host-Alerts-2.1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Host Alerts</a></span></li></ul></li><li><span><a href=\"#Re-scope-Hunting-Time-Frame\" data-toc-modified-id=\"Re-scope-Hunting-Time-Frame-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Re-scope Hunting Time Frame</a></span></li><li><span><a href=\"#How-to-use-this-Notebook\" data-toc-modified-id=\"How-to-use-this-Notebook-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>How to use this Notebook</a></span></li><li><span><a href=\"#Host-Logon-Events\" data-toc-modified-id=\"Host-Logon-Events-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Host Logon Events</a></span><ul class=\"toc-item\"><li><span><a href=\"#Logon-Sessions\" data-toc-modified-id=\"Logon-Sessions-5.1\"><span class=\"toc-item-num\">5.1&nbsp;&nbsp;</span>Logon Sessions</a></span><ul class=\"toc-item\"><li><span><a href=\"#Session-Details\" data-toc-modified-id=\"Session-Details-5.1.1\"><span class=\"toc-item-num\">5.1.1&nbsp;&nbsp;</span>Session Details</a></span></li><li><span><a href=\"#Raw-data-from-user-session\" data-toc-modified-id=\"Raw-data-from-user-session-5.1.2\"><span class=\"toc-item-num\">5.1.2&nbsp;&nbsp;</span>Raw data from user session</a></span></li></ul></li><li><span><a href=\"#Process-Tree-from-session\" data-toc-modified-id=\"Process-Tree-from-session-5.2\"><span class=\"toc-item-num\">5.2&nbsp;&nbsp;</span>Process Tree from session</a></span></li><li><span><a href=\"#Sudo-Session-Investigation\" data-toc-modified-id=\"Sudo-Session-Investigation-5.3\"><span class=\"toc-item-num\">5.3&nbsp;&nbsp;</span>Sudo Session Investigation</a></span></li></ul></li><li><span><a href=\"#User-Activity\" data-toc-modified-id=\"User-Activity-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>User Activity</a></span></li><li><span><a href=\"#Application-Activity\" data-toc-modified-id=\"Application-Activity-7\"><span class=\"toc-item-num\">7&nbsp;&nbsp;</span>Application Activity</a></span><ul class=\"toc-item\"><li><span><a href=\"#Display-process-tree\" data-toc-modified-id=\"Display-process-tree-7.1\"><span class=\"toc-item-num\">7.1&nbsp;&nbsp;</span>Display process tree</a></span></li><li><span><a href=\"#Application-Logs-with-associated-Threat-Intelligence\" data-toc-modified-id=\"Application-Logs-with-associated-Threat-Intelligence-7.2\"><span class=\"toc-item-num\">7.2&nbsp;&nbsp;</span>Application Logs with associated Threat Intelligence</a></span></li></ul></li><li><span><a href=\"#Network-Activity\" data-toc-modified-id=\"Network-Activity-8\"><span class=\"toc-item-num\">8&nbsp;&nbsp;</span>Network Activity</a></span><ul class=\"toc-item\"><li><span><a href=\"#Choose-ASNs/IPs-to-Check-for-Threat-Intel-Reports\" data-toc-modified-id=\"Choose-ASNs/IPs-to-Check-for-Threat-Intel-Reports-8.1\"><span class=\"toc-item-num\">8.1&nbsp;&nbsp;</span>Choose ASNs/IPs to Check for Threat Intel Reports</a></span></li></ul></li><li><span><a href=\"#Configuration\" data-toc-modified-id=\"Configuration-9\"><span class=\"toc-item-num\">9&nbsp;&nbsp;</span>Configuration</a></span><ul class=\"toc-item\"><li><span><a href=\"#msticpyconfig.yaml-configuration-File\" data-toc-modified-id=\"msticpyconfig.yaml-configuration-File-9.1\"><span class=\"toc-item-num\">9.1&nbsp;&nbsp;</span><code>msticpyconfig.yaml</code> configuration File</a></span></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hunting Hypothesis: \n",
    "Our broad initial hunting hypothesis is that a particular Linux host in our environment has been compromised, we will need to hunt from a range of different positions to validate or disprove this hypothesis.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "### Notebook initialization\n",
    "The next cell:\n",
    "- Checks for the correct Python version\n",
    "- Checks versions and optionally installs required packages\n",
    "- Imports the required packages into the notebook\n",
    "- Sets a number of configuration options.\n",
    "\n",
    "This should complete without errors. If you encounter errors or warnings look at the following two notebooks:\n",
    "- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)\n",
    "- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n",
    "\n",
    "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n",
    "- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)\n",
    "- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)\n",
    "\n",
    "You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. \n",
    "There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:\n",
    "- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)\n",
    "- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:51:59.386590Z",
     "start_time": "2020-06-24T01:51:55.136591Z"
    }
   },
   "outputs": [],
   "source": [
    "from pathlib import Path\r\n",
    "from IPython.display import display, HTML\r\n",
    "\r\n",
    "REQ_PYTHON_VER=(3, 6)\r\n",
    "REQ_MSTICPY_VER=(1, 0, 0)\r\n",
    "REQ_MP_EXTRAS = [\"ml\"]\r\n",
    "\r\n",
    "update_nbcheck = (\r\n",
    "    \"<p style='color: orange; text-align=left'>\"\r\n",
    "    \"<b>Warning: we needed to update '<i>utils/nb_check.py</i>'</b><br>\"\r\n",
    "    \"Please restart the kernel and re-run this cell.\"\r\n",
    "    \"</p>\"\r\n",
    ")\r\n",
    "\r\n",
    "display(HTML(\"<h3>Starting Notebook setup...</h3>\"))\r\n",
    "if Path(\"./utils/nb_check.py\").is_file():\r\n",
    "    try:\r\n",
    "        from utils.nb_check import check_versions\r\n",
    "    except ImportError as err:\r\n",
    "        %xmode Minimal\r\n",
    "        !curl https://raw.githubusercontent.com/Azure/Azure-Sentinel-Notebooks/master/utils/nb_check.py > ./utils/nb_check.py 2>/dev/null\r\n",
    "        display(HTML(update_nbcheck))\r\n",
    "    if \"check_versions\" not in globals():\r\n",
    "        raise ImportError(\"Old version of nb_check.py detected - see instructions below.\")\r\n",
    "    %xmode Verbose\r\n",
    "    check_versions(REQ_PYTHON_VER, REQ_MSTICPY_VER, REQ_MP_EXTRAS)\r\n",
    "\r\n",
    "# If the installation fails try to manually install using\r\n",
    "# !pip install --upgrade msticpy\r\n",
    "\r\n",
    "from msticpy.nbtools import nbinit\r\n",
    "additional_packages = [\r\n",
    "    \"oauthlib\", \"pyvis\", \"python-whois\", \"seaborn\"\r\n",
    "]\r\n",
    "nbinit.init_notebook(\r\n",
    "    namespace=globals(),\r\n",
    "    additional_packages=additional_packages,\r\n",
    "    extra_imports=extra_imports,\r\n",
    ");\r\n",
    "\r\n",
    "\r\n",
    "from bokeh.models import ColumnDataSource, FactorRange\r\n",
    "from bokeh.palettes import viridis\r\n",
    "from bokeh.plotting import show, Row, figure\r\n",
    "from bokeh.transform import factor_cmap, cumsum\r\n",
    "from dns import reversename, resolver\r\n",
    "from functools import lru_cache\r\n",
    "from ipaddress import ip_address\r\n",
    "from ipwhois import IPWhois\r\n",
    "from math import pi\r\n",
    "from msticpy.common.exceptions import MsticpyException\r\n",
    "from msticpy.nbtools import observationlist\r\n",
    "from msticpy.nbtools.foliummap import get_map_center\r\n",
    "from msticpy.sectools import auditdextract\r\n",
    "from msticpy.sectools.cmd_line import risky_cmd_line\r\n",
    "from msticpy.sectools.ip_utils import convert_to_ip_entities\r\n",
    "from msticpy.sectools.syslog_utils import create_host_record, cluster_syslog_logons_df, risky_sudo_sessions\r\n",
    "from pyvis.network import Network\r\n",
    "import datetime as dt\r\n",
    "import re\r\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "start_time": "2019-09-05T18:05:09.026Z"
    }
   },
   "source": [
    "### Get WorkspaceId and Authenticate to Log Analytics\n",
    " <details>\n",
    "    <summary> <u>Details...</u></summary>\n",
    "If you are using user/device authentication, run the following cell. \n",
    "- Click the 'Copy code to clipboard and authenticate' button.\n",
    "- This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard. \n",
    "- Select the text box and paste (Ctrl-V/Cmd-V) the copied value. \n",
    "- You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.\n",
    "\n",
    "Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:\n",
    "```\n",
    "%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)\n",
    "```\n",
    "instead of\n",
    "```\n",
    "%kql loganalytics://code().workspace(WORKSPACE_ID)\n",
    "```\n",
    "\n",
    "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.<br>\n",
    "On successful authentication you should see a ```popup schema``` button.\n",
    "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n",
    " </details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:51:59.434663Z",
     "start_time": "2020-06-24T01:51:59.420592Z"
    }
   },
   "outputs": [],
   "source": [
    "# See if we have an Azure Sentinel Workspace defined in our config file.\n",
    "# If not, let the user specify Workspace and Tenant IDs\n",
    "\n",
    "ws_config = WorkspaceConfig()\n",
    "if not ws_config.config_loaded:\n",
    "    ws_config.prompt_for_ws()\n",
    "    \n",
    "qry_prov = QueryProvider(data_environment=\"AzureSentinel\")\n",
    "print(\"done\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:52:41.282988Z",
     "start_time": "2020-06-24T01:52:00.925257Z"
    }
   },
   "outputs": [],
   "source": [
    "# Authenticate to Azure Sentinel workspace\n",
    "qry_prov.connect(ws_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set Hunting Time Frame\n",
    "To begin the hunt we need to et the time frame in which you wish to test your compromised host hunting hypothesis within. Use the widget below to select your start and end time for the hunt. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:52:41.392989Z",
     "start_time": "2020-06-24T01:52:41.334990Z"
    }
   },
   "outputs": [],
   "source": [
    "query_times = nbwidgets.QueryTime(units='day',\n",
    "                                  max_before=14, max_after=1, before=1)\n",
    "query_times.display()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Select Host to Investigate\n",
    "Select the host you want to test your hunting hypothesis against, only hosts with Syslog data within the time frame you specified are available. If the host you wish to select is not present try adjusting your time frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Get a list of hosts with syslog data in our hunting timegframe to provide easy selection\n",
    "syslog_query = f\"\"\"Syslog | where TimeGenerated between (datetime({query_times.start}) .. datetime({query_times.end})) | summarize by Computer\"\"\"\n",
    "md(\"Collecting avaliable host details...\")\n",
    "hosts_list = qry_prov._query_provider.query(query=syslog_query)\n",
    "if isinstance(hosts_list, pd.DataFrame) and not hosts_list.empty:\n",
    "    hosts = hosts_list[\"Computer\"].unique().tolist()\n",
    "    host_text = nbwidgets.SelectItem(description='Select host to investigate: ', \n",
    "                             item_list=hosts, width='75%', auto_display=True)\n",
    "else:\n",
    "    display(md(\"There are no hosts with syslog data in this time period to investigate\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Host Summary\n",
    "Below is a overview of the selected host based on available data sources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hostname=host_text.value\n",
    "az_net_df = None\n",
    "# Collect data on the host\n",
    "all_syslog_query = f\"Syslog | where TimeGenerated between (datetime({query_times.start}) .. datetime({query_times.end})) | where Computer =~ '{hostname}'\"\"\"\n",
    "all_syslog_data = qry_prov.exec_query(all_syslog_query)\n",
    "if isinstance(all_syslog_data, pd.DataFrame) and not all_syslog_data.empty:\n",
    "    heartbeat_query = f\"\"\"Heartbeat | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end})| where Computer == '{hostname}' | top 1 by TimeGenerated desc nulls last\"\"\"\n",
    "    if \"AzureNetworkAnalytics_CL\" in qry_prov.schema:\n",
    "        aznet_query = f\"\"\"AzureNetworkAnalytics_CL | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end}) | where VirtualMachine_s has '{hostname}' | where ResourceType == 'NetworkInterface' | top 1 by TimeGenerated desc | project PrivateIPAddresses = PrivateIPAddresses_s, PublicIPAddresses = PublicIPAddresses_s\"\"\"\n",
    "        print(\"Getting network data...\")\n",
    "        az_net_df = qry_prov.exec_query(query=aznet_query)\n",
    "    print(\"Getting host data...\")\n",
    "    host_hb = qry_prov.exec_query(query=heartbeat_query)\n",
    "\n",
    "    # Create host entity record, with Azure network data if any is avaliable\n",
    "    if az_net_df is not None and isinstance(az_net_df, pd.DataFrame) and not az_net_df.empty:\n",
    "        host_entity = create_host_record(syslog_df=all_syslog_data, heartbeat_df=host_hb, az_net_df=az_net_df)\n",
    "    else:\n",
    "        host_entity = create_host_record(syslog_df=all_syslog_data, heartbeat_df=host_hb)\n",
    "\n",
    "    md(\n",
    "                \"<b>Host Details</b><br>\"\n",
    "                f\"<b>Hostname</b>: {host_entity.computer}<br>\"\n",
    "                f\"<b>OS</b>: {host_entity.OSType} {host_entity.OSName}<br>\"\n",
    "                f\"<b>IP Address</b>: {host_entity.IPAddress.Address}<br>\"\n",
    "                f\"<b>Location</b>: {host_entity.IPAddress.Location.CountryName}<br>\"\n",
    "                f\"<b>Installed Applications</b>: {host_entity.Applications}<br>\"\n",
    "            )\n",
    "else:\n",
    "    md_warn(\"No Syslog data found, check hostname and timeframe.\")\n",
    "    md(\"The data query may be timing out, consider reducing the timeframe size.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Host Alerts & Bookmarks\n",
    "This section provides an overview of any security alerts or Hunting Bookmarks in Azure Sentinel related to this host, this will help scope and guide our hunt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "related_alerts = qry_prov.SecurityAlert.list_related_alerts(\n",
    "    query_times, host_name=hostname)\n",
    "realted_bookmarks = qry_prov.AzureSentinel.list_bookmarks_for_entity(query_times, entity_id=hostname)\n",
    "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n",
    "    host_alert_items = (related_alerts[['AlertName', 'TimeGenerated']]\n",
    "                        .groupby('AlertName').TimeGenerated.agg('count').to_dict())\n",
    "\n",
    "    def print_related_alerts(alertDict, entityType, entityName):\n",
    "        if len(alertDict) > 0:\n",
    "            md(f\"Found {len(alertDict)} different alert types related to this {entityType} (\\'{entityName}\\')\")\n",
    "            for (k, v) in alertDict.items():\n",
    "                md(f\"- {k}, Count of alerts: {v}\")\n",
    "        else:\n",
    "            md(f\"No alerts for {entityType} entity \\'{entityName}\\'\")\n",
    "\n",
    "    print_related_alerts(host_alert_items, 'host', host_entity.HostName)\n",
    "    nbdisplay.display_timeline(\n",
    "        data=related_alerts, source_columns=[\"AlertName\"], title=\"Host alerts over time\", height=300, color=\"red\")\n",
    "else:\n",
    "    md('No related alerts found.')\n",
    "    \n",
    "if isinstance(realted_bookmarks, pd.DataFrame) and not realted_bookmarks.empty:\n",
    "    nbdisplay.display_timeline(data=realted_bookmarks, source_columns=[\"BookmarkName\"], height=200, color=\"orange\", title=\"Host bookmarks over time\",)\n",
    "else:\n",
    "    md('No related bookmarks found.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:53:31.887372Z",
     "start_time": "2020-06-24T01:53:31.826372Z"
    }
   },
   "outputs": [],
   "source": [
    "rel_alert_select = None\n",
    "\n",
    "def show_full_alert(selected_alert):\n",
    "    global security_alert, alert_ip_entities\n",
    "    security_alert = SecurityAlert(\n",
    "        rel_alert_select.selected_alert)\n",
    "    nbdisplay.display_alert(security_alert, show_entities=True)\n",
    "\n",
    "# Show selected alert when selected\n",
    "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n",
    "    related_alerts['CompromisedEntity'] = related_alerts['Computer']\n",
    "    md('### Click on alert to view details.')\n",
    "    rel_alert_select = nbwidgets.SelectAlert(alerts=related_alerts,\n",
    "                                               action=show_full_alert)\n",
    "    rel_alert_select.display()\n",
    "else:\n",
    "    md('No related alerts found.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Re-scope Hunting Time Frame\n",
    "Based on the security alerts for this host we can choose to re-scope our hunting time frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:53:32.233372Z",
     "start_time": "2020-06-24T01:53:32.172372Z"
    }
   },
   "outputs": [],
   "source": [
    "if rel_alert_select is None or rel_alert_select.selected_alert is None:\n",
    "    start = query_times.start\n",
    "else:\n",
    "    start = rel_alert_select.selected_alert['TimeGenerated']\n",
    "\n",
    "# Set new investigation time windows based on the selected alert\n",
    "invest_times = nbwidgets.QueryTime(\n",
    "    units='day', max_before=24, max_after=12, before=1, after=1, origin_time=start)\n",
    "invest_times.display()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to use this Notebook\n",
    "Whilst this notebook is linear in layout it doesn't need to be linear in usage. We have selected our host to investigate and set an initial hunting time-frame to work within. We can now start to test more specific hunting hypothesis with the aim of validating our broader initial hunting hypothesis. To do this we can start by looking at:\n",
    "- <a>Host Logon Events</a>\n",
    "- <a>User Activity</a>\n",
    "- <a>Application Activity</a>\n",
    "- <a>Network Activity</a>\n",
    "\n",
    "You can choose to start below with a hunt in host logon events or choose to jump to one of the other sections listed above. The order in which you choose to run each of these major sections doesn't matter, they are each self contained. You may also choose to rerun sections based on your findings from running other sections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook uses external threat intelligence sources to enrich data. The next cell loads the TILookup class.\n",
    "> **Note**: to use TILookup you will need configuration settings in your msticpyconfig.yaml\n",
    "> <br>see [TIProviders documenation](https://msticpy.readthedocs.io/en/latest/TIProviders.html)\n",
    "> <br>and [Configuring Notebook Environment notebook](./ConfiguringNotebookEnvironment.ipynb)\n",
    "> <br>or [ConfiguringNotebookEnvironment (GitHub static view)](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tilookup = TILookup()\n",
    "md(\"Threat intelligence provider loading complete.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Host Logon Events\n",
    "**Hypothesis:** That an attacker has gained legitimate access to the host via compromised credentials and has logged into the host to conduct malicious activity. \n",
    "\n",
    "This section provides an overview of logon activity for the host within our hunting time frame, the purpose of this is to allow for the identification of anomalous logons or attempted logons."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\r\n",
    "# Collect logon events for this, seperate them into sucessful and unsucessful and cluster sucessful one into sessions\r\n",
    "logon_events = qry_prov.LinuxSyslog.user_logon(start=invest_times.start, end=invest_times.end, host_name=hostname)\r\n",
    "remote_logons = None\r\n",
    "failed_logons = None\r\n",
    "\r\n",
    "if isinstance(logon_events, pd.DataFrame) and not logon_events.empty:\r\n",
    "    remote_logons = (logon_events[logon_events['LogonResult'] == 'Success'])\r\n",
    "    failed_logons = (logon_events[logon_events['LogonResult'] == 'Failure'])\r\n",
    "else:\r\n",
    "    print(\"No logon events in this timeframe\")\r\n",
    "\r\n",
    "\r\n",
    "if (isinstance(remote_logons, pd.DataFrame) and not remote_logons.empty) or (isinstance(failed_logons, pd.DataFrame) and not failed_logons.empty):\r\n",
    "#Provide a timeline of sucessful and failed logon attempts to aid identification of potential brute force attacks\r\n",
    "    display(Markdown('### Timeline of sucessful host logons.'))\r\n",
    "    tooltip_cols = ['User', 'ProcessName', 'SourceIP']\r\n",
    "    if rel_alert_select is not None:\r\n",
    "        logon_timeline = nbdisplay.display_timeline(data=remote_logons, overlay_data=failed_logons, source_columns=tooltip_cols, height=200, overlay_color=\"red\", alert = rel_alert_select.selected_alert)\r\n",
    "    else:\r\n",
    "        logon_timeline = nbdisplay.display_timeline(data=remote_logons, overlay_data=failed_logons, source_columns=tooltip_cols, height=200, overlay_color=\"red\")\r\n",
    "    display(Markdown('<b>Key:</b><p style=\"color:darkblue\">Sucessful logons </p><p style=\"color:Red\">Failed Logon Attempts (via su)</p>'))  \r\n",
    "\r\n",
    "    all_df = pd.DataFrame(dict(successful= remote_logons['ProcessName'].value_counts(), failed = failed_logons['ProcessName'].value_counts())).fillna(0)\r\n",
    "    fail_data = pd.value_counts(failed_logons['User'].values, sort=True).head(10).reset_index(name='value').rename(columns={'User':'Count'})\r\n",
    "    fail_data['angle'] = fail_data['value']/fail_data['value'].sum() * 2*pi\r\n",
    "    fail_data['color'] = viridis(len(fail_data))\r\n",
    "    fp = figure(plot_height=350, plot_width=450, title=\"Relative Frequencies of Failed Logons by Account\", toolbar_location=None, tools=\"hover\", tooltips=\"@index: @value\")\r\n",
    "    fp.wedge(x=0, y=1, radius=0.5, start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'), line_color=\"white\", fill_color='color', legend='index', source=fail_data)\r\n",
    "\r\n",
    "    sucess_data = pd.value_counts(remote_logons['User'].values, sort=False).reset_index(name='value').rename(columns={'User':'Count'})\r\n",
    "    sucess_data['angle'] = sucess_data['value']/sucess_data['value'].sum() * 2*pi\r\n",
    "    sucess_data['color'] = viridis(len(sucess_data))\r\n",
    "    sp = figure(plot_height=350, width=450, title=\"Relative Frequencies of Sucessful Logons by Account\", toolbar_location=None, tools=\"hover\", tooltips=\"@index: @value\")\r\n",
    "    sp.wedge(x=0, y=1, radius=0.5, start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'), line_color=\"white\", fill_color='color', legend='index', source=sucess_data)\r\n",
    "\r\n",
    "    fp.axis.axis_label=None\r\n",
    "    fp.axis.visible=False\r\n",
    "    fp.grid.grid_line_color = None\r\n",
    "    sp.axis.axis_label=None\r\n",
    "    sp.axis.visible=False\r\n",
    "    sp.grid.grid_line_color = None\r\n",
    "\r\n",
    "\r\n",
    "    processes = all_df.index.values.tolist()\r\n",
    "    results = all_df.columns.values.tolist()\r\n",
    "    fail_sucess_data = {'processes' :processes,\r\n",
    "           'sucess' : all_df['successful'].values.tolist(),\r\n",
    "           'failure': all_df['failed'].values.tolist()}\r\n",
    "\r\n",
    "    palette = viridis(2)\r\n",
    "    x = [ (process, result) for process in processes for result in results ]\r\n",
    "    counts = sum(zip(fail_sucess_data['sucess'], fail_sucess_data['failure']), ()) \r\n",
    "    source = ColumnDataSource(data=dict(x=x, counts=counts))\r\n",
    "    b = figure(x_range=FactorRange(*x), plot_height=350,  plot_width=450, title=\"Failed and Sucessful logon attempts by process\",\r\n",
    "               toolbar_location=None, tools=\"\", y_minor_ticks=2)\r\n",
    "    b.vbar(x='x', top='counts', width=0.9, source=source, line_color=\"white\",\r\n",
    "           fill_color=factor_cmap('x', palette=palette, factors=results, start=1, end=2))\r\n",
    "    b.y_range.start = 0\r\n",
    "    b.x_range.range_padding = 0.1\r\n",
    "    b.xaxis.major_label_orientation = 1\r\n",
    "    b.xgrid.grid_line_color = None\r\n",
    "\r\n",
    "    show(Row(sp,fp,b))\r\n",
    "\r\n",
    "    ip_list = [convert_to_ip_entities(i, ip_col=\"SourceIP\")[0] for i in remote_logons['SourceIP'].unique() if i != \"\"]\r\n",
    "    ip_fail_list = [convert_to_ip_entities(i)[0] for i in failed_logons['SourceIP'].unique() if i != \"\"]\r\n",
    "    \r\n",
    "    location = get_map_center(ip_list + ip_fail_list)\r\n",
    "    folium_map = FoliumMap(location = location, zoom_start=1.4)\r\n",
    "    #Map logon locations to allow for identification of anomolous locations\r\n",
    "    if len(ip_fail_list) > 0:\r\n",
    "        md('<h3>Map of Originating Location of Logon Attempts</h3>')\r\n",
    "        icon_props = {'color': 'red'}\r\n",
    "        folium_map.add_ip_cluster(ip_entities=ip_fail_list, **icon_props)\r\n",
    "    if len(ip_list) > 0:\r\n",
    "        icon_props = {'color': 'green'}\r\n",
    "        folium_map.add_ip_cluster(ip_entities=ip_list, **icon_props)\r\n",
    "        display(folium_map.folium_map)\r\n",
    "        md('<p style=\"color:red\">Warning: the folium mapping library '\r\n",
    "                         'does not display correctly in some browsers.</p><br>'\r\n",
    "                         'If you see a blank image please retry with a different browser.')  \r\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Logon Sessions\n",
    "Based on the detail above if you wish to focus your hunt on a particular user jump to the [User Activity](#user) section. Alternatively to further further refine our hunt we need to select a logon session to view in more detail. Select a session from the list below to continue. Sessions that occurred at the time an alert was raised for this host, or where the user has a abnormal ratio of failed to successful login attempts are highlighted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:53:38.073770Z",
     "start_time": "2020-06-24T01:53:37.978770Z"
    }
   },
   "outputs": [],
   "source": [
    "logon_sessions_df = None\n",
    "try:\n",
    "    print(\"Clustering logon sessions...\")\n",
    "    logon_sessions_df = cluster_syslog_logons_df(logon_events)\n",
    "except Exception as err:\n",
    "    print(f\"Error clustering logons: {err}\")\n",
    "\n",
    "if logon_sessions_df is not None:\n",
    "    logon_sessions_df[\"Alerts during session?\"] = np.nan\n",
    "    # check if any alerts occur during logon window.\n",
    "    logon_sessions_df['Start (UTC)'] = [(time - dt.timedelta(seconds=5)) for time in logon_sessions_df['Start']]\n",
    "    logon_sessions_df['End (UTC)'] = [(time + dt.timedelta(seconds=5)) for time in logon_sessions_df['End']]\n",
    "\n",
    "    for TimeGenerated in related_alerts['TimeGenerated']:\n",
    "        logon_sessions_df.loc[(TimeGenerated >= logon_sessions_df['Start (UTC)']) & (TimeGenerated <= logon_sessions_df['End (UTC)']), \"Alerts during session?\"] = \"Yes\"\n",
    "\n",
    "    logon_sessions_df.loc[logon_sessions_df['User'] == 'root', \"Root?\"] = \"Yes\"\n",
    "    logon_sessions_df.replace(np.nan, \"No\", inplace=True)\n",
    "\n",
    "    ratios = []\n",
    "    for _, row in logon_sessions_df.iterrows():\n",
    "        suc_fail = logon_events.apply(lambda x: True if x['User'] == row['User'] and x[\"LogonResult\"] == 'Success' else(\n",
    "            False if x['User'] == row['User'] and x[\"LogonResult\"] == 'Failure' else None), axis=1)\n",
    "        numofsucess = len(suc_fail[suc_fail == True].index)\n",
    "        numoffail = len(suc_fail[suc_fail == False].index)\n",
    "        if numoffail == 0:\n",
    "            ratio = 1\n",
    "        else:\n",
    "            ratio = numofsucess/numoffail\n",
    "        ratios.append(ratio)\n",
    "    logon_sessions_df[\"Sucessful to failed logon ratio\"] = ratios\n",
    "\n",
    "    def color_cells(val):\n",
    "        if isinstance(val, str):\n",
    "            color = 'yellow' if val == \"Yes\" else 'white'\n",
    "        elif isinstance(val, float):\n",
    "            color = 'yellow' if val > 0.5 else 'white'\n",
    "        else:\n",
    "            color = 'white'\n",
    "        return 'background-color: %s' % color \n",
    "\n",
    "    display(logon_sessions_df[['User','Start (UTC)', 'End (UTC)', 'Alerts during session?', 'Sucessful to failed logon ratio', 'Root?']]\n",
    "                        .style.applymap(color_cells).hide_index())\n",
    "\n",
    "    logon_items = (\n",
    "        logon_sessions_df[['User','Start (UTC)', 'End (UTC)']]\n",
    "        .to_string(header=False, index=False, index_names=False)\n",
    "        .split('\\n')\n",
    "    )\n",
    "    logon_sessions_df[\"Key\"] = logon_items    \n",
    "    logon_sessions_df.set_index('Key', inplace=True)\n",
    "    logon_dict = logon_sessions_df[['User','Start (UTC)', 'End (UTC)']].to_dict('index')\n",
    "\n",
    "    logon_selection = nbwidgets.SelectItem(description='Select logon session to investigate: ',\n",
    "                                                 item_dict=logon_dict , width='80%', auto_display=True)\n",
    "else:\n",
    "    md(\"No logon sessions during this timeframe\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Session Details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:53:44.059818Z",
     "start_time": "2020-06-24T01:53:40.909226Z"
    }
   },
   "outputs": [],
   "source": [
    "def view_syslog(selected_facility):\r\n",
    "    return [syslog_events.query('Facility == @selected_facility')]\r\n",
    "\r\n",
    "# Produce a summary of user modification actions taken\r\n",
    "    if \"Add\" in x:\r\n",
    "        return len(add_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist())\r\n",
    "    elif \"Modify\" in x:\r\n",
    "        return len(mod_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist())\r\n",
    "    elif \"Delete\" in x:\r\n",
    "        return len(del_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist())\r\n",
    "    else:\r\n",
    "        return \"\"\r\n",
    "\r\n",
    "crn_tl_data = {}\r\n",
    "user_tl_data = {}\r\n",
    "sudo_tl_data = {}\r\n",
    "sudo_sessions = None\r\n",
    "tooltip_cols = ['SyslogMessage']\r\n",
    "if logon_sessions_df is not None:\r\n",
    "    #Collect data based on the session selected for investigation\r\n",
    "    invest_sess = {'StartTimeUtc': logon_selection.value.get('Start (UTC)'), 'EndTimeUtc': logon_selection.value.get(\r\n",
    "        'End (UTC)'), 'Account': logon_selection.value.get('User'), 'Host': hostname}\r\n",
    "    session = entities.HostLogonSession(invest_sess)\r\n",
    "    syslog_events = qry_prov.LinuxSyslog.all_syslog(\r\n",
    "        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)\r\n",
    "    sudo_events = qry_prov.LinuxSyslog.sudo_activity(\r\n",
    "        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host, user=session.Account)\r\n",
    "    \r\n",
    "    if isinstance(sudo_events, pd.DataFrame) and not sudo_events.empty:\r\n",
    "        try:\r\n",
    "            sudo_sessions = cluster_syslog_logons_df(logon_events=sudo_events)\r\n",
    "        except MsticpyException:\r\n",
    "            pass\r\n",
    "\r\n",
    "    # Display summary of cron activity in session\r\n",
    "    cron_events = qry_prov.LinuxSyslog.cron_activity(\r\n",
    "        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)\r\n",
    "    if not isinstance(cron_events, pd.DataFrame) or cron_events.empty:\r\n",
    "        md(f'<h3> No Cron activity for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}</h3>')\r\n",
    "    else:\r\n",
    "        cron_events['CMD'].replace('', np.nan, inplace=True)\r\n",
    "        crn_tl_data = {\"Cron Exections\": {\"data\": cron_events[['TimeGenerated', 'CMD', 'CronUser', 'SyslogMessage']].dropna(), \"source_columns\": tooltip_cols, \"color\": \"Blue\"},\r\n",
    "                       \"Cron Edits\": {\"data\": cron_events.loc[cron_events['SyslogMessage'].str.contains('EDIT')], \"source_columns\": tooltip_cols, \"color\": \"Green\"}}\r\n",
    "        md('<h2> Most common commands run by cron:</h2>')\r\n",
    "        md('This shows how often each cron job was exected within the specified time window')\r\n",
    "        cron_commands = (cron_events[['EventTime', 'CMD']]\r\n",
    "                         .groupby(['CMD']).count()\r\n",
    "                         .dropna()\r\n",
    "                         .style\r\n",
    "                         .set_table_attributes('width=900px, text-align=center')\r\n",
    "                         .background_gradient(cmap='Reds', low=0.5, high=1)\r\n",
    "                         .format(\"{0:0>1.0f}\"))\r\n",
    "        display(cron_commands)\r\n",
    "\r\n",
    "    # Display summary of user and group creations, deletions and modifications during the session\r\n",
    "    user_activity = qry_prov.LinuxSyslog.user_group_activity(\r\n",
    "        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)\r\n",
    "    if not isinstance(user_activity, pd.DataFrame) or user_activity.empty:\r\n",
    "        md(f'<h3>No user or group moidifcations for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}></h3>')\r\n",
    "    else:\r\n",
    "        add_events = user_activity[user_activity['UserGroupAction'].str.contains(\r\n",
    "            'Add')]\r\n",
    "        del_events = user_activity[user_activity['UserGroupAction'].str.contains(\r\n",
    "            'Delete')]\r\n",
    "        mod_events = user_activity[user_activity['UserGroupAction'].str.contains(\r\n",
    "            'Modify')]\r\n",
    "        user_activity['Count'] = user_activity.groupby('UserGroupAction')['UserGroupAction'].transform('count')\r\n",
    "        if add_events.empty and del_events.empty and mod_events.empty:\r\n",
    "            md('<h2> Users and groups added or deleted:</h2<>')\r\n",
    "            md(f'No users or groups were added or deleted on {host_entity.HostName} between {query_times.start} and {query_times.end}')\r\n",
    "            user_tl_data = {}\r\n",
    "        else:\r\n",
    "            md(\"<h2>Users added, modified or deleted</h2>\")\r\n",
    "            display(user_activity[['UserGroupAction','Count']].drop_duplicates().style.hide_index())\r\n",
    "            account_actions = pd.DataFrame({\"User Additions\": [add_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist()],\r\n",
    "                                            \"User Modifications\": [mod_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist()],\r\n",
    "                                            \"User Deletions\": [del_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist()]})\r\n",
    "            display(account_actions.style.hide_index())\r\n",
    "            user_tl_data = {\"User adds\": {\"data\": add_events, \"source_columns\": tooltip_cols, \"color\": \"Orange\"},\r\n",
    "                            \"User deletes\": {\"data\": del_events, \"source_columns\": tooltip_cols, \"color\": \"Red\"},\r\n",
    "                            \"User modfications\": {\"data\": mod_events, \"source_columns\": tooltip_cols, \"color\": \"Grey\"}}\r\n",
    "        \r\n",
    "        # Display sudo activity during session\r\n",
    "    if not isinstance(sudo_sessions, pd.DataFrame) or sudo_sessions.empty:\r\n",
    "        md(f\"<h3>No Sudo sessions for {session.Host} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}</h3>\")\r\n",
    "        sudo_tl_data = {}\r\n",
    "    else:\r\n",
    "        sudo_start = sudo_events[sudo_events[\"SyslogMessage\"].str.contains(\r\n",
    "            \"pam_unix.+session opened\")].rename(columns={\"Sudoer\": \"User\"})\r\n",
    "        sudo_tl_data = {\"Host logons\": {\"data\": remote_logons, \"source_columns\": tooltip_cols, \"color\": \"Cyan\"},\r\n",
    "                        \"Sudo sessions\": {\"data\": sudo_start, \"source_columns\": tooltip_cols, \"color\": \"Purple\"}}\r\n",
    "        try:\r\n",
    "            risky_actions = cmd_line.risky_cmd_line(events=sudo_events, log_type=\"Syslog\")\r\n",
    "            suspicious_events = cmd_speed(\r\n",
    "                cmd_events=sudo_events, time=60, events=2, cmd_field=\"Command\")\r\n",
    "        except:\r\n",
    "            risky_actions = None\r\n",
    "            suspicious_events = None\r\n",
    "        if risky_actions is None and suspicious_events is None:\r\n",
    "            pass\r\n",
    "        else:\r\n",
    "            risky_sessions = risky_sudo_sessions(\r\n",
    "                risky_actions=risky_actions, sudo_sessions=sudo_sessions, suspicious_actions=suspicious_events)\r\n",
    "            for key in risky_sessions:\r\n",
    "                if key in sudo_sessions:\r\n",
    "                    sudo_sessions[f\"{key} - {risky_sessions[key]}\"] = sudo_sessions.pop(\r\n",
    "                        key)\r\n",
    "    \r\n",
    "        if isinstance(sudo_events, pd.DataFrame):\r\n",
    "            sudo_events_val = sudo_events[['EventTime', 'CommandCall']][sudo_events['CommandCall']!=\"\"].dropna(how='any', subset=['CommandCall'])\r\n",
    "            if sudo_events_val.empty:\r\n",
    "                md(f\"No sucessful sudo activity for {hostname} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}\")\r\n",
    "            else:\r\n",
    "                sudo_events.replace(\"\", np.nan, inplace=True)\r\n",
    "                md('<h2> Frequency of sudo commands</h2>')\r\n",
    "                md('This shows how many times each command has been run with sudo. /bin/bash is usally associated with the use of \"sudo -i\"')\r\n",
    "                sudo_commands = (sudo_events[['EventTime', 'CommandCall']]\r\n",
    "                                .groupby(['CommandCall'])\r\n",
    "                                .count()\r\n",
    "                                .dropna()\r\n",
    "                                .style\r\n",
    "                                .set_table_attributes('width=900px, text-align=center')\r\n",
    "                                .background_gradient(cmap='Reds', low=.5, high=1)\r\n",
    "                                .format(\"{0:0>3.0f}\"))\r\n",
    "                display(sudo_commands)\r\n",
    "        else:\r\n",
    "            md(f\"No sucessful sudo activity for {hostname} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}\")  \r\n",
    "\r\n",
    "    # Display a timeline of all activity during session\r\n",
    "    crn_tl_data.update(user_tl_data)\r\n",
    "    crn_tl_data.update(sudo_tl_data)\r\n",
    "    if crn_tl_data:\r\n",
    "        md('<h2> Session Timeline.</h2>')\r\n",
    "        nbdisplay.display_timeline(\r\n",
    "            data=crn_tl_data, title='Session Timeline', height=300)\r\n",
    "else:\r\n",
    "    md(\"No logon sessions during this timeframe\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Raw data from user session\n",
    "Use this syslog message data to further investigate suspicous activity during the session"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:53:47.432915Z",
     "start_time": "2020-06-24T01:53:45.628367Z"
    }
   },
   "outputs": [],
   "source": [
    "if isinstance(logon_sessions_df, pd.DataFrame) and not logon_sessions_df.empty:\r\n",
    "    #Return syslog data and present it to the use for investigation\r\n",
    "    session_syslog = qry_prov.LinuxSyslog.all_syslog(\r\n",
    "        start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)\r\n",
    "    if session_syslog.empty:\r\n",
    "        display(HTML(\r\n",
    "            f' No syslog for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}'))\r\n",
    "\r\n",
    "\r\n",
    "    def view_sudo(selected_cmd):\r\n",
    "        return [sudo_events.query('CommandCall == @selected_cmd')[\r\n",
    "                ['TimeGenerated', 'SyslogMessage', 'Sudoer', 'SudoTo', 'Command', 'CommandCall']]]\r\n",
    "\r\n",
    "    # Show syslog messages associated with selected sudo command\r\n",
    "    items = sudo_events['CommandCall'].dropna().unique().tolist()\r\n",
    "    if items:\r\n",
    "        md(\"<h3>View all messages associated with a sudo command</h3>\")\r\n",
    "        display(nbwidgets.SelectItem(item_list=items, action=view_sudo))\r\n",
    "else:\r\n",
    "    md(\"No logon sessions during this timeframe\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:53:48.221915Z",
     "start_time": "2020-06-24T01:53:48.175914Z"
    }
   },
   "outputs": [],
   "source": [
    "if isinstance(logon_sessions_df, pd.DataFrame) and not logon_sessions_df.empty:\n",
    "    # Display syslog messages from the session witht he facility selected\n",
    "    items = syslog_events['Facility'].dropna().unique().tolist()\n",
    "    md(\"<h3>View all messages associated with a syslog facility</h3>\")\n",
    "    display(nbwidgets.SelectItem(item_list=items, action=view_syslog))\n",
    "else:\n",
    "    md(\"No logon sessions during this timeframe\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Process Tree from session"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:53:51.672525Z",
     "start_time": "2020-06-24T01:53:50.175953Z"
    }
   },
   "outputs": [],
   "source": [
    "if isinstance(logon_sessions_df, pd.DataFrame) and not logon_sessions_df.empty:\r\n",
    "    display(HTML(\"<h3>Process Trees from session</h3>\"))\r\n",
    "    print(\"Building process tree, this may take some time...\")\r\n",
    "    # Find the table with auditd data in\r\n",
    "    regex = '.*audit.*\\_cl?'\r\n",
    "    matches = ((re.match(regex, key, re.IGNORECASE)) for key in qry_prov.schema)\r\n",
    "    for match in matches:\r\n",
    "        if match != None:\r\n",
    "            audit_table = match.group(0)\r\n",
    "        else:\r\n",
    "            audit_table = None\r\n",
    "\r\n",
    "    # Retrieve auditd data\r\n",
    "    if audit_table:\r\n",
    "        audit_data = qry_prov.LinuxAudit.auditd_all(\r\n",
    "            start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=hostname\r\n",
    "        )\r\n",
    "        if isinstance(audit_data, pd.DataFrame) and not audit_data.empty:\r\n",
    "            audit_events = auditdextract.extract_events_to_df(\r\n",
    "                data=audit_data\r\n",
    "            )\r\n",
    "\r\n",
    "            process_tree = auditdextract.generate_process_tree(audit_data=audit_events)\r\n",
    "            process_tree.mp_process_tree.plot()\r\n",
    "        else:\r\n",
    "            display(HTML(\"No auditd data avaliable to build process tree\"))\r\n",
    "    else:\r\n",
    "        display(HTML(\"No auditd data avaliable to build process tree\"))\r\n",
    "else:\r\n",
    "    md(\"No logon sessions during this timeframe\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Click [here](#app) to start a process/application focused hunt or continue with session based hunt below by selecting a sudo session to investigate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sudo Session Investigation\n",
    "Sudo activity is often required by an attacker to conduct actions on target, and more granular data is avalibale for sudo sessions allowing for deeper level hunting within these sesions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:53:55.462637Z",
     "start_time": "2020-06-24T01:53:55.422637Z"
    }
   },
   "outputs": [],
   "source": [
    "if logon_sessions_df is not None and sudo_sessions is not None:\n",
    "    sudo_items = sudo_sessions[['User','Start', 'End']].to_string(header=False,\n",
    "                      index=False,\n",
    "                      index_names=False).split('\\n')\n",
    "    sudo_sessions[\"Key\"] = sudo_items\n",
    "    sudo_sessions.set_index('Key', inplace=True)\n",
    "    sudo_dict = sudo_sessions[['User','Start', 'End']].to_dict('index')\n",
    "\n",
    "    sudo_selection = nbwidgets.SelectItem(description='Select sudo session to investigate: ',\n",
    "                                                item_dict=sudo_dict, width='100%', height='300px', auto_display=True)\n",
    "else:\n",
    "    sudo_selection = None\n",
    "    md(\"No logon sessions during this timeframe\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:57:23.902023Z",
     "start_time": "2020-06-24T01:57:21.856481Z"
    }
   },
   "outputs": [],
   "source": [
    "#Collect data associated with the sudo session selected\r\n",
    "sudo_events = None\r\n",
    "from msticpy.sectools.tiproviders.ti_provider_base import TISeverity\r\n",
    "\r\n",
    "def ti_check_sev(severity, threshold):\r\n",
    "    severity = TISeverity.parse(severity)\r\n",
    "    threshold = TISeverity.parse(threshold)\r\n",
    "    return severity.value >= threshold.value\r\n",
    "\r\n",
    "if sudo_selection:\r\n",
    "    sudo_sess = {'StartTimeUtc': sudo_selection.value.get('Start'), 'EndTimeUtc': sudo_selection.value.get(\r\n",
    "        'End'), 'Account': sudo_selection.value.get('User'), 'Host': hostname}\r\n",
    "    sudo_session = entities.HostLogonSession(sudo_sess)\r\n",
    "    sudo_events = qry_prov.LinuxSyslog.sudo_activity(start=sudo_session.StartTimeUtc.round(\r\n",
    "        '-1s') - pd.Timedelta(seconds=1), end=(sudo_session.EndTimeUtc.round('1s')+ pd.Timedelta(seconds=1)), host_name=sudo_session.Host)\r\n",
    "    if isinstance(sudo_events, pd.DataFrame) and not sudo_events.empty:\r\n",
    "        display(sudo_events.replace('', np.nan).dropna(axis=0, subset=['Command'])[\r\n",
    "                ['TimeGenerated', 'Command', 'CommandCall', 'SyslogMessage']])\r\n",
    "        # Extract IOCs from the data\r\n",
    "        ioc_extractor = iocextract.IoCExtract()\r\n",
    "        os_family = host_entity.OSType if host_entity.OSType else 'Linux'\r\n",
    "        print('Extracting IoCs.......')\r\n",
    "        ioc_df = ioc_extractor.extract(data=sudo_events,\r\n",
    "                                       columns=['SyslogMessage'],\r\n",
    "                                       os_family=os_family,\r\n",
    "                                       ioc_types=['ipv4', 'ipv6', 'dns', 'url',\r\n",
    "                                                  'md5_hash', 'sha1_hash', 'sha256_hash'])\r\n",
    "        if len(ioc_df) > 0:\r\n",
    "            ioc_count = len(\r\n",
    "                ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates())\r\n",
    "            md(f\"Found {ioc_count} IOCs\")\r\n",
    "            #Lookup the extracted IOCs in TI feed\r\n",
    "            ti_resps = tilookup.lookup_iocs(data=ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates(\r\n",
    "            ).reset_index(), obs_col='Observable', ioc_type_col='IoCType')\r\n",
    "            i = 0\r\n",
    "            ti_hits = []\r\n",
    "            ti_resps.reset_index(drop=True, inplace=True)\r\n",
    "            while i < len(ti_resps):\r\n",
    "                if ti_resps['Result'][i] == True and ti_check_sev(ti_resps['Severity'][i], 1):\r\n",
    "                    ti_hits.append(ti_resps['Ioc'][i])\r\n",
    "                    i += 1\r\n",
    "                else:\r\n",
    "                    i += 1\r\n",
    "            md(f\"Found {len(ti_hits)} IoCs in Threat Intelligence\")\r\n",
    "            for ioc in ti_hits:\r\n",
    "                md(f\"Messages containing IoC found in TI feed: {ioc}\")\r\n",
    "                display(sudo_events[sudo_events['SyslogMessage'].str.contains(\r\n",
    "                    ioc)][['TimeGenerated', 'SyslogMessage']])\r\n",
    "        else:\r\n",
    "            md(\"No IoC patterns found in Syslog Messages.\")\r\n",
    "    else:\r\n",
    "        md('No sudo messages for this session')\r\n",
    "\r\n",
    "\r\n",
    "else:\r\n",
    "    md(\"No Sudo session to investigate\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-23T23:54:07.485475Z",
     "start_time": "2019-09-23T23:54:07.480507Z"
    }
   },
   "source": [
    "Jump to:\n",
    "- <a>Host Logon Events</a>\n",
    "- <a>Application Activity</a>\n",
    "- <a>Network Activity</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a></a>\n",
    "## User Activity\n",
    "**Hypothesis:** That an attacker has gained  access to the host and is using a user account to conduct actions on the host.\n",
    "\n",
    "This section provides an overview of activity by user within our hunting time frame, the purpose of this is to allow for the identification  of anomalous activity by a user. This hunt can be driven be investigation of suspected users or as a hunt across all users seen on the host."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:57:32.366086Z",
     "start_time": "2020-06-24T01:57:31.372985Z"
    }
   },
   "outputs": [],
   "source": [
    "# Get list of users with logon or sudo sessions on host\n",
    "logon_events = qry_prov.LinuxSyslog.user_logon(query_times, host_name=hostname)\n",
    "users = logon_events['User'].replace('', np.nan).dropna().unique().tolist()\n",
    "all_users = list(users)\n",
    "\n",
    "\n",
    "if isinstance(sudo_events, pd.DataFrame) and not sudo_events.empty:\n",
    "    sudoers = sudo_events['Sudoer'].replace(\n",
    "        '', np.nan).dropna().unique().tolist()\n",
    "    all_users.extend(x for x in sudoers if x not in all_users)\n",
    "\n",
    "# Pick Users\n",
    "if not logon_events.empty:\n",
    "    user_select = nbwidgets.SelectItem(description='Select user to investigate: ',\n",
    "                                             item_list=all_users, width='75%', auto_display=True)\n",
    "else:\n",
    "    md(\"There was no user activity in the timeframe specified.\")\n",
    "    user_select = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:57:35.805460Z",
     "start_time": "2020-06-24T01:57:33.955397Z"
    }
   },
   "outputs": [],
   "source": [
    "folium_user_map = FoliumMap()\n",
    "\n",
    "def view_sudo(cmd):\n",
    "    return [user_sudo_hold.query('CommandCall == @cmd')[\n",
    "            ['TimeGenerated', 'HostName', 'Command', 'CommandCall', 'SyslogMessage']]]\n",
    "user_sudo_hold = None\n",
    "if user_select is not None:\n",
    "    # Get all syslog relating to these users\n",
    "    username = user_select.value\n",
    "    user_events = all_syslog_data[all_syslog_data['SyslogMessage'].str.contains(username)]\n",
    "    logon_sessions = cluster_syslog_logons_df(logon_events)\n",
    "\n",
    "    # Display all logons associated with the user\n",
    "    md(f\"<h1> User Logon Activity for {username}</h1>\")\n",
    "    user_logon_events = logon_events[logon_events['User'] == username]\n",
    "    try:\n",
    "        user_logon_sessions = cluster_syslog_logons_df(user_logon_events)\n",
    "    except:\n",
    "        user_logon_sessions = None\n",
    "    \n",
    "    user_remote_logons = (\n",
    "        user_logon_events[user_logon_events['LogonResult'] == 'Success']\n",
    "    )\n",
    "    user_failed_logons = (\n",
    "        user_logon_events[user_logon_events['LogonResult'] == 'Failure']\n",
    "    )\n",
    "    if not user_remote_logons.empty:\n",
    "        for _, row in logon_sessions_df.iterrows():\n",
    "            end = row['End']\n",
    "        user_sudo_events = qry_prov.LinuxSyslog.sudo_activity(start=user_remote_logons.sort_values(\n",
    "            by='TimeGenerated')['TimeGenerated'].iloc[0], end=end, host_name=hostname, user=username)\n",
    "    else: \n",
    "        user_sudo_events = None\n",
    "\n",
    "    if user_logon_sessions is None and user_remote_logons.empty and user_failed_logons.empty:\n",
    "        pass\n",
    "    else:\n",
    "        display(HTML(\n",
    "            f\"{len(user_remote_logons)} sucessfull logons and {len(user_failed_logons)} failed logons for {username}\"))\n",
    "\n",
    "        display(Markdown('### Timeline of host logon attempts.'))\n",
    "        tooltip_cols = ['SyslogMessage']\n",
    "        dfs = {\"User Logons\" :user_remote_logons, \"Failed Logons\": user_failed_logons, \"Sudo Events\" :user_sudo_events}\n",
    "        user_tl_data = {}\n",
    "\n",
    "        for k,v in dfs.items():\n",
    "            if v is not None and not v.empty:\n",
    "                user_tl_data.update({k :{\"data\":v,\"source_columns\":tooltip_cols}})\n",
    "\n",
    "        nbdisplay.display_timeline(\n",
    "            data=user_tl_data, title=\"User logon timeline\", height=300)\n",
    "        \n",
    "        all_user_df = pd.DataFrame(dict(successful= user_remote_logons['ProcessName'].value_counts(), failed = user_failed_logons['ProcessName'].value_counts())).fillna(0)\n",
    "        processes = all_user_df.index.values.tolist()\n",
    "        results = all_user_df.columns.values.tolist()\n",
    "        user_fail_sucess_data = {'processes' :processes,\n",
    "               'sucess' : all_user_df['successful'].values.tolist(),\n",
    "               'failure': all_user_df['failed'].values.tolist()}\n",
    "\n",
    "        palette = viridis(2)\n",
    "        x = [ (process, result) for process in processes for result in results ]\n",
    "        counts = sum(zip(user_fail_sucess_data['sucess'], fail_sucess_data['failure']), ()) \n",
    "        source = ColumnDataSource(data=dict(x=x, counts=counts))\n",
    "        b = figure(x_range=FactorRange(*x), plot_height=350,  plot_width=450, title=\"Failed and Sucessful logon attempts by process\",\n",
    "                   toolbar_location=None, tools=\"\", y_minor_ticks=2)\n",
    "        b.vbar(x='x', top='counts', width=0.9, source=source, line_color=\"white\",\n",
    "               fill_color=factor_cmap('x', palette=palette, factors=results, start=1, end=2))\n",
    "        b.y_range.start = 0\n",
    "        b.x_range.range_padding = 0.1\n",
    "        b.xaxis.major_label_orientation = 1\n",
    "        b.xgrid.grid_line_color = None\n",
    "        user_logons = pd.DataFrame({\"Sucessful Logons\" : [int(all_user_df['successful'].sum())],\n",
    "                        \"Failed Logons\" : [int(all_user_df['failed'].sum())]}).T\n",
    "        user_logon_data = pd.value_counts(user_logon_events['LogonResult'].values, sort=True).head(10).reset_index(name='value').rename(columns={'User':'Count'})\n",
    "        user_logon_data = user_logon_data[user_logon_data['index']!=\"Unknown\"].copy()\n",
    "        user_logon_data['angle'] = user_logon_data['value']/user_logon_data['value'].sum() * 2*pi\n",
    "        user_logon_data['color'] = viridis(len(user_logon_data))\n",
    "        p = figure(plot_height=350, plot_width=450, title=\"Relative Frequencies of Failed Logons by Account\", toolbar_location=None, tools=\"hover\", tooltips=\"@index: @value\")\n",
    "        p.axis.visible = False\n",
    "        p.xgrid.visible = False\n",
    "        p.ygrid.visible = False\n",
    "        p.wedge(x=0, y=1, radius=0.5, start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'), line_color=\"white\", fill_color='color', legend='index', source=user_logon_data)\n",
    "        show(Row(p,b))               \n",
    "        \n",
    "        user_ip_list = [convert_to_ip_entities(i)[0] for i in user_remote_logons['SourceIP']]\n",
    "        user_ip_fail_list = [convert_to_ip_entities(i)[0] for i in user_failed_logons['SourceIP']]\n",
    "    \n",
    "        user_location = get_map_center(ip_list + ip_fail_list)\n",
    "        user_folium_map = FoliumMap(location = location, zoom_start=1.4)\n",
    "        #Map logon locations to allow for identification of anomolous locations\n",
    "        if len(ip_fail_list) > 0:\n",
    "            md('<h3>Map of Originating Location of Logon Attempts</h3>')\n",
    "            icon_props = {'color': 'red'}\n",
    "            user_folium_map.add_ip_cluster(ip_entities=user_ip_fail_list, **icon_props)\n",
    "        if len(ip_list) > 0:\n",
    "            icon_props = {'color': 'green'}\n",
    "            user_folium_map.add_ip_cluster(ip_entities=user_ip_list, **icon_props)\n",
    "            display(user_folium_map.folium_map)\n",
    "            md('<p style=\"color:red\">Warning: the folium mapping library '\n",
    "                         'does not display correctly in some browsers.</p><br>'\n",
    "                         'If you see a blank image please retry with a different browser.')  \n",
    "        \n",
    "    #Display sudo activity of the user \n",
    "    if not isinstance(user_sudo_events, pd.DataFrame) or user_sudo_events.empty:\n",
    "        md(f\"<h3>No sucessful sudo activity for {username}</h3>\")\n",
    "    else:\n",
    "        user_sudo_hold = user_sudo_events\n",
    "        user_sudo_commands = (user_sudo_events[['EventTime', 'CommandCall']].replace('', np.nan).groupby(['CommandCall']).count().dropna().style.set_table_attributes('width=900px, text-align=center').background_gradient(cmap='Reds', low=.5, high=1).format(\"{0:0>3.0f}\"))\n",
    "        display(user_sudo_commands)\n",
    "        md(\"Select a sudo command to investigate in more detail\")\n",
    "        display(nbwidgets.SelectItem(item_list=items, action=view_sudo))\n",
    "else:\n",
    "    md(\"No user session selected\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:57:41.495503Z",
     "start_time": "2020-06-24T01:57:41.474501Z"
    }
   },
   "outputs": [],
   "source": [
    "# If the user has sudo activity extract and IOCs from the logs and look them up in TI feeds\r\n",
    "if not isinstance(user_sudo_hold, pd.DataFrame) or user_sudo_hold.empty:\r\n",
    "    md(f\"No sudo messages data\")\r\n",
    "else:\r\n",
    "    # Extract IOCs\r\n",
    "    ioc_extractor = iocextract.IoCExtract()\r\n",
    "    os_family = host_entity.OSType if host_entity.OSType else 'Linux'\r\n",
    "    print('Extracting IoCs.......')\r\n",
    "    ioc_df = ioc_extractor.extract(data=user_sudo_hold,\r\n",
    "                                   columns=['SyslogMessage'],\r\n",
    "                                   ioc_types=['ipv4', 'ipv6', 'dns', 'url', 'md5_hash', 'sha1_hash', 'sha256_hash'])\r\n",
    "    if len(ioc_df) > 0:\r\n",
    "        ioc_count = len(ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates())\r\n",
    "        md(f\"Found {ioc_count} IOCs\")\r\n",
    "        ti_resps = tilookup.lookup_iocs(data=ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates(\r\n",
    "        ).reset_index(), obs_col='Observable', ioc_type_col='IoCType')\r\n",
    "        i = 0\r\n",
    "        ti_hits = []\r\n",
    "        ti_resps.reset_index(drop=True, inplace=True)\r\n",
    "        while i < len(ti_resps):\r\n",
    "            if ti_resps['Result'][i] == True and ti_check_sev(ti_resps['Severity'][i], 1):\r\n",
    "                ti_hits.append(ti_resps['Ioc'][i])\r\n",
    "                i += 1\r\n",
    "            else:\r\n",
    "                i += 1\r\n",
    "        md(f\"Found {len(ti_hits)} IoCs in Threat Intelligence\")\r\n",
    "        for ioc in ti_hits:\r\n",
    "            md(f\"Messages containing IoC found in TI feed: {ioc}\")\r\n",
    "            display(user_sudo_hold[user_sudo_hold['SyslogMessage'].str.contains(\r\n",
    "                ioc)][['TimeGenerated', 'SyslogMessage']])\r\n",
    "    else:\r\n",
    "        md(\"No IoC patterns found in Syslog Message.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Jump to:\n",
    "- <a>Host Logon Events</a>\n",
    "- <a>User Activity</a>\n",
    "- <a>Network Activity</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a></a>\n",
    "## Application Activity\n",
    "\n",
    "**Hypothesis:** That an attacker has compromised an application running on the host and is using the applications process to conduct actions on the host.\n",
    "\n",
    "This section provides an overview of activity by application within our hunting time frame, the purpose of this is to allow for the identification of anomalous activity by an application. This hunt can be driven be investigation of suspected applications or as a hunt across all users seen on the host."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:57:45.323865Z",
     "start_time": "2020-06-24T01:57:45.274865Z"
    }
   },
   "outputs": [],
   "source": [
    "# Get list of Applications\n",
    "apps = all_syslog_data['ProcessName'].replace('', np.nan).dropna().unique().tolist()\n",
    "system_apps = ['sudo', 'CRON', 'systemd-resolved', 'snapd',\n",
    "               '50-motd-news', 'systemd-logind', 'dbus-deamon', 'crontab']\n",
    "if len(host_entity.Applications) > 0:\n",
    "    installed_apps = []\n",
    "    installed_apps.extend(x for x in apps if x not in system_apps)\n",
    "\n",
    "    # Pick Applications\n",
    "    app_select = nbwidgets.SelectItem(description='Select sudo session to investigate: ',\n",
    "                                            item_list=installed_apps, width='75%', auto_display=True)\n",
    "else:\n",
    "    display(HTML(\"No applications other than stand OS applications present\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:57:51.258753Z",
     "start_time": "2020-06-24T01:57:51.149753Z"
    }
   },
   "outputs": [],
   "source": [
    "# Get all syslog relating to these Applications\n",
    "app = app_select.value\n",
    "app_data = all_syslog_data[all_syslog_data['ProcessName'] == app].copy()\n",
    "\n",
    "# App log volume over time\n",
    "if isinstance(app_data, pd.DataFrame) and not app_data.empty:\n",
    "    app_data_volume = app_data.set_index(\n",
    "        \"TimeGenerated\").resample('5T').count()\n",
    "    app_data_volume.reset_index(level=0, inplace=True)\n",
    "    app_data_volume.rename(columns={\"TenantId\" : \"NoOfLogMessages\"}, inplace=True)\n",
    "    nbdisplay.display_timeline_values(data=app_data_volume, y='NoOfLogMessages', source_columns=['NoOfLogMessages'], title=f\"{app} log volume over time\") \n",
    "    \n",
    "    app_high_sev = app_data[app_data['SeverityLevel'].isin(\n",
    "        ['emerg', 'alert', 'crit', 'err', 'warning'])]\n",
    "    if isinstance(app_high_sev, pd.DataFrame) and not app_high_sev.empty:\n",
    "        app_hs_volume = app_high_sev.set_index(\n",
    "            \"TimeGenerated\").resample('5T').count()\n",
    "        app_hs_volume.reset_index(level=0, inplace=True)\n",
    "        app_hs_volume.rename(columns={\"TenantId\" : \"NoOfLogMessages\"}, inplace=True)\n",
    "        nbdisplay.display_timeline_values(data=app_hs_volume, y='NoOfLogMessages', source_columns=['NoOfLogMessages'], title=f\"{app} high severity log volume over time\") \n",
    "\n",
    "risky_messages = risky_cmd_line(events=app_data, log_type=\"Syslog\", cmd_field=\"SyslogMessage\")\n",
    "if risky_messages:\n",
    "    print(risky_messages)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Display process tree\n",
    "Due to the large volume of data involved you may wish to make you query window smaller"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T01:59:29.756566Z",
     "start_time": "2020-06-24T01:59:29.702565Z"
    }
   },
   "outputs": [],
   "source": [
    "if rel_alert_select is None or rel_alert_select.selected_alert is None:\n",
    "    start = query_times.start\n",
    "else:\n",
    "    start = rel_alert_select.selected_alert['TimeGenerated']\n",
    "\n",
    "# Set new investigation time windows based on the selected alert\n",
    "proc_invest_times = nbwidgets.QueryTime(units='hours',\n",
    "                                       max_before=6, max_after=3, before=2, origin_time=start)\n",
    "proc_invest_times.display()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T02:01:09.922496Z",
     "start_time": "2020-06-24T02:00:19.315827Z"
    }
   },
   "outputs": [],
   "source": [
    "audit_table = None\n",
    "app_audit_data = None\n",
    "app = app_select.value\n",
    "process_tree_data = None\n",
    "regex = '.*audit.*\\_cl?'\n",
    "# Find the table with auditd data in and collect the data\n",
    "matches = ((re.match(regex, key, re.IGNORECASE)) for key in qry_prov.schema)\n",
    "for match in matches:\n",
    "    if match != None:\n",
    "        audit_table = match.group(0)\n",
    "\n",
    "#Check if the amount of data expected to be returned is a reasonable size, if not prompt before continuing\n",
    "if audit_table != None:\n",
    "    if isinstance(app_audit_data, pd.DataFrame):\n",
    "        pass\n",
    "    else:\n",
    "        print('Collecting audit data, please wait this may take some time....')\n",
    "        app_audit_query_count = f\"\"\"{audit_table} \n",
    "                    | where TimeGenerated >= datetime({proc_invest_times.start}) \n",
    "                    | where TimeGenerated <= datetime({proc_invest_times.end}) \n",
    "                    | where Computer == '{hostname}'\n",
    "                    | summarize count()\n",
    "                   \"\"\"\n",
    "        \n",
    "        count_check = qry_prov.exec_query(query=app_audit_query_count)\n",
    "\n",
    "        if count_check['count_'].iloc[0] > 100000 and not count_check.empty:\n",
    "            size = count_check['count_'].iloc[0]\n",
    "            print(\n",
    "                f\"You are returning a very large dataset ({size} rows).\",\n",
    "                \"It is reccomended that you consider scoping the size\\n\",\n",
    "                \"of your query down.\\n\",\n",
    "                \"Are you sure you want to proceed?\"\n",
    "            )\n",
    "            response = (input(\"Y/N\") or \"N\")\n",
    "        \n",
    "        if (\n",
    "            (count_check['count_'].iloc[0] < 100000)\n",
    "            or (count_check['count_'].iloc[0] > 100000\n",
    "                and response.casefold().startswith(\"y\"))\n",
    "        ):\n",
    "            print(\"querying audit data...\")\n",
    "            audit_data = qry_prov.LinuxAudit.auditd_all(\n",
    "                start=proc_invest_times.start, end=proc_invest_times.end, host_name=hostname\n",
    "                )\n",
    "            if isinstance(audit_data, pd.DataFrame) and not audit_data.empty:\n",
    "                print(\"building process tree...\")\n",
    "                audit_events = auditdextract.extract_events_to_df(\n",
    "                    data=audit_data\n",
    "                )\n",
    "                \n",
    "                process_tree_data = auditdextract.generate_process_tree(audit_data=audit_events)\n",
    "                plot_lim = 1000\n",
    "                if len(process_tree) > plot_lim:\n",
    "                    md_warn(f\"More than {plot_lim} processes to plot, limiting to top {plot_lim}.\")\n",
    "                    process_tree[:plot_lim].mp_process_tree.plot(legend_col=\"exe\")\n",
    "                else:\n",
    "                    process_tree.mp_process_tree.plot(legend_col=\"exe\")\n",
    "                size = audit_events.size\n",
    "                print(f\"Collected {size} rows of data\")\n",
    "            else:\n",
    "                md(\"No audit events avalaible\")\n",
    "        else:\n",
    "            print(\"Resize query window\")\n",
    "    \n",
    "else:\n",
    "    md(\"No audit events avalaible\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T02:01:43.252644Z",
     "start_time": "2020-06-24T02:01:42.969634Z"
    }
   },
   "outputs": [],
   "source": [
    "md(f\"<h3>Process tree for {app}</h3>\")\n",
    "if process_tree_data is not None:\n",
    "    process_tree_df = process_tree_data[process_tree_data[\"exe\"].str.contains(app, na=False)].copy()\n",
    "    if not process_tree_df.empty:    \n",
    "        app_roots = process_tree_data.apply(lambda x: ptree.get_root(process_tree_data, x), axis=1)\n",
    "        trees = []\n",
    "        for root in app_roots[\"source_index\"].unique():\n",
    "            trees.append(process_tree_data[process_tree_data[\"path\"].str.startswith(root)])\n",
    "        app_proc_trees = pd.concat(trees)\n",
    "        app_proc_trees.mp_process_tree.plot(legend_col=\"exe\", show_table=True)\n",
    "    else:\n",
    "        display(f\"No process tree data avaliable for {app}\")\n",
    "        process_tree = None\n",
    "else:\n",
    "    md(\"No data avaliable to build process tree\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Application Logs with associated Threat Intelligence\n",
    "These logs are associated with the process being investigated and include IOCs that appear in our TI feeds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T02:01:50.058394Z",
     "start_time": "2020-06-24T02:01:49.715903Z"
    }
   },
   "outputs": [],
   "source": [
    "# Extract IOCs from syslog assocated with the selected process\r\n",
    "ioc_extractor = iocextract.IoCExtract()\r\n",
    "os_family = host_entity.OSType if host_entity.OSType else 'Linux'\r\n",
    "md('Extracting IoCs...')\r\n",
    "ioc_df = ioc_extractor.extract(data=app_data,\r\n",
    "                               columns=['SyslogMessage'],\r\n",
    "                               ioc_types=['ipv4', 'ipv6', 'dns', 'url',\r\n",
    "                                          'md5_hash', 'sha1_hash', 'sha256_hash'])\r\n",
    "\r\n",
    "if process_tree_data is not None and not process_tree_data.empty:\r\n",
    "    app_process_tree = app_proc_trees.dropna(subset=['cmdline'])\r\n",
    "    audit_ioc_df = ioc_extractor.extract(data=app_process_tree,\r\n",
    "                                         columns=['cmdline'],\r\n",
    "                                         ioc_types=['ipv4', 'ipv6', 'dns', 'url',\r\n",
    "                                                    'md5_hash', 'sha1_hash', 'sha256_hash'])\r\n",
    "\r\n",
    "    ioc_df = ioc_df.append(audit_ioc_df)\r\n",
    "# Look up IOCs in TI feeds\r\n",
    "if len(ioc_df) > 0:\r\n",
    "    ioc_count = len(ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates())\r\n",
    "    md(f\"Found {ioc_count} IOCs\")\r\n",
    "    md(\"Looking up threat intel...\")\r\n",
    "    ti_resps = tilookup.lookup_iocs(data=ioc_df[[\r\n",
    "                                     \"IoCType\", \"Observable\"]].drop_duplicates().reset_index(drop=True), obs_col='Observable')\r\n",
    "    i = 0\r\n",
    "    ti_hits = []\r\n",
    "    ti_resps.reset_index(drop=True, inplace=True)\r\n",
    "    while i < len(ti_resps):\r\n",
    "        if ti_resps['Result'][i] == True and ti_check_sev(ti_resps['Severity'][i], 1):\r\n",
    "            ti_hits.append(ti_resps['Ioc'][i])\r\n",
    "            i += 1\r\n",
    "        else:\r\n",
    "            i += 1\r\n",
    "    display(HTML(f\"Found {len(ti_hits)} IoCs in Threat Intelligence\"))\r\n",
    "    for ioc in ti_hits:\r\n",
    "        display(HTML(f\"Messages containing IoC found in TI feed: {ioc}\"))\r\n",
    "        display(app_data[app_data['SyslogMessage'].str.contains(\r\n",
    "            ioc)][['TimeGenerated', 'SyslogMessage']])\r\n",
    "else:\r\n",
    "    md(\"<h3>No IoC patterns found in Syslog Message.</h3>\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-23T23:55:34.409792Z",
     "start_time": "2019-09-23T23:55:34.404795Z"
    }
   },
   "source": [
    "Jump to:\n",
    "- <a>Host Logon Events</a>\n",
    "- <a>User Activity</a>\n",
    "- <a>Application Activity</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Network Activity\n",
    "**Hypothesis:** That an attacker is remotely communicating with the host in order to compromise the host or for C2 or data exfiltration purposes after compromising the host.\n",
    "\n",
    "This section provides an overview of network activity to and from the host during hunting time frame, the purpose of this is to allow for the identification of anomalous network traffic. If you wish to investigate a specific IP in detail it is recommended that you use the IP Explorer Notebook (include link)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T02:02:21.843587Z",
     "start_time": "2020-06-24T02:02:11.835821Z"
    }
   },
   "outputs": [],
   "source": [
    "# Get list of IPs from Syslog and Azure Network Data\r\n",
    "ioc_extractor = iocextract.IoCExtract()\r\n",
    "os_family = host_entity.OSType if host_entity.OSType else 'Linux'\r\n",
    "print('Finding IP Addresses this may take a few minutes.......')\r\n",
    "syslog_ips = ioc_extractor.extract(data=all_syslog_data,\r\n",
    "                                   columns=['SyslogMessage'],\r\n",
    "                                   ioc_types=['ipv4', 'ipv6'])\r\n",
    "\r\n",
    "\r\n",
    "if 'AzureNetworkAnalytics_CL' not in qry_prov.schema:\r\n",
    "    az_net_comms_df = None\r\n",
    "    az_ips = None\r\n",
    "else:\r\n",
    "    if hasattr(host_entity, 'private_ips') and hasattr(host_entity, 'public_ips'):\r\n",
    "        all_host_ips = host_entity.private_ips + \\\r\n",
    "            host_entity.public_ips + [host_entity.IPAddress]\r\n",
    "    else:\r\n",
    "        all_host_ips = [host_entity.IPAddress]\r\n",
    "    host_ips = {'\\'{}\\''.format(i.Address) for i in all_host_ips}\r\n",
    "    host_ip_list = ','.join(host_ips)\r\n",
    "\r\n",
    "    az_ip_where = f\"\"\"| where (VMIPAddress in (\"{host_ip_list}\") or SrcIP in (\"{host_ip_list}\") or DestIP in (\"{host_ip_list}\")) and (AllowedOutFlows > 0 or AllowedInFlows > 0)\"\"\"\r\n",
    "    az_net_comms_df = qry_prov.AzureNetwork.az_net_analytics(\r\n",
    "        start=query_times.start, end=query_times.end, host_name=hostname, where_clause=az_ip_where)\r\n",
    "    if isinstance(az_net_comms_df, pd.DataFrame) and not az_net_comms_df.empty:\r\n",
    "        az_ips = az_net_comms_df.query(\"PublicIPs != @host_entity.IPAddress\")\r\n",
    "    else:\r\n",
    "        az_ips = None\r\n",
    "if len(syslog_ips):\r\n",
    "    IPs = syslog_ips[['IoCType', 'Observable']].drop_duplicates('Observable')\r\n",
    "    display(f\"Found {len(IPs)} IP Addresses assoicated with the host\")\r\n",
    "else:\r\n",
    "    md(\"### No IoC patterns found in Syslog Message.\")\r\n",
    "    \r\n",
    "if az_ips is not None:\r\n",
    "    ips = az_ips['PublicIps'].drop_duplicates(\r\n",
    "    ) + syslog_ips['Observable'].drop_duplicates()\r\n",
    "else:\r\n",
    "    ips = syslog_ips['Observable'].drop_duplicates()\r\n",
    "\r\n",
    "if isinstance(az_net_comms_df, pd.DataFrame) and not az_net_comms_df.empty:\r\n",
    "    import warnings\r\n",
    "\r\n",
    "    with warnings.catch_warnings():\r\n",
    "        warnings.simplefilter(\"ignore\")\r\n",
    "\r\n",
    "        az_net_comms_df['TotalAllowedFlows'] = az_net_comms_df['AllowedOutFlows'] + \\\r\n",
    "            az_net_comms_df['AllowedInFlows']\r\n",
    "        sns.catplot(x=\"L7Protocol\", y=\"TotalAllowedFlows\",\r\n",
    "                    col=\"FlowDirection\", data=az_net_comms_df)\r\n",
    "        sns.relplot(x=\"FlowStartTime\", y=\"TotalAllowedFlows\",\r\n",
    "                    col=\"FlowDirection\", kind=\"line\",\r\n",
    "                    hue=\"L7Protocol\", data=az_net_comms_df).set_xticklabels(rotation=50)\r\n",
    "\r\n",
    "    nbdisplay.display_timeline(data=az_net_comms_df.query('AllowedOutFlows > 0'),\r\n",
    "                               overlay_data=az_net_comms_df.query(\r\n",
    "                                   'AllowedInFlows > 0'),\r\n",
    "                               title='Network Flows (out=blue, in=green)',\r\n",
    "                               time_column='FlowStartTime',\r\n",
    "                               source_columns=[\r\n",
    "                                   'FlowType', 'AllExtIPs', 'L7Protocol', 'FlowDirection'],\r\n",
    "                               height=300)\r\n",
    "else:\r\n",
    "    md('<h3>No Azure network data for specified time range.</h3>')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Choose ASNs/IPs to Check for Threat Intel Reports\n",
    "Choose from the list of Selected ASNs for the IPs you wish to check on. Then select the IP(s) that you wish to check against Threat Intelligence data.\n",
    "The Source list is populated with all ASNs found in the syslog and network flow data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T02:02:28.305211Z",
     "start_time": "2020-06-24T02:02:27.707241Z"
    }
   },
   "outputs": [],
   "source": [
    "#Lookup each IP in whois data and extract the ASN\n",
    "@lru_cache(maxsize=1024)\n",
    "def whois_desc(ip_lookup, progress=False):\n",
    "    try:\n",
    "        ip = ip_address(ip_lookup)\n",
    "    except ValueError:\n",
    "        return \"Not an IP Address\"\n",
    "    if ip.is_private:\n",
    "        return \"private address\"\n",
    "    if not ip.is_global:\n",
    "        return \"other address\"\n",
    "    whois = IPWhois(ip)\n",
    "    whois_result = whois.lookup_whois()\n",
    "    if progress:\n",
    "        print(\".\", end=\"\")\n",
    "    return whois_result[\"asn_description\"]\n",
    "\n",
    "# Summarise network data by ASN\n",
    "ASN_List = []\n",
    "print(\"WhoIs Lookups\")\n",
    "ASNs = ips.apply(lambda x: whois_desc(x, True))\n",
    "IP_ASN = pd.DataFrame(dict(IPs=ips, ASN=ASNs)).reset_index()\n",
    "x = IP_ASN.groupby([\"ASN\"]).count().drop(\n",
    "    'index', axis=1).sort_values('IPs', ascending=False)\n",
    "display(x)\n",
    "ASN_List = x.index\n",
    "\n",
    "# Select an ASN to investigate in more detail\n",
    "selection = widgets.SelectMultiple(\n",
    "    options=ASN_List,\n",
    "    width=900,\n",
    "    description='Select ASN to investigate',\n",
    "    disabled=False\n",
    ")\n",
    "display(selection)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T02:03:09.018331Z",
     "start_time": "2020-06-24T02:03:08.996333Z"
    }
   },
   "outputs": [],
   "source": [
    "# For every IP associated with the selected ASN look them up in TI feeds\n",
    "ip_invest_list = None\n",
    "ip_selection = None\n",
    "for ASN in selection.value:\n",
    "    if ip_invest_list is None:\n",
    "        ip_invest_list = (IP_ASN[IP_ASN[\"ASN\"] == ASN]['IPs'].tolist())\n",
    "    else:\n",
    "        ip_invest_list + (IP_ASN[IP_ASN[\"ASN\"] == ASN]['IPs'].tolist())\n",
    "\n",
    "if ip_invest_list is not None:\n",
    "    ioc_ip_list = []\n",
    "    if len(ip_invest_list) > 0:\n",
    "        ti_resps = tilookup.lookup_iocs(data=ip_invest_list, providers=[\"OTX\"])\n",
    "        i = 0\n",
    "        ti_hits = []\n",
    "        while i < len(ti_resps):\n",
    "            if ti_resps['Details'][i]['pulse_count'] > 0:\n",
    "                ti_hits.append(ti_resps['Ioc'][i])\n",
    "                i += 1\n",
    "            else:\n",
    "                i += 1\n",
    "        display(HTML(f\"Found {len(ti_hits)} IoCs in Threat Intelligence\"))\n",
    "        for ioc in ti_hits:\n",
    "            ioc_ip_list.append(ioc)\n",
    "\n",
    "    #Show IPs found in TI feeds for further investigation        \n",
    "    if len(ioc_ip_list) > 0: \n",
    "        display(HTML(\"Select an IP whcih appeared in TI to investigate further\"))\n",
    "        ip_selection = nbwidgets.SelectItem(description='Select IP Address to investigate: ', item_list = ioc_ip_list, width='95%', auto_display=True)\n",
    "       \n",
    "else:\n",
    "    md(\"No IPs to investigate\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-24T02:03:11.613331Z",
     "start_time": "2020-06-24T02:03:11.600332Z"
    }
   },
   "outputs": [],
   "source": [
    "# Get all syslog for the IPs\n",
    "if ip_selection is not None:\n",
    "    display(HTML(\"Syslog data associated with this IP Address\"))\n",
    "    sys_hits = all_syslog_data[all_syslog_data['SyslogMessage'].str.contains(\n",
    "        ip_selection.value)]\n",
    "    display(sys_hits)\n",
    "    os_family = host_entity.OSType if host_entity.OSType else 'Linux'\n",
    "\n",
    "    display(HTML(\"TI result for this IP Address\"))\n",
    "    display(ti_resps[ti_resps['Ioc'] == ip_selection.value])\n",
    "else:\n",
    "    md(\"No IP address selected\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuration\n",
    "\n",
    "### `msticpyconfig.yaml` configuration File\n",
    "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n",
    "\n",
    "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)"
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3.8 - AzureML",
   "language": "python",
   "name": "python38-azureml"
  },
  "language_info": {
   "name": "python",
   "version": ""
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {
    "height": "683px",
    "width": "424px"
   },
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "374.667px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}