{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " # Windows Host Explorer\n",
    " &lt;details&gt;\n",
    "     <summary> <u>Details...</u></summary>\n",
    "\n",
    " **Notebook Version:** 1.0<br>\n",
    " **Python Version:** Python 3.6 (including Python 3.6 - AzureML)<br>\n",
    " **Required Packages**: kqlmagic, msticpy, pandas, numpy, matplotlib, bokeh, networkx, ipywidgets, ipython, scikit_learn, dnspython, ipwhois, folium, maxminddb_geolite2<br>\n",
    " **Platforms Supported**:\n",
    " - Azure Notebooks Free Compute\n",
    " - Azure Notebooks DSVM\n",
    " - OS Independent\n",
    "\n",
    " **Data Sources Required**:\n",
    " - Log Analytics - SecurityAlert, SecurityEvent (EventIDs 4688 and 4624/25), AzureNetworkAnalytics_CL, Heartbeat\n",
    " - (Optional) - VirusTotal, AlienVault OTX, IBM XForce, Open Page Rank, (all require accounts and API keys)\n",
    " &lt;/details&gt;\n",
    "\n",
    " Brings together a series of queries and visualizations to help you determine the security state of the Windows host or virtual machine that you are investigating.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Windows-Host-Explorer\" data-toc-modified-id=\"Windows-Host-Explorer-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Windows Host Explorer</a></span><ul class=\"toc-item\"><li><span><a href=\"#Notebook-Initialization\" data-toc-modified-id=\"Notebook-Initialization-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Notebook Initialization</a></span></li><li><span><a href=\"#Get-WorkspaceId-and-Authenticate-to-Azure-Sentinel\" data-toc-modified-id=\"Get-WorkspaceId-and-Authenticate-to-Azure-Sentinel-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Get WorkspaceId and Authenticate to Azure Sentinel</a></span><ul class=\"toc-item\"><li><span><a href=\"#Authentication-and-Configuration-Problems\" data-toc-modified-id=\"Authentication-and-Configuration-Problems-1.2.1\"><span class=\"toc-item-num\">1.2.1&nbsp;&nbsp;</span>Authentication and Configuration Problems</a></span></li></ul></li></ul></li><li><span><a href=\"#Search-for-a-Host-name-and-query-host-properties\" data-toc-modified-id=\"Search-for-a-Host-name-and-query-host-properties-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Search for a Host name and query host properties</a></span></li><li><span><a href=\"#Related-Alerts\" data-toc-modified-id=\"Related-Alerts-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Related Alerts</a></span><ul class=\"toc-item\"><li><span><a href=\"#Browse-List-of-Related-Alerts\" data-toc-modified-id=\"Browse-List-of-Related-Alerts-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>Browse List of Related Alerts</a></span></li></ul></li><li><span><a href=\"#Host-Logons\" data-toc-modified-id=\"Host-Logons-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Host Logons</a></span><ul class=\"toc-item\"><li><span><a href=\"#Successful-Logons---Timeline-and-LogonType-breakdown\" data-toc-modified-id=\"Successful-Logons---Timeline-and-LogonType-breakdown-4.1\"><span class=\"toc-item-num\">4.1&nbsp;&nbsp;</span>Successful Logons - Timeline and LogonType breakdown</a></span></li><li><span><a href=\"#Failed-Logons\" data-toc-modified-id=\"Failed-Logons-4.2\"><span class=\"toc-item-num\">4.2&nbsp;&nbsp;</span>Failed Logons</a></span><ul class=\"toc-item\"><li><span><a href=\"#Accounts-With-Failed-And-Successful-Logons\" data-toc-modified-id=\"Accounts-With-Failed-And-Successful-Logons-4.2.1\"><span class=\"toc-item-num\">4.2.1&nbsp;&nbsp;</span>Accounts With Failed And Successful Logons</a></span></li></ul></li></ul></li><li><span><a href=\"#Other-Security-Events\" data-toc-modified-id=\"Other-Security-Events-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Other Security Events</a></span><ul class=\"toc-item\"><li><span><a href=\"#Parse-Event-Data-for-Selected-Events\" data-toc-modified-id=\"Parse-Event-Data-for-Selected-Events-5.1\"><span class=\"toc-item-num\">5.1&nbsp;&nbsp;</span>Parse Event Data for Selected Events</a></span></li><li><span><a href=\"#Account-Change-Events---Timeline\" data-toc-modified-id=\"Account-Change-Events---Timeline-5.2\"><span class=\"toc-item-num\">5.2&nbsp;&nbsp;</span>Account Change Events - Timeline</a></span></li><li><span><a href=\"#Show-Details-of-Selected-Events\" data-toc-modified-id=\"Show-Details-of-Selected-Events-5.3\"><span class=\"toc-item-num\">5.3&nbsp;&nbsp;</span>Show Details of Selected Events</a></span></li></ul></li><li><span><a href=\"#Examine-Logon-Sessions\" data-toc-modified-id=\"Examine-Logon-Sessions-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>Examine Logon Sessions</a></span><ul class=\"toc-item\"><li><span><a href=\"#Browse-logon-account-details\" data-toc-modified-id=\"Browse-logon-account-details-6.1\"><span class=\"toc-item-num\">6.1&nbsp;&nbsp;</span>Browse logon account details</a></span></li><li><span><a href=\"#View-distinct-host-logon-patterns\" data-toc-modified-id=\"View-distinct-host-logon-patterns-6.2\"><span class=\"toc-item-num\">6.2&nbsp;&nbsp;</span>View distinct host logon patterns</a></span></li><li><span><a href=\"#Analyze-Processes-Patterns-for-logon-sessions\" data-toc-modified-id=\"Analyze-Processes-Patterns-for-logon-sessions-6.3\"><span class=\"toc-item-num\">6.3&nbsp;&nbsp;</span>Analyze Processes Patterns for logon sessions</a></span><ul class=\"toc-item\"><li><span><a href=\"#Compute-the-relative-rarity-of-processes-in-each-session\" data-toc-modified-id=\"Compute-the-relative-rarity-of-processes-in-each-session-6.3.1\"><span class=\"toc-item-num\">6.3.1&nbsp;&nbsp;</span>Compute the relative rarity of processes in each session</a></span></li><li><span><a href=\"#Overview-of-session-timelines-for-sessions-with-higher-rarity-score\" data-toc-modified-id=\"Overview-of-session-timelines-for-sessions-with-higher-rarity-score-6.3.2\"><span class=\"toc-item-num\">6.3.2&nbsp;&nbsp;</span>Overview of session timelines for sessions with higher rarity score</a></span></li><li><span><a href=\"#View-the-processes-for-these-Sessions\" data-toc-modified-id=\"View-the-processes-for-these-Sessions-6.3.3\"><span class=\"toc-item-num\">6.3.3&nbsp;&nbsp;</span>View the processes for these Sessions</a></span></li></ul></li><li><span><a href=\"#Browse-All-Sessions-(Optional)\" data-toc-modified-id=\"Browse-All-Sessions-(Optional)-6.4\"><span class=\"toc-item-num\">6.4&nbsp;&nbsp;</span>Browse All Sessions (Optional)</a></span><ul class=\"toc-item\"><li><span><a href=\"#Step-1---Select-a-logon-ID-and-Type\" data-toc-modified-id=\"Step-1---Select-a-logon-ID-and-Type-6.4.1\"><span class=\"toc-item-num\">6.4.1&nbsp;&nbsp;</span>Step 1 - Select a logon ID and Type</a></span></li><li><span><a href=\"#Step-2---Pick-a-logon-session-to-view-its-processes\" data-toc-modified-id=\"Step-2---Pick-a-logon-session-to-view-its-processes-6.4.2\"><span class=\"toc-item-num\">6.4.2&nbsp;&nbsp;</span>Step 2 - Pick a logon session to view its processes</a></span></li></ul></li></ul></li><li><span><a href=\"#Check-for-IOCs-in-Commandline-for-selected-session\" data-toc-modified-id=\"Check-for-IOCs-in-Commandline-for-selected-session-7\"><span class=\"toc-item-num\">7&nbsp;&nbsp;</span>Check for IOCs in Commandline for selected session</a></span><ul class=\"toc-item\"><li><span><a href=\"#Extract-IoCs\" data-toc-modified-id=\"Extract-IoCs-7.1\"><span class=\"toc-item-num\">7.1&nbsp;&nbsp;</span>Extract IoCs</a></span></li><li><span><a href=\"#If-any-Base64-encoded-strings,-decode-and-search-for-IoCs-in-the-results.\" data-toc-modified-id=\"If-any-Base64-encoded-strings,-decode-and-search-for-IoCs-in-the-results.-7.2\"><span class=\"toc-item-num\">7.2&nbsp;&nbsp;</span>If any Base64 encoded strings, decode and search for IoCs in the results.</a></span></li><li><span><a href=\"#Threat-Intel-Lookup\" data-toc-modified-id=\"Threat-Intel-Lookup-7.3\"><span class=\"toc-item-num\">7.3&nbsp;&nbsp;</span>Threat Intel Lookup</a></span></li></ul></li><li><span><a href=\"#Network-Check-Communications-with-Other-Hosts\" data-toc-modified-id=\"Network-Check-Communications-with-Other-Hosts-8\"><span class=\"toc-item-num\">8&nbsp;&nbsp;</span>Network Check Communications with Other Hosts</a></span><ul class=\"toc-item\"><li><span><a href=\"#Query-Flows-by-Host-IP-Addresses\" data-toc-modified-id=\"Query-Flows-by-Host-IP-Addresses-8.1\"><span class=\"toc-item-num\">8.1&nbsp;&nbsp;</span>Query Flows by Host IP Addresses</a></span></li><li><span><a href=\"#Flow-Summary\" data-toc-modified-id=\"Flow-Summary-8.2\"><span class=\"toc-item-num\">8.2&nbsp;&nbsp;</span>Flow Summary</a></span></li><li><span><a href=\"#Choose-ASNs/IPs-to-Check-for-Threat-Intel-Reports\" data-toc-modified-id=\"Choose-ASNs/IPs-to-Check-for-Threat-Intel-Reports-8.3\"><span class=\"toc-item-num\">8.3&nbsp;&nbsp;</span>Choose ASNs/IPs to Check for Threat Intel Reports</a></span></li><li><span><a href=\"#GeoIP-Map-of-External-IPs\" data-toc-modified-id=\"GeoIP-Map-of-External-IPs-8.4\"><span class=\"toc-item-num\">8.4&nbsp;&nbsp;</span>GeoIP Map of External IPs</a></span></li></ul></li><li><span><a href=\"#Appendices\" data-toc-modified-id=\"Appendices-9\"><span class=\"toc-item-num\">9&nbsp;&nbsp;</span>Appendices</a></span><ul class=\"toc-item\"><li><span><a href=\"#Available-DataFrames\" data-toc-modified-id=\"Available-DataFrames-9.1\"><span class=\"toc-item-num\">9.1&nbsp;&nbsp;</span>Available DataFrames</a></span></li><li><span><a href=\"#Saving-Data-to-Excel\" data-toc-modified-id=\"Saving-Data-to-Excel-9.2\"><span class=\"toc-item-num\">9.2&nbsp;&nbsp;</span>Saving Data to Excel</a></span></li><li><span><a href=\"#Configuration\" data-toc-modified-id=\"Configuration-9.3\"><span class=\"toc-item-num\">9.3&nbsp;&nbsp;</span>Configuration</a></span><ul class=\"toc-item\"><li><span><a href=\"#msticpyconfig.yaml-configuration-File\" data-toc-modified-id=\"msticpyconfig.yaml-configuration-File-9.3.1\"><span class=\"toc-item-num\">9.3.1&nbsp;&nbsp;</span><code>msticpyconfig.yaml</code> configuration File</a></span></li></ul></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "### Notebook initialization\n",
    "The next cell:\n",
    "- Checks for the correct Python version\n",
    "- Checks versions and optionally installs required packages\n",
    "- Imports the required packages into the notebook\n",
    "- Sets a number of configuration options.\n",
    "\n",
    "This should complete without errors. If you encounter errors or warnings look at the following two notebooks:\n",
    "- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)\n",
    "- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n",
    "\n",
    "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n",
    "- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)\n",
    "- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)\n",
    "\n",
    "You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. \n",
    "There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:\n",
    "- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)\n",
    "- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:27:18.623464Z",
     "start_time": "2020-05-15T23:27:15.156160Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<font color='green'>Environment setup has completed.</font>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Continuing wth notebook setup."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Checking msticpy version..."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "msticpy version 0.5.1 OK"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing imports....\n",
      "Checking configuration....\n",
      "Setting options....\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<h3>Notebook setup complete</h3>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "import os\n",
    "import sys\n",
    "import warnings\n",
    "from IPython.display import display, HTML, Markdown\n",
    "\n",
    "REQ_PYTHON_VER=(3, 6)\n",
    "REQ_MSTICPY_VER=(0, 5, 0)\n",
    "\n",
    "display(HTML(\"<h3>Starting Notebook setup...</h3>\"))\n",
    "if Path(\"./utils/nb_check.py\").is_file():\n",
    "    from utils.nb_check import check_python_ver, check_mp_ver\n",
    "\n",
    "    check_python_ver(min_py_ver=REQ_PYTHON_VER)\n",
    "    try:\n",
    "        check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n",
    "    except ImportError:\n",
    "        !pip install --upgrade msticpy\n",
    "        if \"msticpy\" in sys.modules:\n",
    "            importlib.reload(msticpy)\n",
    "        else:\n",
    "            import msticpy\n",
    "        check_mp_ver(REQ_PYTHON_VER)\n",
    "            \n",
    "\n",
    "# If not using Azure Notebooks, install msticpy with\n",
    "# !pip install msticpy\n",
    "from msticpy.nbtools import nbinit\n",
    "nbinit.init_notebook(\n",
    "    namespace=globals(),\n",
    "    extra_imports=[\"ipwhois, IPWhois\"]\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Get WorkspaceId and Authenticate to Azure Sentinel\n",
    " &lt;details&gt;\n",
    "     <summary><u>Details...</u></summary>\n",
    " If you are using user/device authentication, run the following cell.\n",
    " - Click the 'Copy code to clipboard and authenticate' button.\n",
    " - This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard.\n",
    " - Select the text box and paste (Ctrl-V/Cmd-V) the copied value.\n",
    " - You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.\n",
    "\n",
    " Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:\n",
    " ```\n",
    " %kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)\n",
    " ```\n",
    " instead of\n",
    " ```\n",
    " %kql loganalytics://code().workspace(WORKSPACE_ID)\n",
    " ```\n",
    "\n",
    " Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.<br>\n",
    " On successful authentication you should see a ```popup schema``` button.\n",
    " To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n",
    " &lt;/details&gt;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:27:22.847608Z",
     "start_time": "2020-05-15T23:27:22.839609Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style=''>Workspace details collected from config file</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#See if we have an Azure Sentinel Workspace defined in our config file, if not let the user specify Workspace and Tenant IDs\n",
    "from msticpy.nbtools.wsconfig import WorkspaceConfig\n",
    "# WorkspaceConfig.list_workspaces()\n",
    "# ws_config = WorkspaceConfig(workspace=\"My_Workspace_Name\")\n",
    "# calling WorkspaceConfig with no parameters will load the default workspace from msticpyconfig.yaml\n",
    "# or fall back on a config.json file.\n",
    "ws_config = WorkspaceConfig()\n",
    "try:\n",
    "    ws_id = ws_config['workspace_id']\n",
    "    ten_id = ws_config['tenant_id']\n",
    "    config = True\n",
    "    md(\"Workspace details collected from config file\")\n",
    "except KeyError:\n",
    "    md(('Please go to your Log Analytics workspace, copy the workspace ID'\n",
    "                 ' and/or tenant Id and paste here to enable connection to the workspace and querying of it..<br> '))\n",
    "    ws_id_wgt = nbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n",
    "                                        prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n",
    "    ten_id_wgt = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n",
    "                                         prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n",
    "    config = False\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:28:39.796803Z",
     "start_time": "2020-05-15T23:27:27.080209Z"
    }
   },
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "try {IPython.notebook.kernel.execute(\"NOTEBOOK_URL = '\" + window.location + \"'\");} catch(err) {;}"
      ],
      "text/plain": [
       "<IPython.core.display.Javascript object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<!DOCTYPE html>\n",
       "                    <html><body>\n",
       "\n",
       "                    <!-- h1 id=\"user_code_p\"><b>FEXRDM2SB</b><br></h1-->\n",
       "\n",
       "                    <input  id=\"kql_MagicCodeAuthInput\" type=\"text\" readonly style=\"font-weight: bold; border: none;\" size = '9' value='FEXRDM2SB'>\n",
       "\n",
       "                    <button id='kql_MagicCodeAuth_button', onclick=\"this.style.visibility='hidden';kql_MagicCodeAuthFunction()\">Copy code to clipboard and authenticate</button>\n",
       "\n",
       "                    <script>\n",
       "                    var kql_MagicUserCodeAuthWindow = null\n",
       "                    function kql_MagicCodeAuthFunction() {\n",
       "                        /* Get the text field */\n",
       "                        var copyText = document.getElementById(\"kql_MagicCodeAuthInput\");\n",
       "\n",
       "                        /* Select the text field */\n",
       "                        copyText.select();\n",
       "\n",
       "                        /* Copy the text inside the text field */\n",
       "                        document.execCommand(\"copy\");\n",
       "\n",
       "                        /* Alert the copied text */\n",
       "                        // alert(\"Copied the text: \" + copyText.value);\n",
       "\n",
       "                        var w = screen.width / 2;\n",
       "                        var h = screen.height / 2;\n",
       "                        params = 'width='+w+',height='+h\n",
       "                        kql_MagicUserCodeAuthWindow = window.open('https://microsoft.com/devicelogin', 'kql_MagicUserCodeAuthWindow', params);\n",
       "\n",
       "                        // TODO: save selected cell index, so that the clear will be done on the lince cell\n",
       "                    }\n",
       "                    </script>\n",
       "\n",
       "                    </body></html>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<!DOCTYPE html>\n",
       "                    <html><body><script>\n",
       "\n",
       "                        // close authentication window\n",
       "                        if (kql_MagicUserCodeAuthWindow && kql_MagicUserCodeAuthWindow.opener != null && !kql_MagicUserCodeAuthWindow.closed) {\n",
       "                            kql_MagicUserCodeAuthWindow.close()\n",
       "                        }\n",
       "                        // TODO: make sure, you clear the right cell. BTW, not sure it is a must to do any clearing\n",
       "\n",
       "                        // clear output cell\n",
       "                        Jupyter.notebook.clear_output(Jupyter.notebook.get_selected_index())\n",
       "\n",
       "                        // TODO: if in run all mode, move to last cell, otherwise move to next cell\n",
       "                        // move to next cell\n",
       "\n",
       "                    </script></body></html>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<!DOCTYPE html>\n",
       "            <html><body>\n",
       "            <div style=''>\n",
       "            \n",
       "\n",
       "            <button onclick=\"this.style.visibility='visible';kql_MagicLaunchWindowFunction('Kqlmagic_temp_files/_52b1ab41-869e-4138-9e40-2a4457f09bf0_at_loganalytics_schema.html','fullscreen=no,directories=no,location=no,menubar=no,resizable=yes,scrollbars=yes,status=no,titlebar=no,toolbar=no,','_52b1ab41_869e_4138_9e40_2a4457f09bf0_at_loganalytics_schema','')\">popup schema 52b1ab41-869e-4138-9e40-2a4457f09bf0@loganalytics</button>\n",
       "            \n",
       "            </div>\n",
       "\n",
       "            <script>\n",
       "\n",
       "            function kql_MagicLaunchWindowFunction(file_path, window_params, window_name, notebooks_host) {\n",
       "                var url;\n",
       "                if (notebooks_host == 'text') {\n",
       "                    url = ''\n",
       "                } else if (file_path.startsWith('http')) {\n",
       "                    url = file_path;\n",
       "                } else {\n",
       "                    var base_url = '';\n",
       "\n",
       "                    // check if azure notebook\n",
       "                    var azure_host = (notebooks_host == null || notebooks_host.length == 0) ? 'https://notebooks.azure.com' : notebooks_host;\n",
       "                    var start = azure_host.search('//');\n",
       "                    var azure_host_suffix = '.' + azure_host.substring(start+2);\n",
       "\n",
       "                    var loc = String(window.location);\n",
       "                    var end = loc.search(azure_host_suffix);\n",
       "                    start = loc.search('//');\n",
       "                    if (start > 0 && end > 0) {\n",
       "                        var parts = loc.substring(start+2, end).split('-');\n",
       "                        if (parts.length == 2) {\n",
       "                            var library = parts[0];\n",
       "                            var user = parts[1];\n",
       "                            base_url = azure_host + '/api/user/' +user+ '/library/' +library+ '/html/';\n",
       "                        }\n",
       "                    }\n",
       "\n",
       "                    // check if local jupyter lab\n",
       "                    if (base_url.length == 0) {\n",
       "                        var configDataScipt  = document.getElementById('jupyter-config-data');\n",
       "                        if (configDataScipt != null) {\n",
       "                            var jupyterConfigData = JSON.parse(configDataScipt.textContent);\n",
       "                            if (jupyterConfigData['appName'] == 'JupyterLab' && jupyterConfigData['serverRoot'] != null &&  jupyterConfigData['treeUrl'] != null) {\n",
       "                                var basePath = 'e:/src/Azure-Sentinel-Notebooks' + '/';\n",
       "                                if (basePath.startsWith(jupyterConfigData['serverRoot'])) {\n",
       "                                    base_url = '/files/' + basePath.substring(jupyterConfigData['serverRoot'].length+1);\n",
       "                                }\n",
       "                            } \n",
       "                        }\n",
       "                    }\n",
       "\n",
       "                    // assume local jupyter notebook\n",
       "                    if (base_url.length == 0) {\n",
       "\n",
       "                        var parts = loc.split('/');\n",
       "                        parts.pop();\n",
       "                        base_url = parts.join('/') + '/';\n",
       "                    }\n",
       "                    url = base_url + file_path;\n",
       "                }\n",
       "\n",
       "                window.focus();\n",
       "                var w = screen.width / 2;\n",
       "                var h = screen.height / 2;\n",
       "                params = 'width='+w+',height='+h;\n",
       "                kql_Magic__52b1ab41_869e_4138_9e40_2a4457f09bf0_at_loganalytics_schema = window.open(url, window_name, window_params + params);\n",
       "                if (url == '') {\n",
       "                    var el = kql_Magic__52b1ab41_869e_4138_9e40_2a4457f09bf0_at_loganalytics_schema.document.createElement('p');\n",
       "                    kql_Magic__52b1ab41_869e_4138_9e40_2a4457f09bf0_at_loganalytics_schema.document.body.overflow = 'auto';\n",
       "                    el.style.top = 0;\n",
       "                    el.style.left = 0;\n",
       "                    el.innerHTML = file_path;\n",
       "                    kql_Magic__52b1ab41_869e_4138_9e40_2a4457f09bf0_at_loganalytics_schema.document.body.appendChild(el);\n",
       "                }\n",
       "            }\n",
       "            </script>\n",
       "\n",
       "            </body></html>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "if config is False:\n",
    "    ws_id = ws_id_wgt.value\n",
    "    ten_id = ten_id_wgt.value\n",
    "# Establish a query provider for Azure Sentinel and connect to it\n",
    "qry_prov = QueryProvider('LogAnalytics')\n",
    "qry_prov.connect(connection_str=ws_config.code_connect_str)\n",
    "table_index = qry_prov.schema_tables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-10-31T23:37:18.211230Z",
     "start_time": "2019-10-31T23:37:18.204259Z"
    }
   },
   "source": [
    "### Authentication and Configuration Problems\n",
    "\n",
    "<br>\n",
    "<details>\n",
    "    <summary>Click for details about configuring your authentication parameters</summary>\n",
    "    \n",
    "    \n",
    "The notebook is expecting your Azure Sentinel Tenant ID and Workspace ID to be configured in one of the following places:\n",
    "- `config.json` in the current folder\n",
    "- `msticpyconfig.yaml` in the current folder or location specified by `MSTICPYCONFIG` environment variable.\n",
    "    \n",
    "For help with setting up your `config.json` file (if this hasn't been done automatically) see the [`ConfiguringNotebookEnvironment`](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) notebook in the root folder of your Azure-Sentinel-Notebooks project. This shows you how to obtain your Workspace and Subscription IDs from the Azure Sentinel Portal. You can use the SubscriptionID to find your Tenant ID). To view the current `config.json` run the following in a code cell.\n",
    "\n",
    "```%pfile config.json```\n",
    "\n",
    "For help with setting up your `msticpyconfig.yaml` see the [Setup](#Setup) section at the end of this notebook and the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Contents](#Contents)\n",
    " # Search for a Host name and query host properties"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:28:41.610484Z",
     "start_time": "2020-05-15T23:28:41.598485Z"
    }
   },
   "outputs": [],
   "source": [
    "host_text = widgets.Text(\n",
    "    description=\"Enter the Host name to search for:\", **WIDGET_DEFAULTS\n",
    ")\n",
    "display(host_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:28:46.198826Z",
     "start_time": "2020-05-15T23:28:46.144827Z"
    }
   },
   "outputs": [],
   "source": [
    "query_times = nbwidgets.QueryTime(units=\"day\", max_before=20, before=5, max_after=1)\n",
    "query_times.display()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:28:58.158859Z",
     "start_time": "2020-05-15T23:28:55.922817Z"
    }
   },
   "outputs": [],
   "source": [
    "# Get single event - try process creation\n",
    "if \"SecurityEvent\" not in table_index:\n",
    "    raise ValueError(\"No Windows event log data available in the workspace\")\n",
    "host_name = None\n",
    "matching_hosts_df = qry_prov.WindowsSecurity.list_host_processes(\n",
    "    query_times, host_name=host_text.value.strip(), add_query_items=\"| distinct Computer\"\n",
    ")\n",
    "if len(matching_hosts_df) > 1:\n",
    "    print(f\"Multiple matches for '{host_text.value}'. Please select a host from the list.\")\n",
    "    choose_host = nbwidgets.SelectString(\n",
    "        item_list=list(matching_hosts_df[\"Computer\"].values),\n",
    "        description=\"Select the host.\",\n",
    "        auto_display=True,\n",
    "    )\n",
    "elif not matching_hosts_df.empty:\n",
    "    host_name = matching_hosts_df[\"Computer\"].iloc[0]\n",
    "    print(f\"Unique host found: {host_name}\")\n",
    "else:\n",
    "    print(f\"Host not found: {host_text.value}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:12.506439Z",
     "start_time": "2020-05-15T23:29:01.493356Z"
    }
   },
   "outputs": [],
   "source": [
    "if not host_name:\n",
    "    host_name = choose_host.value\n",
    "\n",
    "host_entity = None\n",
    "if not matching_hosts_df.empty:\n",
    "    host_entity = entities.Host(src_event=matching_hosts_df[matching_hosts_df[\"Computer\"] == host_name].iloc[0])\n",
    "if not host_entity:\n",
    "    raise LookupError(f\"Could not find Windows events the name {host_name}\")\n",
    "\n",
    "def populate_heartbeat_details(host_hb_df, host_entity=None):\n",
    "    if not host_hb_df.empty:\n",
    "        host_hb = host_hb_df.iloc[0]\n",
    "        if not host_entity:\n",
    "            host_entity = entities.Host(host_hb[\"Computer\"])\n",
    "        host_entity.SourceComputerId = host_hb[\"SourceComputerId\"]\n",
    "        host_entity.OSType = host_hb[\"OSType\"]\n",
    "        host_entity.OSMajorVersion = host_hb[\"OSMajorVersion\"]\n",
    "        host_entity.OSMinorVersion = host_hb[\"OSMinorVersion\"]\n",
    "        host_entity.ComputerEnvironment = host_hb[\"ComputerEnvironment\"]\n",
    "        host_entity.ResourceId = host_hb[\"ResourceId\"]\n",
    "        host_entity.OmsSolutions = [\n",
    "            sol.strip() for sol in host_hb[\"Solutions\"].split(\",\")\n",
    "        ]\n",
    "        host_entity.VMUUID = host_hb[\"VMUUID\"]\n",
    "\n",
    "        ip_entity = entities.IpAddress()\n",
    "        ip_entity.Address = host_hb[\"ComputerIP\"]\n",
    "        geoloc_entity = entities.GeoLocation()\n",
    "        geoloc_entity.CountryName = host_hb[\"RemoteIPCountry\"]\n",
    "        geoloc_entity.Longitude = host_hb[\"RemoteIPLongitude\"]\n",
    "        geoloc_entity.Latitude = host_hb[\"RemoteIPLatitude\"]\n",
    "        ip_entity.Location = geoloc_entity\n",
    "        host_entity.IPAddress = ip_entity  # TODO change to graph edge\n",
    "    return host_entity\n",
    "\n",
    "def convert_to_ip_entities(ip_str):\n",
    "    iplocation = GeoLiteLookup()\n",
    "    ip_entities = []\n",
    "    if ip_str:\n",
    "        if \",\" in ip_str:\n",
    "            addrs = ip_str.split(\",\")\n",
    "        elif \" \" in ip_str:\n",
    "            addrs = ip_str.split(\" \")\n",
    "        else:\n",
    "            addrs = [ip_str]\n",
    "        for addr in addrs:\n",
    "            ip_entity = entities.IpAddress()\n",
    "            ip_entity.Address = addr.strip()\n",
    "            iplocation.lookup_ip(ip_entity=ip_entity)\n",
    "            ip_entities.append(ip_entity)\n",
    "    return ip_entities\n",
    "\n",
    "# Add this information to our inv_host_entity\n",
    "def populate_host_aznet_ips(az_net_df, host_entity):\n",
    "    retrieved_address = []\n",
    "    if len(az_net_df) == 1:\n",
    "        host_entity.private_ips = convert_to_ip_entities(\n",
    "            az_net_df[\"PrivateIPAddresses\"].iloc[0]\n",
    "        )\n",
    "        host_entity.public_ips = convert_to_ip_entities(\n",
    "            az_net_df[\"PublicIPAddresses\"].iloc[0]\n",
    "        )\n",
    "        retrieved_address = [ip.Address for ip in host_entity.public_ips]\n",
    "    else:\n",
    "        if \"private_ips\" not in host_entity:\n",
    "            host_entity.private_ips = []\n",
    "        if \"public_ips\" not in host_entity:\n",
    "            host_entity.public_ips = []\n",
    "\n",
    "\n",
    "iplocation = GeoLiteLookup()\n",
    "\n",
    "# Try to get an OMS Heartbeat for this computer\n",
    "if \"Heartbeat\" in table_index:\n",
    "    print(f\"Looking for {host_name} in OMS Heartbeat data...\")\n",
    "    host_hb_df = qry_prov.Network.get_heartbeat_for_host(host_name=host_name)\n",
    "    host_entity = populate_heartbeat_details(host_hb_df, host_entity)\n",
    "\n",
    "if \"AzureNetworkAnalytics_CL\" in table_index:\n",
    "    print(f\"Looking for {host_name} IP addresses in network flows...\")\n",
    "    az_net_df = qry_prov.Network.get_ips_for_host(host_name=host_name)\n",
    "    populate_host_aznet_ips(az_net_df, host_entity)\n",
    "\n",
    "md(\"Host Details\", \"bold\")\n",
    "print(host_entity)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Contents](#Contents)\n",
    " # Related Alerts\n",
    " Look for any related alerts around this time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:12.712441Z",
     "start_time": "2020-05-15T23:29:12.667444Z"
    }
   },
   "outputs": [],
   "source": [
    "ra_query_times = nbwidgets.QueryTime(\n",
    "    units=\"day\",\n",
    "    origin_time=query_times.origin_time,\n",
    "    max_before=28,\n",
    "    max_after=5,\n",
    "    before=5,\n",
    "    auto_display=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:15.961693Z",
     "start_time": "2020-05-15T23:29:14.578094Z"
    }
   },
   "outputs": [],
   "source": [
    "\n",
    "related_alerts = qry_prov.SecurityAlert.list_related_alerts(\n",
    "    ra_query_times, host_name=host_entity.HostName\n",
    ")\n",
    "\n",
    "def print_related_alerts(alertDict, entityType, entityName):\n",
    "    if len(alertDict) > 0:\n",
    "        display(\n",
    "            Markdown(\n",
    "                f\"### Found {len(alertDict)} different alert types related to this {entityType} (`{entityName}`)\"\n",
    "            )\n",
    "        )\n",
    "        for (k, v) in alertDict.items():\n",
    "            print(f\"- {k}, # Alerts: {v}\")\n",
    "    else:\n",
    "        print(f\"No alerts for {entityType} entity `{entityName}`\")\n",
    "\n",
    "\n",
    "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n",
    "    host_alert_items = (\n",
    "        related_alerts[[\"AlertName\", \"TimeGenerated\"]]\n",
    "        .groupby(\"AlertName\")\n",
    "        .TimeGenerated.agg(\"count\")\n",
    "        .to_dict()\n",
    "    )\n",
    "    print_related_alerts(host_alert_items, \"host\", host_entity.HostName)\n",
    "    if len(host_alert_items) > 1:\n",
    "        nbdisplay.display_timeline(\n",
    "            data=related_alerts, title=\"Alerts\", source_columns=[\"AlertName\"], height=200\n",
    "        )\n",
    "else:\n",
    "    display(Markdown(\"No related alerts found.\"))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Browse List of Related Alerts\n",
    " Select an Alert to view details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:19.615661Z",
     "start_time": "2020-05-15T23:29:19.546661Z"
    }
   },
   "outputs": [],
   "source": [
    "def disp_full_alert(alert):\n",
    "    global related_alert\n",
    "    related_alert = SecurityAlert(alert)\n",
    "    nbdisplay.display_alert(related_alert, show_entities=True)\n",
    "\n",
    "recenter_wgt = widgets.Checkbox(\n",
    "    value=True,\n",
    "    description='Center subsequent query times round selected Alert?',\n",
    "    disabled=False,\n",
    "    **WIDGET_DEFAULTS\n",
    ")\n",
    "if related_alerts is not None and not related_alerts.empty:\n",
    "    related_alerts[\"CompromisedEntity\"] = related_alerts[\"Computer\"]\n",
    "    display(Markdown(\"### Click on alert to view details.\"))\n",
    "    display(recenter_wgt)\n",
    "    rel_alert_select = nbwidgets.AlertSelector(\n",
    "        alerts=related_alerts,\n",
    "        action=disp_full_alert,\n",
    "    )\n",
    "    rel_alert_select.display()\n",
    "    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Contents](#toc)\n",
    " # Host Logons\n",
    " This section looks at successful and failed logons on the host."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:25.409590Z",
     "start_time": "2020-05-15T23:29:25.359585Z"
    }
   },
   "outputs": [],
   "source": [
    "# set the origin time to the time of our alert\n",
    "origin_time = (related_alert.TimeGenerated \n",
    "               if recenter_wgt.value \n",
    "               else query_times.origin_time)\n",
    "logon_query_times = nbwidgets.QueryTime(\n",
    "    units=\"day\",\n",
    "    origin_time=origin_time,\n",
    "    before=5,\n",
    "    after=1,\n",
    "    max_before=20,\n",
    "    max_after=20,\n",
    ")\n",
    "logon_query_times.display()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Successful Logons - Timeline and LogonType breakdown"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:29.150066Z",
     "start_time": "2020-05-15T23:29:27.337759Z"
    }
   },
   "outputs": [],
   "source": [
    "host_logons = qry_prov.WindowsSecurity.list_host_logons(\n",
    "    logon_query_times, host_name=host_entity.HostName\n",
    ")\n",
    "\n",
    "if host_logons is not None and not host_logons.empty:\n",
    "    display(Markdown(\"### Logon timeline.\"))\n",
    "    tooltip_cols = [\n",
    "        \"TargetUserName\",\n",
    "        \"TargetDomainName\",\n",
    "        \"SubjectUserName\",\n",
    "        \"SubjectDomainName\",\n",
    "        \"LogonType\",\n",
    "        \"IpAddress\",\n",
    "    ]\n",
    "    nbdisplay.display_timeline(\n",
    "        data=host_logons,\n",
    "        group_by=\"TargetUserName\",\n",
    "        source_columns=tooltip_cols,\n",
    "        legend=\"right\", yaxis=True\n",
    "    )\n",
    "\n",
    "    display(Markdown(\"### Counts of logon events by logon type.\"))\n",
    "    display(Markdown(\"Min counts for each logon type highlighted.\"))\n",
    "    logon_by_type = (\n",
    "        host_logons[[\"Account\", \"LogonType\", \"EventID\"]]\n",
    "        .astype({'LogonType': 'int32'})\n",
    "        .merge(right=pd.Series(data=nbdisplay._WIN_LOGON_TYPE_MAP, name=\"LogonTypeDesc\"),\n",
    "               left_on=\"LogonType\", right_index=True)\n",
    "        .drop(columns=\"LogonType\")\n",
    "        .groupby([\"Account\", \"LogonTypeDesc\"])\n",
    "        .count()\n",
    "        .unstack()\n",
    "        .rename(columns={\"EventID\": \"LogonCount\"})\n",
    "        .fillna(0)\n",
    "        .style\n",
    "        .background_gradient(cmap=\"viridis\", low=0.5, high=0)\n",
    "        .format(\"{0:0>3.0f}\")\n",
    "    )\n",
    "    display(logon_by_type)\n",
    "else:\n",
    "    display(Markdown(\"No logon events found for host.\"))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a></a>[Contents](#toc)\n",
    " ## Failed Logons"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:32.494051Z",
     "start_time": "2020-05-15T23:29:31.042487Z"
    }
   },
   "outputs": [],
   "source": [
    "failedLogons = qry_prov.WindowsSecurity.list_host_logon_failures(\n",
    "    logon_query_times, host_name=host_entity.HostName\n",
    ")\n",
    "if failedLogons.empty:\n",
    "    print(\"No logon failures recorded for this host between \",\n",
    "          f\" {logon_query_times.start} and {logon_query_times.end}\"\n",
    "        )\n",
    "else:\n",
    "    nbdisplay.display_timeline(\n",
    "        data=host_logons.query('TargetLogonId != \"0x3e7\"'),\n",
    "        overlay_data=failedLogons,\n",
    "        alert=related_alert,\n",
    "        title=\"Logons (blue=user-success, green=failed)\",\n",
    "        source_columns=tooltip_cols,\n",
    "        height=200,\n",
    "    )\n",
    "    display(failedLogons\n",
    " .astype({'LogonType': 'int32'})\n",
    " .merge(right=pd.Series(data=nbdisplay._WIN_LOGON_TYPE_MAP, name=\"LogonTypeDesc\"),\n",
    "               left_on=\"LogonType\", right_index=True)\n",
    " [['Account', 'EventID', 'TimeGenerated',\n",
    "  'Computer', 'SubjectUserName', 'SubjectDomainName',\n",
    "   'TargetUserName', 'TargetDomainName',\n",
    "   'LogonTypeDesc','IpAddress', 'WorkstationName'\n",
    "  ]])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Accounts With Failed And Successful Logons\n",
    "This query joins failed and successful logons for the same account name. Multiple logon failures followed by a sucessful logon might indicate attempts to guess or probe the user password."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:35.995808Z",
     "start_time": "2020-05-15T23:29:35.834809Z"
    }
   },
   "outputs": [],
   "source": [
    "if not failedLogons.empty:\n",
    "    combined = pd.concat([failedLogons,\n",
    "                          host_logons[host_logons[\"TargetUserName\"]\n",
    "                                      .isin(failedLogons[\"TargetUserName\"]\n",
    "                                            .drop_duplicates())]])\n",
    "    display(combined.head())\n",
    "    combined[\"LogonStatus\"] = combined.apply(lambda x: \"Failed\" if x.EventID == 4625 else \"Success\", axis=1)\n",
    "    nbdisplay.display_timeline(data=combined,\n",
    "                               group_by=\"LogonStatus\",\n",
    "                               source_columns=[\"TargetUserName\", \"LogonType\", \"SubjectUserName\", \"TargetLogonId\"],\n",
    "                               legend=\"inline\",\n",
    "                               yaxis=True,\n",
    "                               height=200)\n",
    "    display(combined.sort_values(\"TimeGenerated\"))\n",
    "else:\n",
    "    md(f\"No logon failures recorded for this host between {logon_query_times.start} and {logon_query_times.end}\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Contents](#Contents)\n",
    "# Other Security Events\n",
    " It's often useful to look at what other events were being logged\n",
    " at the time of the attack.\n",
    " \n",
    " We show events here grouped by Account. Things to look for are:\n",
    " \n",
    " - Unexpected events that change system security such as the addition of accounts or services\n",
    " - Event types that occur for only a single account - especially if there are a lot of event types only executed by a single account."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:47.340949Z",
     "start_time": "2020-05-15T23:29:38.308340Z"
    }
   },
   "outputs": [],
   "source": [
    "md(f\"Collecting Windows Event Logs for {host_entity.HostName}, this may take a few minutes...\")\n",
    "\n",
    "all_events_df = qry_prov.WindowsSecurity.list_host_events(\n",
    "    logon_query_times,\n",
    "    host_name=host_entity.HostName,\n",
    "    add_query_items=\"| where EventID != 4688 and EventID != 4624\",\n",
    ")\n",
    "\n",
    "# Create a pivot of Event vs. Account\n",
    "win_events_acc = all_events_df[[\"Account\", \"Activity\", \"TimeGenerated\"]].copy()\n",
    "win_events_acc = win_events_acc.replace(\"-\\\\-\", \"No Account\").replace(\n",
    "    {\"Account\": \"\"}, value=\"No Account\"\n",
    ")\n",
    "win_events_acc[\"Account\"] = win_events_acc.apply(lambda x: x.Account.split(\"\\\\\")[-1], axis=1)\n",
    "event_pivot = (\n",
    "    pd.pivot_table(\n",
    "        win_events_acc,\n",
    "        values=\"TimeGenerated\",\n",
    "        index=[\"Activity\"],\n",
    "        columns=[\"Account\"],\n",
    "        aggfunc=\"count\",\n",
    "    )\n",
    "    .fillna(0)\n",
    "    .reset_index()\n",
    ")\n",
    "display(Markdown(\"Yellow highlights indicate account with highest event count\"))\n",
    "(\n",
    "    event_pivot.style\n",
    "    .applymap(lambda x: \"color: white\" if x == 0 else \"\")\n",
    "    .applymap(\n",
    "        lambda x: \"background-color: lightblue\"\n",
    "        if not isinstance(x, str) and x > 0\n",
    "        else \"\"\n",
    "    )\n",
    "    .set_properties(subset=[\"Activity\"], **{\"width\": \"400px\", \"text-align\": \"left\"})\n",
    "    .highlight_max(axis=1)\n",
    "    .hide_index()\n",
    ")\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parse Event Data for Selected Events\n",
    "For events that you want to look at in more detail you can parse out the full EventData field (containing all fields of the original event). The `parse_event_data` function below does that - transforming the EventData XML into a dictionary of property/value pairs). The `expand_event_properties` function takes this dictionary and transforms into columns in the output DataFrame.\n",
    "\n",
    "<br>\n",
    "&lt;details&gt;\n",
    "     <summary> <u>More details...</u></summary>\n",
    "You can do this for multiple event types in a single pass but, dependng on the schema of each event you may end up with a lot of sparsely populated columns. E.g. suppose EventID 1 has EventData fields A, B and C and EventID 2 has fields A, D, E. If you parse both IDs you'll will end up with a DataFrame with columns A, B, C, D and E with contents populated only for the rows that with corresponding data.\n",
    "\n",
    "We recommend that you process batches of related event types (e.g. all user account change events) that have similar sets of fields to keep the output DataFrame manageable.\n",
    "&lt;/details&gt;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:29:58.968325Z",
     "start_time": "2020-05-15T23:29:58.080325Z"
    }
   },
   "outputs": [],
   "source": [
    "# Function to convert EventData XML into dictionary and\n",
    "# populate columns into DataFrame from previous query result\n",
    "import xml.etree.ElementTree as ET\n",
    "from xml.etree.ElementTree import ParseError\n",
    "\n",
    "SCHEMA = \"http://schemas.microsoft.com/win/2004/08/events/event\"\n",
    "\n",
    "\n",
    "def parse_event_data(row):\n",
    "    try:\n",
    "        xdoc = ET.fromstring(row.EventData)\n",
    "        col_dict = {\n",
    "            elem.attrib[\"Name\"]: elem.text for elem in xdoc.findall(f\"{{{SCHEMA}}}Data\")\n",
    "        }\n",
    "        reassigned = set()\n",
    "        for k, v in col_dict.items():\n",
    "            if k in row and not row[k]:\n",
    "                row[k] = v\n",
    "                reassigned.add(k)\n",
    "        if reassigned:\n",
    "            # print('Reassigned: ', ', '.join(reassigned))\n",
    "            for k in reassigned:\n",
    "                col_dict.pop(k)\n",
    "        return col_dict\n",
    "    except (ParseError, TypeError):\n",
    "        return None\n",
    "\n",
    "\n",
    "# Parse event properties into a dictionary\n",
    "all_events_df[\"EventProperties\"] = all_events_df.apply(parse_event_data, axis=1)\n",
    "\n",
    "# For a specific event ID you can explode the EventProperties values into their own columns\n",
    "# using this function. You can do this for the whole data set but it will result\n",
    "# in a lot of sparse columns in the output data frame\n",
    "def expand_event_properties(input_df):\n",
    "    exp_df = input_df.apply(lambda x: pd.Series(x.EventProperties), axis=1)\n",
    "    return (\n",
    "        exp_df.drop(set(input_df.columns).intersection(exp_df.columns), axis=1)\n",
    "        .merge(\n",
    "            input_df.drop(\"EventProperties\", axis=1),\n",
    "            how=\"inner\",\n",
    "            left_index=True,\n",
    "            right_index=True,\n",
    "        )\n",
    "        .replace(\"\", np.nan)  # these 3 lines get rid of blank columns\n",
    "        .dropna(axis=1, how=\"all\")\n",
    "        .fillna(\"\")\n",
    "    )\n",
    "\n",
    "\n",
    "expand_event_properties(all_events_df[all_events_df[\"EventID\"] == 4724]).head(2)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Account Change Events - Timeline\n",
    "Here we want to focus on a some specific subcategories of events. Attackers commonly try to add or change user accounts and group memberships. We also include events related to addition or change of scheduled tasks and Windows services. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:30:02.077311Z",
     "start_time": "2020-05-15T23:30:01.842315Z"
    }
   },
   "outputs": [],
   "source": [
    "# Get a full list of Windows Security Events\n",
    "import pkgutil\n",
    "import os\n",
    "w_evt = pkgutil.get_data(\"msticpy\", f\"resources{os.sep}WinSecurityEvent.json\")\n",
    "win_event_df = pd.read_json(w_evt)\n",
    "\n",
    "# Create criteria for events that we're interested in\n",
    "acct_sel = win_event_df[\"subcategory\"] == \"User Account Management\"\n",
    "group_sel = win_event_df[\"subcategory\"] == \"Security Group Management\"\n",
    "schtask_sel = (win_event_df[\"subcategory\"] == \"Other Object Access Events\") & (\n",
    "    win_event_df[\"description\"].str.contains(\"scheduled task\")\n",
    ")\n",
    "\n",
    "event_list = win_event_df[acct_sel | group_sel | schtask_sel][\"event_id\"].to_list()\n",
    "# Add Service install event\n",
    "event_list.append(7045)\n",
    "\n",
    "# Plot events on a timeline\n",
    "p = nbdisplay.display_timeline(\n",
    "    data=all_events_df[all_events_df[\"EventID\"].isin(event_list)],\n",
    "    group_by=\"EventID\",\n",
    "    source_columns=[\"Activity\", \"Account\"],\n",
    "    legend=\"right\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Show Details of Selected Events\n",
    "From the above data - pick which event types you want to view (by default, all are selected).\n",
    "The second cell will display the event types selected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-11-01T19:38:42.155265Z",
     "start_time": "2019-11-01T19:38:42.104295Z"
    }
   },
   "outputs": [],
   "source": [
    "# populate actual events IDs to select from\n",
    "recorded_events = (all_events_df['EventID']\n",
    "                   [all_events_df[\"EventID\"]\n",
    "                    .isin(event_list)].drop_duplicates().values)\n",
    "event_subset = win_event_df[win_event_df[\"event_id\"].isin(event_list)\n",
    "                            & win_event_df[\"event_id\"].isin(recorded_events)]\n",
    "items = list(event_subset.apply(lambda x: (x.full_desc, x.event_id), axis=1).values)\n",
    "ss = nbwidgets.SelectSubset(\n",
    "    items,\n",
    "    default_selected=items\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-11-01T19:38:48.430726Z",
     "start_time": "2019-11-01T19:38:48.412764Z"
    }
   },
   "outputs": [],
   "source": [
    "col_names = ['TimeGenerated', 'Account', 'AccountType',\n",
    "             'Computer', 'EventID', 'Activity', 'SubjectAccount',\n",
    "             'SubjectDomainName', 'SubjectLogonId', 'SubjectUserName',\n",
    "             'TargetAccount', 'TargetDomainName', 'TargetSid', 'TargetUserName']\n",
    "display(all_events_df[all_events_df[\"EventID\"].isin(ss.selected_values)]\n",
    " [col_names]\n",
    " .replace(to_replace=\"\", value=np.NAN)\n",
    " .dropna(axis=1, how=\"all\"))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Contents](#Contents)\n",
    "# Examine Logon Sessions\n",
    "Looking at characteristics and activity of individual logon sessions is an effective way of spottting clusters of attacker activity.\n",
    "\n",
    "The biggest problem is deciding which logon sessions are the ones to look at. We may already have some indicators of sessions that we want to examine from earlier sections:\n",
    "\n",
    "- Accounts that experienced a series of failed logons followed by successful logons [see](#Accounts With Failed And Successful Logons)\n",
    "- Accounts that triggered unexpected events [see](#Show-Timeline-of-Account-Change-Events)\n",
    "\n",
    "In this section we use clustering to collapse repetive logons and show details of the distinct logon patterns\n",
    "\n",
    " ## Browse logon account details"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:30:15.514844Z",
     "start_time": "2020-05-15T23:30:14.697818Z"
    }
   },
   "outputs": [],
   "source": [
    "from msticpy.sectools.eventcluster import (\n",
    "    dbcluster_events,\n",
    "    add_process_features,\n",
    "    _string_score,\n",
    ")\n",
    "\n",
    "if host_logons is None or host_logons.empty:\n",
    "    display(Markdown(\"No host logons recorded. This section cannot be run.\"))\n",
    "    raise ValueError(\"aborted\")\n",
    "\n",
    "# Set up clustering features and run DBScan clustering\n",
    "logon_features = host_logons.copy()\n",
    "logon_features[\"AccountNum\"] = host_logons.apply(\n",
    "    lambda x: _string_score(x.Account), axis=1\n",
    ")\n",
    "logon_features[\"TargetUserNum\"] = host_logons.apply(\n",
    "    lambda x: _string_score(x.TargetUserName), axis=1\n",
    ")\n",
    "logon_features[\"LogonHour\"] = host_logons.apply(lambda x: x.TimeGenerated.hour, axis=1)\n",
    "\n",
    "# you might need to play around with the max_cluster_distance parameter.\n",
    "# decreasing this gives more clusters.\n",
    "(clus_logons, _, _) = dbcluster_events(\n",
    "    data=logon_features,\n",
    "    time_column=\"TimeGenerated\",\n",
    "    cluster_columns=[\"AccountNum\", \"LogonType\", \"TargetUserNum\"],\n",
    "    max_cluster_distance=0.0001,\n",
    ")\n",
    "display(Markdown(f\"Number of input events: {len(host_logons)}\"))\n",
    "display(Markdown(f\"Number of clustered events: {len(clus_logons)}\"))\n",
    "\n",
    "display(Markdown(\"### Relative frequencies by account pattern\"))\n",
    "plt.rcParams[\"figure.figsize\"] = (12, 4)\n",
    "clus_logons.sort_values(\"Account\").plot.barh(x=\"Account\", y=\"ClusterSize\");\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## View distinct host logon patterns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:30:17.480349Z",
     "start_time": "2020-05-15T23:30:17.426340Z"
    }
   },
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "# Build a list of distinct logon patterns from the clustered data\n",
    "dist_logons = clus_logons.sort_values(\"TimeGenerated\")[\n",
    "    [\"TargetUserName\", \"TimeGenerated\", \"LastEventTime\", \"LogonType\", \"ClusterSize\"]\n",
    "]\n",
    "dist_logons = dist_logons.apply(\n",
    "    lambda x: (\n",
    "        f\"{x.TargetUserName}:    \"\n",
    "        f\"(logontype {x.LogonType})   \"\n",
    "        f\"timerange: {x.TimeGenerated} - {x.LastEventTime}    \"\n",
    "        f\"count: {x.ClusterSize}\"\n",
    "    ),\n",
    "    axis=1,\n",
    ")\n",
    "# Convert to dict, flipping keys/values\n",
    "dist_logons = {v: k for k, v in dist_logons.to_dict().items()}\n",
    "\n",
    "\n",
    "def get_selected_logon_cluster(selected_item):\n",
    "    acct = clus_logons.loc[selected_item][\"TargetUserName\"]\n",
    "    logon_type = clus_logons.loc[selected_item][\"LogonType\"]\n",
    "    return host_logons.query(\"TargetUserName == @acct and LogonType == @logon_type\")\n",
    "\n",
    "\n",
    "# Create an Output widget to show the Logon Details\n",
    "w_output = widgets.Output(layout={\"border\": \"1px solid black\"})\n",
    "\n",
    "\n",
    "def show_logon(idx):\n",
    "    w_output.clear_output()\n",
    "    with w_output:\n",
    "        nbdisplay.display_logon_data(pd.DataFrame(clus_logons.loc[idx]).T)\n",
    "\n",
    "\n",
    "logon_wgt = nbwidgets.SelectString(\n",
    "    description=\"Select logon cluster to examine\",\n",
    "    item_dict=dist_logons,\n",
    "    action=show_logon,\n",
    "    height=\"200px\",\n",
    "    width=\"100%\",\n",
    "    auto_display=True,\n",
    ")\n",
    "display(w_output)\n",
    "# Display the first item on first view\n",
    "top_item = next(iter(dist_logons.values()))\n",
    "with w_output:\n",
    "    nbdisplay.display_logon_data(pd.DataFrame(clus_logons.loc[top_item]).T)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Analyze Processes Patterns for logon sessions\n",
    " \n",
    "In this section we look at the types of processes run in each logon session. For each process (and process characteristics such as command line structure) we measure its rarity compared to other processes on the same host. We then calculate the mean rarity of all processes in a logon session and display the results ordered by rarity. One is the highest possible score and would indicate all processes in the session have a unique execution pattern.\n",
    " \n",
    "Note: The next section retrieves processes for time period around the logons for the user ID selected in the previous session. If you want to view a broader time boundary please adjust the query time boundaries in below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:30:31.264572Z",
     "start_time": "2020-05-15T23:30:31.213572Z"
    }
   },
   "outputs": [],
   "source": [
    "# set the origin time to start at the first logon in our set\n",
    "# end end 2hrs after the last\n",
    "start_time = host_logons[\"TimeGenerated\"].min()\n",
    "end_time = host_logons[\"TimeGenerated\"].max()\n",
    "time_diff = int((end_time - start_time).total_seconds() / (60 * 60) + 2)\n",
    "proc_query_times = nbwidgets.QueryTime(\n",
    "    units=\"hours\",\n",
    "    origin_time=start_time,\n",
    "    before=1,\n",
    "    after=time_diff + 1,\n",
    "    max_before=20,\n",
    "    max_after=time_diff + 20,\n",
    ")\n",
    "proc_query_times.display()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ### Compute the relative rarity of processes in each session\n",
    " This should be a good guide to which sessions are the more interesting to look at.\n",
    " \n",
    " **Note** Clustering lots (1000s) of events will take a little time to compute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:30:56.847458Z",
     "start_time": "2020-05-15T23:30:36.409635Z"
    }
   },
   "outputs": [],
   "source": [
    "from msticpy.sectools.eventcluster import dbcluster_events, add_process_features\n",
    "from collections import Counter\n",
    "\n",
    "print(\"Getting process events...\", end=\"\")\n",
    "processes_on_host = qry_prov.WindowsSecurity.list_host_processes(\n",
    "    proc_query_times, host_name=host_entity.HostName\n",
    ")\n",
    "print(f\"done. {len(processes_on_host)} events\")\n",
    "print(\"Clustering. Please wait...\", end=\"\")\n",
    "feature_procs = add_process_features(input_frame=processes_on_host, path_separator=\"\\\\\")\n",
    "\n",
    "feature_procs[\"accountNum\"] = feature_procs.apply(\n",
    "    lambda x: _string_score(x.Account), axis=1\n",
    ")\n",
    "# you might need to play around with the max_cluster_distance parameter.\n",
    "# decreasing this gives more clusters.\n",
    "(clus_events, dbcluster, x_data) = dbcluster_events(\n",
    "    data=feature_procs,\n",
    "    cluster_columns=[\n",
    "        \"commandlineTokensFull\",\n",
    "        \"pathScore\",\n",
    "        \"accountNum\",\n",
    "        \"isSystemSession\",\n",
    "    ],\n",
    "    max_cluster_distance=0.0001,\n",
    ")\n",
    "print(\"done\")\n",
    "print(\"Number of input events:\", len(feature_procs))\n",
    "print(\"Number of clustered events:\", len(clus_events))\n",
    "\n",
    "# Join the clustered results back to the original process frame\n",
    "procs_with_cluster = feature_procs.merge(\n",
    "    clus_events[\n",
    "        [\n",
    "            \"commandlineTokensFull\",\n",
    "            \"accountNum\",\n",
    "            \"pathScore\",\n",
    "            \"isSystemSession\",\n",
    "            \"ClusterSize\",\n",
    "        ]\n",
    "    ],\n",
    "    on=[\"commandlineTokensFull\", \"accountNum\", \"pathScore\", \"isSystemSession\"],\n",
    ")\n",
    "# Rarity = inverse of cluster size\n",
    "procs_with_cluster[\"Rarity\"] = 1 / procs_with_cluster[\"ClusterSize\"]\n",
    "# count the number of processes for each logon ID\n",
    "lgn_proc_count = (\n",
    "    pd.concat(\n",
    "        [\n",
    "            processes_on_host.groupby(\"TargetLogonId\")[\"TargetLogonId\"].count(),\n",
    "            processes_on_host.groupby(\"SubjectLogonId\")[\"SubjectLogonId\"].count(),\n",
    "        ]\n",
    "    ).sum(level=0)\n",
    ").to_dict()\n",
    "\n",
    "# Display the results\n",
    "md(\"Sessions ordered by process rarity\", 'bold')\n",
    "md(\"Higher score indicates higher number of unusual processes\")\n",
    "process_rarity = (procs_with_cluster.groupby([\"SubjectUserName\", \"SubjectLogonId\"])\n",
    "    .agg({\"Rarity\": \"mean\", \"TimeGenerated\": \"count\"})\n",
    "    .rename(columns={\"TimeGenerated\": \"ProcessCount\"})\n",
    "    .reset_index())\n",
    "display(\n",
    "    process_rarity\n",
    "    .sort_values(\"Rarity\", ascending=False)\n",
    "    .style.bar(subset=[\"Rarity\"], color=\"#d65f5f\")\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Overview of session timelines for sessions with higher rarity score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:31:02.979872Z",
     "start_time": "2020-05-15T23:31:02.675871Z"
    }
   },
   "outputs": [],
   "source": [
    "# Display process timeline for 75% percentile rarest scores\n",
    "rare_sess = process_rarity[process_rarity[\"Rarity\"]\n",
    "                           > process_rarity[\"Rarity\"].quantile(.25)]\n",
    "rare_sessions = processes_on_host[(processes_on_host[\"SubjectLogonId\"]\n",
    "                                   .isin(rare_sess[\"SubjectLogonId\"]))\n",
    "                                  & (processes_on_host[\"SubjectUserName\"]\n",
    "                                     .isin(rare_sess[\"SubjectUserName\"]))]\n",
    "\n",
    "md(\"Timeline of sessions with higher process rarity\", \"large\")\n",
    "md(\"Multiple sessions (y-axis) may be are shown for each account.\")\n",
    "md(\"You will likely need to zoom in to see the individual session processes.\")\n",
    "\n",
    "nbdisplay.display_timeline(\n",
    "    data=rare_sessions,\n",
    "    group_by=\"SubjectLogonId\",\n",
    "    source_columns=[\"SubjectUserName\", \"SubjectLogonId\", \"NewProcessName\", \"CommandLine\"],\n",
    "    legend=\"right\",\n",
    "    yaxis=True\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View the processes for these Sessions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:31:09.344452Z",
     "start_time": "2020-05-15T23:31:09.280062Z"
    }
   },
   "outputs": [],
   "source": [
    "def view_logon_sess(logon_id=\"\"):\n",
    "    global selected_logon\n",
    "    selected_logon = host_logons[host_logons[\"TargetLogonId\"] == logon_id]\n",
    "\n",
    "    if all_procs.value:\n",
    "        sess_procs = processes_on_host.query(\n",
    "            \"TargetLogonId == @logon_id | SubjectLogonId == @logon_id\"\n",
    "        )\n",
    "    else:\n",
    "        sess_procs = procs_with_cluster.query(\"SubjectLogonId == @logon_id\")[\n",
    "            [\"NewProcessName\", \"CommandLine\", \"SubjectLogonId\", \"ClusterSize\"]\n",
    "        ].drop_duplicates()\n",
    "    display(sess_procs)\n",
    "\n",
    "sessions = list(process_rarity\n",
    "                .sort_values(\"Rarity\", ascending=False)\n",
    "                .apply(lambda x: (f\"{x.SubjectLogonId}  {x.SubjectUserName}   Rarity={x.Rarity}\",\n",
    "                                  x.SubjectLogonId), \n",
    "                       axis=1))\n",
    "all_procs = widgets.Checkbox(\n",
    "    value=False,\n",
    "    description=\"View All Processes (Show clustered only if not checked)\",\n",
    "    **WIDGET_DEFAULTS,\n",
    ")\n",
    "display(all_procs)\n",
    "logon_wgt = nbwidgets.SelectString(\n",
    "    description=\"Select logon session to examine\",\n",
    "    item_dict={label: val for label, val in sessions},\n",
    "    height=\"300px\",\n",
    "    width=\"100%\",\n",
    "    auto_display=True,\n",
    "    action=view_logon_sess,\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Browse All Sessions (Optional)\n",
    "\n",
    "**If the previous section did not reveal anything interesting you can opt to browse all logon sessions.**\n",
    "\n",
    "**Otherwise, skip to the [Check Commandline for IoCs section](#Check-for-IOCs-in-Commandline-for-selected-session)**\n",
    "\n",
    "To do this you need to first pick an account + logon type (in the following cell) then pick a particular session that you want to view in the subsequent cell. Use the rarity score from the previous graph to guide you.\n",
    "\n",
    " ### Step 1 - Select a logon ID and Type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-10-19T01:18:41.70951Z",
     "start_time": "2019-10-19T01:18:41.686523Z"
    }
   },
   "outputs": [],
   "source": [
    "logon_wgt2 = nbwidgets.SelectString(\n",
    "    description=\"Select logon cluster to examine\",\n",
    "    item_dict=dist_logons,\n",
    "    height=\"200px\",\n",
    "    width=\"100%\",\n",
    "    auto_display=True,\n",
    ")\n",
    "all_procs = widgets.Checkbox(\n",
    "    value=False,\n",
    "    description=\"View All Processes (Clustered only if not checked)\",\n",
    "    **WIDGET_DEFAULTS,\n",
    ")\n",
    "display(all_procs)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ### Step 2 - Pick a logon session to view its processes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-10-19T01:18:55.502417Z",
     "start_time": "2019-10-19T01:18:55.450467Z"
    }
   },
   "outputs": [],
   "source": [
    "selected_logon_cluster = get_selected_logon_cluster(logon_wgt2.value)\n",
    "\n",
    "selected_tgt_logon = selected_logon_cluster[\"TargetUserName\"].iat[0]\n",
    "system_logon = selected_tgt_logon.lower() == \"system\" or selected_tgt_logon.endswith(\n",
    "    \"$\"\n",
    ")\n",
    "\n",
    "if system_logon:\n",
    "    display(\n",
    "        Markdown(\n",
    "            '<h3><p style=\"color:red\">Warning: the selected '\n",
    "            \"account name appears to be a system account.</p></h1><br>\"\n",
    "            \"<i>It is difficult to accurately associate processes \"\n",
    "            \"with the specific logon sessions.<br>\"\n",
    "            \"Showing clustered events for entire time selection.\"\n",
    "        )\n",
    "    )\n",
    "    display(\n",
    "        clus_events.sort_values(\"TimeGenerated\")[\n",
    "            [\n",
    "                \"TimeGenerated\",\n",
    "                \"LastEventTime\",\n",
    "                \"NewProcessName\",\n",
    "                \"CommandLine\",\n",
    "                \"ClusterSize\",\n",
    "                \"commandlineTokensFull\",\n",
    "                \"pathScore\",\n",
    "                \"isSystemSession\",\n",
    "            ]\n",
    "        ]\n",
    "    )\n",
    "\n",
    "# Display a pick list for logon instances\n",
    "sel_1 = host_logons[\"TargetLogonId\"].isin(lgn_proc_count)\n",
    "sel_2 = host_logons[\"TargetUserName\"] == selected_tgt_logon\n",
    "items = (\n",
    "    host_logons[sel_1 & sel_2]\n",
    "    .sort_values(\"TimeGenerated\")\n",
    "    .apply(\n",
    "        lambda x: (\n",
    "            f\"{x.TargetUserName}:    \"\n",
    "            f\"(logontype={x.LogonType})   \"\n",
    "            f\"(timestamp={x.TimeGenerated})    \"\n",
    "            f\"logonid={x.TargetLogonId}\"\n",
    "        ),\n",
    "        axis=1,\n",
    "    )\n",
    "    .values.tolist()\n",
    ")\n",
    "if not items:\n",
    "    items = [\"No processes for logon\"]\n",
    "sess_w = widgets.Select(\n",
    "    options=items, description=\"Select logon instance to examine\", **WIDGET_DEFAULTS\n",
    ")\n",
    "\n",
    "import re\n",
    "\n",
    "logon_list_regex = r\"\"\"\n",
    "(?P<acct>[^:]+):\\s+\n",
    "\\(logontype=(?P<logon_type>[^)]+)\\)\\s+\n",
    "\\(timestamp=(?P<time>[^)]+)\\)\\s+\n",
    "logonid=(?P<logonid>[0-9a-fx)]+)\n",
    "\"\"\"\n",
    "\n",
    "\n",
    "def get_selected_logon(selected_item):\n",
    "    acct_match = re.search(logon_list_regex, selected_item, re.VERBOSE)\n",
    "    if acct_match:\n",
    "        acct = acct_match[\"acct\"]\n",
    "        logon_type = int(acct_match[\"logon_type\"])\n",
    "        time_stamp = pd.to_datetime(acct_match[\"time\"])\n",
    "        logon_id = acct_match[\"logonid\"]\n",
    "        return host_logons.query(\n",
    "            \"TargetUserName == @acct and LogonType == @logon_type\"\n",
    "            \" and TargetLogonId == @logon_id\"\n",
    "        )\n",
    "\n",
    "\n",
    "def view_logon_sess(x=\"\"):\n",
    "    global selected_logon\n",
    "    selected_logon = get_selected_logon(x)\n",
    "    if selected_logon is not None:\n",
    "        logonId = selected_logon[\"TargetLogonId\"].iloc[0]\n",
    "        if all_procs.value:\n",
    "            sess_procs = processes_on_host.query(\n",
    "                \"TargetLogonId == @logonId | SubjectLogonId == @logonId\"\n",
    "            )\n",
    "        else:\n",
    "            sess_procs = procs_with_cluster.query(\"SubjectLogonId == @logonId\")[\n",
    "                [\"NewProcessName\", \"CommandLine\", \"SubjectLogonId\", \"ClusterSize\"]\n",
    "            ].drop_duplicates()\n",
    "        display(sess_procs)\n",
    "\n",
    "\n",
    "widgets.interactive(view_logon_sess, x=sess_w)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a></a>[Contents](#toc)\n",
    " # Check for IOCs in Commandline for selected session\n",
    " This section looks for Indicators of Compromise (IoC) within the data sets passed to it.\n",
    " \n",
    " The input data for this comes from the session you picked in the following sections:\n",
    " - [Compute the relative rarity of processes in each session](#View-the-processes-for-these-Sessions)\n",
    " - [Browse All Sessions](#Step-2---Pick-a-logon-session-to-view-its-processes)\n",
    " \n",
    "To change which section is used - go back to the desired section and pick a session or re-run the entire cell.\n",
    "\n",
    "## Extract IoCs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-12T22:34:30.885408Z",
     "start_time": "2019-09-12T22:34:30.882411Z"
    }
   },
   "outputs": [],
   "source": [
    "# Use this to search all process commandlines\n",
    "# ioc_df = ioc_extractor.extract(\n",
    "#     data=procs_with_cluster,\n",
    "#     columns=[\"CommandLine\"],\n",
    "#     os_family=os_family,\n",
    "#     ioc_types=[\"ipv4\", \"ipv6\", \"dns\", \"url\", \"md5_hash\", \"sha1_hash\", \"sha256_hash\"],\n",
    "# )\n",
    "# ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:31:26.280399Z",
     "start_time": "2020-05-15T23:31:25.106400Z"
    }
   },
   "outputs": [],
   "source": [
    "selected_tgt_logon = selected_logon[\"TargetUserName\"].iat[0]\n",
    "system_logon = (selected_tgt_logon.lower() == \"system\" \n",
    "                or selected_tgt_logon.endswith(\"$\"))\n",
    "if not system_logon:\n",
    "    logonId = selected_logon[\"TargetLogonId\"].iloc[0]\n",
    "    sess_procs = processes_on_host.query(\n",
    "        \"TargetLogonId == @logonId | SubjectLogonId == @logonId\"\n",
    "    )\n",
    "else:\n",
    "    sess_procs = clus_events\n",
    "\n",
    "ioc_extractor = IoCExtract()\n",
    "os_family = host_entity.OSType if host_entity.OSType else \"Windows\"\n",
    "\n",
    "ioc_df = ioc_extractor.extract(\n",
    "    data=sess_procs,\n",
    "    columns=[\"CommandLine\"],\n",
    "    os_family=os_family,\n",
    "    ioc_types=[\"ipv4\", \"ipv6\", \"dns\", \"url\", \"md5_hash\", \"sha1_hash\", \"sha256_hash\"],\n",
    ")\n",
    "if len(ioc_df):\n",
    "    display(Markdown(\"### IoC patterns found in process set.\"))\n",
    "    display(ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates())\n",
    "else:\n",
    "    display(Markdown(\"### No IoC patterns found in process tree.\"))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## If any Base64 encoded strings, decode and search for IoCs in the results.\n",
    " &lt;details&gt;\n",
    "     <summary> <u>Details...</u></summary>\n",
    " This section looks for base64 encoded strings within the data - this is a common way of hiding attacker intent. It attempts to decode any strings that look like base64. Additionally, if the base64 decode operation returns any items that look like a base64 encoded string or file, a gzipped binary sequence, a zipped or tar archive, it will attempt to extract the contents before searching for potentially interesting IoC observables within the decoded data.\n",
    "\n",
    " For simple strings the Base64 decoded output is straightforward. However for nested encodings this can get a little complex and difficult to represent in a tabular format.\n",
    "\n",
    " **Columns**\n",
    "  - reference - The index of the row item in dotted notation in depth.seq pairs (e.g. 1.2.2.3 would be the 3 item at depth 3 that is a child of the 2nd item found at depth 1). This may not always be an accurate notation - it is mainly use to allow you to associate an individual row with the reference value contained in the full_decoded_string column of the topmost item).\n",
    "  - original_string - the original string before decoding.\n",
    "  - file_name - filename, if any (only if this is an item in zip or tar file).\n",
    "  - file_type - a guess at the file type (this is currently elementary and only includes a few file types).\n",
    "  - input_bytes - the decoded bytes as a Python bytes string.\n",
    "  - decoded_string - the decoded string if it can be decoded as a UTF-8 or UTF-16 string. Note: binary sequences may often successfully decode as UTF-16 strings but, in these cases, the decodings are meaningless.\n",
    "  - encoding_type - encoding type (UTF-8 or UTF-16) if a decoding was possible, otherwise 'binary'.\n",
    "  - file_hashes - collection of file hashes for any decoded item.\n",
    "  - md5 - md5 hash as a separate column.\n",
    "  - sha1 - sha1 hash as a separate column.\n",
    "  - sha256 - sha256 hash as a separate column.\n",
    "  - printable_bytes - printable version of input_bytes as a string of \\xNN values\n",
    "  - src_index - the index of the row in the input dataframe from which the data came.\n",
    "  - full_decoded_string - the full decoded string with any decoded replacements. This is only really useful for top-level items, since nested items will only show the 'full' string representing the child fragment.\n",
    " &lt;/details&gt;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:31:27.893142Z",
     "start_time": "2020-05-15T23:31:27.880144Z"
    }
   },
   "outputs": [],
   "source": [
    "\n",
    "dec_df = base64.unpack_items(data=sess_procs, column=\"CommandLine\")\n",
    "if dec_df is not None and len(dec_df) > 0:\n",
    "    display(HTML(\"<h3>Decoded base 64 command lines</h3>\"))\n",
    "    display(HTML(\"Decoded values and hashes of decoded values shown below.\"))\n",
    "    display(\n",
    "        HTML(\n",
    "            \"Warning - some binary patterns may be decodable as unicode strings. \"\n",
    "            'In these cases you should ignore the \"decoded_string\" column '\n",
    "            'and treat the encoded item as a binary - using the \"printable_bytes\" '\n",
    "            \"column or treat the decoded_string as a binary (bytes) value.\"\n",
    "        )\n",
    "    )\n",
    "\n",
    "    display(\n",
    "        dec_df[\n",
    "            [\n",
    "                \"full_decoded_string\",\n",
    "                \"decoded_string\",\n",
    "                \"original_string\",\n",
    "                \"printable_bytes\",\n",
    "                \"file_hashes\",\n",
    "            ]\n",
    "        ]\n",
    "    )\n",
    "\n",
    "    ioc_dec_df = ioc_extractor.extract(data=dec_df, columns=[\"full_decoded_string\"])\n",
    "    if len(ioc_dec_df):\n",
    "        display(HTML(\"<h3>IoC patterns found in events with base64 decoded data</h3>\"))\n",
    "        display(ioc_dec_df)\n",
    "        ioc_df = ioc_df.append(ioc_dec_df, ignore_index=True)\n",
    "else:\n",
    "    print(\"No base64 encodings found.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "  [Contents](#Contents)\n",
    " ## Threat Intel Lookup\n",
    "\n",
    " This section takes the output from the IoC Extraction from commandlines and submits it to Threat Intelligence services to see if any of the IoC are known threats.\n",
    "\n",
    "Please take a moment to review the `Selected` list below and remove any items that are obviously not items that you want to lookup (e.g. `myadmintool.ps` is almost certainly a PowerShell script but is also a valid match for a legal DNS domain).\n",
    "<details>\n",
    "    <summary>TI Configuration</summary>\n",
    "If you have not used msticpy threat intelligence lookups before you will need to supply API keys for the \n",
    "TI Providers that you want to use. Please see the section on configuring [msticpyconfig.yaml](#msticpyconfig.yaml-configuration-File)\n",
    "\n",
    "Then reload provider settings:\n",
    "```\n",
    "mylookup = TILookup()\n",
    "mylookup.reload_provider_settings()\n",
    "```\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:31:36.912323Z",
     "start_time": "2020-05-15T23:31:30.600559Z"
    }
   },
   "outputs": [],
   "source": [
    "ti_lookup = TILookup()\n",
    "ti_lookup.provider_status\n",
    "\n",
    "items = list(ioc_df[\"Observable\"].values)\n",
    "ioc_ss = nbwidgets.SelectSubset(\n",
    "    items,\n",
    "    default_selected=items\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:31:47.445775Z",
     "start_time": "2020-05-15T23:31:37.910324Z"
    }
   },
   "outputs": [],
   "source": [
    "iocs_to_check = (ioc_df[ioc_df[\"Observable\"].isin(ioc_ss.selected_items)]\n",
    "                 [[\"IoCType\", \"Observable\"]].drop_duplicates())\n",
    "\n",
    "ti_lookup.lookup_iocs(data=iocs_to_check, obs_col=\"Observable\", ioc_type_col=\"IoCType\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a></a>[Contents](#toc)\n",
    " # Network Check Communications with Other Hosts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:32:07.424166Z",
     "start_time": "2020-05-15T23:32:07.379166Z"
    }
   },
   "outputs": [],
   "source": [
    "ip_q_times = nbwidgets.QueryTime(\n",
    "    label=\"Set time bounds for network queries\",\n",
    "    units=\"day\",\n",
    "    max_before=28,\n",
    "    before=2,\n",
    "    after=5,\n",
    "    max_after=28,\n",
    "    origin_time=proc_query_times.origin_time\n",
    ")\n",
    "ip_q_times.display()\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Query Flows by Host IP Addresses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:32:15.750114Z",
     "start_time": "2020-05-15T23:32:09.049085Z"
    }
   },
   "outputs": [],
   "source": [
    "if \"AzureNetworkAnalytics_CL\" not in table_index:\n",
    "    md_warn(\"AzureNetworkAnalytics_CL table is not available.\")\n",
    "    md(\"No network flow data available. \"\n",
    "       + \"Please skip the remainder of this section.\",\n",
    "      \"blue, bold\")\n",
    "    az_net_comms_df = None\n",
    "else:\n",
    "    all_host_ips = (\n",
    "        host_entity.private_ips + host_entity.public_ips + [host_entity.IPAddress]\n",
    "    )\n",
    "    host_ips = [i.Address for i in all_host_ips]\n",
    "\n",
    "    az_net_comms_df = qry_prov.Network.list_azure_network_flows_by_ip(\n",
    "        ip_q_times, ip_address_list=host_ips\n",
    "    )\n",
    "\n",
    "    if isinstance(az_net_comms_df, pd.DataFrame) and not az_net_comms_df.empty:\n",
    "        az_net_comms_df['TotalAllowedFlows'] = az_net_comms_df['AllowedOutFlows'] + az_net_comms_df['AllowedInFlows']\n",
    "        nbdisplay.display_timeline(\n",
    "            data=az_net_comms_df,\n",
    "            group_by=\"L7Protocol\",\n",
    "            title=\"Network Flows by Protocol\",\n",
    "            time_column=\"FlowStartTime\",\n",
    "            source_columns=[\"FlowType\", \"AllExtIPs\", \"L7Protocol\", \"FlowDirection\"],\n",
    "            height=300,\n",
    "            legend=\"right\",\n",
    "            yaxis=True\n",
    "        )\n",
    "        nbdisplay.display_timeline(\n",
    "            data=az_net_comms_df,\n",
    "            group_by=\"FlowDirection\",\n",
    "            title=\"Network Flows by Direction\",\n",
    "            time_column=\"FlowStartTime\",\n",
    "            source_columns=[\"FlowType\", \"AllExtIPs\", \"L7Protocol\", \"FlowDirection\"],\n",
    "            height=300,\n",
    "            legend=\"right\",\n",
    "            yaxis=True\n",
    "        )\n",
    "    else:\n",
    "        print(\"No network data for specified time range.\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:32:17.381114Z",
     "start_time": "2020-05-15T23:32:17.067117Z"
    }
   },
   "outputs": [],
   "source": [
    "if az_net_comms_df is not None and not az_net_comms_df.empty:\n",
    "    flow_plot = nbdisplay.display_timeline_values(\n",
    "        data=az_net_comms_df,\n",
    "        group_by=\"L7Protocol\",\n",
    "        source_columns=[\"FlowType\", \n",
    "                        \"AllExtIPs\", \n",
    "                        \"L7Protocol\", \n",
    "                        \"FlowDirection\", \n",
    "                        \"TotalAllowedFlows\"],\n",
    "        time_column=\"FlowStartTime\",\n",
    "        title=\"Network flows by Layer 7 Protocol\",\n",
    "        y=\"TotalAllowedFlows\",\n",
    "        ref_event=related_alert,\n",
    "        legend=\"right\",\n",
    "        height=500,\n",
    "        kind=[\"vbar\", \"circle\"]\n",
    "    );\n",
    "else:\n",
    "    md(\"No network data for specified time range.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Flow Summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:32:18.693115Z",
     "start_time": "2020-05-15T23:32:18.586117Z"
    }
   },
   "outputs": [],
   "source": [
    "if az_net_comms_df is not None and not az_net_comms_df.empty:\n",
    "    cm = sns.light_palette(\"green\", as_cmap=True)\n",
    "\n",
    "    cols = [\n",
    "        \"VMName\",\n",
    "        \"VMIPAddress\",\n",
    "        \"PublicIPs\",\n",
    "        \"SrcIP\",\n",
    "        \"DestIP\",\n",
    "        \"L4Protocol\",\n",
    "        \"L7Protocol\",\n",
    "        \"DestPort\",\n",
    "        \"FlowDirection\",\n",
    "        \"AllExtIPs\",\n",
    "        \"TotalAllowedFlows\",\n",
    "    ]\n",
    "    flow_index = az_net_comms_df[cols].copy()\n",
    "\n",
    "    def get_source_ip(row):\n",
    "        if row.FlowDirection == \"O\":\n",
    "            return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n",
    "        else:\n",
    "            return row.AllExtIPs if row.AllExtIPs else row.DestIP\n",
    "\n",
    "    def get_dest_ip(row):\n",
    "        if row.FlowDirection == \"O\":\n",
    "            return row.AllExtIPs if row.AllExtIPs else row.DestIP\n",
    "        else:\n",
    "            return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n",
    "\n",
    "    flow_index[\"source\"] = flow_index.apply(get_source_ip, axis=1)\n",
    "    flow_index[\"dest\"] = flow_index.apply(get_dest_ip, axis=1)\n",
    "    md(f\"{len(flow_index)} events in flow index\")\n",
    "\n",
    "    # Uncomment to view flow_index results\n",
    "#     with warnings.catch_warnings():\n",
    "#         warnings.simplefilter(\"ignore\")\n",
    "#         display(\n",
    "#             flow_index[\n",
    "#                 [\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\", \"TotalAllowedFlows\"]\n",
    "#             ]\n",
    "#             .groupby([\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\"])\n",
    "#             .sum()\n",
    "#             .reset_index()\n",
    "#             .style.bar(subset=[\"TotalAllowedFlows\"], color=\"#d65f5f\")\n",
    "#         )\n",
    "else:\n",
    "    md(\"No network data for specified time range.\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:33:03.907521Z",
     "start_time": "2020-05-15T23:32:26.377680Z"
    }
   },
   "outputs": [],
   "source": [
    "# WHOIS lookup function\n",
    "from functools import lru_cache\n",
    "from ipwhois import IPWhois\n",
    "from ipaddress import ip_address\n",
    "\n",
    "\n",
    "@lru_cache(maxsize=1024)\n",
    "def get_whois_info(ip_lookup, show_progress=False):\n",
    "    try:\n",
    "        ip = ip_address(ip_lookup)\n",
    "    except ValueError:\n",
    "        return \"Not an IP Address\", {}\n",
    "    if ip.is_private:\n",
    "        return \"private address\", {}\n",
    "    if not ip.is_global:\n",
    "        return \"other address\", {}\n",
    "    whois = IPWhois(ip)\n",
    "    whois_result = whois.lookup_whois()\n",
    "    if show_progress:\n",
    "        print(\".\", end=\"\")\n",
    "    return whois_result[\"asn_description\"], whois_result\n",
    "\n",
    "\n",
    "# Add ASN informatio from Whois\n",
    "flows_df = (\n",
    "    flow_index[[\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\", \"TotalAllowedFlows\"]]\n",
    "    .groupby([\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\"])\n",
    "    .sum()\n",
    "    .reset_index()\n",
    ")\n",
    "\n",
    "num_ips = len(flows_df[\"source\"].unique()) + len(flows_df[\"dest\"].unique())\n",
    "print(f\"Performing WhoIs lookups for {num_ips} IPs \", end=\"\")\n",
    "#flows_df = flows_df.assign(DestASN=\"\", DestASNFull=\"\", SourceASN=\"\", SourceASNFull=\"\")\n",
    "flows_df[\"DestASN\"] = flows_df.apply(lambda x: get_whois_info(x.dest, True), axis=1)\n",
    "flows_df[\"SourceASN\"] = flows_df.apply(lambda x: get_whois_info(x.source, True), axis=1)\n",
    "print(\"done\")\n",
    "\n",
    "# Split the tuple returned by get_whois_info into separate columns\n",
    "flows_df[\"DestASNFull\"] = flows_df.apply(lambda x: x.DestASN[1], axis=1)\n",
    "flows_df[\"DestASN\"] = flows_df.apply(lambda x: x.DestASN[0], axis=1)\n",
    "flows_df[\"SourceASNFull\"] = flows_df.apply(lambda x: x.SourceASN[1], axis=1)\n",
    "flows_df[\"SourceASN\"] = flows_df.apply(lambda x: x.SourceASN[0], axis=1)\n",
    "\n",
    "our_host_asns = [get_whois_info(ip.Address)[0] for ip in host_entity.public_ips]\n",
    "md(f\"Host {host_entity.HostName} ASNs:\", \"bold\")\n",
    "md(str(our_host_asns))\n",
    "\n",
    "flow_sum_df = flows_df.groupby([\"DestASN\", \"SourceASN\"]).agg(\n",
    "    TotalAllowedFlows=pd.NamedAgg(column=\"TotalAllowedFlows\", aggfunc=\"sum\"),\n",
    "    L7Protocols=pd.NamedAgg(column=\"L7Protocol\", aggfunc=lambda x: x.unique().tolist()),\n",
    "    source_ips=pd.NamedAgg(column=\"source\", aggfunc=lambda x: x.unique().tolist()),\n",
    "    dest_ips=pd.NamedAgg(column=\"dest\", aggfunc=lambda x: x.unique().tolist()),\n",
    ").reset_index()\n",
    "flow_sum_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Choose ASNs/IPs to Check for Threat Intel Reports\n",
    "Choose from the list of Selected ASNs for the IPs you wish to check on.\n",
    "The Source list is been pre-populated with all ASNs found in the network flow summary.\n",
    "\n",
    "As an example, we've populated the `Selected` list with the ASNs that have the lowest number of flows to and from the host. We also remove the ASN that matches the ASN of the host we are investigating.\n",
    "\n",
    "Please edit this list, using flow summary data above as a guide and leaving only ASNs that you are suspicious about. Typicially these would be ones with relatively low `TotalAllowedFlows` and possibly with unusual `L7Protocols`.\n",
    "<details>\n",
    "    <summary>TI Configuration</summary>\n",
    "If you have not used msticpy threat intelligence lookups before you will need to supply API keys for the \n",
    "TI Providers that you want to use. Please see the section on configuring [msticpyconfig.yaml](#msticpyconfig.yaml-configuration-File)\n",
    "\n",
    "Then reload provider settings:\n",
    "```\n",
    "mylookup = TILookup()\n",
    "mylookup.reload_provider_settings()\n",
    "```\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:33:05.240520Z",
     "start_time": "2020-05-15T23:33:05.179518Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "all_asns = list(flow_sum_df[\"DestASN\"].unique()) + list(flow_sum_df[\"SourceASN\"].unique())\n",
    "all_asns = set(all_asns) - set([\"private address\"])\n",
    "\n",
    "# Select the ASNs in the 25th percentile (lowest number of flows)\n",
    "quant_25pc = flow_sum_df[\"TotalAllowedFlows\"].quantile(q=[0.25]).iat[0]\n",
    "quant_25pc_df = flow_sum_df[flow_sum_df[\"TotalAllowedFlows\"] <= quant_25pc]\n",
    "other_asns = list(quant_25pc_df[\"DestASN\"].unique()) + list(quant_25pc_df[\"SourceASN\"].unique())\n",
    "other_asns = set(other_asns) - set(our_host_asns)\n",
    "md(\"Choose IPs from Selected ASNs to look up for Threat Intel.\", \"bold\")\n",
    "sel_asn = nbwidgets.SelectSubset(source_items=all_asns, default_selected=other_asns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:33:20.587377Z",
     "start_time": "2020-05-15T23:33:06.602518Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from itertools import chain\n",
    "from msticpy.sectools.tiproviders.ti_provider_base import TISeverity\n",
    "\n",
    "def ti_check_ser_sev(severity, threshold):\n",
    "    threshold = TISeverity.parse(threshold)\n",
    "    return severity.apply(lambda x: TISeverity.parse(x) >= threshold)\n",
    "\n",
    "\n",
    "dest_ips = set(chain.from_iterable(flow_sum_df[flow_sum_df[\"DestASN\"].isin(sel_asn.selected_items)][\"dest_ips\"]))\n",
    "src_ips = set(chain.from_iterable(flow_sum_df[flow_sum_df[\"SourceASN\"].isin(sel_asn.selected_items)][\"source_ips\"]))\n",
    "selected_ips = dest_ips | src_ips\n",
    "md(f\"{len(selected_ips)} unique IPs in selected ASNs\")\n",
    "\n",
    "# Add the IoCType to save cost of inferring each item\n",
    "md(\"Looking up TI...\")\n",
    "selected_ip_dict = {ip: \"ipv4\" for ip in selected_ips}\n",
    "ti_results = ti_lookup.lookup_iocs(data=selected_ip_dict)\n",
    "\n",
    "md(f\"{len(ti_results)} results received.\")\n",
    "\n",
    "ti_results_pos = ti_results[ti_check_ser_sev(ti_results[\"Severity\"], 1)]\n",
    "print(f\"{len(ti_results_pos)} positive results found.\")\n",
    "\n",
    "if not ti_results_pos.empty:\n",
    "    src_pos = flows_df.merge(ti_results_pos, left_on=\"source\", right_on=\"Ioc\")\n",
    "    dest_pos = flows_df.merge(ti_results_pos, left_on=\"dest\", right_on=\"Ioc\")\n",
    "    ti_ip_results = pd.concat([src_pos, dest_pos])\n",
    "    md_warn(\"Positive Threat Intel Results found for the following flows\")\n",
    "    md(\"Please examine these IP flows using the IP Explorer notebook.\", \"bold, large\")\n",
    "    display(ti_ip_results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## GeoIP Map of External IPs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-10-19T01:25:26.845038Z",
     "start_time": "2019-10-19T01:25:26.778039Z"
    }
   },
   "outputs": [],
   "source": [
    "def format_ip_entity(row, ip_col):\n",
    "    ip_entity = entities.IpAddress(Address=row[ip_col])\n",
    "    iplocation.lookup_ip(ip_entity=ip_entity)\n",
    "    ip_entity.AdditionalData[\"protocol\"] = row.L7Protocol\n",
    "    if \"severity\" in row:\n",
    "        ip_entity.AdditionalData[\"threat severity\"] = row[\"severity\"]\n",
    "    if \"Details\" in row:\n",
    "        ip_entity.AdditionalData[\"threat details\"] = row[\"Details\"]\n",
    "    return ip_entity\n",
    "\n",
    "# from msticpy.nbtools.foliummap import FoliumMap\n",
    "folium_map = FoliumMap(zoom_start=4)\n",
    "if az_net_comms_df is None or az_net_comms_df.empty:\n",
    "    print(\"No network flow data available.\")\n",
    "else:\n",
    "    # Get the flow records for all flows not in the TI results\n",
    "    selected_out = flows_df[flows_df[\"DestASN\"].isin(sel_asn.selected_items)]\n",
    "    selected_out = selected_out[~selected_out[\"dest\"].isin(ti_ip_results[\"Ioc\"])]\n",
    "    if selected_out.empty:\n",
    "        ips_out = []\n",
    "    else:\n",
    "        ips_out = list(selected_out.apply(lambda x: format_ip_entity(x, \"dest\"), axis=1))\n",
    "    \n",
    "    selected_in = flows_df[flows_df[\"SourceASN\"].isin(sel_asn.selected_items)]\n",
    "    selected_in = selected_in[~selected_in[\"source\"].isin(ti_ip_results[\"Ioc\"])]\n",
    "    if selected_in.empty:\n",
    "        ips_in = []\n",
    "    else:\n",
    "        ips_in = list(selected_in.apply(lambda x: format_ip_entity(x, \"source\"), axis=1))\n",
    "\n",
    "    ips_threats = list(ti_ip_results.apply(lambda x: format_ip_entity(x, \"Ioc\"), axis=1))\n",
    "\n",
    "    display(HTML(\"<h3>External IP Addresses communicating with host</h3>\"))\n",
    "    display(HTML(\"Numbered circles indicate multiple items - click to expand\"))\n",
    "    display(HTML(\"Location markers: <br>Blue = outbound, Purple = inbound, Green = Host, Red = Threats\"))\n",
    "\n",
    "    icon_props = {\"color\": \"green\"}\n",
    "    for ips in host_entity.public_ips:\n",
    "        ips.AdditionalData[\"host\"] = host_entity.HostName\n",
    "    folium_map.add_ip_cluster(ip_entities=host_entity.public_ips, **icon_props)\n",
    "    icon_props = {\"color\": \"blue\"}\n",
    "    folium_map.add_ip_cluster(ip_entities=ips_out, **icon_props)\n",
    "    icon_props = {\"color\": \"purple\"}\n",
    "    folium_map.add_ip_cluster(ip_entities=ips_in, **icon_props)\n",
    "    icon_props = {\"color\": \"red\"}\n",
    "    folium_map.add_ip_cluster(ip_entities=ips_threats, **icon_props)\n",
    "    folium_map.center_map()\n",
    "    display(folium_map)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " <a></a>[Contents](#toc)\n",
    " # Appendices"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Available DataFrames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"List of current DataFrames in Notebook\")\n",
    "print(\"-\" * 50)\n",
    "current_vars = list(locals().keys())\n",
    "for var_name in current_vars:\n",
    "    if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith(\"_\"):\n",
    "        print(var_name)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Saving Data to Excel\n",
    " To save the contents of a pandas DataFrame to an Excel spreadsheet\n",
    " use the following syntax\n",
    " ```\n",
    " writer = pd.ExcelWriter('myWorksheet.xlsx')\n",
    " my_data_frame.to_excel(writer,'Sheet1')\n",
    " writer.save()\n",
    " ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuration\n",
    "\n",
    "### `msticpyconfig.yaml` configuration File\n",
    "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n",
    "\n",
    "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)"
   ]
  }
 ],
 "metadata": {
  "file_extension": ".py",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python (condadev)",
   "language": "python",
   "name": "condadev"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "mimetype": "text/x-python",
  "name": "python",
  "npconvert_exporter": "python",
  "pygments_lexer": "ipython3",
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "323.667px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "position": {
    "height": "649.85px",
    "left": "1596px",
    "right": "20px",
    "top": "120px",
    "width": "350px"
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  },
  "version": 3,
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}