{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-25T19:53:48.349636Z",
     "start_time": "2019-09-25T19:53:48.344638Z"
    }
   },
   "source": [
    "#  Entity Explorer - Domain and URL\r\n",
    " <details>\r\n",
    "     <summary> <u>Details...</u></summary>\r\n",
    "\r\n",
    " **Notebook Version:** 1.0<br>\r\n",
    " **Python Version:** Python 3.6 (including Python 3.6 - AzureML)<br>\r\n",
    " **Required Packages**: kqlmagic, msticpy, pandas, numpy, matplotlib, networkx, ipywidgets, ipython, dnspython, ipwhois, folium, maxminddb_geolite2<br>\r\n",
    "\r\n",
    " **Data Sources Required**:\r\n",
    " - Log Analytics - Syslog, SecurityEvent, DnsEvents, CommonSecurityLog, AzureNetworkAnalytics_CL<br> \r\n",
    "**TI Proviers Used**\r\n",
    " - VirusTotal, Open Page Rank, BrowShot(all required for certain elements), AlienVault OTX, IBM XForce (optional) - all providers require accounts and API keys\r\n",
    " </details>\r\n",
    "\r\n",
    "This Notebooks brings together a series of tools and techniques to enable threat hunting within the context of a domain name or URL that has been identified as of interest. It provides a series of techniques to assist in determining whether a domain or URL is malicious. Once this has been established it provides an overview of the scope of the domain or URL across an environment, along with indicators of areas for further investigation such as hosts of interest.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Hunting-Hypothesis:\" data-toc-modified-id=\"Hunting-Hypothesis:-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Hunting Hypothesis:</a></span><ul class=\"toc-item\"><li><span><a href=\"#Notebook-initialization\" data-toc-modified-id=\"Notebook-initialization-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Notebook initialization</a></span></li><li><span><a href=\"#Get-WorkspaceId-and-Authenticate-to-Log-Analytics\" data-toc-modified-id=\"Get-WorkspaceId-and-Authenticate-to-Log-Analytics-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Get WorkspaceId and Authenticate to Log Analytics</a></span><ul class=\"toc-item\"><li><span><a href=\"#Authentication-and-Configuration-Problems\" data-toc-modified-id=\"Authentication-and-Configuration-Problems-1.2.1\"><span class=\"toc-item-num\">1.2.1&nbsp;&nbsp;</span>Authentication and Configuration Problems</a></span></li></ul></li></ul></li><li><span><a href=\"#Select-the-domain-or-URL-you-wish-to-investigate\" data-toc-modified-id=\"Select-the-domain-or-URL-you-wish-to-investigate-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Select the domain or URL you wish to investigate</a></span></li><li><span><a href=\"#Domain-Overview\" data-toc-modified-id=\"Domain-Overview-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Domain Overview</a></span><ul class=\"toc-item\"><li><span><a href=\"#Threat-Intelligence\" data-toc-modified-id=\"Threat-Intelligence-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>Threat Intelligence</a></span><ul class=\"toc-item\"><li><span><a href=\"#msticpyconfig.yaml-configuration-File\" data-toc-modified-id=\"msticpyconfig.yaml-configuration-File-3.1.1\"><span class=\"toc-item-num\">3.1.1&nbsp;&nbsp;</span><code>msticpyconfig.yaml</code> configuration File</a></span></li></ul></li><li><span><a href=\"#Domain-analysis\" data-toc-modified-id=\"Domain-analysis-3.2\"><span class=\"toc-item-num\">3.2&nbsp;&nbsp;</span>Domain analysis</a></span></li><li><span><a href=\"#TLS-Cert-Details\" data-toc-modified-id=\"TLS-Cert-Details-3.3\"><span class=\"toc-item-num\">3.3&nbsp;&nbsp;</span>TLS Cert Details</a></span></li><li><span><a href=\"#Reverse-DNS-details\" data-toc-modified-id=\"Reverse-DNS-details-3.4\"><span class=\"toc-item-num\">3.4&nbsp;&nbsp;</span>Reverse DNS details</a></span></li><li><span><a href=\"#Site-Screenshot\" data-toc-modified-id=\"Site-Screenshot-3.5\"><span class=\"toc-item-num\">3.5&nbsp;&nbsp;</span>Site Screenshot</a></span></li><li><span><a href=\"#Domain-Summary\" data-toc-modified-id=\"Domain-Summary-3.6\"><span class=\"toc-item-num\">3.6&nbsp;&nbsp;</span>Domain Summary</a></span></li></ul></li><li><span><a href=\"#Related-Alerts\" data-toc-modified-id=\"Related-Alerts-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Related Alerts</a></span></li><li><span><a href=\"#Domain-or-URL-in-Logs\" data-toc-modified-id=\"Domain-or-URL-in-Logs-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Domain or URL in Logs</a></span><ul class=\"toc-item\"><li><span><a href=\"#Host-Logs\" data-toc-modified-id=\"Host-Logs-5.1\"><span class=\"toc-item-num\">5.1&nbsp;&nbsp;</span>Host Logs</a></span></li><li><span><a href=\"#Network-Device-Logs\" data-toc-modified-id=\"Network-Device-Logs-5.2\"><span class=\"toc-item-num\">5.2&nbsp;&nbsp;</span>Network Device Logs</a></span></li><li><span><a href=\"#DNS-Logs\" data-toc-modified-id=\"DNS-Logs-5.3\"><span class=\"toc-item-num\">5.3&nbsp;&nbsp;</span>DNS Logs</a></span></li><li><span><a href=\"#Flow-Logs\" data-toc-modified-id=\"Flow-Logs-5.4\"><span class=\"toc-item-num\">5.4&nbsp;&nbsp;</span>Flow Logs</a></span></li><li><span><a href=\"#All-Hosts-Observed-Communicating-with-the-Domain-or-URL\" data-toc-modified-id=\"All-Hosts-Observed-Communicating-with-the-Domain-or-URL-5.5\"><span class=\"toc-item-num\">5.5&nbsp;&nbsp;</span>All Hosts Observed Communicating with the Domain or URL</a></span></li></ul></li><li><span><a href=\"#Summary-of-Findings\" data-toc-modified-id=\"Summary-of-Findings-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>Summary of Findings</a></span></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hunting Hypothesis: \n",
    "Our broad initial hunting hypothesis is that a particular Linux host in our environment\n",
    "has been compromised, we will need to hunt from a range of different positions to\n",
    "validate or disprove this hypothesis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "### Notebook initialization\n",
    "The next cell:\n",
    "- Checks for the correct Python version\n",
    "- Checks versions and optionally installs required packages\n",
    "- Imports the required packages into the notebook\n",
    "- Sets a number of configuration options.\n",
    "\n",
    "This should complete without errors. If you encounter errors or warnings look at the following two notebooks:\n",
    "- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)\n",
    "- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n",
    "\n",
    "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n",
    "- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)\n",
    "- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)\n",
    "\n",
    "You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. \n",
    "There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:\n",
    "- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)\n",
    "- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:19:52.320806Z",
     "start_time": "2020-05-15T23:19:48.201597Z"
    }
   },
   "outputs": [],
   "source": [
    "from pathlib import Path\r\n",
    "from IPython.display import display, HTML, Image\r\n",
    "\r\n",
    "REQ_PYTHON_VER=(3, 6)\r\n",
    "REQ_MSTICPY_VER=(1, 0, 0)\r\n",
    "\r\n",
    "update_nbcheck = (\r\n",
    "    \"<p style='color: orange; text-align=left'>\"\r\n",
    "    \"<b>Warning: we needed to update '<i>utils/nb_check.py</i>'</b><br>\"\r\n",
    "    \"Please restart the kernel and re-run this cell.\"\r\n",
    "    \"</p>\"\r\n",
    ")\r\n",
    "\r\n",
    "display(HTML(\"<h3>Starting Notebook setup...</h3>\"))\r\n",
    "if Path(\"./utils/nb_check.py\").is_file():\r\n",
    "    try:\r\n",
    "        from utils.nb_check import check_versions\r\n",
    "    except ImportError as err:\r\n",
    "        %xmode Minimal\r\n",
    "        !curl https://raw.githubusercontent.com/Azure/Azure-Sentinel-Notebooks/master/utils/nb_check.py > ./utils/nb_check.py 2>/dev/null\r\n",
    "        display(HTML(update_nbcheck))\r\n",
    "    if \"check_versions\" not in globals():\r\n",
    "        raise ImportError(\"Old version of nb_check.py detected - see instructions below.\")\r\n",
    "    %xmode Verbose\r\n",
    "    check_versions(REQ_PYTHON_VER, REQ_MSTICPY_VER)\r\n",
    "\r\n",
    "# If not using Azure Notebooks, install msticpy with\r\n",
    "# !pip install msticpy\r\n",
    "\r\n",
    "from msticpy.nbtools import nbinit\r\n",
    "extra_imports = [\r\n",
    "    \"msticpy.nbtools, observationlist\",\r\n",
    "    \"msticpy.sectools, domain_utils\",\r\n",
    "    \"pyvis.network, Network\",\r\n",
    "]\r\n",
    "nbinit.init_notebook(\r\n",
    "    namespace=globals(),\r\n",
    "    additional_packages=[\"pyvis\", \"python-whois\"],\r\n",
    "    extra_imports=extra_imports,\r\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-25T20:20:04.563899Z",
     "start_time": "2019-09-25T20:20:04.507874Z"
    }
   },
   "source": [
    "### Get WorkspaceId and Authenticate to Log Analytics\n",
    "<details>\n",
    "    <summary> <u>Details...</u></summary>\n",
    "If you are using user/device authentication, run the following cell. \n",
    "- Click the 'Copy code to clipboard and authenticate' button.\n",
    "- This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard. \n",
    "- Select the text box and paste (Ctrl-V/Cmd-V) the copied value. \n",
    "- You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.\n",
    "\n",
    "Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:\n",
    "```\n",
    "%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)\n",
    "```\n",
    "instead of\n",
    "```\n",
    "%kql loganalytics://code().workspace(WORKSPACE_ID)\n",
    "```\n",
    "\n",
    "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.<br>\n",
    "On successful authentication you should see a ```popup schema``` button.\n",
    "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:20:07.182177Z",
     "start_time": "2020-05-15T23:20:07.161178Z"
    }
   },
   "outputs": [],
   "source": [
    "# See if we have an Azure Sentinel Workspace defined in our config file.\n",
    "# If not, let the user specify Workspace and Tenant IDs\n",
    "\n",
    "ws_config = WorkspaceConfig()\n",
    "if not ws_config.config_loaded:\n",
    "    ws_config.prompt_for_ws()\n",
    "    \n",
    "qry_prov = QueryProvider(data_environment=\"AzureSentinel\")\n",
    "print(\"done\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:20:42.295726Z",
     "start_time": "2020-05-15T23:20:10.940842Z"
    }
   },
   "outputs": [],
   "source": [
    "# Authenticate to Azure Sentinel workspace\n",
    "qry_prov.connect(ws_config)\n",
    "# Load TI Providers\n",
    "tilookup = TILookup()\n",
    "tilookup.reload_providers()\n",
    "tilookup.provider_status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Authentication and Configuration Problems\n",
    "\n",
    "<br>\n",
    "<details>\n",
    "    <summary>Click for details about configuring your authentication parameters</summary>\n",
    "    \n",
    "The notebook is expecting your Azure Sentinel Tenant ID and Workspace ID to be configured in one of the following places:\n",
    "- `config.json` in the current folder\n",
    "- `msticpyconfig.yaml` in the current folder or location specified by `MSTICPYCONFIG` environment variable.\n",
    "    \n",
    "For help with setting up your `config.json` file (if this hasn't been done automatically) see the [`ConfiguringNotebookEnvironment`](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) notebook in the root folder of your Azure-Sentinel-Notebooks project. This shows you how to obtain your Workspace and Subscription IDs from the Azure Sentinel Portal. You can use the SubscriptionID to find your Tenant ID). To view the current `config.json` run the following in a code cell.\n",
    "\n",
    "```%pfile config.json```\n",
    "\n",
    "For help with setting up your `msticpyconfig.yaml` see the [Setup](#Setup) section at the end of this notebook and the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Select the domain or URL you wish to investigate\n",
    "Enter the domain or URL you wish to investigate. e.g. www.microsoft.com/index.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:20:48.657499Z",
     "start_time": "2020-05-15T23:20:48.634498Z"
    }
   },
   "outputs": [],
   "source": [
    "domain_url = widgets.Text(description='Please enter your the domain or URL to investigate:',\n",
    "                          **WIDGET_DEFAULTS)\n",
    "display(domain_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:22:12.208670Z",
     "start_time": "2020-05-15T23:22:11.064100Z"
    }
   },
   "outputs": [],
   "source": [
    "import tldextract\n",
    "graph_items = []\n",
    "dom_val = domain_utils.DomainValidator()\n",
    "summary = observationlist.Observations()\n",
    "dom_record = None\n",
    "url=domain_url.value.strip().lower()\n",
    "_, domain, tld = tldextract.extract(domain_url.value)\n",
    "domain = domain.lower() + \".\" + tld.lower()\n",
    "if dom_val.validate_tld(domain) is not True:\n",
    "    md(f\"{domain} is not a valid domain name\", \"bold\")\n",
    "\n",
    "if url != domain:\n",
    "    md(f\"<strong>Domain</strong> : {domain}\")\n",
    "    md(f\"<strong>URL</strong> : {url}\")\n",
    "    graph_items.append((domain,url))\n",
    "else:\n",
    "    md(f\"<strong>Domain</strong> : {domain}\")\n",
    "    url = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you are certain the above indicators are malicious and wish to jump straight to investigating thier scope of impact in the environment jump to <a>Related Alerts</a>.\n",
    "\n",
    "## Domain Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Threat Intelligence\n",
    "As a first step we want to establish if this domain or URL is known to to be malicious by our Threat Intelligence providers.\n",
    "\n",
    "#### `msticpyconfig.yaml` configuration File\n",
    "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n",
    "\n",
    "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) and [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file). \n",
    "\n",
    "For Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:22:21.151600Z",
     "start_time": "2020-05-15T23:22:16.479012Z"
    }
   },
   "outputs": [],
   "source": [
    "from msticpy.sectools.tiproviders.ti_provider_base import TISeverity\n",
    "def conv_severity(severity):\n",
    "    try:\n",
    "        if isinstance(severity, TISeverity):\n",
    "            return severity\n",
    "        if isinstance(severity, str):\n",
    "            return TISeverity[severity]\n",
    "        else:\n",
    "            return TISeverity(severity)\n",
    "    except (ValueError, KeyError):\n",
    "        return TISeverity.information\n",
    "\n",
    "def ti_check_sev(severity, threshold):\n",
    "    severity = conv_severity(severity)\n",
    "    threshold = conv_severity(threshold)\n",
    "    return severity.value >= threshold.value\n",
    "\n",
    "domain_ti = tilookup.result_to_df(tilookup.lookup_ioc(observable=domain, ioc_type='dns'))\n",
    "if url is not None:\n",
    "    url_ti = tilookup.result_to_df(tilookup.lookup_ioc(observable=url, ioc_type='url'))\n",
    "    md(f\"Threat Intelligence Results for {url}\", \"bold\")\n",
    "    display(url_ti.T)\n",
    "    summary.add_observation(caption=\"URL TI\", description=f\"Summary of TI for {url}\", data=url_ti)\n",
    "    graph_items += [((url,provider)) for provider in url_ti.index\n",
    "                    if ti_check_sev(url_ti.loc[provider]['Severity'], 1)] \n",
    "md(f\"Threat Intelligence Results for {domain}\", \"bold\")\n",
    "display(domain_ti.T)\n",
    "summary.add_observation(caption=\"Domain TI\", description=f\"Summary of TI for {domain}\", data=domain_ti)\n",
    "graph_items += [((domain,provider)) for provider in domain_ti.index \n",
    "                if ti_check_sev(domain_ti.loc[provider]['Severity'],1)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Domain analysis\n",
    "To build up a fuller picture of the domain we can use whois, and other data sources to gather pertinent data. Indicators such as registration data, domain entropy, and registration details can provide indicators that a domain is not legitimate in nature.\n",
    "\n",
    "This cell uses the Open Page Rank API (https://www.domcop.com/openpagerank/) - in order to use this you need to add your API key to your `msticpyconfig.yaml` configuration file (as you did for other TI providers). \n",
    "\n",
    "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) and [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file). \n",
    "\n",
    "For Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:22:33.854606Z",
     "start_time": "2020-05-15T23:22:33.439867Z"
    }
   },
   "outputs": [],
   "source": [
    "from whois import whois\r\n",
    "from collections import Counter\r\n",
    "def Entropy(data):\r\n",
    "    s, lens = Counter(data), np.float(len(data))\r\n",
    "    return -sum(count/lens * np.log2(count/lens) for count in s.values())\r\n",
    "#Get a whois record for our domain\r\n",
    "wis = whois(domain)\r\n",
    "\r\n",
    "if wis.domain_name is not None:\r\n",
    "    # Create domain record from whois data\r\n",
    "    dom_record = pd.DataFrame({\"Domain\":[domain],\r\n",
    "                                   \"Name\":[wis['name']],\r\n",
    "                                   \"Org\":[wis['org']],\r\n",
    "                                   \"DNSSec\":[wis['dnssec']],\r\n",
    "                                   \"City\":[wis['city']],\r\n",
    "                                   \"State\":[wis['state']],\r\n",
    "                                   \"Country\":[wis['country']],\r\n",
    "                                   \"Registrar\": [wis['registrar']],\r\n",
    "                                   \"Status\": [wis['status']],\r\n",
    "                                   \"Created\":[wis['creation_date']],\r\n",
    "                                   \"Expiration\" : [wis['expiration_date']],\r\n",
    "                                   \"Last Updated\" : [wis['updated_date']],\r\n",
    "                                   \"Name Servers\": [wis['name_servers']]})\r\n",
    "    ns_domains = []\r\n",
    "    \r\n",
    "    # Remove duplicate Name Server records\r\n",
    "    for server in wis['name_servers']:\r\n",
    "        ns_sub_d, ns_domain, ns_tld = tldextract.extract(server)\r\n",
    "        ns_dom = ns_domain.lower() + \".\" + ns_tld.lower()\r\n",
    "        if domain not in ns_domains:\r\n",
    "                   ns_domains.append(ns_dom)                                            \r\n",
    "   \r\n",
    "    # Identity domains populatirty with Open Page Rank\r\n",
    "    page_rank = tilookup.result_to_df(tilookup.lookup_ioc(observable=domain, providers=[\"OPR\"]))\r\n",
    "    if page_rank['RawResult'][0]:\r\n",
    "        page_rank_score = page_rank['RawResult'][0]['response'][0]['page_rank_integer']\r\n",
    "    else:\r\n",
    "        page_rank_score = 0\r\n",
    "    dom_record[\"Page Rank\"] = [page_rank_score]\r\n",
    "   \r\n",
    "    # Get a list of subdomains for the domain\r\n",
    "    url_ti = tilookup.result_to_df(tilookup.lookup_ioc(observable=domain, providers=[\"VirusTotal\"]))\r\n",
    "    if url_ti['RawResult'][0]:\r\n",
    "        sub_doms = url_ti['RawResult'][0]['subdomains']\r\n",
    "    else:\r\n",
    "        sub_doms = 0\r\n",
    "    graph_items.append((domain, \"Sub Domains\"))\r\n",
    "    graph_items += [(sub,\"Sub Domains\") for sub in sub_doms]\r\n",
    "    dom_record['Sub Domains'] = [sub_doms]\r\n",
    "    \r\n",
    "    # Work out domain entropy to identity possible DGA\r\n",
    "    dom_ent = Entropy(domain)\r\n",
    "    dom_record['Domains Entropy'] = [dom_ent]\r\n",
    "    \r\n",
    "    # Add elements to graph for later plotting\r\n",
    "    if isinstance(dom_record['Created'],list):                                                        \r\n",
    "        graph_items.append((domain,dom_record['Created'][0][0]))\r\n",
    "    else:\r\n",
    "        graph_items.append((domain,dom_record['Created'][0]))\r\n",
    "    graph_items.append((domain, \"Name Servers\"))\r\n",
    "    graph_items += [((\"Name Servers\", ns)) for ns in dom_record['Name Servers'][0]]\r\n",
    "    graph_items += [(domain,dom_record['Registrar'][0]), (domain,dom_record['Country'][0]),(domain,f\"Page Rank : {dom_record['Page Rank'][0]}\")]\r\n",
    "    \r\n",
    "    #Highlight domains with low PageRank score or if thier entropy is more than 2 standard deviations from the average for the top 1 million domains\r\n",
    "    def color_cells(val):\r\n",
    "        if isinstance(val, int):\r\n",
    "            color = 'yellow' if val < 3 else 'white'\r\n",
    "        elif isinstance(val, float):\r\n",
    "            color = 'yellow' if val > 4.30891 or val < 2.72120  else 'white'\r\n",
    "        else:\r\n",
    "            color = 'white'\r\n",
    "        return 'background-color: %s' % color\r\n",
    "    \r\n",
    "    # Display whois details and highlight interesting values\r\n",
    "    display(dom_record.T.style.applymap(color_cells, subset=pd.IndexSlice[['Page Rank', 'Domains Entropy'],0]))\r\n",
    "    summary.add_observation(caption=\"Domain Summary\", description=f\"Summary of public domain records for {domain}\", data=dom_record)\r\n",
    "    md(\"If Page Rank or Domain Entropy are highlighted this indicates that their values are outside the expected values of a legitimate website\")\r\n",
    "    md(f\"The average entropy for the 1M most popular domains is 3.2675\")\r\n",
    "\r\n",
    "else:\r\n",
    "    # If there is no whois data see what we can use from TI\r\n",
    "    url_ti = tilookup.result_to_df(tilookup.lookup_ioc(observable=domain, providers=[\"VirusTotal\"]))\r\n",
    "    md(f\"No current whois record exists for {domain} below are historical records\")\r\n",
    "    print(url_ti['RawResult'][0]['whois'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-26T23:52:25.059639Z",
     "start_time": "2019-09-26T23:52:25.056638Z"
    }
   },
   "source": [
    "### TLS Cert Details\n",
    "Does the domain have an associated tls certificate and if so is that certificate in the malicious certs list held by abuse.ch?\n",
    "Details such as the certificate's subject and issuer can also provide indicators as to the domains nature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:22:47.645866Z",
     "start_time": "2020-05-15T23:22:47.154488Z"
    }
   },
   "outputs": [],
   "source": [
    "if url is not None:\n",
    "    scope = url\n",
    "else:\n",
    "    scope = domain\n",
    "\n",
    "# See if TLS cert is in abuse.ch malicious certs list and get cert details\n",
    "result, x509 = dom_val.in_abuse_list(scope)\n",
    "\n",
    "if x509 is not None:\n",
    "    cert_df = pd.DataFrame({\"SN\" :[x509.serial_number],\n",
    "                            \"Subject\":[[(i.value) for i in x509.subject]],\n",
    "                            \"Issuer\": [[(i.value) for i in x509.issuer]],\n",
    "                            \"Expired\": [x509.not_valid_after],\n",
    "                            \"InAbuseList\": result})\n",
    "\n",
    "    display(cert_df.T)\n",
    "    summary.add_observation(caption=\"TLS Summary\", description=f\"Summary of TLS certificate for {domain}\", data=cert_df)\n",
    "    md(\"If 'InAbuseList' is True this shows that the SSL certificate fingerprint appeared in the abuse.ch list\")\n",
    "    graph_items.append((domain,result))\n",
    "\n",
    "else:\n",
    "    md(\"No TLS certificate was found in abuse.ch lists.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reverse DNS details\n",
    "What IP address is assocatiated with this domain, what do we know about that IP?\n",
    "What other domains have been associated with this IP, and is it a known ToR exit node?\n",
    "\n",
    "In order to use this ToR lookup functionality of MSTICpy you need to configure it as a provider in your `msticpyconfig.yaml` configuration file. No API key is required to use this functionality.  \n",
    "\n",
    "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) and [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file). \n",
    "\n",
    "For Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:23:01.169948Z",
     "start_time": "2020-05-15T23:22:50.884598Z"
    }
   },
   "outputs": [],
   "source": [
    "import dns.resolver\n",
    "from dns.resolver import NXDOMAIN\n",
    "from ipwhois import IPWhois\n",
    "primary_providers = [prov[0] for prov in tilookup._providers.items()]\n",
    "\n",
    "if \"VirusTotal\" in tilookup.loaded_providers and \"VirusTotal\" not in primary_providers:\n",
    "    primary_providers.append(\"VirusTotal\")\n",
    "\n",
    "if dom_val.is_resolvable(domain) is True:\n",
    "    try:\n",
    "        answer = dns.resolver.query(domain, 'A')\n",
    "    except NXDOMAIN:\n",
    "        raise ValueError(\"Could not resolve IP addresses from domain.\")\n",
    "    x = answer[0].to_text()\n",
    "    whois = IPWhois(x)\n",
    "    ipwis = whois.lookup_whois()\n",
    "    ip_rec = pd.DataFrame({\"IP Address\": [x],\n",
    "                           \"ASN\" : [ipwis['asn']],\n",
    "                         \"ASN Owner\": [ipwis['asn_description']],\n",
    "                          \"Country\" : [ipwis['asn_country_code']],\n",
    "                          \"Date\": [ipwis['asn_date']]})\n",
    "    ip_addresses = ip_rec['IP Address'].to_list()\n",
    "    graph_items += [\n",
    "        (ip_rec[\"IP Address\"][0],domain),\n",
    "        (ip_rec[\"IP Address\"][0],ip_rec[\"ASN\"][0]),\n",
    "        (ip_rec[\"ASN Owner\"][0],ip_rec[\"ASN\"][0]),\n",
    "        (ip_rec[\"Country\"][0],ip_rec[\"ASN\"][0])\n",
    "    ]\n",
    "    \n",
    "    tor = None\n",
    "    if \"Tor\" in tilookup.loaded_providers:\n",
    "        tor = tilookup.result_to_df(tilookup.lookup_ioc(observable=ip_rec['IP Address'][0], providers=[\"Tor\"]))\n",
    "    if tor is None or tor['Details'][0] == \"Not found.\":\n",
    "        ip_rec['Tor Node?'] = \"No\"\n",
    "    else:\n",
    "        ip_rec['Tor Node?'] = \"Yes\"\n",
    "        graph_items.append((ip_rec[\"IP Address\"][0],\"Tor Node\"))\n",
    "    ip_ti = tilookup.result_to_df(tilookup.lookup_ioc(observable=ip_rec['IP Address'][0], providers=primary_providers))\n",
    "    last_10 = []\n",
    "    if \"VirusTotal\" in tilookup.loaded_providers:\n",
    "        last_10 = ip_ti.T['VirusTotal']['RawResult'][\"resolutions\"][0:10]\n",
    "    prev_domains = []\n",
    "    for record in last_10:\n",
    "        prev_domains.append(record['hostname'])\n",
    "        graph_items.append((record['hostname'],ip_rec[\"IP Address\"][0]))   \n",
    "    ip_rec[\"Last 10 resolutions\"] = [prev_domains]\n",
    "    display(ip_rec.T)\n",
    "    summary.add_observation(caption=\"IP Summary\", description=f\"Summary of IP assocaiated with {domain}\", data=ip_rec)\n",
    "else:\n",
    "    ip_ti = tilookup.result_to_df(tilookup.lookup_ioc(observable=answer[0].to_text()))\n",
    "    print(ip_ti.T['VirusTotal']['RawResult'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-27T22:21:13.478223Z",
     "start_time": "2019-09-27T22:21:13.475222Z"
    }
   },
   "source": [
    "### Site Screenshot\n",
    "Using https://browshot.com/ return a screenshot of the domain or url being investigated. This can help us identify if the site is a phishing portal.\n",
    "\n",
    "As with other external providers you need an API key to use the BrowShot service, and have the provider configured in your `msticpyconfig.yaml` file.  \n",
    "\n",
    "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) and [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file). \n",
    "\n",
    "For Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:23:01.580946Z",
     "start_time": "2020-05-15T23:23:01.409952Z"
    }
   },
   "outputs": [],
   "source": [
    "if url is not None:\n",
    "    image_data = domain_utils.screenshot(url)\n",
    "else:\n",
    "    image_data = domain_utils.screenshot(domain)\n",
    "    \n",
    "with open('screenshot.png', 'wb') as f:\n",
    "        f.write(image_data.content)\n",
    "\n",
    "display(Image(filename='screenshot.png'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Domain Summary\n",
    "In order to effectively evaluate the data collected above we will graph the elements to help highlight connections."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:23:03.815721Z",
     "start_time": "2020-05-15T23:23:03.810722Z"
    }
   },
   "outputs": [],
   "source": [
    "# Create graph from items saved to graph_items\n",
    "import networkx as nx\n",
    "import matplotlib.pyplot as plt\n",
    "G=nx.Graph()\n",
    "for item in graph_items:\n",
    "    G.add_edge(item[0],str(item[1]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:23:07.014366Z",
     "start_time": "2020-05-15T23:23:06.689367Z"
    }
   },
   "outputs": [],
   "source": [
    "# Plot Graph with pyvis\r\n",
    "net=Network(height=900, width=900, notebook=True)\r\n",
    "net.barnes_hut()\r\n",
    "net.from_nx(G)\r\n",
    "net.set_options(\"\"\"\r\n",
    "var options = {\"nodes\": {\"color\": {\"highlight\": {\"border\": \"rgba(233,77,49,1)\"},\"hover\": {\"border\": \"rgba(233,77,49,1)\"}},\r\n",
    "    \"scaling\": {\"min\": 1},\"size\": 7},\r\n",
    "    \"edges\": {\"color\": {\"inherit\": true}, \"smooth\": false},\r\n",
    "    \"interaction\": {\"hover\": true,\"multiselect\": true},\r\n",
    "    \"manipulation\": {\"enabled\": true},\r\n",
    "    \"physics\": {\"enabled\": false,\"barnesHut\": {\"gravitationalConstant\": -80000,\"springLength\": 250,\"springConstant\": 0.001},\"minVelocity\": 0.75}\r\n",
    "}\"\"\")\r\n",
    "net.show(\"graph.html\")\r\n",
    "# If the intereactive graph does not display correcrtly uncomment the three lines below to access display a non-interactive version\r\n",
    "import matplotlib.pyplot as plt\r\n",
    "plt.figure(3,figsize=(12,12))\r\n",
    "nx.draw(G, with_labels=True, font_weight='bold')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-09-27T00:28:22.232839Z",
     "start_time": "2019-09-27T00:28:22.229839Z"
    }
   },
   "source": [
    "# Domain/URL in the Environment\n",
    "Once we have determined the nature of the domain or URL under investigation we want to see what the scope of impact is in our environment but identifying any presence of the domain or URL in our datasets.\n",
    "If the domain has a high page rank score it is likely that it will be highly prevalent in a large environment, therefore you may wish to consider whether or not to run these cells for such a domain due to the data volumes involved."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:23:46.219660Z",
     "start_time": "2020-05-15T23:23:46.205659Z"
    }
   },
   "outputs": [],
   "source": [
    "if dom_record is None or int(dom_record[\"Page Rank\"]) < 6:\n",
    "    warning = None\n",
    "    md(f\"The Page Rank score for {domain} is low, querying for this domain should not present issues.\")\n",
    "else:\n",
    "    md_warn(f\"{domain} has a high Page Rank score, it is likely to be highly prevalent in the environment.\")\n",
    "    md(\"Please confirm below that you wish to proceed, note that some queries are likely to be slow due to large amounts of data\", \"bold\")\n",
    "    warning = widgets.Checkbox(\n",
    "        value=False,\n",
    "        description='Are you sure?',\n",
    "        disabled=False\n",
    "    )\n",
    "    display(warning)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:23:56.721580Z",
     "start_time": "2020-05-15T23:23:56.703582Z"
    }
   },
   "outputs": [],
   "source": [
    "# Establish if we want to investigate just the URL or domain and URL\n",
    "if warning is not None and warning.value == False:\n",
    "    md_warn(\"Please check the box above to confirm you wish to proceed\")\n",
    "else:\n",
    "    if url is not None:\n",
    "        md(\"Do you wish to search on the URL alone or URL and Domain? For mallicious URLs on known good domains you may wish to only search on the URL to get more granular results.\")\n",
    "        scope_selection = widgets.RadioButtons(\n",
    "            options=['URL Only', 'URL and Domain'],\n",
    "            disabled=False\n",
    "        )\n",
    "        display(scope_selection)\n",
    "    else:\n",
    "        scope_selection = None\n",
    "        md(f\"Searching data for {domain}\")\n",
    "        \n",
    "host_list = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:23:58.836138Z",
     "start_time": "2020-05-15T23:23:58.779137Z"
    }
   },
   "outputs": [],
   "source": [
    "# Set a time scope for our investigation\n",
    "if scope_selection is not None:\n",
    "    if scope_selection.value == \"URL Only\":\n",
    "        scope = url\n",
    "    else:\n",
    "        scope = f\"{domain}|{url}\"\n",
    "else:\n",
    "    scope = domain\n",
    "\n",
    "query_times = nbwidgets.QueryTime(units='day',\n",
    "                                      max_before=20, max_after=1, before=3)\n",
    "query_times.display()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Related Alerts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:24:32.411730Z",
     "start_time": "2020-05-15T23:24:30.075322Z"
    }
   },
   "outputs": [],
   "source": [
    "#Get any alerts associated with the domain or URL\n",
    "alerts = qry_prov.SecurityAlert.list_alerts(\n",
    "    query_times)\n",
    "if isinstance(alerts, pd.DataFrame) and not alerts.empty:\n",
    "    related_alerts = alerts[alerts[\"Entities\"].str.contains(scope)]\n",
    "else:\n",
    "    alerts = None\n",
    "    display(HTML(\"No alerts found\"))\n",
    "\n",
    "\n",
    "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n",
    "    related_alerts_items = (related_alerts[['AlertName', 'TimeGenerated']]\n",
    "                        .groupby('AlertName').TimeGenerated.agg('count').to_dict())\n",
    "\n",
    "    def print_related_alerts(alertDict, entityType, entityName):\n",
    "        if len(alertDict) > 0:\n",
    "            display(Markdown(\n",
    "                f\"### Found {len(alertDict)} different alert types related to this {entityType} (\\'{entityName}\\')\"))\n",
    "            for (k, v) in alertDict.items():\n",
    "                display(Markdown(f\"- {k}, Count of alerts: {v}\"))\n",
    "        else:\n",
    "            display(\n",
    "                Markdown(f\"No alerts for {entityType} entity \\'{entityName}\\'\"))\n",
    "\n",
    "\n",
    "# Display alerts on timeline to aid in visual grouping\n",
    "    print_related_alerts(related_alerts_items, 'domain', domain)\n",
    "    nbdisplay.display_timeline(\n",
    "        data=related_alerts, source_columns=[\"AlertName\"], title=\"Host alerts over time\", height=300, color=\"red\")\n",
    "    score = len(related_alerts.index)/2\n",
    "    summary.add_observation(caption=\"Alerts\", description=f\"Alerts linked to {scope}\", data=related_alerts, score=score)\n",
    "else:\n",
    "    md(\"No related alerts found.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:24:46.090216Z",
     "start_time": "2020-05-15T23:24:46.038215Z"
    }
   },
   "outputs": [],
   "source": [
    "rel_alert_select = None\n",
    "\n",
    "def show_full_alert(selected_alert):\n",
    "    global security_alert, alert_ip_entities\n",
    "    security_alert = SecurityAlert(\n",
    "        rel_alert_select.selected_alert)\n",
    "    nbdisplay.display_alert(security_alert, show_entities=True)\n",
    "\n",
    "# Show selected alert when selected\n",
    "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n",
    "    display(Markdown('### Click on alert to view details.'))\n",
    "    rel_alert_select = nbwidgets.SelectAlert(alerts=related_alerts,\n",
    "                                               action=show_full_alert)\n",
    "    rel_alert_select.display()\n",
    "else:\n",
    "    md('No related alerts found.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Domain or URL in Logs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Host Logs\n",
    "Hosts that have communicated with the domain or URL under investigation may have indicators of this activity in thier logs, especially if the domain or URL was referenced in a command line argument. The context that the domain or URL is observed in may provide some indication of what the activity was."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:24:57.201789Z",
     "start_time": "2020-05-15T23:24:54.612261Z"
    }
   },
   "outputs": [],
   "source": [
    "host_log_query = f\"\"\"\n",
    " Syslog \n",
    " | where TimeGenerated >= datetime({query_times.start}) \n",
    " | where TimeGenerated <= datetime({query_times.end})\n",
    " | where SyslogMessage matches regex \"{scope}\"\n",
    " | union isfuzzy = true (\n",
    " SecurityEvent\n",
    " | where TimeGenerated >= datetime({query_times.start}) \n",
    " | where TimeGenerated <= datetime({query_times.end})\n",
    " | where CommandLine matches regex \"{scope}\")\n",
    "\"\"\"\n",
    "# Identify any hosts with logs relating to this URL or domain and provide a summary of those hosts\n",
    "host_logs_df = qry_prov.exec_query(host_log_query)\n",
    "if not host_logs_df.empty:\n",
    "    md(f\"Summary of logs containing {scope} by host:\", \"bold\")\n",
    "    host_log_sum = pd.DataFrame({'Log Count' : host_logs_df.groupby(['Computer']).count()['TimeGenerated']}).reset_index()\n",
    "    display(host_log_sum.style.hide_index())\n",
    "    #Add details to a summary for later use\n",
    "    summary.add_observation(caption=\"Host Log Summary\", description=f\"Summary of logs containing {scope} by host\", data=host_log_sum)\n",
    "    ioc_extractor = iocextract.IoCExtract()\n",
    "    print('Extracting IPs, Domains and URLs from logs.......')\n",
    "    ioc_df = ioc_extractor.extract(data=host_logs_df,\n",
    "                                    columns=['SyslogMessage', 'CommandLine'],\n",
    "                                    os_family='Linux',\n",
    "                                    ioc_types=['ipv4', 'ipv6', 'dns', 'url'])\n",
    "    md(\"Network artifacts found in logs:\", \"bold\")\n",
    "    display(ioc_df.drop('SourceIndex', axis=1).style.hide_index())\n",
    "    # Collect a list of ip addresses associated with the domain or url\n",
    "    ip_addresses += [(ip) for ip in ioc_df[ioc_df['IoCType'] == \"ipv4\"]['Observable'] if ip not in ip_addresses]\n",
    "\n",
    "else:\n",
    "    md(f\"No host logs found containing {domain} or {url}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:00.537657Z",
     "start_time": "2020-05-15T23:25:00.454659Z"
    }
   },
   "outputs": [],
   "source": [
    "#Display the logs associated with the domain or URL for each host\n",
    "def view_logs(host):\n",
    "    display(host_logs_df.query('Computer == @host'))\n",
    "\n",
    "if not host_logs_df.empty:\n",
    "    items = host_log_sum['Computer'].dropna().unique().tolist()\n",
    "    host_list = items\n",
    "    md(f\"<h3>View all host logs that contains {scope}</h3>\")\n",
    "    log_view = widgets.Dropdown(\n",
    "        options=items, description='Select Computer to view raw logs', disabled=False, **WIDGET_DEFAULTS)\n",
    "    display(widgets.interactive(view_logs, host=log_view))\n",
    "else:\n",
    "    md(f\"No host logs found containing {domain} or {url}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Network Device Logs\n",
    "Often network devices will logs connection activity that can help identity which hosts have communicated with a given domain or URL, and may provide additional detail as to the nature of this communication."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:11.257722Z",
     "start_time": "2020-05-15T23:25:09.682424Z"
    }
   },
   "outputs": [],
   "source": [
    "net_query = f\"\"\"\n",
    "    CommonSecurityLog\n",
    "    | where TimeGenerated > datetime({query_times.start})\n",
    "    | where TimeGenerated < datetime({query_times.end})\n",
    "    | where RequestURL contains \"{scope}\" or AdditionalExtensions contains \"{scope}\"\n",
    "    \"\"\"\n",
    "\n",
    "net_logs_df = qry_prov.exec_query(net_query)\n",
    "# Search for indicators of network device logs containing the domain or URL. If any area summarize this data and add indicators to lists.\n",
    "if not net_logs_df.empty:\n",
    "    md(f\"Count of network connections to {scope} by hosts:\")\n",
    "    host_count = pd.DataFrame({'Connection Count' : net_logs_df.groupby(['SourceIP','DestinationIP','DestinationPort', 'RequestURL']).count()['TimeGenerated']}).reset_index()\n",
    "    display(host_count.style.hide_index())\n",
    "    summary.add_observation(caption=\"Network Log Summary\", description=f\"Summary of network connections to {scope} by host\", data=host_count)\n",
    "    ip.addresses += [(ip) for ip in host_count['DestinationIP'] if ip not in ip_addresses]\n",
    "else:\n",
    "    md(f\"No network device logs found containing {scope}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:12.535230Z",
     "start_time": "2020-05-15T23:25:12.514236Z"
    }
   },
   "outputs": [],
   "source": [
    "def view_net_logs(host):\n",
    "    display(net_logs_df.query('SourceIP == @host'))\n",
    "\n",
    "if not net_logs_df.empty:\n",
    "    # Display logs from any network devices that contain the domain or URL\n",
    "    items = net_logs_df['SourceIP'].dropna().unique().tolist()\n",
    "    host_list += items\n",
    "    md(f\"<h3>View all host logs that contains {scope}</h3>\")\n",
    "    net_log_view = widgets.Dropdown(\n",
    "        options=items, description='Select IP to view raw logs', disabled=False, **WIDGET_DEFAULTS)\n",
    "    display(widgets.interactive(view_net_logs, host=net_log_view))\n",
    "else:\n",
    "    md(f\"No network device logs found containing {scope}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DNS Logs\n",
    "A host communicating with a domain is going to need to resolve that domain first, this can provide us details of other IP addresses associated with the domain. In addition the type of requests made can help us identify activity such as data exfiltration via DNS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:16.739029Z",
     "start_time": "2020-05-15T23:25:15.129254Z"
    }
   },
   "outputs": [],
   "source": [
    "if \"DnsEvents\" in qry_prov.schema:\n",
    "    dns_query = f\"\"\"\n",
    "        DnsEvents\n",
    "        | where TimeGenerated > datetime({query_times.start})\n",
    "        | where TimeGenerated < datetime({query_times.end})\n",
    "        | where SubType == \"LookupQuery\"\n",
    "        | where tolower(Name) contains \"{scope}\"\n",
    "        | where isnotempty(IPAddresses)\n",
    "        \"\"\"\n",
    "    # Seach DNS logs for resolutions of the domain\n",
    "    dns_logs_df = qry_prov.exec_query(dns_query)\n",
    "    if not dns_logs_df.empty:\n",
    "        ip_addr = dns_logs_df[dns_logs_df['TimeGenerated'] == dns_logs_df['TimeGenerated'].max()]['IPAddresses'].replace(\"\", np.nan).dropna().to_list()\n",
    "        new_ips = len(ip_addresses)\n",
    "        # Identity any DNS responses for the domain that contain IP addresses not previously identified\n",
    "        ip_addresses += [(ip) for ip in ip_addr if ip not in ip_addresses]\n",
    "        if len(ip_addresses) > new_ips:\n",
    "            md(f\"New IP Addresses found for {domain}: \")\n",
    "            print(ip_addresses[(new_ips-1):])\n",
    "        host_list += dns_logs_df['ClientIP'].unique().tolist()\n",
    "        host_count = dns_logs_df.groupby('ClientIP').count()['Name']\n",
    "        host_resolutions = pd.DataFrame({\"Count of DNS Lookups\": dns_logs_df.groupby('ClientIP').count()['Name']}).reset_index()\n",
    "        md(f\"Count of resolutions for {domain} by host:\")\n",
    "        display(host_resolutions.style.hide_index())\n",
    "        summary.add_observation(caption=\"DNS Log Summary\", description=f\"Summary of DNS resolutions of {scope} by host\", data=host_resolutions)\n",
    "    else:\n",
    "        md(f\"No DNS device logs found containing {scope}\")\n",
    "else:\n",
    "    dns_logs_df = None\n",
    "    md(\"No DNS events avaliable in workspace\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:17.782870Z",
     "start_time": "2020-05-15T23:25:17.763870Z"
    }
   },
   "outputs": [],
   "source": [
    "# Check DNS logs for indicators of data exfiltration or tunnelling via DNS\r\n",
    "if dns_logs_df is not None:\r\n",
    "    import msticpy.sectools.base64unpack as b64\r\n",
    "    lookups = dns_logs_df['Name'].dropna().unique().tolist()\r\n",
    "    potential_tunnels = []\r\n",
    "    for lookup in lookups:\r\n",
    "        if len(lookup) > 250:\r\n",
    "            print(f\"Suspicious domain length {lookup}\")\r\n",
    "        sub_d, _, _ = tldextract.extract(lookup)\r\n",
    "        req = sub_d.replace(\".\",\"\")\r\n",
    "        score = Entropy(req)\r\n",
    "        if score > (3.2675 + 0.5) or score < (3.2675 - 0.5):\r\n",
    "            potential_tunnels.append(lookup)\r\n",
    "        base64 = b64.unpack(req)\r\n",
    "        if not base64[1].empty:\r\n",
    "            potential_tunnels.append(lookup)\r\n",
    "    suspicious_queries = dns_logs_df[dns_logs_df['Name'].isin(potential_tunnels)]\r\n",
    "    if suspicious_queries.empty:\r\n",
    "        md(f\"No DNS lookups found for {domain}\")\r\n",
    "        suspect_tunnels = None\r\n",
    "    else:\r\n",
    "        md(\"Potential DNS Tunnelling:\")\r\n",
    "        suspect_tunnels = pd.DataFrame({\"Count of DNS Lookups\": suspicious_queries.groupby(['Name','ClientIP']).count()['TimeGenerated']})\r\n",
    "        display(suspect_tunnels.reset_index().style.hide_index())\r\n",
    "    summary.add_observation(caption=\"DNS Tunnelling\", description=f\"Potential DNS Tunnelling\", data=suspect_tunnels)\r\n",
    "else:\r\n",
    "    md(\"No DNS events avaliable in workspace\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Flow Logs\n",
    "In Microsoft Azure network flow logs can help identify hosts connecting to the domain or URL as well as provide some context as to the nature of these connections."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:38.849288Z",
     "start_time": "2020-05-15T23:25:35.444591Z"
    }
   },
   "outputs": [],
   "source": [
    "# Check Azure flow logs for any connections to the domain or URL.\n",
    "if 'AzureNetworkAnalytics_CL' not in qry_prov.schema:\n",
    "    az_net_comms_df = None\n",
    "    md('No Azure network data avaliable in this workspace.')\n",
    "else:\n",
    "    az_net_comms_df = qry_prov.Network.list_azure_network_flows_by_ip(query_times, ip_address_list=ip_addresses)\n",
    "    if isinstance(az_net_comms_df, pd.DataFrame) and not az_net_comms_df.empty:\n",
    "        az_net_comms_df.head()\n",
    "        az_net_comms_df['TotalAllowedFlows'] = az_net_comms_df['AllowedOutFlows'] + az_net_comms_df['AllowedInFlows']\n",
    "        nbdisplay.display_timeline(\n",
    "            data=az_net_comms_df,\n",
    "            group_by=\"L7Protocol\",\n",
    "            title=\"Network Flows by Protocol\",\n",
    "            time_column=\"FlowStartTime\",\n",
    "            source_columns=[\"FlowType\", \"AllExtIPs\", \"L7Protocol\", \"FlowDirection\"],\n",
    "            height=300,\n",
    "            legend=\"right\",\n",
    "            yaxis=True\n",
    "        )\n",
    "        nbdisplay.display_timeline(\n",
    "            data=az_net_comms_df,\n",
    "            group_by=\"FlowDirection\",\n",
    "            title=\"Network Flows by Direction\",\n",
    "            time_column=\"FlowStartTime\",\n",
    "            source_columns=[\"FlowType\", \"AllExtIPs\", \"L7Protocol\", \"FlowDirection\"],\n",
    "            height=300,\n",
    "            legend=\"right\",\n",
    "            yaxis=True\n",
    "        )\n",
    "    else:\n",
    "        md(f\"No Azure network data for {domain} in this timerange.\")    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:39.321799Z",
     "start_time": "2020-05-15T23:25:39.308800Z"
    }
   },
   "outputs": [],
   "source": [
    "if az_net_comms_df is not None and not az_net_comms_df.empty:\n",
    "    flow_plot = nbdisplay.display_timeline_values(data=az_net_comms_df,\n",
    "                                      group_by=\"L7Protocol\",\n",
    "                                      source_columns=[\"FlowType\", \n",
    "                                                      \"AllExtIPs\", \n",
    "                                                      \"L7Protocol\", \n",
    "                                                      \"FlowDirection\", \n",
    "                                                      \"TotalAllowedFlows\"],\n",
    "                                      time_column=\"FlowStartTime\",\n",
    "                                      y=\"TotalAllowedFlows\",\n",
    "                                      legend=\"right\",\n",
    "                                      legend_column=\"L7Protocol\", \n",
    "                                      height=500,\n",
    "                                      kind=[\"vbar\", \"circle\"]);\n",
    "else:\n",
    "    md(f\"No Azure network data avaliable.\")  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:40.833504Z",
     "start_time": "2020-05-15T23:25:40.811505Z"
    }
   },
   "outputs": [],
   "source": [
    "if az_net_comms_df is not None and not az_net_comms_df.empty:\n",
    "    cols = [\n",
    "        \"VMName\",\n",
    "        \"VMIPAddress\",\n",
    "        \"PublicIPs\",\n",
    "        \"SrcIP\",\n",
    "        \"DestIP\",\n",
    "        \"L4Protocol\",\n",
    "        \"L7Protocol\",\n",
    "        \"DestPort\",\n",
    "        \"FlowDirection\",\n",
    "        \"AllExtIPs\",\n",
    "        \"TotalAllowedFlows\",\n",
    "    ]\n",
    "    flow_index = az_net_comms_df[cols].copy()\n",
    "\n",
    "    def get_source_ip(row):\n",
    "        if row.FlowDirection == \"O\":\n",
    "            return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n",
    "        else:\n",
    "            return row.AllExtIPs if row.AllExtIPs else row.DestIP\n",
    "\n",
    "    def get_dest_ip(row):\n",
    "        if row.FlowDirection == \"O\":\n",
    "            return row.AllExtIPs if row.AllExtIPs else row.DestIP\n",
    "        else:\n",
    "            return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n",
    "\n",
    "    flow_index[\"source\"] = flow_index.apply(get_source_ip, axis=1)\n",
    "    flow_index[\"dest\"] = flow_index.apply(get_dest_ip, axis=1)\n",
    "    \n",
    "    with warnings.catch_warnings():\n",
    "        warnings.simplefilter(\"ignore\")\n",
    "        display(\n",
    "            flow_index[\n",
    "                [\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\", \"TotalAllowedFlows\"]\n",
    "            ]\n",
    "            .groupby([\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\"])\n",
    "            .sum()\n",
    "            .reset_index()\n",
    "            .style.bar(subset=[\"TotalAllowedFlows\"], color=\"#d65f5f\")\n",
    "         )\n",
    "    summary.add_observation(caption=\"Network Flow Summary\", description=f\"Summary of network flows to and from IPs associated with {scope}\", data=flow_index) \n",
    "\n",
    "else:\n",
    "    flow_index = None\n",
    "    md(f\"No Azure network data avaliable.\")  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:41.940337Z",
     "start_time": "2020-05-15T23:25:41.932336Z"
    }
   },
   "outputs": [],
   "source": [
    "if flow_index is not None and not flow_index.empty:\n",
    "    net_ips = flow_index['source'].dropna().unique().tolist() + flow_index['dest'].dropna().unique().tolist()\n",
    "    md(\"Resolving hostnames please be patient this may take some time\")\n",
    "    ip.addresses = ip_addresses + [(ip) for ip in net_ips if ip not in ip_addresses] \n",
    "    for ip in ip_addresses:\n",
    "        host_res = qry_prov.Network.get_host_for_ip(query_times, ip_address=ip)\n",
    "        host_list.append(host_res['Computer'][0])\n",
    "    md(\"Hosts added to host list\")\n",
    "else:\n",
    "    md(f\"No Azure network data avaliable.\")  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### All Hosts Observed Communicating with the Domain or URL\n",
    "During the cells executed above we have identified hosts communicating with the domain or IP in question. These hosts are potential candidates for further investigation using Azure Sentinel or via other entity explorer Notebook. This cell provides a summary of these hosts and well as details of any alerts we have that are associated with these hosts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:44.248494Z",
     "start_time": "2020-05-15T23:25:44.234496Z"
    }
   },
   "outputs": [],
   "source": [
    "import re\r\n",
    "pattern = re.compile(\"^(?:[0-9]{1,3}\\.){3}[0-9]{1,3}$\")\r\n",
    "# Simplify to list \r\n",
    "host_ip_list = [(host) for host in host_list if pattern.match(host)]  \r\n",
    "\r\n",
    "for ip in host_ip_list:\r\n",
    "    host_list.remove(ip)\r\n",
    "    host_name = qry_prov.Network.get_host_for_ip( query_times, ip_address=ip)\r\n",
    "    if not host_name.empty:\r\n",
    "        host_list.append(host_name['Computer'][0])    \r\n",
    "if alerts is not None:\r\n",
    "    alert_count = [((len(alerts[alerts[\"Entities\"].str.contains(host)].index))) for host in host_list]\r\n",
    "    host_alerts = pd.DataFrame({\"Hosts\":host_list,\r\n",
    "                               \"Count of Host Alerts\": alert_count})\r\n",
    "    if host_alerts.empty:\r\n",
    "        md(f\"No hosts observed having an association with {domain}\")\r\n",
    "    else:\r\n",
    "        summary.add_observation(caption=\"Host Alerts\", description=f\"A list of hosts observed communicating with {scope} and any alerts associated with them\", data=host_alerts) \r\n",
    "        md(f\"\"\"\r\n",
    "        During the investigation the following hosts have been observed as having an association with {domain}.\r\n",
    "        The count of alerts for each host is to provide guidance on which hosts should be considered for prioritization \r\n",
    "        in further investigation.\"\"\")\r\n",
    "        display(host_alerts.style.hide_index())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary of Findings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-05-15T23:25:47.519233Z",
     "start_time": "2020-05-15T23:25:47.379233Z"
    }
   },
   "outputs": [],
   "source": [
    "md(f\"Domain: {domain}\", \"bold\")\n",
    "md(f\"URL: {url}\", \"bold\")\n",
    "summary.display_observations()"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3.8 - AzureML",
   "language": "python",
   "name": "python38-azureml"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "352.33px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}