{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting Started with Azure Notebooks and Azure Sentinel\n",
    "**Notebook Version:** 1.0<br>\n",
    " **Python Version:** Python 3.6 (including Python 3.6 - AzureML)<br>\n",
    " **Required Packages**: <br>\n",
    " **Platforms Supported**:\n",
    " - Azure Notebooks Free Compute\n",
    " - Azure Notebooks DSVM\n",
    " - OS Independent\n",
    "\n",
    "**Data Sources Required**:\n",
    " - Log Analytics - SiginLogs (Optional)\n",
    " - VirusTotal\n",
    " - MaxMind\n",
    " \n",
    " \n",
    "This notebook takes you through the basics needed to get started with Azure Notebooks and Azure Sentinel, and how to perform the basic actions of data acquisition, data enrichment, data analysis, and data visualization. These actions are the building blocks of threat hunting with notebooks and are useful to understand before running more complex notebooks. This notebook only lightly covers each topic but includes 'learn more' sections to provide you with the resource to deep dive into each of these topics. \n",
    "\n",
    "This notebook assumes that you are running this in an Azure Notebooks environment, however it will work in other Jupyter environments.\n",
    "\n",
    "**Note:**\n",
    "This notebooks uses SigninLogs from your Azure Sentinel Workspace. If you are not yet collecting SigninLogs configure this connector in the Azure Sentinel portal before running this notebook.\n",
    "This notebook also uses the VirusTotal API for data enrichment, for this you will require an API key which can be obtained by signing up for a free [VirusTotal community account](https://www.virustotal.com/gui/join-us)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## What is a Jupyter notebook?\n",
    "You are currently reading a Jupyter notebook. [Jupyter](http://jupyter.org/) is an interactive development and data manipulation environment presented in a browser. Using Jupyter you can create documents, called Notebooks. These documents are made up of cells that contain interactive code, alongside that code's output, and other items such as text and images (what you are looking at now is a cell of Markdown text).\n",
    "\n",
    "The name, Jupyter, comes from the core supported programming languages that it supports: Julia, Python, and R. Whilst you can use any of these languages we are going to use Python in this notebook, in addition the notebooks that come with Azure Sentinel are all written in Python. Whilst there are pros, and cons to each language Python is a well-established language that has a large number of materials and libraries well suited for data analysis and security investigation, making it ideal for our needs.\n",
    "\n",
    "### Learn more:\n",
    " - The [Infosec Jupyter Book](https://infosecjupyterbook.com/introduction.html) has more details on the technical working of Jupyter.\n",
    " - [The Jupyter Project documentation](https://jupyter.org/documentation)\n",
    "\n",
    "---\n",
    "## How to use a Jupyter notebook?\n",
    "To use a Jupyter notebook you need a Jupyter server that will render the notebook and execute the code within it. This can take the form of a local [Jupyter installation](https://pypi.org/project/jupyter/), or a remotely hosted version such as [Azure Notebooks](https://notebooks.azure.com/). If you are reading this it is highly likely that you already have a Jupyter server that this notebook is using.\n",
    "You can learn more about installing and running your own Jupyter server [here](https://realpython.com/jupyter-notebook-introduction/).\n",
    "\n",
    "### Using Azure Notebooks\n",
    "If you accessed this notebook from Azure Sentinel,  you are probably using Azure Notebooks to run this notebook. Azure Notebooks runs in the same way that a local Jupyter server with, except with the additional feature of integrated project management and file storage. When you open a notebook in Azure Notebooks the user interface is nearly identical to a standard Jupyter notebook experience.\n",
    "\n",
    "Before you can start running code in a notebook you need to make sure that it is connected to a Jupyter server and you have the correct type of kernel configured. For this notebook we are going to be using Python 3.6, hopefully Azure Notebooks has already loaded this kernel for you - you can check this by looking at the top left corner of the screen where you should see the currently connected kernel. \n",
    "\n",
    "![KernelIssue](https://github.com/Azure/Azure-Sentinel-Notebooks/raw/master/images/nb_img1.png)\n",
    "\n",
    "If this does not read Python 3.6 you can select the correct kernel by selecting Kernel > Change kernel from the top menu and clicking Python 3.6.\n",
    "\n",
    "> **Note**: the notebook works with Python 3.6, 3.7 or later. If you are using this notebook in Azure ML or another Jupyter environment you can choose any kernel that supports Python 3.6 or later\n",
    "\n",
    "![KernelPicker](https://github.com/Azure/Azure-Sentinel-Notebooks/raw/master/images/nb_img2.png)\n",
    "\n",
    "Once you have done this you should be ready to move onto a code cell.\n",
    "> **Tip**: You can identify which cells are code by selecting them and looking at the drop down box at the center of the top menu. It will either read 'Code' (for interactive code cells), 'Markdown' (for Markdown text cells like this one), or RawNBConvert (these are just raw data and not interpreted by Jupyter - they can be used by tools that process notebook files, such as *nbconvert* to render the data into HTML or LaTeX). \n",
    "\n",
    "If you click on the cell below you should see this box change to 'Code'.\n",
    "\n",
    "### Learn More:\n",
    "More details on Azure Notebooks can be found in the [Azure Notebooks documentation](https://docs.microsoft.com/en-us/azure/notebooks/) and the [Azure Sentinel documentation](https://docs.microsoft.com/en-us/azure/sentinel/notebooks).\n",
    "\n",
    "---\n",
    "## Running code\n",
    "Once you have selected a code cell you can run it by clicking the run button at the menu bar at the top, or by pressing Ctrl+Enter.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is our first code cell, it contains basic Python code.\n",
    "# You can run a code cell by selecting it and clicking the Run button in the top menu, or by pressing Shift + Enter.\n",
    "# Once you run a code cell any output from that code will be displayed directly below it.\n",
    "print(\"Congratulations you just ran this code cell\")\n",
    "y = 2+2\n",
    "print(\"2 + 2 =\", y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Variables set within a code cell persist between cells meaning you can chain cells together"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y + 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learn More : \n",
    " - The [Infosec Jupyter Book](https://infosecjupyterbook.com/) provides an infosec specific intro to Python.\n",
    " - [Real Python](https://realpython.com/) is a comprehensive set of Python learnings and tutorials.\n",
    "<br>\n",
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you understand the basics we can move onto more complex code.\n",
    "\n",
    "---\n",
    "## Setting up the environment\n",
    "Code cells behave in the same way your code would in other environments, so you need to remember about common coding practices such as variable initialization and library imports. \n",
    "Before we execute more complex code we need to make sure the required packages are installed and libraries imported. At the top of many of the Azure Sentinel notebooks you will see large cells that will check kernel versions and then install and import all the libraries we are going to be using in the notebook, make sure you run this before running other cells in the notebook.\n",
    "If you are running notebooks locally or via dedicated compute in Azure Notebooks library installs will persist but this is not the case with Azure Notebooks free tier, so you will need to install each time you run. Even if running in a static environment imports are required for each run so make sure you run this cell regardless."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "import os\n",
    "import sys\n",
    "import warnings\n",
    "from IPython.display import display, HTML, Markdown\n",
    "\n",
    "REQ_PYTHON_VER=(3, 6)\n",
    "REQ_MSTICPY_VER=(0, 6, 0)\n",
    "\n",
    "display(HTML(\"<h3>Starting Notebook setup...</h3>\"))\n",
    "# If you did not clone the entire Azure-Sentinel-Notebooks repo you may not have this file\n",
    "if Path(\"./utils/nb_check.py\").is_file():\n",
    "    from utils.nb_check import check_python_ver, check_mp_ver\n",
    "\n",
    "    check_python_ver(min_py_ver=REQ_PYTHON_VER)\n",
    "    try:\n",
    "        check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n",
    "    except ImportError:\n",
    "        !pip install --upgrade msticpy\n",
    "        if \"msticpy\" in sys.modules:\n",
    "            importlib.reload(sys.modules[\"msticpy\"])\n",
    "        else:\n",
    "            import msticpy\n",
    "        check_mp_ver(REQ_MSTICPY_VER)\n",
    "            \n",
    "from msticpy.nbtools import nbinit\n",
    "nbinit.init_notebook(\n",
    "    namespace=globals(),\n",
    "    extra_imports=[\"ipwhois, IPWhois\", \"urllib.request, urlretrieve\", \"yaml\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Configuration\n",
    "Once we have set up our Jupyter environment with the libraries that we'll use in the notebook, we need to make sure we have some configuration in place. Some of the notebook components need addtional configuration to connect to external services (e.g. API keys to retrieve Threat Intelligence data). This includes configuration for connection to our Azure Sentinel workspace, as well as some threat intelligence providers we will use later.\n",
    "The easiest way to handle the configuration for these services is to store them in a msticpyconfig file (`msticpyconfig.yaml`). More details on msticpyconfig can be found here: https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html\n",
    "\n",
    "### Learn more: \n",
    "- In this notebook we will setup the basic config we need to get started. If you need a more complete walk-through we have a separate notebook to help you: https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb\n",
    "<br>\n",
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Azure-Sentinel-Notebooks GitHub repo contains an template msticpyconfig file ready to be populated. If you have run this notebook before you may have a msticpyconfig file already populated, the cell below allows you to checks if this file. If your config file does not contain details under Azure Sentinel > Workspaces, or TIProviders the following cells will populate these for you.<br>\n",
    "If you want to see an example of what a populated msticpyconfig file should look like a samples is included in the repo as msticpyconfig-sample.yaml."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import yaml\n",
    "def print_config():\n",
    "    with open('msticpyconfig.yaml') as f:\n",
    "        data = yaml.load(f, Loader=yaml.FullLoader)\n",
    "        print(yaml.dump(data))\n",
    "try:\n",
    "    print_config()\n",
    "except FileNotFoundError:\n",
    "    print(\"No msticpyconfig.yaml was found in your current directory.\")\n",
    "    print(\"We are downloading a template file for you.\")\n",
    "    urlretrieve(\"https://raw.githubusercontent.com/Azure/Azure-Sentinel-Notebooks/master/msticpyconfig.yaml\", \"msticpyconfig.yaml\")\n",
    "    print_config()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you do not have and msticpyconfig file we can populate one for you. Before you do this you will need a few things.\n",
    "\n",
    "The first is the Workspace ID and Tenant ID of the Azure Sentinel Workspace you wish to connect to.\n",
    "\n",
    " - You can get the workspace ID by opening Azure Sentinel in the [Azure Portal](https://portal.azure.com) and selecting Settings > Workspace Settings. Your Workspace ID is displayed near the top of this page.\n",
    "\n",
    "- You can get your tenant ID (also referred to organization or directory ID) via [Azure Active Directory](https://docs.microsoft.com/en-us/onedrive/find-your-office-365-tenant-id)\n",
    "\n",
    "We are going to use [VirusTotal](https://www.virustotal.com) to enrich our Azure Sentinel data. For this you will need a VirusTotal API key, one of these can be obtained for free (as a personnal key) via the [VirusTotal](https://developers.virustotal.com/v3.0/reference#getting-started) website.\n",
    "We are using VirusTotal for this notebook but we also support a range of other threat intelligence providers: https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html\n",
    "<br><br>\n",
    "In addition we are going to plot IP address locations on a map, in order to do this we are going to use [MaxMind](https://www.maxmind.com) to geolocate IP addresses which requires an API key. You can sign up for a free account and API key at https://www.maxmind.com/en/geolite2/signup. \n",
    "<br><br>\n",
    "Once you have these required items run the cell below and you will prompted to enter these elements:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "ws_id = nbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n",
    "                                        prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n",
    "ten_id = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n",
    "                                         prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n",
    "vt_key = nbwidgets.GetEnvironmentKey(env_var='VT_KEY',\n",
    "                                        prompt='Please enter your VirusTotal API Key:', auto_display=True)\n",
    "mm_key = nbwidgets.GetEnvironmentKey(env_var='MM_KEY',\n",
    "                                        prompt='Please enter your MaxMind API Key:', auto_display=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " The cell below will now populate a msticpyconfig file with these values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"msticpyconfig.yaml\") as config:\n",
    "    data = yaml.load(config, Loader=yaml.Loader)\n",
    "data['AzureSentinel']\n",
    "\n",
    "workspace = {\"Default\":{\"WorkspaceId\": ws_id.value, \"TenantId\": ten_id.value}}\n",
    "ti = {\"VirusTotal\":{\"Args\": {\"AuthKey\" : vt_key.value}, \"Primary\" : True, \"Provider\": \"VirusTotal\"}}\n",
    "other_prov = {\"GeoIPLite\" : {\"Args\" : {\"AuthKey\" : mm_key.value, \"DBFolder\" : \"~/msticpy\"}, \"Provider\" : \"GeoLiteLookup\"}}\n",
    "data['AzureSentinel']['Workspaces'] = workspace\n",
    "data['TIProviders'] = ti\n",
    "data['OtherProviders'] = other_prov\n",
    "\n",
    "with open(\"msticpyconfig.yaml\", 'w') as config:\n",
    "    yaml.dump(data, config)\n",
    "    \n",
    "print(\"msticpyconfig.yaml updated\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now validate our configuration is correct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from msticpy.common.pkg_config import refresh_config, validate_config\n",
    "refresh_config()\n",
    "validate_config()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note** you may see warnings for missing providers when running this cell.\n",
    "> This is not an issue as we will not be using all providers in this notebook\n",
    "> so long as you get thie message \"No errors found.\" you are OK to proceed.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Getting Data\n",
    "Now that we have configured the details necessary to connect to Azure Sentinel we can go ahead and get some data. We will do this with `QueryProvider()` from MSTICpy. \n",
    "You can use the `QueryProvider` class to connect to different data sources such as MDATP, the Security Graph API, and the one we will use here, Azure Sentinel. \n",
    "\n",
    "### Learn more:\n",
    " - More details on configuring and using QueryProviders can be found in the [MSTICpy Documentation](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#instantiating-a-query-provider).\n",
    "<br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For now, we are going to set up a QueryProvider for Azure Sentinel, pass it the details for our workspace that we just stored in the msticpyconfig file, and connect. The connection process will ask us to authenticate to our Azure Sentinel workspace via [device authorization](https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-device-code) with our Azure credentials. You can do this by clicking the device login code button that appears as the output of the next cell, or by navigating to https://microsoft.com/devicelogin and manually entering the code. Note that this authentication persists with the kernel you are using with the notebook, so if you restart the kernel you will need to re-authenticate.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initalize a QueryProvider for Azure Sentinel\n",
    "qry_prov = QueryProvider(\"LogAnalytics\")\n",
    "\n",
    "# Get the Azure Sentinel workspace details from msticpyconfig\n",
    "try:\n",
    "    ws_config = WorkspaceConfig()\n",
    "    md(\"Workspace details collected from config file\")\n",
    "except:\n",
    "    raise Exception(\"No workspace settings are configured, please run the cells above to configure these.\")\n",
    "    \n",
    "# Connect to Azure Sentinel with our QueryProvider and config details\n",
    "# ws_config.code_connect_str is a feature of MSTICpy that creates the required connection string from details in our msticpyconfig\n",
    "qry_prov.connect(connection_str=ws_config.code_connect_str)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have connected we can query Azure Sentinel for data, but before we do that we need to understand what data is avalaible to query. The QueryProvider object provides a way to get a list of tables as well as tables and table columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get list of tables in our Workspace\n",
    "display(qry_prov.schema_tables [:5]) # We are outputting only the first 5 tables for brevity\n",
    "# Get list of tables and thier columns\n",
    "qry_prov.schema['SigninLogs'] # We are only displaying the columns for SigninLogs for brevity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "MSTICpy includes a number of built in queries that you can run.<br>\n",
    "You can list available queries with .list_queries() and get specific details about a query by calling it with \"?\" as a parameter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get a list of avaliable queries\n",
    "qry_prov.list_queries()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get details about a query\n",
    "qry_prov.Azure.list_all_signins_geo(\"?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can then run the query by calling it with the required parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime, timedelta\n",
    "# set our query end time as now\n",
    "end = datetime.now()\n",
    "# set our query start time as 1 hour ago\n",
    "start = end - timedelta(hours=1)\n",
    "# run query with specified start and end times\n",
    "logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n",
    "# display first 5 rows of any results\n",
    "logons_df.head() # If you have no data you will just see the column headings displayed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another way to run queries is to pass a string format of a KQL query to the query provider, this will run the query against the workspace connected to above, and will return the data in a [Pandas DataFrame](https://pandas.pydata.org/). We will look at working with Pandas in a bit more detail later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-26T19:27:44.779558Z",
     "start_time": "2020-06-26T19:27:44.569079Z"
    }
   },
   "outputs": [],
   "source": [
    "# Define our query\n",
    "test_query = \"\"\"\n",
    "SigninLogs\n",
    "| where TimeGenerated > ago(7d)\n",
    "| take 10\n",
    "\"\"\"\n",
    "\n",
    "# Pass that query to our QueryProvider\n",
    "test_df = qry_prov.exec_query(test_query)\n",
    "\n",
    "# Check that we have some data\n",
    "if isinstance(test_df, pd.DataFrame) and not test_df.empty:\n",
    "    # .head() returns the first 5 rows of our results DataFrame\n",
    "    display(test_df.head())\n",
    "# If where is no data load some sample data to use instead\n",
    "else:\n",
    "    md(\"You don't appear to have any SigninLogs - we will load sample data for you to use.\")\n",
    "    if not Path(\"nbdemo/data/aad_logons.pkl\").exists():\n",
    "        Path(\"nbdemo/data/\").mkdir(parents=True, exist_ok=True)\n",
    "        urlretrieve('https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/nbdemo/data/aad_logons.pkl?raw=true', 'nbdemo/data/aad_logons.pkl')\n",
    "        urlretrieve('https://raw.githubusercontent.com/Azure/Azure-Sentinel-Notebooks/master/nbdemo/data/queries.yaml', 'nbdemo/data/queries.yaml')\n",
    "    qry_prov = QueryProvider(\"LocalData\", data_paths=[\"nbdemo/data/\"], query_paths=[\"nbdemo/data/\"])\n",
    "    logons_df = qry_prov.Azure.list_all_signins_geo()\n",
    "    display(logons_df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learn more:\n",
    " - You can learn more about the MSTICpy pre-defined queries in the [MSTICpy Documentation](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#running-an-pre-defined-query)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Pandas\n",
    "Our query results are returned in the form of a Pandas DataFrame. DataFrames are a core component of the Azure Sentinel notebooks and of MSTICpy and is used for both input and output formats.\n",
    "Pandas DataFrames are incredibly versitile data structures with a lot of useful features, we will cover a small number of them here and we recommend that you check out the Learn more section to learn more about Pandas features.\n",
    "<br>\n",
    "<br>\n",
    "### Displaying a DataFrame:\n",
    "The first thing we want to do is display our DataFrame. You can either just run it or explicity display it by calling `display(df)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For this section we are going to create a DataFrame from data we have saved in a csv file\n",
    "df = pd.read_csv(\"https://raw.githubusercontent.com/microsoft/msticpy/master/tests/testdata/host_logons.csv\", index_col=[0] )\n",
    "# Display our DataFrame\n",
    "df  # or display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note** if the dataframe variable (`df` in the example above) is the last statement in a \n",
    "> code cell, Jupyter will automatically display it without using the `display()` function. \n",
    "> However, if you want to display a DataFrame in the middle of \n",
    "> other code in a cell you must use the `display()` function.\n",
    "\n",
    "You may not want to display the whole DataFrame and instead display only a selection of items. There are numerous ways to do this and the cell below shows some of the most widely used functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md(\"Display the first 2 rows using head(): \", \"bold\")\n",
    "display(df.head(2))  # we don't need to call display here but just for illustration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md(\"Display the 3rd row using iloc[]: \", \"bold\")\n",
    "df.iloc[3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md(\"Show the column names in the DataFrame \", \"bold\")\n",
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md(\"Display just the TimeGenerated and TenantId columnns: \", \"bold\")\n",
    "df[[\"TimeGenerated\", \"TenantId\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also choose to select a subsection of our DataFrame based on the contents of the DataFrame:\n",
    "\n",
    "> **Tip**: the syntax in these examples is using a technique called *boolean indexing*. \n",
    "> <br>`df[<boolean expression>]`\n",
    "> returns all rows in the dataframe where the boolean expression is True\n",
    "> <br>In the first example we telling pandas to return all rows where the column value of\n",
    "> 'TargetUserName' matches 'MSTICAdmin'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md(\"Display only rows where TargetUserName value is 'MSTICAdmin': \", \"bold\")\n",
    "df[df['TargetUserName']==\"MSTICAdmin\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md(\"Display rows where TargetUserName is either MSTICAdmin or adm1nistratror:\", \"bold\")\n",
    "display(df[df['TargetUserName'].isin(['adm1nistrator', 'MSTICAdmin'])])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our DataFrame call also be extended to add new columns with additional data if reqired:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df[\"NewCol\"] = \"Look at my new data!\"\n",
    "display(df[[\"TenantId\",\"Account\", \"TimeGenerated\", \"NewCol\"]].head(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learn more:\n",
    "There is a lot more you can do with Pandas, the links below provide some useful resources:\n",
    " - [Getting starting with Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html)\n",
    " - [Infosec Jupyerbook intro to Pandas](https://infosecjupyterbook.com/notebooks/tutorials/03_intro_to_pandas.html)\n",
    " - [A great list of Pandas hints and tricks](https://www.dataschool.io/python-pandas-tips-and-tricks/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Enriching data\n",
    "\n",
    "Now that we have seen how to query for data, and do some basic manipulation we can look at enriching this data with additional data sources. For this we are going to use an external threat intelligence provider to give us some more details about an IP address we have in our dataset using the [MSTICpy TIProvider](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html) feature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime, timedelta\n",
    "# Check if we have logon data already and if not get some\n",
    "if not isinstance(logons_df, pd.DataFrame) or logons_df.empty:\n",
    "    # set our query end time as now\n",
    "    end = datetime.now()\n",
    "    # set our query start time as 1 hour ago\n",
    "    start = end - timedelta(days=1)\n",
    "    # run query with specified start and end times\n",
    "    logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n",
    "    \n",
    "# Create our TI provider\n",
    "ti = TILookup()\n",
    "# Get the first logon IP address from our dataset\n",
    "ip = logons_df.iloc[1]['IPAddress']\n",
    "# Look up the IP in VirusTotal\n",
    "ti_resp = ti.lookup_ioc(ip, providers=[\"VirusTotal\"])\n",
    "\n",
    "# Format our results as a DataFrame\n",
    "ti_resp = ti.result_to_df(ti_resp)\n",
    "display(ti_resp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the [Pandas apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) feature we can get results for all the IP addresses in our data set and add the lookup severity score as a new column in our DataFrame for easier reference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Take the IP address in each row, look it up against TI and return the seveirty score\n",
    "def lookup_res(row):\n",
    "    ip = row['IPAddress']\n",
    "    resp = ti.lookup_ioc(ip, providers=[\"VirusTotal\"])\n",
    "    resp = ti.result_to_df(resp)\n",
    "    return resp[\"Severity\"].iloc[0]\n",
    "\n",
    "# Take the first 3 rows of data and copy they into a new DataFrame\n",
    "enrich_logons_df = logons_df.iloc[:3].copy()\n",
    "# Create a new column called TIRisk and populate that with the TI severity score of the IP Address in that row\n",
    "enrich_logons_df['TIRisk'] = enrich_logons_df.apply(lookup_res, axis=1)\n",
    "# Display a subset of columns from our DataFrame\n",
    "enrich_logons_df[[\"TimeGenerated\", \"ResultType\", \"UserPrincipalName\", \"IPAddress\", \"TIRisk\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learn more:\n",
    "MSTICpy includes further threat intelligence capabilities as well as other data enrichment options. More details on these can be found in the [documentation](https://msticpy.readthedocs.io/en/latest/DataEnrichment.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Analyzing data\n",
    "With the data we have collected we may wish to perform some analysis on it in order to better understand it. MSTICpy includes a number of features to help with this, and there are a vast array of other data analysis capabilities available via Python ranging from simple processes to complex ML models. We will start here by keeping it simple and look at how we can decode some Base64 encoded command line strings we have in order to allow us to understand their content."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from msticpy.sectools import base64unpack as b64\n",
    "# Take our encoded Powershell Command\n",
    "b64_cmd = \"powershell.exe -encodedCommand SW52b2tlLVdlYlJlcXVlc3QgaHR0cHM6Ly9jb250b3NvLmNvbS9tYWx3YXJlIC1PdXRGaWxlIEM6XG1hbHdhcmUuZXhl\"\n",
    "# Unpack the Base64 encoded elements\n",
    "unpack_txt = b64.unpack(input_string=b64_cmd)\n",
    "# Display our results and transform for easier reading\n",
    "unpack_txt[1].T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also use MSTICpy to extract Indicators of Compromise (IoCs) from a dataset, this makes it easy to extract and match on a set of IoCs within our data. In the example below we take a US Cybersecurity & Infrastructure Security Agency (CISA) report and extract all domains listed in the report:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "# Set up our IoCExtract oject\n",
    "ioc_extractor = iocextract.IoCExtract()\n",
    "# Download our threat report\n",
    "data = requests.get(\"https://www.us-cert.gov/sites/default/files/publications/AA20-099A_WHITE.stix.xml\")\n",
    "# Extract domains listed in our report\n",
    "iocs = ioc_extractor.extract(data.text, ioc_types=\"dns\")['dns']\n",
    "# Display the first 5 iocs found in our report\n",
    "list(iocs)[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learn more:\n",
    "There are a wide range of options when it comes to data analysis in notebooks using Python. Here are some useful resources to get you started:\n",
    " - [MSITCpy DataAnalysis documentation](https://msticpy.readthedocs.io/en/latest/DataAnalysis.html)\n",
    " - Scikit-Learn is a popular Python ML data analysis library, which has a useful [tutorial](https://scikit-learn.org/stable/tutorial/basic/tutorial.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Visualizing data\n",
    "Visualizing data can provide an excellent way to analyse data, identify patterns and anomalies. Python has a wide range of data visualization capabilities each of which have thier own benefits and drawbacks. We will look at some basic capabilities as well as the in-build visualizations in MSTICpy.\n",
    "<br><br><br>\n",
    "**Basic Graphs**<br>\n",
    "Pandas and Matplotlib provide the easiest and simplest way to produce simple plots of data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vis_q = \"\"\"\n",
    "SigninLogs\n",
    "| where TimeGenerated > ago(7d)\n",
    "| sample 5\"\"\"\n",
    "\n",
    "# Try and query for data but if using sample data load that instead\n",
    "try:\n",
    "    vis_data = qry_prov.exec_query(vis_q)\n",
    "except FileNotFoundError:\n",
    "    vis_data = logons_df\n",
    "\n",
    "# Check we have some data in our results and if not use previously used dataset\n",
    "if not isinstance(vis_data, pd.DataFrame) or vis_data.empty:\n",
    "    vis_data = logons_df\n",
    "\n",
    "# Plot up to the first 5 IP addresses\n",
    "vis_data.head()['IPAddress'].value_counts().plot.bar(title=\"IP prevelence\", legend=False)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pie_df = vis_data.copy()\n",
    " # If we have lots of data just plot the first 5 rows\n",
    "pie_df.head()['IPAddress'].value_counts().plot.pie(legend=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Bokeh](https://bokeh.org/) is a powerful visualization library that allows you to create complex, interactive visualizations. MSTICpy includes a number of pre-built visualizations using Bokeh including a timeline feature that can be used to represent events over time. You can interact with the timeline by zooming and panning, using the range selector, as well as hovering over data points to see more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime, timedelta\n",
    "# Check if we have logon data already and if not get some\n",
    "if not isinstance(logons_df, pd.DataFrame) or logons_df.empty:\n",
    "    # set our query end time as now\n",
    "    end = datetime.now()\n",
    "    # set our query start time as 1 hour ago\n",
    "    start = end - timedelta(days=1)\n",
    "    # run query with specified start and end times\n",
    "    logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n",
    "    \n",
    "display(timeline.display_timeline(logons_df.head(10), source_columns=[\"TimeGenerated\", \"ResultType\", \"UserPrincipalName\", \"IPAddress\"], group_by=\"AppDisplayName\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "MSTICpy also includes a feature to allow you to map locations, this can be particularily useful when looking at the distribution of remote network connections or other events. Below we plot the locations of remote logons observed in our Azure AD data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from msticpy.sectools.ip_utils import convert_to_ip_entities\n",
    "from msticpy.nbtools.foliummap import FoliumMap, get_map_center\n",
    "\n",
    "# Convert our IP addresses in string format into an ip address entity\n",
    "ip_entity = entityschema.IpAddress()\n",
    "ip_list = [convert_to_ip_entities(i)[0] for i in logons_df['IPAddress'].head(10)]\n",
    "    \n",
    "# Get center location of all IP locaitons to center the map on\n",
    "location = get_map_center(ip_list)\n",
    "logon_map = FoliumMap(location=location, zoom_start=4)\n",
    "\n",
    "# Add location markers to our map and dsiplay it\n",
    "if len(ip_list) > 0:\n",
    "    logon_map.add_ip_cluster(ip_entities=ip_list)\n",
    "display(logon_map.folium_map)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learn more:\n",
    " - The [Infosec Jupyterbook](https://infosecjupyterbook.com/) includes a section on data visualization.\n",
    " - [Bokeh Library Documentation](https://bokeh.org/)\n",
    " - [Matplotlib tutorial](https://matplotlib.org/3.2.0/tutorials/index.html)\n",
    " - [Seaborn visualization library tutorial](https://seaborn.pydata.org/tutorial.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Conclusion\n",
    "This notebook has showed you the basics of using notebooks and Azure Sentinel for security investigaitons. There are many more things possible using notebooks and it is stronly encouraged to read the material we have referenced in the learn more sections in this notebook. You can also explore the other Azure Sentinel notebooks in order to take advantage of the pre-built hunting logic, and understand other analysis techniques that are possible. </br>\n",
    "### Appendix:\n",
    " - [Jupyter Notebooks: An Introduction](https://realpython.com/jupyter-notebook-introduction/)\n",
    " - [Threat Hunting in the cloud with Azure Notebooks](https://medium.com/@maarten.goet/threat-hunting-in-the-cloud-with-azure-notebooks-supercharge-your-hunting-skills-using-jupyter-8d69218e7ca0)\n",
    " - [MSTICpy documentation](https://msticpy.readthedocs.io/)\n",
    " - [Azure Sentinel Notebooks documentation](https://docs.microsoft.com/en-us/azure/sentinel/notebooks)\n",
    " - [The Infosec Jupyterbook](https://infosecjupyterbook.com/introduction.html)\n",
    " - [Linux Host Explorer Notebook walkthrough](https://techcommunity.microsoft.com/t5/azure-sentinel/explorer-notebook-series-the-linux-host-explorer/ba-p/1138273)\n",
    " - [Why use Jupyter for Security Investigations](https://techcommunity.microsoft.com/t5/azure-sentinel/why-use-jupyter-for-security-investigations/ba-p/475729)\n",
    " - [Security Investigtions with Azure Sentinel & Notebooks](https://techcommunity.microsoft.com/t5/azure-sentinel/security-investigation-with-azure-sentinel-and-jupyter-notebooks/ba-p/432921)\n",
    " - [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html)\n",
    " - [Bokeh Documentation](https://docs.bokeh.org/en/latest/)"
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3.8 - AzureML",
   "language": "python",
   "name": "python38-azureml"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}