{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MitreMap - Infer MITRE technique from Threat Intel Data\n",
    "\n",
    "__Notebook Version:__ 1.0 <br>\n",
    "__Notebook Author:__ Vani Asawa<br>\n",
    "\n",
    "\n",
    "__Python Version:__ >=Python 3.8<br>\n",
    "__Platforms Supported:__  Azure Machine Learning Notebooks<br>\n",
    "\n",
    "__Data Source Required:__ None<br>\n",
    "\n",
    "__GPU Compute Required:__ No<br>\n",
    "__GPU Compute Recommended:__ Yes<br>\n",
    "\n",
    "__Requirements Path:__ ```../mitremap-notebook/requirements.txt```<br>\n",
    "__Packages Downloaded:__ \n",
    "- ipywidgets==7.5.1\n",
    "- transformers==4.5.1\n",
    "- torch==1.10.2\n",
    "- msticpy==2.1.2\n",
    "- nltk==3.6.2\n",
    "- iocextract==1.13.1\n",
    "- shap==0.41.0\n",
    "\n",
    "## Overview\n",
    "\n",
    "This notebook allows a user to map descriptive text of an incident on to relevant MITRE ATT&CK Enterprise techniques. It uses a [GPT2](https://huggingface.co/gpt2) language model to associate terms in the description with similar descriptions in past incidents. It also extracts relevant Indicators of Compromise from the text.\n",
    "\n",
    "You can use the notebook with one of several pre-trained models or train your own model using your own threat reports or public sources.\n",
    "\n",
    "## Motivation\n",
    "\n",
    "Please refer to [Motivation](./README.md#motivation) and [Goals](./README.md#goals-of-the-mitremap-notebook) to learn more.\n",
    "\n",
    "## Prerequisites\n",
    "**Please do not run the notebook cells all at once**. The cells need to be run sequentially and successfully executed before proceeding with the remainder of the notebook.\n",
    "\n",
    "## Table of Contents\n",
    "\n",
    "0. Installations [One-Time Setup]\n",
    "1. Imports\n",
    "2. Configure Input Data and Model Parameters\n",
    "3. Run\n",
    "4. Results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Installations [One-Time Setup]\n",
    "\n",
    "Please refer to [One-Time Setup](./README.md#one-time-setup) to configure the virtual environment, install the required packages, and download the model artifacts.\n",
    "\n",
    "Use the Powershell or BASH script below to download the model artifacts \n",
    "\n",
    "**Estimated Time to download the model artifacts** - 5-10 minutes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Option 1: Powershell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!PowerShell ./model.ps1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Option 2: BASH"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "./model.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Download the utils whl to use the inference packages**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install utils-1.0-py3-none-any.whl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Re-start the kernel and run the Notebook from **1. Imports**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  1. Imports\n",
    "\n",
    "The modules used to run this notebook can be found under ```mitremap-notebook/utils/*```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "sys.path.append(os.getcwd())\n",
    "\n",
    "import utils\n",
    "from utils import main, inference, configs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Configure Input Data and Model Parameters,\n",
    "\n",
    "Please refer to [Input Parameters](./README.md#input-parameters) to learn more about setting the input parameter configurations.\n",
    "\n",
    "Start using the notebook with one of the threat intel examples in the markdown script 😊"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "config_widgets = configs.configure_model_parameters()\n",
    "for k in config_widgets.keys():\n",
    "       display(config_widgets[k])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Run\n",
    "\n",
    "Time to run the ```main.go``` function depends on the -\n",
    "\n",
    "1. Length of the Threat Intel Report, and\n",
    "2. If **Model Explainability** is set to True\n",
    "\n",
    "For our sample threat reports in the markdown script, you can expect -\n",
    "\n",
    "- < 1 minute without model explainability, and \n",
    "- 1-2 minutes with model explainability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "configs, inference_df, iocs_df = main.go(\n",
    "    config_widgets\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Results\n",
    "\n",
    "- ```configs```: Stores the input configurations set by the customer\n",
    "\n",
    "- ```inference_df```: Stores the inference results for the threat intel data\n",
    "\n",
    "- ```iocs_df```: Stores the IOCs extracted from the threat intel data.\n",
    "\n",
    "Use the ```inference.print_detailed_report(inference_df, configs)``` to obtain a printed summary of the MITRE technique predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inference.print_detailed_report(\n",
    "    inference_df,\n",
    "    configs\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Summary Statistics for Inference Dataframe: ')\n",
    "print('Shape of Inference Dataframe: ', inference_df.shape)\n",
    "if not inference_df.empty:\n",
    "    print('Sample rows: ')\n",
    "    display(inference_df.head(5))\n",
    "else:\n",
    "    print('No results obtained.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Summary Statistics for IOCs Dataframe: ')\n",
    "print('Shape of IOCs Dataframe: ', iocs_df.shape)\n",
    "if not iocs_df.empty:\n",
    "    print('Distinct counts for each category of IOCs: ')\n",
    "    display(iocs_df.groupby('IOC_Type').count().rename(columns={'IOC_Value': 'Count'}))\n",
    "    print('Sample rows: ')\n",
    "    display(iocs_df.head(5))\n",
    "else:\n",
    "    print('No IOCs obtained.')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.0 ('venv_conda': venv)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "aab2fdc36ff945907dc969e1819080c72b9a36ee5f36476bc76d3644e3267d5e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}