{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **MITRE ATT&CK PYTHON CLIENT**: Data Sources\n",
    "------------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Goals:\n",
    "* Access ATT&CK data sources in STIX format via a public TAXII server\n",
    "* Learn to interact with ATT&CK data all at once\n",
    "* Explore and idenfity patterns in the data retrieved\n",
    "* Learn more about ATT&CK data sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. ATT&CK Python Client Installation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can install it via PIP: **pip install attackcti**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Import ATT&CK API Client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from attackcti import attack_client"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Import Extra Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pandas import *\n",
    "import numpy as np\n",
    "\n",
    "import altair as alt\n",
    "alt.renderers.enable('notebook')\n",
    "\n",
    "import itertools"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Initialize ATT&CK Client Class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "lift = attack_client()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Getting Information About Techniques"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Getting ALL ATT&CK Techniques"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_techniques = lift.get_techniques(stix_format=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Showing the first technique in our list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'external_references': [{'source_name': 'mitre-attack',\n",
       "   'external_id': 'T1059.008',\n",
       "   'url': 'https://attack.mitre.org/techniques/T1059/008'},\n",
       "  {'source_name': 'Cisco Synful Knock Evolution',\n",
       "   'url': 'https://blogs.cisco.com/security/evolution-of-attacks-on-cisco-ios-devices',\n",
       "   'description': 'Graham Holmes. (2015, October 8). Evolution of attacks on Cisco IOS devices. Retrieved October 19, 2020.'},\n",
       "  {'source_name': 'Cisco IOS Software Integrity Assurance - Command History',\n",
       "   'url': 'https://tools.cisco.com/security/center/resources/integrity_assurance.html#23',\n",
       "   'description': 'Cisco. (n.d.). Cisco IOS Software Integrity Assurance - Command History. Retrieved October 21, 2020.'}],\n",
       " 'kill_chain_phases': [{'kill_chain_name': 'mitre-attack',\n",
       "   'phase_name': 'execution'}],\n",
       " 'x_mitre_is_subtechnique': True,\n",
       " 'x_mitre_version': '1.0',\n",
       " 'id': 'attack-pattern--818302b2-d640-477b-bf88-873120ce85c4',\n",
       " 'technique_description': 'Adversaries may abuse scripting or built-in command line interpreters (CLI) on network devices to execute malicious command and payloads. The CLI is the primary means through which users and administrators interact with the device in order to view system information, modify device operations, or perform diagnostic and administrative functions. CLIs typically contain various permission levels required for different commands. \\n\\nScripting interpreters automate tasks and extend functionality beyond the command set included in the network OS. The CLI and scripting interpreter are accessible through a direct console connection, or through remote means, such as telnet or secure shell (SSH).\\n\\nAdversaries can use the network CLI to change how network devices behave and operate. The CLI may be used to manipulate traffic flows to intercept or manipulate data, modify startup configuration parameters to load malicious system software, or to disable security features or logging to avoid detection. (Citation: Cisco Synful Knock Evolution)',\n",
       " 'technique': 'Network Device CLI',\n",
       " 'created_by_ref': 'identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5',\n",
       " 'object_marking_refs': ['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'],\n",
       " 'url': 'https://attack.mitre.org/techniques/T1059/008',\n",
       " 'matrix': 'mitre-attack',\n",
       " 'technique_id': 'T1059.008',\n",
       " 'type': 'attack-pattern',\n",
       " 'tactic': ['execution'],\n",
       " 'modified': '2020-10-22T16:43:38.388Z',\n",
       " 'created': '2020-10-20T00:09:33.072Z',\n",
       " 'data_sources': ['Network device logs',\n",
       "  'Network device run-time memory',\n",
       "  'Network device command history',\n",
       "  'Network device configuration'],\n",
       " 'platform': ['Network'],\n",
       " 'technique_detection': 'Consider reviewing command history in either the console or as part of the running memory to determine if unauthorized or suspicious commands were used to modify device configuration.(Citation: Cisco IOS Software Integrity Assurance - Command History)\\n\\nConsider comparing a copy of the network device configuration against a known-good version to discover unauthorized changes to the command interpreter. The same process can be accomplished through a comparison of the run-time memory, though this is non-trivial and may require assistance from the vendor.',\n",
       " 'permissions_required': ['Administrator', 'User']}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_techniques[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Normalizing semi-structured JSON data into a flat table via **pandas.io.json.json_normalize**\n",
    "* Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_normalized = pandas.json_normalize(all_techniques)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>external_references</th>\n",
       "      <th>kill_chain_phases</th>\n",
       "      <th>x_mitre_is_subtechnique</th>\n",
       "      <th>x_mitre_version</th>\n",
       "      <th>id</th>\n",
       "      <th>technique_description</th>\n",
       "      <th>technique</th>\n",
       "      <th>created_by_ref</th>\n",
       "      <th>object_marking_refs</th>\n",
       "      <th>url</th>\n",
       "      <th>...</th>\n",
       "      <th>remote_support</th>\n",
       "      <th>impact_type</th>\n",
       "      <th>revoked</th>\n",
       "      <th>x_mitre_deprecated</th>\n",
       "      <th>x_mitre_old_attack_id</th>\n",
       "      <th>difficulty_explanation</th>\n",
       "      <th>difficulty_for_adversary</th>\n",
       "      <th>detectable_explanation</th>\n",
       "      <th>detectable_by_common_defenses</th>\n",
       "      <th>tactic_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[{'source_name': 'mitre-attack', 'external_id'...</td>\n",
       "      <td>[{'kill_chain_name': 'mitre-attack', 'phase_na...</td>\n",
       "      <td>True</td>\n",
       "      <td>1.0</td>\n",
       "      <td>attack-pattern--818302b2-d640-477b-bf88-873120...</td>\n",
       "      <td>Adversaries may abuse scripting or built-in co...</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5</td>\n",
       "      <td>[marking-definition--fa42a846-8d90-4e51-bc29-7...</td>\n",
       "      <td>https://attack.mitre.org/techniques/T1059/008</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 37 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 external_references  \\\n",
       "0  [{'source_name': 'mitre-attack', 'external_id'...   \n",
       "\n",
       "                                   kill_chain_phases x_mitre_is_subtechnique  \\\n",
       "0  [{'kill_chain_name': 'mitre-attack', 'phase_na...                    True   \n",
       "\n",
       "  x_mitre_version                                                 id  \\\n",
       "0             1.0  attack-pattern--818302b2-d640-477b-bf88-873120...   \n",
       "\n",
       "                               technique_description           technique  \\\n",
       "0  Adversaries may abuse scripting or built-in co...  Network Device CLI   \n",
       "\n",
       "                                   created_by_ref  \\\n",
       "0  identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5   \n",
       "\n",
       "                                 object_marking_refs  \\\n",
       "0  [marking-definition--fa42a846-8d90-4e51-bc29-7...   \n",
       "\n",
       "                                             url  ... remote_support  \\\n",
       "0  https://attack.mitre.org/techniques/T1059/008  ...            NaN   \n",
       "\n",
       "  impact_type revoked x_mitre_deprecated x_mitre_old_attack_id  \\\n",
       "0         NaN     NaN                NaN                   NaN   \n",
       "\n",
       "  difficulty_explanation difficulty_for_adversary detectable_explanation  \\\n",
       "0                    NaN                      NaN                    NaN   \n",
       "\n",
       "  detectable_by_common_defenses tactic_type  \n",
       "0                           NaN         NaN  \n",
       "\n",
       "[1 rows x 37 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_normalized[0:1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Re-indexing Dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques = techniques_normalized.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[execution]</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>[Network device logs, Network device run-time ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[collection]</td>\n",
       "      <td>Network Device Configuration Dump</td>\n",
       "      <td>T1602.002</td>\n",
       "      <td>[Netflow/Enclave netflow, Network protocol ana...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[defense-evasion, persistence]</td>\n",
       "      <td>TFTP Boot</td>\n",
       "      <td>T1542.005</td>\n",
       "      <td>[Network device run-time memory, Network devic...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[defense-evasion, persistence]</td>\n",
       "      <td>ROMMONkit</td>\n",
       "      <td>T1542.004</td>\n",
       "      <td>[File monitoring, Netflow/Enclave netflow, Net...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[collection]</td>\n",
       "      <td>SNMP (MIB Dump)</td>\n",
       "      <td>T1602.001</td>\n",
       "      <td>[Netflow/Enclave netflow, Network protocol ana...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         matrix   platform                          tactic  \\\n",
       "0  mitre-attack  [Network]                     [execution]   \n",
       "1  mitre-attack  [Network]                    [collection]   \n",
       "2  mitre-attack  [Network]  [defense-evasion, persistence]   \n",
       "3  mitre-attack  [Network]  [defense-evasion, persistence]   \n",
       "4  mitre-attack  [Network]                    [collection]   \n",
       "\n",
       "                           technique technique_id  \\\n",
       "0                 Network Device CLI    T1059.008   \n",
       "1  Network Device Configuration Dump    T1602.002   \n",
       "2                          TFTP Boot    T1542.005   \n",
       "3                          ROMMONkit    T1542.004   \n",
       "4                    SNMP (MIB Dump)    T1602.001   \n",
       "\n",
       "                                        data_sources  \n",
       "0  [Network device logs, Network device run-time ...  \n",
       "1  [Netflow/Enclave netflow, Network protocol ana...  \n",
       "2  [Network device run-time memory, Network devic...  \n",
       "3  [File monitoring, Netflow/Enclave netflow, Net...  \n",
       "4  [Netflow/Enclave netflow, Network protocol ana...  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A total of  1024  techniques\n"
     ]
    }
   ],
   "source": [
    "print('A total of ',len(techniques),' techniques')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Removing Revoked Techniques"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_techniques_no_revoked = lift.remove_revoked(all_techniques)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A total of  878  techniques\n"
     ]
    }
   ],
   "source": [
    "print('A total of ',len(all_techniques_no_revoked),' techniques')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Extractinng Revoked Techniques"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_techniques_revoked = lift.extract_revoked(all_techniques)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A total of  146  techniques that have been revoked\n"
     ]
    }
   ],
   "source": [
    "print('A total of ',len(all_techniques_revoked),' techniques that have been revoked')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The revoked techniques are the following ones:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Web Session Cookie\n",
      "Emond\n",
      "Cloud Instance Metadata API\n",
      "Revert Cloud Instance\n",
      "Application Access Token\n",
      "Elevated Execution with Prompt\n",
      "Credentials from Web Browsers\n",
      "PowerShell Profile\n",
      "Parent PID Spoofing\n",
      "Compile After Delivery\n",
      "Systemd Service\n",
      "Runtime Data Manipulation\n",
      "Transmitted Data Manipulation\n",
      "Stored Data Manipulation\n",
      "Disk Content Wipe\n",
      "Disk Structure Wipe\n",
      "Domain Generation Algorithms\n",
      "Compiled HTML File\n",
      "Kernel Modules and Extensions\n",
      "Spearphishing Link\n",
      "CMSTP\n",
      "Credentials in Registry\n",
      "Control Panel Items\n",
      "Kerberoasting\n",
      "Spearphishing Attachment\n",
      "SIP and Trust Provider Hijacking\n",
      "Spearphishing via Service\n",
      "Sudo Caching\n",
      "Time Providers\n",
      "AppCert DLLs\n",
      "Dynamic Data Exchange\n",
      "Multi-hop Proxy\n",
      "Process Doppelgänging\n",
      "Extra Window Memory Injection\n",
      "Domain Fronting\n",
      "Mshta\n",
      "Hooking\n",
      "Image File Execution Options Injection\n",
      "LSASS Driver\n",
      "Screensaver\n",
      "LLMNR/NBT-NS Poisoning and Relay\n",
      "Password Filter DLL\n",
      "SSH Hijacking\n",
      "SID-History Injection\n",
      "Gatekeeper Bypass\n",
      "HISTCONTROL\n",
      "LC_LOAD_DYLIB Addition\n",
      "Launchctl\n",
      "Local Job Scheduling\n",
      "Private Keys\n",
      "Rc.common\n",
      "Space after Filename\n",
      "Application Shimming\n",
      "AppleScript\n",
      "Bash History\n",
      ".bash_profile and .bashrc\n",
      "Clear Command History\n",
      "Dylib Hijacking\n",
      "Hidden Window\n",
      "Launch Daemon\n",
      "Hidden Users\n",
      "Input Prompt\n",
      "Launch Agent\n",
      "Login Item\n",
      "Keychain\n",
      "Plist Modification\n",
      "Re-opened Applications\n",
      "Setuid and Setgid\n",
      "Hidden Files and Directories\n",
      "Startup Items\n",
      "Sudo\n",
      "Securityd Memory\n",
      "Trap\n",
      "Authentication Package\n",
      "Install Root Certificate\n",
      "Netsh Helper DLL\n",
      "Network Share Connection Removal\n",
      "Component Object Model Hijacking\n",
      "Regsvcs/Regasm\n",
      "InstallUtil\n",
      "Regsvr32\n",
      "Code Signing\n",
      "Component Firmware\n",
      "File Deletion\n",
      "AppInit DLLs\n",
      "Security Support Provider\n",
      "Web Shell\n",
      "Timestomp\n",
      "Pass the Ticket\n",
      "NTFS File Attributes\n",
      "Custom Command and Control Protocol\n",
      "Process Hollowing\n",
      "Disabling Security Tools\n",
      "Bypass User Account Control\n",
      "PowerShell\n",
      "Rundll32\n",
      "Windows Management Instrumentation Event Subscription\n",
      "Credentials in Files\n",
      "Multilayer Encryption\n",
      "Windows Admin Shares\n",
      "Remote Desktop Protocol\n",
      "Pass the Hash\n",
      "DLL Side-Loading\n",
      "Bootkit\n",
      "Indicator Removal from Tools\n",
      "Uncommonly Used Port\n",
      "Security Software Discovery\n",
      "Registry Run Keys / Startup Folder\n",
      "Service Registry Permissions Weakness\n",
      "Indicator Blocking\n",
      "New Service\n",
      "Software Packing\n",
      "File System Permissions Weakness\n",
      "Change Default File Association\n",
      "DLL Search Order Hijacking\n",
      "Service Execution\n",
      "Standard Cryptographic Protocol\n",
      "Modify Existing Service\n",
      "Windows Remote Management\n",
      "Custom Cryptographic Protocol\n",
      "Shortcut Modification\n",
      "Data Encrypted\n",
      "System Firmware\n",
      "Application Deployment Software\n",
      "Accessibility Features\n",
      "Port Monitors\n",
      "Binary Padding\n",
      "Winlogon Helper DLL\n",
      "Data Compressed\n",
      "Remotely Install Application\n",
      "Insecure Third-Party Libraries\n",
      "Fake Developer Accounts\n",
      "Device Type Discovery\n",
      "Detect App Analysis Environment\n",
      "Malicious Software Development Tools\n",
      "Biometric Spoofing\n",
      "Device Unlock Code Guessing or Brute Force\n",
      "Malicious Media Content\n",
      "URL Scheme Hijacking\n",
      "Abuse of iOS Enterprise App Signing Key\n",
      "App Delivered via Web Download\n",
      "App Delivered via Email Attachment\n",
      "Malicious or Vulnerable Built-in Device Functionality\n",
      "Malicious SMS Message\n",
      "Exploit Baseband Vulnerability\n",
      "Stolen Developer Credentials or Signing Keys\n"
     ]
    }
   ],
   "source": [
    "for t in all_techniques_revoked:\n",
    "    print(t['technique'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Updating our Dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_normalized = pandas.json_normalize(all_techniques_no_revoked)\n",
    "techniques = techniques_normalized.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Techniques Per Matrix\n",
    "Using **altair** python library we can start showing a few charts stacking the number of techniques with or without data sources.\n",
    "Reference: https://altair-viz.github.io/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>technique</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>536</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-ics-attack</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-mobile-attack</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-pre-attack</td>\n",
       "      <td>174</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                matrix  technique\n",
       "0         mitre-attack        536\n",
       "1     mitre-ics-attack         81\n",
       "2  mitre-mobile-attack         87\n",
       "3     mitre-pre-attack        174"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = techniques\n",
    "data_2 = data.groupby(['matrix'])['technique'].count()\n",
    "data_3 = data_2.to_frame().reset_index()\n",
    "data_3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-184270221c81652fd0426411b5cc8a9f\"}, \"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"matrix\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"technique\"}, \"y\": {\"type\": \"nominal\", \"field\": \"matrix\"}}, \"height\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-184270221c81652fd0426411b5cc8a9f\": [{\"matrix\": \"mitre-attack\", \"technique\": 536}, {\"matrix\": \"mitre-ics-attack\", \"technique\": 81}, {\"matrix\": \"mitre-mobile-attack\", \"technique\": 87}, {\"matrix\": \"mitre-pre-attack\", \"technique\": 174}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"f4ebc20d-c16f-4ec6-90d8-ee18f2e853b3\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x12010beb0>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#f4ebc20d-c16f-4ec6-90d8-ee18f2e853b3"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alt.Chart(data_3).mark_bar().encode(x='technique', y='matrix', color='matrix').properties(height = 200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11. Techniques With and Without Data Sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Techniques\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 200}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Techniques\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Techniques\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 200}], \"data\": {\"name\": \"data-bf80216faf3e46fa0916c0fe5230113d\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-bf80216faf3e46fa0916c0fe5230113d\": [{\"Techniques\": \"Without DS\", \"Count of Techniques\": 337}, {\"Techniques\": \"With DS\", \"Count of Techniques\": 541}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"82e36f62-3e49-41ca-a2a1-11888ec68245\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x11fda5550>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#82e36f62-3e49-41ca-a2a1-11888ec68245"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_source_distribution = pandas.DataFrame({\n",
    "    'Techniques': ['Without DS','With DS'],\n",
    "    'Count of Techniques': [techniques['data_sources'].isna().sum(),techniques['data_sources'].notna().sum()]})\n",
    "bars = alt.Chart(data_source_distribution).mark_bar().encode(x='Techniques',y='Count of Techniques',color='Techniques').properties(width=200,height=300)\n",
    "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n",
    "bars + text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the distribution of techniques based on ATT&CK Matrix?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>Ind_DS</th>\n",
       "      <th>technique</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>With DS</td>\n",
       "      <td>474</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Without DS</td>\n",
       "      <td>62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-ics-attack</td>\n",
       "      <td>With DS</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-ics-attack</td>\n",
       "      <td>Without DS</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-mobile-attack</td>\n",
       "      <td>Without DS</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>mitre-pre-attack</td>\n",
       "      <td>Without DS</td>\n",
       "      <td>174</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                matrix      Ind_DS  technique\n",
       "0         mitre-attack     With DS        474\n",
       "1         mitre-attack  Without DS         62\n",
       "2     mitre-ics-attack     With DS         67\n",
       "3     mitre-ics-attack  Without DS         14\n",
       "4  mitre-mobile-attack  Without DS         87\n",
       "5     mitre-pre-attack  Without DS        174"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = techniques\n",
    "data['Count_DS'] = data['data_sources'].str.len()\n",
    "data['Ind_DS'] = np.where(data['Count_DS']>0,'With DS','Without DS')\n",
    "data_2 = data.groupby(['matrix','Ind_DS'])['technique'].count()\n",
    "data_3 = data_2.to_frame().reset_index()\n",
    "data_3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-b034731fd80e42eb889ae43ae9d0d467\"}, \"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"matrix\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"technique\"}, \"y\": {\"type\": \"nominal\", \"field\": \"Ind_DS\"}}, \"height\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-b034731fd80e42eb889ae43ae9d0d467\": [{\"matrix\": \"mitre-attack\", \"Ind_DS\": \"With DS\", \"technique\": 474}, {\"matrix\": \"mitre-attack\", \"Ind_DS\": \"Without DS\", \"technique\": 62}, {\"matrix\": \"mitre-ics-attack\", \"Ind_DS\": \"With DS\", \"technique\": 67}, {\"matrix\": \"mitre-ics-attack\", \"Ind_DS\": \"Without DS\", \"technique\": 14}, {\"matrix\": \"mitre-mobile-attack\", \"Ind_DS\": \"Without DS\", \"technique\": 87}, {\"matrix\": \"mitre-pre-attack\", \"Ind_DS\": \"Without DS\", \"technique\": 174}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"c41580a3-6a3e-472f-86a6-5b5a975349cb\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x11fdb7700>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#c41580a3-6a3e-472f-86a6-5b5a975349cb"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alt.Chart(data_3).mark_bar().encode(x='technique', y='Ind_DS', color='matrix').properties(height = 200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What are those mitre-attack techniques without data sources?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "      <th>Count_DS</th>\n",
       "      <th>Ind_DS</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[resource-development]</td>\n",
       "      <td>Vulnerabilities</td>\n",
       "      <td>T1588.006</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[reconnaissance]</td>\n",
       "      <td>Spearphishing Service</td>\n",
       "      <td>T1598.001</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[reconnaissance]</td>\n",
       "      <td>Purchase Technical Data</td>\n",
       "      <td>T1597.002</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[reconnaissance]</td>\n",
       "      <td>Threat Intel Vendors</td>\n",
       "      <td>T1597.001</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[reconnaissance]</td>\n",
       "      <td>Search Closed Sources</td>\n",
       "      <td>T1597</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[resource-development]</td>\n",
       "      <td>Compromise Infrastructure</td>\n",
       "      <td>T1584</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>92</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[resource-development]</td>\n",
       "      <td>Acquire Infrastructure</td>\n",
       "      <td>T1583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>220</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Linux, macOS, Windows]</td>\n",
       "      <td>[collection]</td>\n",
       "      <td>Archive via Custom Method</td>\n",
       "      <td>T1560.003</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Linux]</td>\n",
       "      <td>[credential-access]</td>\n",
       "      <td>/etc/passwd and /etc/shadow</td>\n",
       "      <td>T1003.008</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>354</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Linux, macOS, Windows]</td>\n",
       "      <td>[persistence, privilege-escalation]</td>\n",
       "      <td>Boot or Logon Autostart Execution</td>\n",
       "      <td>T1547</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>62 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           matrix                 platform  \\\n",
       "17   mitre-attack                    [PRE]   \n",
       "23   mitre-attack                    [PRE]   \n",
       "25   mitre-attack                    [PRE]   \n",
       "26   mitre-attack                    [PRE]   \n",
       "27   mitre-attack                    [PRE]   \n",
       "..            ...                      ...   \n",
       "90   mitre-attack                    [PRE]   \n",
       "92   mitre-attack                    [PRE]   \n",
       "220  mitre-attack  [Linux, macOS, Windows]   \n",
       "260  mitre-attack                  [Linux]   \n",
       "354  mitre-attack  [Linux, macOS, Windows]   \n",
       "\n",
       "                                  tactic                          technique  \\\n",
       "17                [resource-development]                    Vulnerabilities   \n",
       "23                      [reconnaissance]              Spearphishing Service   \n",
       "25                      [reconnaissance]            Purchase Technical Data   \n",
       "26                      [reconnaissance]               Threat Intel Vendors   \n",
       "27                      [reconnaissance]              Search Closed Sources   \n",
       "..                                   ...                                ...   \n",
       "90                [resource-development]          Compromise Infrastructure   \n",
       "92                [resource-development]             Acquire Infrastructure   \n",
       "220                         [collection]          Archive via Custom Method   \n",
       "260                  [credential-access]        /etc/passwd and /etc/shadow   \n",
       "354  [persistence, privilege-escalation]  Boot or Logon Autostart Execution   \n",
       "\n",
       "    technique_id data_sources  Count_DS      Ind_DS  \n",
       "17     T1588.006          NaN       NaN  Without DS  \n",
       "23     T1598.001          NaN       NaN  Without DS  \n",
       "25     T1597.002          NaN       NaN  Without DS  \n",
       "26     T1597.001          NaN       NaN  Without DS  \n",
       "27         T1597          NaN       NaN  Without DS  \n",
       "..           ...          ...       ...         ...  \n",
       "90         T1584          NaN       NaN  Without DS  \n",
       "92         T1583          NaN       NaN  Without DS  \n",
       "220    T1560.003          NaN       NaN  Without DS  \n",
       "260    T1003.008          NaN       NaN  Without DS  \n",
       "354        T1547          NaN       NaN  Without DS  \n",
       "\n",
       "[62 rows x 8 columns]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[(data['matrix']=='mitre-attack') & (data['Ind_DS']=='Without DS')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Techniques without data sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_without_data_sources=techniques[techniques.data_sources.isnull()].reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "      <th>Count_DS</th>\n",
       "      <th>Ind_DS</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[resource-development]</td>\n",
       "      <td>Vulnerabilities</td>\n",
       "      <td>T1588.006</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[reconnaissance]</td>\n",
       "      <td>Spearphishing Service</td>\n",
       "      <td>T1598.001</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[reconnaissance]</td>\n",
       "      <td>Purchase Technical Data</td>\n",
       "      <td>T1597.002</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[reconnaissance]</td>\n",
       "      <td>Threat Intel Vendors</td>\n",
       "      <td>T1597.001</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[PRE]</td>\n",
       "      <td>[reconnaissance]</td>\n",
       "      <td>Search Closed Sources</td>\n",
       "      <td>T1597</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Without DS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         matrix platform                  tactic                technique  \\\n",
       "0  mitre-attack    [PRE]  [resource-development]          Vulnerabilities   \n",
       "1  mitre-attack    [PRE]        [reconnaissance]    Spearphishing Service   \n",
       "2  mitre-attack    [PRE]        [reconnaissance]  Purchase Technical Data   \n",
       "3  mitre-attack    [PRE]        [reconnaissance]     Threat Intel Vendors   \n",
       "4  mitre-attack    [PRE]        [reconnaissance]    Search Closed Sources   \n",
       "\n",
       "  technique_id data_sources  Count_DS      Ind_DS  \n",
       "0    T1588.006          NaN       NaN  Without DS  \n",
       "1    T1598.001          NaN       NaN  Without DS  \n",
       "2    T1597.002          NaN       NaN  Without DS  \n",
       "3    T1597.001          NaN       NaN  Without DS  \n",
       "4        T1597          NaN       NaN  Without DS  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_without_data_sources.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are  337  techniques without data sources ( 38%  of  878  techniques)\n"
     ]
    }
   ],
   "source": [
    "print('There are ',techniques['data_sources'].isna().sum(),' techniques without data sources (',\"{0:.0%}\".format(techniques['data_sources'].isna().sum()/len(techniques)),' of ',len(techniques),' techniques)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Techniques With Data Sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_with_data_sources=techniques[techniques.data_sources.notnull()].reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "      <th>Count_DS</th>\n",
       "      <th>Ind_DS</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[execution]</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>[Network device logs, Network device run-time ...</td>\n",
       "      <td>4.0</td>\n",
       "      <td>With DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[collection]</td>\n",
       "      <td>Network Device Configuration Dump</td>\n",
       "      <td>T1602.002</td>\n",
       "      <td>[Netflow/Enclave netflow, Network protocol ana...</td>\n",
       "      <td>3.0</td>\n",
       "      <td>With DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[defense-evasion, persistence]</td>\n",
       "      <td>TFTP Boot</td>\n",
       "      <td>T1542.005</td>\n",
       "      <td>[Network device run-time memory, Network devic...</td>\n",
       "      <td>5.0</td>\n",
       "      <td>With DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[defense-evasion, persistence]</td>\n",
       "      <td>ROMMONkit</td>\n",
       "      <td>T1542.004</td>\n",
       "      <td>[File monitoring, Netflow/Enclave netflow, Net...</td>\n",
       "      <td>4.0</td>\n",
       "      <td>With DS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[collection]</td>\n",
       "      <td>SNMP (MIB Dump)</td>\n",
       "      <td>T1602.001</td>\n",
       "      <td>[Netflow/Enclave netflow, Network protocol ana...</td>\n",
       "      <td>3.0</td>\n",
       "      <td>With DS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         matrix   platform                          tactic  \\\n",
       "0  mitre-attack  [Network]                     [execution]   \n",
       "1  mitre-attack  [Network]                    [collection]   \n",
       "2  mitre-attack  [Network]  [defense-evasion, persistence]   \n",
       "3  mitre-attack  [Network]  [defense-evasion, persistence]   \n",
       "4  mitre-attack  [Network]                    [collection]   \n",
       "\n",
       "                           technique technique_id  \\\n",
       "0                 Network Device CLI    T1059.008   \n",
       "1  Network Device Configuration Dump    T1602.002   \n",
       "2                          TFTP Boot    T1542.005   \n",
       "3                          ROMMONkit    T1542.004   \n",
       "4                    SNMP (MIB Dump)    T1602.001   \n",
       "\n",
       "                                        data_sources  Count_DS   Ind_DS  \n",
       "0  [Network device logs, Network device run-time ...       4.0  With DS  \n",
       "1  [Netflow/Enclave netflow, Network protocol ana...       3.0  With DS  \n",
       "2  [Network device run-time memory, Network devic...       5.0  With DS  \n",
       "3  [File monitoring, Netflow/Enclave netflow, Net...       4.0  With DS  \n",
       "4  [Netflow/Enclave netflow, Network protocol ana...       3.0  With DS  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_with_data_sources.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are  541  techniques with data sources ( 62%  of  878  techniques)\n"
     ]
    }
   ],
   "source": [
    "print('There are ',techniques['data_sources'].notna().sum(),' techniques with data sources (',\"{0:.0%}\".format(techniques['data_sources'].notna().sum()/len(techniques)),' of ',len(techniques),' techniques)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 12. Grouping Techniques With Data Sources By Matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a graph to represent the number of techniques per matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"x\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"y\": {\"type\": \"nominal\", \"field\": \"Matrix\"}}, \"height\": 100, \"width\": 300}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 10, \"dy\": 0}, \"encoding\": {\"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"y\": {\"type\": \"nominal\", \"field\": \"Matrix\"}}, \"height\": 100, \"width\": 300}], \"data\": {\"name\": \"data-fb2770765a9a1c165be37278cc07fa93\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-fb2770765a9a1c165be37278cc07fa93\": [{\"Matrix\": \"mitre-attack\", \"Count of Techniques\": 474}, {\"Matrix\": \"mitre-ics-attack\", \"Count of Techniques\": 67}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"550c9a4e-6e47-4b38-b24f-ccdb98b73f04\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x11fc0fc10>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#550c9a4e-6e47-4b38-b24f-ccdb98b73f04"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "matrix_distribution = pandas.DataFrame({\n",
    "    'Matrix': list(techniques_with_data_sources.groupby(['matrix'])['matrix'].count().keys()),\n",
    "    'Count of Techniques': techniques_with_data_sources.groupby(['matrix'])['matrix'].count().tolist()})\n",
    "bars = alt.Chart(matrix_distribution).mark_bar().encode(y='Matrix',x='Count of Techniques').properties(width=300,height=100)\n",
    "text = bars.mark_text(align='center',baseline='middle',dx=10,dy=0).encode(text='Count of Techniques')\n",
    "bars + text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the techniques belong to **mitre-attack** matrix which is the main **Enterprise** matrix. Reference: https://attack.mitre.org/wiki/Main_Page "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 13. Grouping Techniques With Data Sources by Platform"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we need to split the **platform** column values because a technique might be mapped to more than one platform"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_platform=techniques_with_data_sources\n",
    "\n",
    "attributes_1 = ['platform'] # In attributes we are going to indicate the name of the columns that we need to split\n",
    "\n",
    "for a in attributes_1:\n",
    "    s = techniques_platform.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n",
    "    # \"s\" is going to be a column of a frame with every value of the list inside each cell of the column \"a\"\n",
    "    s.name = a\n",
    "    # We name \"s\" with the same name of \"a\".\n",
    "    techniques_platform=techniques_platform.drop(a, axis=1).join(s).reset_index(drop=True)\n",
    "    # We drop the column \"a\" from \"techniques_platform\", and then join \"techniques_platform\" with \"s\"\n",
    "\n",
    "# Let's re-arrange the columns from general to specific\n",
    "techniques_platform_2=techniques_platform.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now show techniques with data sources mapped to one platform at the time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>[execution]</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>[Network device logs, Network device run-time ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>[collection]</td>\n",
       "      <td>Network Device Configuration Dump</td>\n",
       "      <td>T1602.002</td>\n",
       "      <td>[Netflow/Enclave netflow, Network protocol ana...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>[defense-evasion, persistence]</td>\n",
       "      <td>TFTP Boot</td>\n",
       "      <td>T1542.005</td>\n",
       "      <td>[Network device run-time memory, Network devic...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>[defense-evasion, persistence]</td>\n",
       "      <td>ROMMONkit</td>\n",
       "      <td>T1542.004</td>\n",
       "      <td>[File monitoring, Netflow/Enclave netflow, Net...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>[collection]</td>\n",
       "      <td>SNMP (MIB Dump)</td>\n",
       "      <td>T1602.001</td>\n",
       "      <td>[Netflow/Enclave netflow, Network protocol ana...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         matrix platform                          tactic  \\\n",
       "0  mitre-attack  Network                     [execution]   \n",
       "1  mitre-attack  Network                    [collection]   \n",
       "2  mitre-attack  Network  [defense-evasion, persistence]   \n",
       "3  mitre-attack  Network  [defense-evasion, persistence]   \n",
       "4  mitre-attack  Network                    [collection]   \n",
       "\n",
       "                           technique technique_id  \\\n",
       "0                 Network Device CLI    T1059.008   \n",
       "1  Network Device Configuration Dump    T1602.002   \n",
       "2                          TFTP Boot    T1542.005   \n",
       "3                          ROMMONkit    T1542.004   \n",
       "4                    SNMP (MIB Dump)    T1602.001   \n",
       "\n",
       "                                        data_sources  \n",
       "0  [Network device logs, Network device run-time ...  \n",
       "1  [Netflow/Enclave netflow, Network protocol ana...  \n",
       "2  [Network device run-time memory, Network devic...  \n",
       "3  [File monitoring, Netflow/Enclave netflow, Net...  \n",
       "4  [Netflow/Enclave netflow, Network protocol ana...  "
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_platform_2.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a visualization to show the number of techniques grouped by platform:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Platform\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Platform\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 200}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Platform\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Platform\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 200}], \"data\": {\"name\": \"data-94eeddf8fc5f36e972721aadcb2c794d\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-94eeddf8fc5f36e972721aadcb2c794d\": [{\"Platform\": \"AWS\", \"Count of Techniques\": 53}, {\"Platform\": \"Azure\", \"Count of Techniques\": 53}, {\"Platform\": \"Azure AD\", \"Count of Techniques\": 30}, {\"Platform\": \"Control Server\", \"Count of Techniques\": 23}, {\"Platform\": \"Data Historian\", \"Count of Techniques\": 12}, {\"Platform\": \"Engineering Workstation\", \"Count of Techniques\": 13}, {\"Platform\": \"Field Controller/RTU/PLC/IED\", \"Count of Techniques\": 38}, {\"Platform\": \"GCP\", \"Count of Techniques\": 53}, {\"Platform\": \"Human-Machine Interface\", \"Count of Techniques\": 25}, {\"Platform\": \"Input/Output Server\", \"Count of Techniques\": 6}, {\"Platform\": \"Linux\", \"Count of Techniques\": 252}, {\"Platform\": \"Network\", \"Count of Techniques\": 28}, {\"Platform\": \"Office 365\", \"Count of Techniques\": 51}, {\"Platform\": \"PRE\", \"Count of Techniques\": 14}, {\"Platform\": \"SaaS\", \"Count of Techniques\": 35}, {\"Platform\": \"Safety Instrumented System/Protection Relay\", \"Count of Techniques\": 18}, {\"Platform\": \"Windows\", \"Count of Techniques\": 435}, {\"Platform\": \"macOS\", \"Count of Techniques\": 265}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"91350139-e783-4480-84ab-a442dc283743\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x11fe638b0>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#91350139-e783-4480-84ab-a442dc283743"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "platform_distribution = pandas.DataFrame({\n",
    "    'Platform': list(techniques_platform_2.groupby(['platform'])['platform'].count().keys()),\n",
    "    'Count of Techniques': techniques_platform_2.groupby(['platform'])['platform'].count().tolist()})\n",
    "bars = alt.Chart(platform_distribution,height=300).mark_bar().encode(x ='Platform',y='Count of Techniques',color='Platform').properties(width=200)\n",
    "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n",
    "bars + text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the bar chart above we can see that there are more techniques with data sources mapped to the Windows platform."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 14. Grouping Techniques With Data Sources by Tactic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, first we need to split the tactic column values because a technique might be mapped to more than one tactic:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_tactic=techniques_with_data_sources\n",
    "\n",
    "attributes_2 = ['tactic'] # In attributes we are going to indicate the name of the columns that we need to split\n",
    "\n",
    "for a in attributes_2:\n",
    "    s = techniques_tactic.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n",
    "    # \"s\" is going to be a column of a frame with every value of the list inside each cell of the column \"a\"\n",
    "    s.name = a\n",
    "    # We name \"s\" with the same name of \"a\".\n",
    "    techniques_tactic = techniques_tactic.drop(a, axis=1).join(s).reset_index(drop=True)\n",
    "    # We drop the column \"a\" from \"techniques_tactic\", and then join \"techniques_tactic\" with \"s\"\n",
    "\n",
    "# Let's re-arrange the columns from general to specific\n",
    "techniques_tactic_2=techniques_tactic.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now show techniques with data sources mapped to one tactic at the time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>[Network device logs, Network device run-time ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>collection</td>\n",
       "      <td>Network Device Configuration Dump</td>\n",
       "      <td>T1602.002</td>\n",
       "      <td>[Netflow/Enclave netflow, Network protocol ana...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>defense-evasion</td>\n",
       "      <td>TFTP Boot</td>\n",
       "      <td>T1542.005</td>\n",
       "      <td>[Network device run-time memory, Network devic...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>persistence</td>\n",
       "      <td>TFTP Boot</td>\n",
       "      <td>T1542.005</td>\n",
       "      <td>[Network device run-time memory, Network devic...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>defense-evasion</td>\n",
       "      <td>ROMMONkit</td>\n",
       "      <td>T1542.004</td>\n",
       "      <td>[File monitoring, Netflow/Enclave netflow, Net...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         matrix   platform           tactic  \\\n",
       "0  mitre-attack  [Network]        execution   \n",
       "1  mitre-attack  [Network]       collection   \n",
       "2  mitre-attack  [Network]  defense-evasion   \n",
       "3  mitre-attack  [Network]      persistence   \n",
       "4  mitre-attack  [Network]  defense-evasion   \n",
       "\n",
       "                           technique technique_id  \\\n",
       "0                 Network Device CLI    T1059.008   \n",
       "1  Network Device Configuration Dump    T1602.002   \n",
       "2                          TFTP Boot    T1542.005   \n",
       "3                          TFTP Boot    T1542.005   \n",
       "4                          ROMMONkit    T1542.004   \n",
       "\n",
       "                                        data_sources  \n",
       "0  [Network device logs, Network device run-time ...  \n",
       "1  [Netflow/Enclave netflow, Network protocol ana...  \n",
       "2  [Network device run-time memory, Network devic...  \n",
       "3  [Network device run-time memory, Network devic...  \n",
       "4  [File monitoring, Netflow/Enclave netflow, Net...  "
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_tactic_2.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a visualization to show the number of techniques grouped by tactic:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Tactic\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Tactic\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 400}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Tactic\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Tactic\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 400}], \"data\": {\"name\": \"data-a36a295299fa7b623bea39cbd6dc16e5\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-a36a295299fa7b623bea39cbd6dc16e5\": [{\"Tactic\": \"command-and-control-ics\", \"Count of Techniques\": 2}, {\"Tactic\": \"lateral-movement-ics\", \"Count of Techniques\": 5}, {\"Tactic\": \"persistence-ics\", \"Count of Techniques\": 6}, {\"Tactic\": \"resource-development\", \"Count of Techniques\": 7}, {\"Tactic\": \"discovery-ics\", \"Count of Techniques\": 7}, {\"Tactic\": \"evasion-ics\", \"Count of Techniques\": 7}, {\"Tactic\": \"reconnaissance\", \"Count of Techniques\": 7}, {\"Tactic\": \"execution-ics\", \"Count of Techniques\": 8}, {\"Tactic\": \"initial-access-ics\", \"Count of Techniques\": 9}, {\"Tactic\": \"collection-ics\", \"Count of Techniques\": 11}, {\"Tactic\": \"impair-process-control\", \"Count of Techniques\": 11}, {\"Tactic\": \"inhibit-response-function\", \"Count of Techniques\": 15}, {\"Tactic\": \"exfiltration\", \"Count of Techniques\": 17}, {\"Tactic\": \"initial-access\", \"Count of Techniques\": 19}, {\"Tactic\": \"lateral-movement\", \"Count of Techniques\": 23}, {\"Tactic\": \"impact\", \"Count of Techniques\": 26}, {\"Tactic\": \"execution\", \"Count of Techniques\": 34}, {\"Tactic\": \"collection\", \"Count of Techniques\": 34}, {\"Tactic\": \"discovery\", \"Count of Techniques\": 36}, {\"Tactic\": \"command-and-control\", \"Count of Techniques\": 40}, {\"Tactic\": \"credential-access\", \"Count of Techniques\": 48}, {\"Tactic\": \"privilege-escalation\", \"Count of Techniques\": 89}, {\"Tactic\": \"persistence\", \"Count of Techniques\": 99}, {\"Tactic\": \"defense-evasion\", \"Count of Techniques\": 152}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"6bb193cd-6df5-404c-992b-3c19bd9bf3bc\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x11fd1a370>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#6bb193cd-6df5-404c-992b-3c19bd9bf3bc"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tactic_distribution = pandas.DataFrame({\n",
    "    'Tactic': list(techniques_tactic_2.groupby(['tactic'])['tactic'].count().keys()),\n",
    "    'Count of Techniques': techniques_tactic_2.groupby(['tactic'])['tactic'].count().tolist()}).sort_values(by='Count of Techniques',ascending=True)\n",
    "bars = alt.Chart(tactic_distribution,width=800,height=300).mark_bar().encode(x ='Tactic',y='Count of Techniques',color='Tactic').properties(width=400)\n",
    "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n",
    "bars + text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Defende-evasion and Persistence are tactics with the highest nummber of techniques with data sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 15. Grouping Techniques With Data Sources by Data Source"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to split the data source column values because a technique might be mapped to more than one data source:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_data_source=techniques_with_data_sources\n",
    "\n",
    "attributes_3 = ['data_sources'] # In attributes we are going to indicate the name of the columns that we need to split\n",
    "\n",
    "for a in attributes_3:\n",
    "    s = techniques_data_source.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n",
    "    # \"s\" is going to be a column of a frame with every value of the list inside each cell of the column \"a\"\n",
    "    s.name = a\n",
    "    # We name \"s\" with the same name of \"a\".\n",
    "    techniques_data_source = techniques_data_source.drop(a, axis=1).join(s).reset_index(drop=True)\n",
    "    # We drop the column \"a\" from \"techniques_data_source\", and then join \"techniques_data_source\" with \"s\"\n",
    "\n",
    "# Let's re-arrange the columns from general to specific\n",
    "techniques_data_source_2 = techniques_data_source.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)\n",
    "\n",
    "# We are going to edit some names inside the dataframe to improve the consistency:\n",
    "techniques_data_source_3 = techniques_data_source_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now show techniques with data sources mapped to one data source at the time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[execution]</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[execution]</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device run-time memory</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[execution]</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device command history</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[execution]</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device configuration</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>[Network]</td>\n",
       "      <td>[collection]</td>\n",
       "      <td>Network Device Configuration Dump</td>\n",
       "      <td>T1602.002</td>\n",
       "      <td>Netflow/Enclave netflow</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         matrix   platform        tactic                          technique  \\\n",
       "0  mitre-attack  [Network]   [execution]                 Network Device CLI   \n",
       "1  mitre-attack  [Network]   [execution]                 Network Device CLI   \n",
       "2  mitre-attack  [Network]   [execution]                 Network Device CLI   \n",
       "3  mitre-attack  [Network]   [execution]                 Network Device CLI   \n",
       "4  mitre-attack  [Network]  [collection]  Network Device Configuration Dump   \n",
       "\n",
       "  technique_id                    data_sources  \n",
       "0    T1059.008             Network device logs  \n",
       "1    T1059.008  Network device run-time memory  \n",
       "2    T1059.008  Network device command history  \n",
       "3    T1059.008    Network device configuration  \n",
       "4    T1602.002         Netflow/Enclave netflow  "
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_data_source_3.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a visualization to show the number of techniques grouped by data sources:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Data Source\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Data Source\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 1200}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Data Source\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Data Source\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 1200}], \"data\": {\"name\": \"data-0c7cf91db0a6e6401724291cca8f060b\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-0c7cf91db0a6e6401724291cca8f060b\": [{\"Data Source\": \"API monitoring\", \"Count of Techniques\": 82}, {\"Data Source\": \"AWS CloudTrail logs\", \"Count of Techniques\": 32}, {\"Data Source\": \"Access tokens\", \"Count of Techniques\": 4}, {\"Data Source\": \"Alarm History\", \"Count of Techniques\": 3}, {\"Data Source\": \"Alarm history\", \"Count of Techniques\": 9}, {\"Data Source\": \"Alarm thresholds\", \"Count of Techniques\": 1}, {\"Data Source\": \"Anti-virus\", \"Count of Techniques\": 11}, {\"Data Source\": \"Application Logs\", \"Count of Techniques\": 16}, {\"Data Source\": \"Asset management\", \"Count of Techniques\": 3}, {\"Data Source\": \"Authentication logs\", \"Count of Techniques\": 66}, {\"Data Source\": \"Azure activity logs\", \"Count of Techniques\": 32}, {\"Data Source\": \"BIOS\", \"Count of Techniques\": 5}, {\"Data Source\": \"Binary file metadata\", \"Count of Techniques\": 29}, {\"Data Source\": \"Browser extensions\", \"Count of Techniques\": 1}, {\"Data Source\": \"Component firmware\", \"Count of Techniques\": 4}, {\"Data Source\": \"Controller parameters\", \"Count of Techniques\": 1}, {\"Data Source\": \"Controller program\", \"Count of Techniques\": 7}, {\"Data Source\": \"DLL monitoring\", \"Count of Techniques\": 36}, {\"Data Source\": \"DNS records\", \"Count of Techniques\": 8}, {\"Data Source\": \"Data historian\", \"Count of Techniques\": 4}, {\"Data Source\": \"Data loss prevention\", \"Count of Techniques\": 10}, {\"Data Source\": \"Detonation chamber\", \"Count of Techniques\": 6}, {\"Data Source\": \"Digital certificate logs\", \"Count of Techniques\": 1}, {\"Data Source\": \"Digital signatures\", \"Count of Techniques\": 3}, {\"Data Source\": \"Disk forensics\", \"Count of Techniques\": 3}, {\"Data Source\": \"Domain registration\", \"Count of Techniques\": 1}, {\"Data Source\": \"EFI\", \"Count of Techniques\": 3}, {\"Data Source\": \"Email gateway\", \"Count of Techniques\": 12}, {\"Data Source\": \"Environment variable\", \"Count of Techniques\": 5}, {\"Data Source\": \"File Monitoring\", \"Count of Techniques\": 1}, {\"Data Source\": \"File monitoring\", \"Count of Techniques\": 196}, {\"Data Source\": \"GCP audit logs\", \"Count of Techniques\": 12}, {\"Data Source\": \"Host network interface\", \"Count of Techniques\": 7}, {\"Data Source\": \"Host network interfaces\", \"Count of Techniques\": 2}, {\"Data Source\": \"Kernel drivers\", \"Count of Techniques\": 6}, {\"Data Source\": \"Loaded DLLs\", \"Count of Techniques\": 23}, {\"Data Source\": \"MBR\", \"Count of Techniques\": 3}, {\"Data Source\": \"Mail server\", \"Count of Techniques\": 16}, {\"Data Source\": \"Malware reverse engineering\", \"Count of Techniques\": 11}, {\"Data Source\": \"Named Pipes\", \"Count of Techniques\": 1}, {\"Data Source\": \"Netflow/Enclave netflow\", \"Count of Techniques\": 74}, {\"Data Source\": \"Network device command history\", \"Count of Techniques\": 2}, {\"Data Source\": \"Network device configuration\", \"Count of Techniques\": 5}, {\"Data Source\": \"Network device logs\", \"Count of Techniques\": 24}, {\"Data Source\": \"Network device run-time memory\", \"Count of Techniques\": 4}, {\"Data Source\": \"Network intrusion detection system\", \"Count of Techniques\": 18}, {\"Data Source\": \"Network protocol analysis\", \"Count of Techniques\": 89}, {\"Data Source\": \"OAuth audit logs\", \"Count of Techniques\": 4}, {\"Data Source\": \"Office 365 account logs\", \"Count of Techniques\": 12}, {\"Data Source\": \"Office 365 audit logs\", \"Count of Techniques\": 8}, {\"Data Source\": \"Office 365 trace logs\", \"Count of Techniques\": 4}, {\"Data Source\": \"Packet capture\", \"Count of Techniques\": 118}, {\"Data Source\": \"PowerShell logs\", \"Count of Techniques\": 23}, {\"Data Source\": \"Process Monitoring\", \"Count of Techniques\": 320}, {\"Data Source\": \"Process command-line parameters\", \"Count of Techniques\": 199}, {\"Data Source\": \"Process use of network\", \"Count of Techniques\": 68}, {\"Data Source\": \"SSL/TLS certificates\", \"Count of Techniques\": 2}, {\"Data Source\": \"SSL/TLS inspection\", \"Count of Techniques\": 24}, {\"Data Source\": \"SSl/TLS inspection\", \"Count of Techniques\": 1}, {\"Data Source\": \"Sensor health and status\", \"Count of Techniques\": 4}, {\"Data Source\": \"Sequential Event Recorder\", \"Count of Techniques\": 1}, {\"Data Source\": \"Sequential event recorder\", \"Count of Techniques\": 14}, {\"Data Source\": \"Services\", \"Count of Techniques\": 5}, {\"Data Source\": \"Social media monitoring\", \"Count of Techniques\": 5}, {\"Data Source\": \"Stackdriver logs\", \"Count of Techniques\": 27}, {\"Data Source\": \"System calls\", \"Count of Techniques\": 10}, {\"Data Source\": \"Third-party application logs\", \"Count of Techniques\": 5}, {\"Data Source\": \"User interface\", \"Count of Techniques\": 4}, {\"Data Source\": \"VBR\", \"Count of Techniques\": 2}, {\"Data Source\": \"WMI Objects\", \"Count of Techniques\": 2}, {\"Data Source\": \"Web application firewall logs\", \"Count of Techniques\": 9}, {\"Data Source\": \"Web logs\", \"Count of Techniques\": 12}, {\"Data Source\": \"Web proxy\", \"Count of Techniques\": 11}, {\"Data Source\": \"Windows Error Reporting\", \"Count of Techniques\": 4}, {\"Data Source\": \"Windows Registry\", \"Count of Techniques\": 57}, {\"Data Source\": \"Windows error reporting\", \"Count of Techniques\": 1}, {\"Data Source\": \"Windows event logs\", \"Count of Techniques\": 51}, {\"Data Source\": \"Windows registry\", \"Count of Techniques\": 2}, {\"Data Source\": \"process use of network\", \"Count of Techniques\": 1}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"668c5615-0c95-4616-850e-c55bc0da70c2\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x11fe68250>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#668c5615-0c95-4616-850e-c55bc0da70c2"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_source_distribution = pandas.DataFrame({\n",
    "    'Data Source': list(techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().keys()),\n",
    "    'Count of Techniques': techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().tolist()})\n",
    "bars = alt.Chart(data_source_distribution,width=800,height=300).mark_bar().encode(x ='Data Source',y='Count of Techniques',color='Data Source').properties(width=1200)\n",
    "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n",
    "bars + text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A few interesting things from the bar chart above:\n",
    "* Process Monitoring, File Monitoring, and Process Command-line parameters are the Data Sources with the highest number of techniques\n",
    "* There are some data source names that include string references to Windows such as PowerShell, Windows and wmi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 16. Most Relevant Groups Of Data Sources Per Technique"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Number Of Data Sources Per Technique"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although identifying the data sources with the highest number of techniques is a good start, they usually do not work alone. You might be collecting **Process Monitoring** already but you might be still missing a lot of context from a data perspective."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"x\": {\"type\": \"quantitative\", \"field\": \"Number of Data Sources\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"width\": 500}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"Number of Data Sources\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"width\": 500}], \"data\": {\"name\": \"data-b6f6d78cd7978454e387468282a2f262\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-b6f6d78cd7978454e387468282a2f262\": [{\"Number of Data Sources\": 1, \"Count of Techniques\": 37}, {\"Number of Data Sources\": 2, \"Count of Techniques\": 107}, {\"Number of Data Sources\": 3, \"Count of Techniques\": 125}, {\"Number of Data Sources\": 4, \"Count of Techniques\": 118}, {\"Number of Data Sources\": 5, \"Count of Techniques\": 49}, {\"Number of Data Sources\": 6, \"Count of Techniques\": 33}, {\"Number of Data Sources\": 7, \"Count of Techniques\": 14}, {\"Number of Data Sources\": 8, \"Count of Techniques\": 10}, {\"Number of Data Sources\": 9, \"Count of Techniques\": 5}, {\"Number of Data Sources\": 10, \"Count of Techniques\": 3}, {\"Number of Data Sources\": 11, \"Count of Techniques\": 3}, {\"Number of Data Sources\": 12, \"Count of Techniques\": 4}, {\"Number of Data Sources\": 13, \"Count of Techniques\": 1}, {\"Number of Data Sources\": 14, \"Count of Techniques\": 1}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"d80ddddd-15b9-47bb-894b-26b319632f83\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x11fd74af0>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#d80ddddd-15b9-47bb-894b-26b319632f83"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_source_distribution_2 = pandas.DataFrame({\n",
    "    'Techniques': list(techniques_data_source_3.groupby(['technique'])['technique'].count().keys()),\n",
    "    'Count of Data Sources': techniques_data_source_3.groupby(['technique'])['technique'].count().tolist()})\n",
    "\n",
    "data_source_distribution_3 = pandas.DataFrame({\n",
    "    'Number of Data Sources': list(data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().keys()),\n",
    "    'Count of Techniques': data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().tolist()})\n",
    "\n",
    "bars = alt.Chart(data_source_distribution_3).mark_bar().encode(x ='Number of Data Sources',y='Count of Techniques').properties(width=500)\n",
    "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n",
    "bars + text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The image above shows you the number data sources needed per techniques according to ATT&CK:\n",
    "* There are 71 techniques that require 3 data sources as enough context to validate the detection of them according to ATT&CK\n",
    "* Only one technique has 12 data sources\n",
    "* One data source only applies to 19 techniques"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create subsets of data sources with the data source column defining and using a python function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://stackoverflow.com/questions/26332412/python-recursive-function-to-display-all-subsets-of-given-set\n",
    "def subs(l):\n",
    "    res = []\n",
    "    for i in range(1, len(l) + 1):\n",
    "        for combo in itertools.combinations(l, i):\n",
    "            res.append(list(combo))\n",
    "    return res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before applying the function, we need to use lowercase data sources names and sort data sources names to improve consistency:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = techniques_with_data_sources[['data_sources']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "for index, row in df.iterrows():\n",
    "    row[\"data_sources\"]=[x.lower() for x in row[\"data_sources\"]]\n",
    "    row[\"data_sources\"].sort()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data_sources</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[netflow/enclave netflow, network protocol ana...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[file monitoring, network device command histo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[file monitoring, netflow/enclave netflow, net...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[netflow/enclave netflow, network protocol ana...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        data_sources\n",
       "0  [network device command history, network devic...\n",
       "1  [netflow/enclave netflow, network protocol ana...\n",
       "2  [file monitoring, network device command histo...\n",
       "3  [file monitoring, netflow/enclave netflow, net...\n",
       "4  [netflow/enclave netflow, network protocol ana..."
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's apply the function and split the subsets column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-44-9765a9dc0b2f>:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  df['subsets']=df['data_sources'].apply(subs)\n"
     ]
    }
   ],
   "source": [
    "df['subsets']=df['data_sources'].apply(subs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data_sources</th>\n",
       "      <th>subsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[[network device command history], [network de...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[netflow/enclave netflow, network protocol ana...</td>\n",
       "      <td>[[netflow/enclave netflow], [network protocol ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[file monitoring, network device command histo...</td>\n",
       "      <td>[[file monitoring], [network device command hi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[file monitoring, netflow/enclave netflow, net...</td>\n",
       "      <td>[[file monitoring], [netflow/enclave netflow],...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[netflow/enclave netflow, network protocol ana...</td>\n",
       "      <td>[[netflow/enclave netflow], [network protocol ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        data_sources  \\\n",
       "0  [network device command history, network devic...   \n",
       "1  [netflow/enclave netflow, network protocol ana...   \n",
       "2  [file monitoring, network device command histo...   \n",
       "3  [file monitoring, netflow/enclave netflow, net...   \n",
       "4  [netflow/enclave netflow, network protocol ana...   \n",
       "\n",
       "                                             subsets  \n",
       "0  [[network device command history], [network de...  \n",
       "1  [[netflow/enclave netflow], [network protocol ...  \n",
       "2  [[file monitoring], [network device command hi...  \n",
       "3  [[file monitoring], [netflow/enclave netflow],...  \n",
       "4  [[netflow/enclave netflow], [network protocol ...  "
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to split the subsets column values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_with_data_sources_preview = df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "attributes_4 = ['subsets']\n",
    "\n",
    "for a in attributes_4:\n",
    "    s = techniques_with_data_sources_preview.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n",
    "    s.name = a\n",
    "    techniques_with_data_sources_preview = techniques_with_data_sources_preview.drop(a, axis=1).join(s).reset_index(drop=True)\n",
    "    \n",
    "techniques_with_data_sources_subsets = techniques_with_data_sources_preview.reindex(['data_sources','subsets'], axis=1)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data_sources</th>\n",
       "      <th>subsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device command history]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device configuration]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device logs]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device run-time memory]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        data_sources  \\\n",
       "0  [network device command history, network devic...   \n",
       "1  [network device command history, network devic...   \n",
       "2  [network device command history, network devic...   \n",
       "3  [network device command history, network devic...   \n",
       "4  [network device command history, network devic...   \n",
       "\n",
       "                                             subsets  \n",
       "0                   [network device command history]  \n",
       "1                     [network device configuration]  \n",
       "2                              [network device logs]  \n",
       "3                   [network device run-time memory]  \n",
       "4  [network device command history, network devic...  "
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_with_data_sources_subsets.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's add three columns to analyse the dataframe: subsets_name (Changing Lists to Strings), subsets_number_elements ( Number of data sources per subset) and number_data_sources_per_technique"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "techniques_with_data_sources_subsets['subsets_name']=techniques_with_data_sources_subsets['subsets'].apply(lambda x: ','.join(map(str, x)))\n",
    "techniques_with_data_sources_subsets['subsets_number_elements']=techniques_with_data_sources_subsets['subsets'].str.len()\n",
    "techniques_with_data_sources_subsets['number_data_sources_per_technique']=techniques_with_data_sources_subsets['data_sources'].str.len()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data_sources</th>\n",
       "      <th>subsets</th>\n",
       "      <th>subsets_name</th>\n",
       "      <th>subsets_number_elements</th>\n",
       "      <th>number_data_sources_per_technique</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device command history]</td>\n",
       "      <td>network device command history</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device configuration]</td>\n",
       "      <td>network device configuration</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device logs]</td>\n",
       "      <td>network device logs</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device run-time memory]</td>\n",
       "      <td>network device run-time memory</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>network device command history,network device ...</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        data_sources  \\\n",
       "0  [network device command history, network devic...   \n",
       "1  [network device command history, network devic...   \n",
       "2  [network device command history, network devic...   \n",
       "3  [network device command history, network devic...   \n",
       "4  [network device command history, network devic...   \n",
       "\n",
       "                                             subsets  \\\n",
       "0                   [network device command history]   \n",
       "1                     [network device configuration]   \n",
       "2                              [network device logs]   \n",
       "3                   [network device run-time memory]   \n",
       "4  [network device command history, network devic...   \n",
       "\n",
       "                                        subsets_name  subsets_number_elements  \\\n",
       "0                     network device command history                        1   \n",
       "1                       network device configuration                        1   \n",
       "2                                network device logs                        1   \n",
       "3                     network device run-time memory                        1   \n",
       "4  network device command history,network device ...                        2   \n",
       "\n",
       "   number_data_sources_per_technique  \n",
       "0                                  4  \n",
       "1                                  4  \n",
       "2                                  4  \n",
       "3                                  4  \n",
       "4                                  4  "
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_with_data_sources_subsets.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As it was described above, we need to find grups pf data sources, so we are going to filter out all the subsets with only one data source:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "subsets = techniques_with_data_sources_subsets\n",
    "\n",
    "subsets_ok=subsets[subsets.subsets_number_elements != 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data_sources</th>\n",
       "      <th>subsets</th>\n",
       "      <th>subsets_name</th>\n",
       "      <th>subsets_number_elements</th>\n",
       "      <th>number_data_sources_per_technique</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>network device command history,network device ...</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>network device command history,network device ...</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>network device command history,network device ...</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device configuration, network device ...</td>\n",
       "      <td>network device configuration,network device logs</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>[network device command history, network devic...</td>\n",
       "      <td>[network device configuration, network device ...</td>\n",
       "      <td>network device configuration,network device ru...</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        data_sources  \\\n",
       "4  [network device command history, network devic...   \n",
       "5  [network device command history, network devic...   \n",
       "6  [network device command history, network devic...   \n",
       "7  [network device command history, network devic...   \n",
       "8  [network device command history, network devic...   \n",
       "\n",
       "                                             subsets  \\\n",
       "4  [network device command history, network devic...   \n",
       "5  [network device command history, network devic...   \n",
       "6  [network device command history, network devic...   \n",
       "7  [network device configuration, network device ...   \n",
       "8  [network device configuration, network device ...   \n",
       "\n",
       "                                        subsets_name  subsets_number_elements  \\\n",
       "4  network device command history,network device ...                        2   \n",
       "5  network device command history,network device ...                        2   \n",
       "6  network device command history,network device ...                        2   \n",
       "7   network device configuration,network device logs                        2   \n",
       "8  network device configuration,network device ru...                        2   \n",
       "\n",
       "   number_data_sources_per_technique  \n",
       "4                                  4  \n",
       "5                                  4  \n",
       "6                                  4  \n",
       "7                                  4  \n",
       "8                                  4  "
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "subsets_ok.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we calculate the most relevant groups of data sources (Top 15):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "subsets_graph = subsets_ok.groupby(['subsets_name'])['subsets_name'].count().to_frame(name='subsets_count').sort_values(by='subsets_count',ascending=False)[0:15]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>subsets_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>subsets_name</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>process command-line parameters,process monitoring</th>\n",
       "      <td>183</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>file monitoring,process monitoring</th>\n",
       "      <td>144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>file monitoring,process command-line parameters</th>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>file monitoring,process command-line parameters,process monitoring</th>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>network protocol analysis,packet capture</th>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>api monitoring,process monitoring</th>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>process monitoring,process use of network</th>\n",
       "      <td>56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>netflow/enclave netflow,packet capture</th>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>process monitoring,windows registry</th>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>packet capture,process use of network</th>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>packet capture,process monitoring</th>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>process command-line parameters,windows registry</th>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>netflow/enclave netflow,network protocol analysis</th>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>network protocol analysis,process use of network</th>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>netflow/enclave netflow,process monitoring</th>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    subsets_count\n",
       "subsets_name                                                     \n",
       "process command-line parameters,process monitoring            183\n",
       "file monitoring,process monitoring                            144\n",
       "file monitoring,process command-line parameters               100\n",
       "file monitoring,process command-line parameters...             88\n",
       "network protocol analysis,packet capture                       76\n",
       "api monitoring,process monitoring                              70\n",
       "process monitoring,process use of network                      56\n",
       "netflow/enclave netflow,packet capture                         55\n",
       "process monitoring,windows registry                            50\n",
       "packet capture,process use of network                          45\n",
       "packet capture,process monitoring                              43\n",
       "process command-line parameters,windows registry               41\n",
       "netflow/enclave netflow,network protocol analysis              41\n",
       "network protocol analysis,process use of network               40\n",
       "netflow/enclave netflow,process monitoring                     38"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "subsets_graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Data Sources\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Data Sources\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"width\": 500}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Data Sources\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Data Sources\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"width\": 500}], \"data\": {\"name\": \"data-ef18c839539c3164e0c40c20eb1da48e\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-ef18c839539c3164e0c40c20eb1da48e\": [{\"Data Sources\": \"process command-line parameters,process monitoring\", \"Count of Techniques\": 183}, {\"Data Sources\": \"file monitoring,process monitoring\", \"Count of Techniques\": 144}, {\"Data Sources\": \"file monitoring,process command-line parameters\", \"Count of Techniques\": 100}, {\"Data Sources\": \"file monitoring,process command-line parameters,process monitoring\", \"Count of Techniques\": 88}, {\"Data Sources\": \"network protocol analysis,packet capture\", \"Count of Techniques\": 76}, {\"Data Sources\": \"api monitoring,process monitoring\", \"Count of Techniques\": 70}, {\"Data Sources\": \"process monitoring,process use of network\", \"Count of Techniques\": 56}, {\"Data Sources\": \"netflow/enclave netflow,packet capture\", \"Count of Techniques\": 55}, {\"Data Sources\": \"process monitoring,windows registry\", \"Count of Techniques\": 50}, {\"Data Sources\": \"packet capture,process use of network\", \"Count of Techniques\": 45}, {\"Data Sources\": \"packet capture,process monitoring\", \"Count of Techniques\": 43}, {\"Data Sources\": \"process command-line parameters,windows registry\", \"Count of Techniques\": 41}, {\"Data Sources\": \"netflow/enclave netflow,network protocol analysis\", \"Count of Techniques\": 41}, {\"Data Sources\": \"network protocol analysis,process use of network\", \"Count of Techniques\": 40}, {\"Data Sources\": \"netflow/enclave netflow,process monitoring\", \"Count of Techniques\": 38}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"2214899b-49ff-44bd-8006-c13ea8aa10bc\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x11fe680d0>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#2214899b-49ff-44bd-8006-c13ea8aa10bc"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "subsets_graph_2 = pandas.DataFrame({\n",
    "    'Data Sources': list(subsets_graph.index),\n",
    "    'Count of Techniques': subsets_graph['subsets_count'].tolist()})\n",
    "\n",
    "bars = alt.Chart(subsets_graph_2).mark_bar().encode(x ='Data Sources', y ='Count of Techniques', color='Data Sources').properties(width=500)\n",
    "text = bars.mark_text(align='center',baseline='middle',dx= 0,dy=-5).encode(text='Count of Techniques')\n",
    "bars + text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Group (Process Monitoring - Process Command-line parameters) is the is the group of data sources with the highest number of techniques. This group of data sources are suggested to hunt 78 techniques"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 17. Let's Split all the Information About Techniques With Data Sources Defined: Matrix, Platform, Tactic and Data Source"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's split all the relevant columns of the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device run-time memory</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device command history</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device configuration</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>collection</td>\n",
       "      <td>Network Device Configuration Dump</td>\n",
       "      <td>T1602.002</td>\n",
       "      <td>Netflow/Enclave netflow</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         matrix platform      tactic                          technique  \\\n",
       "0  mitre-attack  Network   execution                 Network Device CLI   \n",
       "1  mitre-attack  Network   execution                 Network Device CLI   \n",
       "2  mitre-attack  Network   execution                 Network Device CLI   \n",
       "3  mitre-attack  Network   execution                 Network Device CLI   \n",
       "4  mitre-attack  Network  collection  Network Device Configuration Dump   \n",
       "\n",
       "  technique_id                    data_sources  \n",
       "0    T1059.008             Network device logs  \n",
       "1    T1059.008  Network device run-time memory  \n",
       "2    T1059.008  Network device command history  \n",
       "3    T1059.008    Network device configuration  \n",
       "4    T1602.002         Netflow/Enclave netflow  "
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_data = techniques_with_data_sources\n",
    "\n",
    "attributes = ['platform','tactic','data_sources'] # In attributes we are going to indicate the name of the columns that we need to split\n",
    "\n",
    "for a in attributes:\n",
    "    s = techniques_data.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n",
    "    # \"s\" is going to be a column of a frame with every value of the list inside each cell of the column \"a\"\n",
    "    s.name = a\n",
    "    # We name \"s\" with the same name of \"a\".\n",
    "    techniques_data=techniques_data.drop(a, axis=1).join(s).reset_index(drop=True)\n",
    "    # We drop the column \"a\" from \"techniques_data\", and then join \"techniques_data\" with \"s\"\n",
    "\n",
    "# Let's re-arrange the columns from general to specific\n",
    "techniques_data_2=techniques_data.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)\n",
    "\n",
    "# We are going to edit some names inside the dataframe to improve the consistency:\n",
    "techniques_data_3 = techniques_data_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])\n",
    "\n",
    "techniques_data_3.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Do you remember data sources names with a reference to Windows? After splitting the dataframe by platforms, tactics and data sources, are there any macOC or linux techniques that consider windows data sources? Let's identify those rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "# After splitting the rows of the dataframe, there are some values that relate windows data sources with platforms like linux and masOS.\n",
    "# We need to identify those rows\n",
    "conditions = [(techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),\n",
    "             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),\n",
    "             (techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),\n",
    "             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),\n",
    "             (techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True),\n",
    "             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True)]\n",
    "# In conditions we indicate a logical test\n",
    "\n",
    "choices = ['NO OK','NO OK','NO OK','NO OK','NO OK','NO OK']\n",
    "# In choices, we indicate the result when the logical test is true\n",
    "\n",
    "techniques_data_3['Validation'] = np.select(conditions,choices,default='OK')\n",
    "# We add a column \"Validation\" to \"techniques_data_3\" with the result of the logical test. The default value is going to be \"OK\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the inconsistent data?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "      <th>Validation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>162</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Linux</td>\n",
       "      <td>defense-evasion</td>\n",
       "      <td>Run Virtual Instance</td>\n",
       "      <td>T1564.006</td>\n",
       "      <td>Windows Registry</td>\n",
       "      <td>NO OK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>macOS</td>\n",
       "      <td>defense-evasion</td>\n",
       "      <td>Run Virtual Instance</td>\n",
       "      <td>T1564.006</td>\n",
       "      <td>Windows Registry</td>\n",
       "      <td>NO OK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>179</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Linux</td>\n",
       "      <td>defense-evasion</td>\n",
       "      <td>Hidden File System</td>\n",
       "      <td>T1564.005</td>\n",
       "      <td>Windows Registry</td>\n",
       "      <td>NO OK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>181</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>macOS</td>\n",
       "      <td>defense-evasion</td>\n",
       "      <td>Hidden File System</td>\n",
       "      <td>T1564.005</td>\n",
       "      <td>Windows Registry</td>\n",
       "      <td>NO OK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>794</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>macOS</td>\n",
       "      <td>defense-evasion</td>\n",
       "      <td>Hidden Window</td>\n",
       "      <td>T1564.003</td>\n",
       "      <td>PowerShell logs</td>\n",
       "      <td>NO OK</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           matrix platform           tactic             technique  \\\n",
       "162  mitre-attack    Linux  defense-evasion  Run Virtual Instance   \n",
       "168  mitre-attack    macOS  defense-evasion  Run Virtual Instance   \n",
       "179  mitre-attack    Linux  defense-evasion    Hidden File System   \n",
       "181  mitre-attack    macOS  defense-evasion    Hidden File System   \n",
       "794  mitre-attack    macOS  defense-evasion         Hidden Window   \n",
       "\n",
       "    technique_id      data_sources Validation  \n",
       "162    T1564.006  Windows Registry      NO OK  \n",
       "168    T1564.006  Windows Registry      NO OK  \n",
       "179    T1564.005  Windows Registry      NO OK  \n",
       "181    T1564.005  Windows Registry      NO OK  \n",
       "794    T1564.003   PowerShell logs      NO OK  "
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_analysis_data_no_ok = techniques_data_3[techniques_data_3.Validation == 'NO OK']\n",
    "# Finally, we are filtering all the values with NO OK\n",
    "\n",
    "techniques_analysis_data_no_ok.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are  136  rows with inconsistent data\n"
     ]
    }
   ],
   "source": [
    "print('There are ',len(techniques_analysis_data_no_ok),' rows with inconsistent data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the impact of this inconsistent data from a platform and data sources perspective?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = techniques_with_data_sources\n",
    "\n",
    "attributes = ['platform','data_sources']\n",
    "\n",
    "for a in attributes:\n",
    "    s = df.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n",
    "    s.name = a\n",
    "    df=df.drop(a, axis=1).join(s).reset_index(drop=True)\n",
    "    \n",
    "df_2=df.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)\n",
    "df_3 = df_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])\n",
    "\n",
    "conditions = [(df_3['data_sources'].str.contains('windows',case=False)== True),\n",
    "              (df_3['data_sources'].str.contains('powershell',case=False)== True),\n",
    "              (df_3['data_sources'].str.contains('wmi',case=False)== True)]\n",
    "\n",
    "choices = ['Windows','Windows','Windows']\n",
    "\n",
    "df_3['Validation'] = np.select(conditions,choices,default='Other')\n",
    "df_3['Num_Tech'] = 1\n",
    "df_4 = df_3[df_3.Validation == 'Windows']\n",
    "df_5 = df_4.groupby(['data_sources','platform'])['technique'].nunique()\n",
    "df_6 = df_5.to_frame().reset_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/javascript": [
       "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-6d4700e1646c3dceebb7655c72e7b5ac\"}, \"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"platform\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"technique\", \"stack\": \"normalize\"}, \"y\": {\"type\": \"nominal\", \"field\": \"data_sources\"}}, \"height\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-6d4700e1646c3dceebb7655c72e7b5ac\": [{\"data_sources\": \"PowerShell logs\", \"platform\": \"Linux\", \"technique\": 9}, {\"data_sources\": \"PowerShell logs\", \"platform\": \"Network\", \"technique\": 2}, {\"data_sources\": \"PowerShell logs\", \"platform\": \"Windows\", \"technique\": 22}, {\"data_sources\": \"PowerShell logs\", \"platform\": \"macOS\", \"technique\": 13}, {\"data_sources\": \"WMI Objects\", \"platform\": \"Linux\", \"technique\": 1}, {\"data_sources\": \"WMI Objects\", \"platform\": \"Windows\", \"technique\": 2}, {\"data_sources\": \"WMI Objects\", \"platform\": \"macOS\", \"technique\": 1}, {\"data_sources\": \"Windows Error Reporting\", \"platform\": \"Linux\", \"technique\": 4}, {\"data_sources\": \"Windows Error Reporting\", \"platform\": \"Windows\", \"technique\": 4}, {\"data_sources\": \"Windows Error Reporting\", \"platform\": \"macOS\", \"technique\": 4}, {\"data_sources\": \"Windows Registry\", \"platform\": \"AWS\", \"technique\": 2}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Azure\", \"technique\": 2}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Azure AD\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Control Server\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Data Historian\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Engineering Workstation\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"GCP\", \"technique\": 2}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Human-Machine Interface\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Linux\", \"technique\": 19}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Network\", \"technique\": 3}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Office 365\", \"technique\": 5}, {\"data_sources\": \"Windows Registry\", \"platform\": \"SaaS\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Windows\", \"technique\": 55}, {\"data_sources\": \"Windows Registry\", \"platform\": \"macOS\", \"technique\": 19}, {\"data_sources\": \"Windows error reporting\", \"platform\": \"Data Historian\", \"technique\": 1}, {\"data_sources\": \"Windows error reporting\", \"platform\": \"Engineering Workstation\", \"technique\": 1}, {\"data_sources\": \"Windows error reporting\", \"platform\": \"Human-Machine Interface\", \"technique\": 1}, {\"data_sources\": \"Windows error reporting\", \"platform\": \"Windows\", \"technique\": 1}, {\"data_sources\": \"Windows event logs\", \"platform\": \"AWS\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Azure\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Azure AD\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Control Server\", \"technique\": 1}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Engineering Workstation\", \"technique\": 2}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Field Controller/RTU/PLC/IED\", \"technique\": 1}, {\"data_sources\": \"Windows event logs\", \"platform\": \"GCP\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Human-Machine Interface\", \"technique\": 2}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Linux\", \"technique\": 19}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Network\", \"technique\": 2}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Office 365\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"SaaS\", \"technique\": 1}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Safety Instrumented System/Protection Relay\", \"technique\": 2}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Windows\", \"technique\": 50}, {\"data_sources\": \"Windows event logs\", \"platform\": \"macOS\", \"technique\": 18}, {\"data_sources\": \"Windows registry\", \"platform\": \"Engineering Workstation\", \"technique\": 1}, {\"data_sources\": \"Windows registry\", \"platform\": \"Field Controller/RTU/PLC/IED\", \"technique\": 1}, {\"data_sources\": \"Windows registry\", \"platform\": \"Windows\", \"technique\": 2}]}};\n",
       "const opt = {};\n",
       "const type = \"vega-lite\";\n",
       "const id = \"5e119597-5160-4769-a803-0ec11b1a8ecd\";\n",
       "\n",
       "const output_area = this;\n",
       "\n",
       "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n",
       "  const target = document.createElement(\"div\");\n",
       "  target.id = id;\n",
       "  target.className = \"vega-embed\";\n",
       "\n",
       "  const style = document.createElement(\"style\");\n",
       "  style.textContent = [\n",
       "    \".vega-embed .error p {\",\n",
       "    \"  color: firebrick;\",\n",
       "    \"  font-size: 14px;\",\n",
       "    \"}\",\n",
       "  ].join(\"\\\\n\");\n",
       "\n",
       "  // element is a jQuery wrapped DOM element inside the output area\n",
       "  // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n",
       "  // IPython.display.html#IPython.display.Javascript.__init__\n",
       "  element[0].appendChild(target);\n",
       "  element[0].appendChild(style);\n",
       "\n",
       "  vega.render(\"#\" + id, spec, type, opt, output_area);\n",
       "}, function (err) {\n",
       "  if (err.requireType !== \"scripterror\") {\n",
       "    throw(err);\n",
       "  }\n",
       "});\n"
      ],
      "text/plain": [
       "<vega.vegalite.VegaLite at 0x121116820>"
      ]
     },
     "metadata": {
      "jupyter-vega": "#5e119597-5160-4769-a803-0ec11b1a8ecd"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alt.Chart(df_6).mark_bar().encode(x=alt.X('technique', stack=\"normalize\"),    y='data_sources',    color='platform').properties(height=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are techniques that consider Windows Error Reporting, Windows Registry, and Windows event logs as data sources and they also consider platforms like Linux and masOS. We do not need to consider this rows because those data sources can only be managed at a Windows environment. These are the techniques that we should not consider in our data base:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>technique</th>\n",
       "      <th>data_sources</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5953</th>\n",
       "      <td>OS Credential Dumping</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5832</th>\n",
       "      <td>Remote Services</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2814</th>\n",
       "      <td>Clear Command History</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2432</th>\n",
       "      <td>Credentials from Password Stores</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4564</th>\n",
       "      <td>Peripheral Device Discovery</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2271</th>\n",
       "      <td>Keychain</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2259</th>\n",
       "      <td>Credentials from Web Browsers</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2392</th>\n",
       "      <td>GUI Input Capture</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1831</th>\n",
       "      <td>Impair Command History Logging</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>794</th>\n",
       "      <td>Hidden Window</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1611</th>\n",
       "      <td>Hide Artifacts</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5431</th>\n",
       "      <td>Input Capture</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5402</th>\n",
       "      <td>Command and Scripting Interpreter</td>\n",
       "      <td>PowerShell logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3206</th>\n",
       "      <td>Event Triggered Execution</td>\n",
       "      <td>WMI Objects</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4156</th>\n",
       "      <td>Exploitation of Remote Services</td>\n",
       "      <td>Windows Error Reporting</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4206</th>\n",
       "      <td>Exploitation for Defense Evasion</td>\n",
       "      <td>Windows Error Reporting</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5361</th>\n",
       "      <td>Exploitation for Privilege Escalation</td>\n",
       "      <td>Windows Error Reporting</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4241</th>\n",
       "      <td>Exploitation for Credential Access</td>\n",
       "      <td>Windows Error Reporting</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3212</th>\n",
       "      <td>Event Triggered Execution</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5217</th>\n",
       "      <td>Software Deployment Tools</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4038</th>\n",
       "      <td>Service Stop</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4020</th>\n",
       "      <td>Inhibit System Recovery</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5426</th>\n",
       "      <td>Input Capture</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3389</th>\n",
       "      <td>Create or Modify System Process</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5827</th>\n",
       "      <td>Remote Services</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4373</th>\n",
       "      <td>Browser Extensions</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>162</th>\n",
       "      <td>Run Virtual Instance</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2414</th>\n",
       "      <td>Keylogging</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1875</th>\n",
       "      <td>Impair Defenses</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2599</th>\n",
       "      <td>Masquerade Task or Service</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1857</th>\n",
       "      <td>Disable or Modify Tools</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2654</th>\n",
       "      <td>Subvert Trust Controls</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1824</th>\n",
       "      <td>Disable or Modify System Firewall</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1204</th>\n",
       "      <td>System Services</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2341</th>\n",
       "      <td>Modify Authentication Process</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2722</th>\n",
       "      <td>Unsecured Credentials</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>179</th>\n",
       "      <td>Hidden File System</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2895</th>\n",
       "      <td>Abuse Elevation Control Mechanism</td>\n",
       "      <td>Windows Registry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5278</th>\n",
       "      <td>Indicator Removal on Host</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5775</th>\n",
       "      <td>Obfuscated Files or Information</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5401</th>\n",
       "      <td>Command and Scripting Interpreter</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5828</th>\n",
       "      <td>Remote Services</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5559</th>\n",
       "      <td>Scheduled Task/Job</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5427</th>\n",
       "      <td>Input Capture</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2970</th>\n",
       "      <td>Local Account</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3202</th>\n",
       "      <td>Event Triggered Execution</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4439</th>\n",
       "      <td>Create Account</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2602</th>\n",
       "      <td>Masquerade Task or Service</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2655</th>\n",
       "      <td>Subvert Trust Controls</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4078</th>\n",
       "      <td>File and Directory Permissions Modification</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2720</th>\n",
       "      <td>Unsecured Credentials</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4022</th>\n",
       "      <td>Inhibit System Recovery</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3624</th>\n",
       "      <td>System Shutdown/Reboot</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3605</th>\n",
       "      <td>Account Access Removal</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2962</th>\n",
       "      <td>Domain Account</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4909</th>\n",
       "      <td>Account Manipulation</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3388</th>\n",
       "      <td>Create or Modify System Process</td>\n",
       "      <td>Windows event logs</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        technique             data_sources\n",
       "5953                        OS Credential Dumping          PowerShell logs\n",
       "5832                              Remote Services          PowerShell logs\n",
       "2814                        Clear Command History          PowerShell logs\n",
       "2432             Credentials from Password Stores          PowerShell logs\n",
       "4564                  Peripheral Device Discovery          PowerShell logs\n",
       "2271                                     Keychain          PowerShell logs\n",
       "2259                Credentials from Web Browsers          PowerShell logs\n",
       "2392                            GUI Input Capture          PowerShell logs\n",
       "1831               Impair Command History Logging          PowerShell logs\n",
       "794                                 Hidden Window          PowerShell logs\n",
       "1611                               Hide Artifacts          PowerShell logs\n",
       "5431                                Input Capture          PowerShell logs\n",
       "5402            Command and Scripting Interpreter          PowerShell logs\n",
       "3206                    Event Triggered Execution              WMI Objects\n",
       "4156              Exploitation of Remote Services  Windows Error Reporting\n",
       "4206             Exploitation for Defense Evasion  Windows Error Reporting\n",
       "5361        Exploitation for Privilege Escalation  Windows Error Reporting\n",
       "4241           Exploitation for Credential Access  Windows Error Reporting\n",
       "3212                    Event Triggered Execution         Windows Registry\n",
       "5217                    Software Deployment Tools         Windows Registry\n",
       "4038                                 Service Stop         Windows Registry\n",
       "4020                      Inhibit System Recovery         Windows Registry\n",
       "5426                                Input Capture         Windows Registry\n",
       "3389              Create or Modify System Process         Windows Registry\n",
       "5827                              Remote Services         Windows Registry\n",
       "4373                           Browser Extensions         Windows Registry\n",
       "162                          Run Virtual Instance         Windows Registry\n",
       "2414                                   Keylogging         Windows Registry\n",
       "1875                              Impair Defenses         Windows Registry\n",
       "2599                   Masquerade Task or Service         Windows Registry\n",
       "1857                      Disable or Modify Tools         Windows Registry\n",
       "2654                       Subvert Trust Controls         Windows Registry\n",
       "1824            Disable or Modify System Firewall         Windows Registry\n",
       "1204                              System Services         Windows Registry\n",
       "2341                Modify Authentication Process         Windows Registry\n",
       "2722                        Unsecured Credentials         Windows Registry\n",
       "179                            Hidden File System         Windows Registry\n",
       "2895            Abuse Elevation Control Mechanism         Windows Registry\n",
       "5278                    Indicator Removal on Host       Windows event logs\n",
       "5775              Obfuscated Files or Information       Windows event logs\n",
       "5401            Command and Scripting Interpreter       Windows event logs\n",
       "5828                              Remote Services       Windows event logs\n",
       "5559                           Scheduled Task/Job       Windows event logs\n",
       "5427                                Input Capture       Windows event logs\n",
       "2970                                Local Account       Windows event logs\n",
       "3202                    Event Triggered Execution       Windows event logs\n",
       "4439                               Create Account       Windows event logs\n",
       "2602                   Masquerade Task or Service       Windows event logs\n",
       "2655                       Subvert Trust Controls       Windows event logs\n",
       "4078  File and Directory Permissions Modification       Windows event logs\n",
       "2720                        Unsecured Credentials       Windows event logs\n",
       "4022                      Inhibit System Recovery       Windows event logs\n",
       "3624                       System Shutdown/Reboot       Windows event logs\n",
       "3605                       Account Access Removal       Windows event logs\n",
       "2962                               Domain Account       Windows event logs\n",
       "4909                         Account Manipulation       Windows event logs\n",
       "3388              Create or Modify System Process       Windows event logs"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_analysis_data_no_ok[['technique','data_sources']].drop_duplicates().sort_values(by='data_sources',ascending=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Without considering this inconsistent data, the final dataframe is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>matrix</th>\n",
       "      <th>platform</th>\n",
       "      <th>tactic</th>\n",
       "      <th>technique</th>\n",
       "      <th>technique_id</th>\n",
       "      <th>data_sources</th>\n",
       "      <th>Validation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device logs</td>\n",
       "      <td>OK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device run-time memory</td>\n",
       "      <td>OK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device command history</td>\n",
       "      <td>OK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>execution</td>\n",
       "      <td>Network Device CLI</td>\n",
       "      <td>T1059.008</td>\n",
       "      <td>Network device configuration</td>\n",
       "      <td>OK</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>mitre-attack</td>\n",
       "      <td>Network</td>\n",
       "      <td>collection</td>\n",
       "      <td>Network Device Configuration Dump</td>\n",
       "      <td>T1602.002</td>\n",
       "      <td>Netflow/Enclave netflow</td>\n",
       "      <td>OK</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         matrix platform      tactic                          technique  \\\n",
       "0  mitre-attack  Network   execution                 Network Device CLI   \n",
       "1  mitre-attack  Network   execution                 Network Device CLI   \n",
       "2  mitre-attack  Network   execution                 Network Device CLI   \n",
       "3  mitre-attack  Network   execution                 Network Device CLI   \n",
       "4  mitre-attack  Network  collection  Network Device Configuration Dump   \n",
       "\n",
       "  technique_id                    data_sources Validation  \n",
       "0    T1059.008             Network device logs         OK  \n",
       "1    T1059.008  Network device run-time memory         OK  \n",
       "2    T1059.008  Network device command history         OK  \n",
       "3    T1059.008    Network device configuration         OK  \n",
       "4    T1602.002         Netflow/Enclave netflow         OK  "
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "techniques_analysis_data_ok = techniques_data_3[techniques_data_3.Validation == 'OK']\n",
    "techniques_analysis_data_ok.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are  6650  rows of data that you can play with\n"
     ]
    }
   ],
   "source": [
    "print('There are ',len(techniques_analysis_data_ok),' rows of data that you can play with')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 18. Getting Techniques by Data Sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This function gets techniques' information that includes specific data sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_source = 'PROCESS MONITORING'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = lift.get_techniques_by_datasources(data_source)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "320"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "list"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "results2 = lift.get_techniques_by_datasources('pRoceSS MoniTorinG','process commAnd-linE parameters')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "336"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(results2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AttackPattern(type='attack-pattern', id='attack-pattern--2de47683-f398-448f-b947-9abcc3e32fad', created_by_ref='identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5', created='2020-10-05T13:24:49.780Z', modified='2020-10-09T16:05:36.344Z', name='Print Processors', description='Adversaries may abuse print processors to run malicious DLLs during system boot for persistence and/or privilege escalation. Print processors are DLLs that are loaded by the print spooler service, spoolsv.exe, during boot. \\n\\nAdversaries may abuse the print spooler service by adding print processors that load malicious DLLs at startup. A print processor can be installed through the <code>AddPrintProcessor</code> API call with an account that has <code>SeLoadDriverPrivilege</code> enabled. Alternatively, a print processor can be registered to the print spooler service by adding the <code>HKLM\\\\SYSTEM\\\\\\\\[CurrentControlSet or ControlSet001]\\\\Control\\\\Print\\\\Environments\\\\\\\\[Windows architecture: e.g., Windows x64]\\\\Print Processors\\\\\\\\[user defined]\\\\Driver</code> Registry key that points to the DLL. For the print processor to be correctly installed, it must be located in the system print-processor directory that can be found with the <code>GetPrintProcessorDirectory</code> API call.(Citation: Microsoft AddPrintProcessor May 2018) After the print processors are installed, the print spooler service, which starts during boot, must be restarted in order for them to run.(Citation: ESET PipeMon May 2020) The print spooler service runs under SYSTEM level permissions, therefore print processors installed by an adversary may run under elevated privileges.', kill_chain_phases=[KillChainPhase(kill_chain_name='mitre-attack', phase_name='persistence'), KillChainPhase(kill_chain_name='mitre-attack', phase_name='privilege-escalation')], external_references=[ExternalReference(source_name='mitre-attack', url='https://attack.mitre.org/techniques/T1547/012', external_id='T1547.012'), ExternalReference(source_name='Microsoft AddPrintProcessor May 2018', description='Microsoft. (2018, May 31). AddPrintProcessor function. Retrieved October 5, 2020.', url='https://docs.microsoft.com/en-us/windows/win32/printdocs/addprintprocessor'), ExternalReference(source_name='ESET PipeMon May 2020', description='Tartare, M. et al. (2020, May 21). No “Game over” for the Winnti Group. Retrieved August 24, 2020.', url='https://www.welivesecurity.com/2020/05/21/no-game-over-winnti-group/')], object_marking_refs=['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'], x_mitre_contributors=['Mathieu Tartare, ESET'], x_mitre_data_sources=['Process monitoring', 'Windows Registry', 'File monitoring', 'DLL monitoring', 'API monitoring'], x_mitre_detection='Monitor process API calls to <code>AddPrintProcessor</code> and <code>GetPrintProcessorDirectory</code>. New print processor DLLs are written to the print processor directory. Also monitor Registry writes to <code>HKLM\\\\SYSTEM\\\\ControlSet001\\\\Control\\\\Print\\\\Environments\\\\\\\\[Windows architecture]\\\\Print Processors\\\\\\\\[user defined]\\\\\\\\Driver</code> or <code>HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Control\\\\Print\\\\Environments\\\\\\\\[Windows architecture]\\\\Print Processors\\\\\\\\[user defined]\\\\Driver</code> as they pertain to print processor installations.\\n\\nMonitor for abnormal DLLs that are loaded by spoolsv.exe. Print processors that do not correlate with known good software or patching may be suspicious.', x_mitre_is_subtechnique=True, x_mitre_permissions_required=['Administrator', 'SYSTEM'], x_mitre_platforms=['Windows'], x_mitre_version='1.0')"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results2[1]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}