{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# **MITRE ATT&CK PYTHON CLIENT**: Data Sources\n", "------------------" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goals:\n", "* Access ATT&CK data sources in STIX format via a public TAXII server\n", "* Learn to interact with ATT&CK data all at once\n", "* Explore and idenfity patterns in the data retrieved\n", "* Learn more about ATT&CK data sources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. ATT&CK Python Client Installation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can install it via PIP: **pip install attackcti**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Import ATT&CK API Client" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from attackcti import attack_client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Import Extra Libraries" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from pandas import *\n", "import numpy as np\n", "\n", "import altair as alt\n", "alt.renderers.enable('notebook')\n", "\n", "import itertools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Initialize ATT&CK Client Class" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "lift = attack_client()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Getting Information About Techniques" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Getting ALL ATT&CK Techniques" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "all_techniques = lift.get_techniques(stix_format=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Showing the first technique in our list" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'external_references': [{'source_name': 'mitre-attack',\n", " 'external_id': 'T1059.008',\n", " 'url': 'https://attack.mitre.org/techniques/T1059/008'},\n", " {'source_name': 'Cisco Synful Knock Evolution',\n", " 'url': 'https://blogs.cisco.com/security/evolution-of-attacks-on-cisco-ios-devices',\n", " 'description': 'Graham Holmes. (2015, October 8). Evolution of attacks on Cisco IOS devices. Retrieved October 19, 2020.'},\n", " {'source_name': 'Cisco IOS Software Integrity Assurance - Command History',\n", " 'url': 'https://tools.cisco.com/security/center/resources/integrity_assurance.html#23',\n", " 'description': 'Cisco. (n.d.). Cisco IOS Software Integrity Assurance - Command History. Retrieved October 21, 2020.'}],\n", " 'kill_chain_phases': [{'kill_chain_name': 'mitre-attack',\n", " 'phase_name': 'execution'}],\n", " 'x_mitre_is_subtechnique': True,\n", " 'x_mitre_version': '1.0',\n", " 'id': 'attack-pattern--818302b2-d640-477b-bf88-873120ce85c4',\n", " 'technique_description': 'Adversaries may abuse scripting or built-in command line interpreters (CLI) on network devices to execute malicious command and payloads. The CLI is the primary means through which users and administrators interact with the device in order to view system information, modify device operations, or perform diagnostic and administrative functions. CLIs typically contain various permission levels required for different commands. \\n\\nScripting interpreters automate tasks and extend functionality beyond the command set included in the network OS. The CLI and scripting interpreter are accessible through a direct console connection, or through remote means, such as telnet or secure shell (SSH).\\n\\nAdversaries can use the network CLI to change how network devices behave and operate. The CLI may be used to manipulate traffic flows to intercept or manipulate data, modify startup configuration parameters to load malicious system software, or to disable security features or logging to avoid detection. (Citation: Cisco Synful Knock Evolution)',\n", " 'technique': 'Network Device CLI',\n", " 'created_by_ref': 'identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5',\n", " 'object_marking_refs': ['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'],\n", " 'url': 'https://attack.mitre.org/techniques/T1059/008',\n", " 'matrix': 'mitre-attack',\n", " 'technique_id': 'T1059.008',\n", " 'type': 'attack-pattern',\n", " 'tactic': ['execution'],\n", " 'modified': '2020-10-22T16:43:38.388Z',\n", " 'created': '2020-10-20T00:09:33.072Z',\n", " 'data_sources': ['Network device logs',\n", " 'Network device run-time memory',\n", " 'Network device command history',\n", " 'Network device configuration'],\n", " 'platform': ['Network'],\n", " 'technique_detection': 'Consider reviewing command history in either the console or as part of the running memory to determine if unauthorized or suspicious commands were used to modify device configuration.(Citation: Cisco IOS Software Integrity Assurance - Command History)\\n\\nConsider comparing a copy of the network device configuration against a known-good version to discover unauthorized changes to the command interpreter. The same process can be accomplished through a comparison of the run-time memory, though this is non-trivial and may require assistance from the vendor.',\n", " 'permissions_required': ['Administrator', 'User']}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_techniques[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normalizing semi-structured JSON data into a flat table via **pandas.io.json.json_normalize**\n", "* Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "techniques_normalized = pandas.json_normalize(all_techniques)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
external_referenceskill_chain_phasesx_mitre_is_subtechniquex_mitre_versionidtechnique_descriptiontechniquecreated_by_refobject_marking_refsurl...remote_supportimpact_typerevokedx_mitre_deprecatedx_mitre_old_attack_iddifficulty_explanationdifficulty_for_adversarydetectable_explanationdetectable_by_common_defensestactic_type
0[{'source_name': 'mitre-attack', 'external_id'...[{'kill_chain_name': 'mitre-attack', 'phase_na...True1.0attack-pattern--818302b2-d640-477b-bf88-873120...Adversaries may abuse scripting or built-in co...Network Device CLIidentity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5[marking-definition--fa42a846-8d90-4e51-bc29-7...https://attack.mitre.org/techniques/T1059/008...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

1 rows × 37 columns

\n", "
" ], "text/plain": [ " external_references \\\n", "0 [{'source_name': 'mitre-attack', 'external_id'... \n", "\n", " kill_chain_phases x_mitre_is_subtechnique \\\n", "0 [{'kill_chain_name': 'mitre-attack', 'phase_na... True \n", "\n", " x_mitre_version id \\\n", "0 1.0 attack-pattern--818302b2-d640-477b-bf88-873120... \n", "\n", " technique_description technique \\\n", "0 Adversaries may abuse scripting or built-in co... Network Device CLI \n", "\n", " created_by_ref \\\n", "0 identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5 \n", "\n", " object_marking_refs \\\n", "0 [marking-definition--fa42a846-8d90-4e51-bc29-7... \n", "\n", " url ... remote_support \\\n", "0 https://attack.mitre.org/techniques/T1059/008 ... NaN \n", "\n", " impact_type revoked x_mitre_deprecated x_mitre_old_attack_id \\\n", "0 NaN NaN NaN NaN \n", "\n", " difficulty_explanation difficulty_for_adversary detectable_explanation \\\n", "0 NaN NaN NaN \n", "\n", " detectable_by_common_defenses tactic_type \n", "0 NaN NaN \n", "\n", "[1 rows x 37 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_normalized[0:1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Re-indexing Dataframe" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "techniques = techniques_normalized.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sources
0mitre-attack[Network][execution]Network Device CLIT1059.008[Network device logs, Network device run-time ...
1mitre-attack[Network][collection]Network Device Configuration DumpT1602.002[Netflow/Enclave netflow, Network protocol ana...
2mitre-attack[Network][defense-evasion, persistence]TFTP BootT1542.005[Network device run-time memory, Network devic...
3mitre-attack[Network][defense-evasion, persistence]ROMMONkitT1542.004[File monitoring, Netflow/Enclave netflow, Net...
4mitre-attack[Network][collection]SNMP (MIB Dump)T1602.001[Netflow/Enclave netflow, Network protocol ana...
\n", "
" ], "text/plain": [ " matrix platform tactic \\\n", "0 mitre-attack [Network] [execution] \n", "1 mitre-attack [Network] [collection] \n", "2 mitre-attack [Network] [defense-evasion, persistence] \n", "3 mitre-attack [Network] [defense-evasion, persistence] \n", "4 mitre-attack [Network] [collection] \n", "\n", " technique technique_id \\\n", "0 Network Device CLI T1059.008 \n", "1 Network Device Configuration Dump T1602.002 \n", "2 TFTP Boot T1542.005 \n", "3 ROMMONkit T1542.004 \n", "4 SNMP (MIB Dump) T1602.001 \n", "\n", " data_sources \n", "0 [Network device logs, Network device run-time ... \n", "1 [Netflow/Enclave netflow, Network protocol ana... \n", "2 [Network device run-time memory, Network devic... \n", "3 [File monitoring, Netflow/Enclave netflow, Net... \n", "4 [Netflow/Enclave netflow, Network protocol ana... " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques.head()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A total of 1024 techniques\n" ] } ], "source": [ "print('A total of ',len(techniques),' techniques')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Removing Revoked Techniques" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "all_techniques_no_revoked = lift.remove_revoked(all_techniques)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A total of 878 techniques\n" ] } ], "source": [ "print('A total of ',len(all_techniques_no_revoked),' techniques')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Extractinng Revoked Techniques" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "all_techniques_revoked = lift.extract_revoked(all_techniques)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A total of 146 techniques that have been revoked\n" ] } ], "source": [ "print('A total of ',len(all_techniques_revoked),' techniques that have been revoked')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The revoked techniques are the following ones:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Web Session Cookie\n", "Emond\n", "Cloud Instance Metadata API\n", "Revert Cloud Instance\n", "Application Access Token\n", "Elevated Execution with Prompt\n", "Credentials from Web Browsers\n", "PowerShell Profile\n", "Parent PID Spoofing\n", "Compile After Delivery\n", "Systemd Service\n", "Runtime Data Manipulation\n", "Transmitted Data Manipulation\n", "Stored Data Manipulation\n", "Disk Content Wipe\n", "Disk Structure Wipe\n", "Domain Generation Algorithms\n", "Compiled HTML File\n", "Kernel Modules and Extensions\n", "Spearphishing Link\n", "CMSTP\n", "Credentials in Registry\n", "Control Panel Items\n", "Kerberoasting\n", "Spearphishing Attachment\n", "SIP and Trust Provider Hijacking\n", "Spearphishing via Service\n", "Sudo Caching\n", "Time Providers\n", "AppCert DLLs\n", "Dynamic Data Exchange\n", "Multi-hop Proxy\n", "Process Doppelgänging\n", "Extra Window Memory Injection\n", "Domain Fronting\n", "Mshta\n", "Hooking\n", "Image File Execution Options Injection\n", "LSASS Driver\n", "Screensaver\n", "LLMNR/NBT-NS Poisoning and Relay\n", "Password Filter DLL\n", "SSH Hijacking\n", "SID-History Injection\n", "Gatekeeper Bypass\n", "HISTCONTROL\n", "LC_LOAD_DYLIB Addition\n", "Launchctl\n", "Local Job Scheduling\n", "Private Keys\n", "Rc.common\n", "Space after Filename\n", "Application Shimming\n", "AppleScript\n", "Bash History\n", ".bash_profile and .bashrc\n", "Clear Command History\n", "Dylib Hijacking\n", "Hidden Window\n", "Launch Daemon\n", "Hidden Users\n", "Input Prompt\n", "Launch Agent\n", "Login Item\n", "Keychain\n", "Plist Modification\n", "Re-opened Applications\n", "Setuid and Setgid\n", "Hidden Files and Directories\n", "Startup Items\n", "Sudo\n", "Securityd Memory\n", "Trap\n", "Authentication Package\n", "Install Root Certificate\n", "Netsh Helper DLL\n", "Network Share Connection Removal\n", "Component Object Model Hijacking\n", "Regsvcs/Regasm\n", "InstallUtil\n", "Regsvr32\n", "Code Signing\n", "Component Firmware\n", "File Deletion\n", "AppInit DLLs\n", "Security Support Provider\n", "Web Shell\n", "Timestomp\n", "Pass the Ticket\n", "NTFS File Attributes\n", "Custom Command and Control Protocol\n", "Process Hollowing\n", "Disabling Security Tools\n", "Bypass User Account Control\n", "PowerShell\n", "Rundll32\n", "Windows Management Instrumentation Event Subscription\n", "Credentials in Files\n", "Multilayer Encryption\n", "Windows Admin Shares\n", "Remote Desktop Protocol\n", "Pass the Hash\n", "DLL Side-Loading\n", "Bootkit\n", "Indicator Removal from Tools\n", "Uncommonly Used Port\n", "Security Software Discovery\n", "Registry Run Keys / Startup Folder\n", "Service Registry Permissions Weakness\n", "Indicator Blocking\n", "New Service\n", "Software Packing\n", "File System Permissions Weakness\n", "Change Default File Association\n", "DLL Search Order Hijacking\n", "Service Execution\n", "Standard Cryptographic Protocol\n", "Modify Existing Service\n", "Windows Remote Management\n", "Custom Cryptographic Protocol\n", "Shortcut Modification\n", "Data Encrypted\n", "System Firmware\n", "Application Deployment Software\n", "Accessibility Features\n", "Port Monitors\n", "Binary Padding\n", "Winlogon Helper DLL\n", "Data Compressed\n", "Remotely Install Application\n", "Insecure Third-Party Libraries\n", "Fake Developer Accounts\n", "Device Type Discovery\n", "Detect App Analysis Environment\n", "Malicious Software Development Tools\n", "Biometric Spoofing\n", "Device Unlock Code Guessing or Brute Force\n", "Malicious Media Content\n", "URL Scheme Hijacking\n", "Abuse of iOS Enterprise App Signing Key\n", "App Delivered via Web Download\n", "App Delivered via Email Attachment\n", "Malicious or Vulnerable Built-in Device Functionality\n", "Malicious SMS Message\n", "Exploit Baseband Vulnerability\n", "Stolen Developer Credentials or Signing Keys\n" ] } ], "source": [ "for t in all_techniques_revoked:\n", " print(t['technique'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Updating our Dataframe" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "techniques_normalized = pandas.json_normalize(all_techniques_no_revoked)\n", "techniques = techniques_normalized.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10. Techniques Per Matrix\n", "Using **altair** python library we can start showing a few charts stacking the number of techniques with or without data sources.\n", "Reference: https://altair-viz.github.io/" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixtechnique
0mitre-attack536
1mitre-ics-attack81
2mitre-mobile-attack87
3mitre-pre-attack174
\n", "
" ], "text/plain": [ " matrix technique\n", "0 mitre-attack 536\n", "1 mitre-ics-attack 81\n", "2 mitre-mobile-attack 87\n", "3 mitre-pre-attack 174" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = techniques\n", "data_2 = data.groupby(['matrix'])['technique'].count()\n", "data_3 = data_2.to_frame().reset_index()\n", "data_3" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-184270221c81652fd0426411b5cc8a9f\"}, \"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"matrix\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"technique\"}, \"y\": {\"type\": \"nominal\", \"field\": \"matrix\"}}, \"height\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-184270221c81652fd0426411b5cc8a9f\": [{\"matrix\": \"mitre-attack\", \"technique\": 536}, {\"matrix\": \"mitre-ics-attack\", \"technique\": 81}, {\"matrix\": \"mitre-mobile-attack\", \"technique\": 87}, {\"matrix\": \"mitre-pre-attack\", \"technique\": 174}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"f4ebc20d-c16f-4ec6-90d8-ee18f2e853b3\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#f4ebc20d-c16f-4ec6-90d8-ee18f2e853b3" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(data_3).mark_bar().encode(x='technique', y='matrix', color='matrix').properties(height = 200)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11. Techniques With and Without Data Sources" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Techniques\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 200}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Techniques\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Techniques\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 200}], \"data\": {\"name\": \"data-bf80216faf3e46fa0916c0fe5230113d\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-bf80216faf3e46fa0916c0fe5230113d\": [{\"Techniques\": \"Without DS\", \"Count of Techniques\": 337}, {\"Techniques\": \"With DS\", \"Count of Techniques\": 541}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"82e36f62-3e49-41ca-a2a1-11888ec68245\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#82e36f62-3e49-41ca-a2a1-11888ec68245" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_source_distribution = pandas.DataFrame({\n", " 'Techniques': ['Without DS','With DS'],\n", " 'Count of Techniques': [techniques['data_sources'].isna().sum(),techniques['data_sources'].notna().sum()]})\n", "bars = alt.Chart(data_source_distribution).mark_bar().encode(x='Techniques',y='Count of Techniques',color='Techniques').properties(width=200,height=300)\n", "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n", "bars + text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the distribution of techniques based on ATT&CK Matrix?" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixInd_DStechnique
0mitre-attackWith DS474
1mitre-attackWithout DS62
2mitre-ics-attackWith DS67
3mitre-ics-attackWithout DS14
4mitre-mobile-attackWithout DS87
5mitre-pre-attackWithout DS174
\n", "
" ], "text/plain": [ " matrix Ind_DS technique\n", "0 mitre-attack With DS 474\n", "1 mitre-attack Without DS 62\n", "2 mitre-ics-attack With DS 67\n", "3 mitre-ics-attack Without DS 14\n", "4 mitre-mobile-attack Without DS 87\n", "5 mitre-pre-attack Without DS 174" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = techniques\n", "data['Count_DS'] = data['data_sources'].str.len()\n", "data['Ind_DS'] = np.where(data['Count_DS']>0,'With DS','Without DS')\n", "data_2 = data.groupby(['matrix','Ind_DS'])['technique'].count()\n", "data_3 = data_2.to_frame().reset_index()\n", "data_3" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-b034731fd80e42eb889ae43ae9d0d467\"}, \"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"matrix\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"technique\"}, \"y\": {\"type\": \"nominal\", \"field\": \"Ind_DS\"}}, \"height\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-b034731fd80e42eb889ae43ae9d0d467\": [{\"matrix\": \"mitre-attack\", \"Ind_DS\": \"With DS\", \"technique\": 474}, {\"matrix\": \"mitre-attack\", \"Ind_DS\": \"Without DS\", \"technique\": 62}, {\"matrix\": \"mitre-ics-attack\", \"Ind_DS\": \"With DS\", \"technique\": 67}, {\"matrix\": \"mitre-ics-attack\", \"Ind_DS\": \"Without DS\", \"technique\": 14}, {\"matrix\": \"mitre-mobile-attack\", \"Ind_DS\": \"Without DS\", \"technique\": 87}, {\"matrix\": \"mitre-pre-attack\", \"Ind_DS\": \"Without DS\", \"technique\": 174}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"c41580a3-6a3e-472f-86a6-5b5a975349cb\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#c41580a3-6a3e-472f-86a6-5b5a975349cb" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(data_3).mark_bar().encode(x='technique', y='Ind_DS', color='matrix').properties(height = 200)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are those mitre-attack techniques without data sources?" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sourcesCount_DSInd_DS
17mitre-attack[PRE][resource-development]VulnerabilitiesT1588.006NaNNaNWithout DS
23mitre-attack[PRE][reconnaissance]Spearphishing ServiceT1598.001NaNNaNWithout DS
25mitre-attack[PRE][reconnaissance]Purchase Technical DataT1597.002NaNNaNWithout DS
26mitre-attack[PRE][reconnaissance]Threat Intel VendorsT1597.001NaNNaNWithout DS
27mitre-attack[PRE][reconnaissance]Search Closed SourcesT1597NaNNaNWithout DS
...........................
90mitre-attack[PRE][resource-development]Compromise InfrastructureT1584NaNNaNWithout DS
92mitre-attack[PRE][resource-development]Acquire InfrastructureT1583NaNNaNWithout DS
220mitre-attack[Linux, macOS, Windows][collection]Archive via Custom MethodT1560.003NaNNaNWithout DS
260mitre-attack[Linux][credential-access]/etc/passwd and /etc/shadowT1003.008NaNNaNWithout DS
354mitre-attack[Linux, macOS, Windows][persistence, privilege-escalation]Boot or Logon Autostart ExecutionT1547NaNNaNWithout DS
\n", "

62 rows × 8 columns

\n", "
" ], "text/plain": [ " matrix platform \\\n", "17 mitre-attack [PRE] \n", "23 mitre-attack [PRE] \n", "25 mitre-attack [PRE] \n", "26 mitre-attack [PRE] \n", "27 mitre-attack [PRE] \n", ".. ... ... \n", "90 mitre-attack [PRE] \n", "92 mitre-attack [PRE] \n", "220 mitre-attack [Linux, macOS, Windows] \n", "260 mitre-attack [Linux] \n", "354 mitre-attack [Linux, macOS, Windows] \n", "\n", " tactic technique \\\n", "17 [resource-development] Vulnerabilities \n", "23 [reconnaissance] Spearphishing Service \n", "25 [reconnaissance] Purchase Technical Data \n", "26 [reconnaissance] Threat Intel Vendors \n", "27 [reconnaissance] Search Closed Sources \n", ".. ... ... \n", "90 [resource-development] Compromise Infrastructure \n", "92 [resource-development] Acquire Infrastructure \n", "220 [collection] Archive via Custom Method \n", "260 [credential-access] /etc/passwd and /etc/shadow \n", "354 [persistence, privilege-escalation] Boot or Logon Autostart Execution \n", "\n", " technique_id data_sources Count_DS Ind_DS \n", "17 T1588.006 NaN NaN Without DS \n", "23 T1598.001 NaN NaN Without DS \n", "25 T1597.002 NaN NaN Without DS \n", "26 T1597.001 NaN NaN Without DS \n", "27 T1597 NaN NaN Without DS \n", ".. ... ... ... ... \n", "90 T1584 NaN NaN Without DS \n", "92 T1583 NaN NaN Without DS \n", "220 T1560.003 NaN NaN Without DS \n", "260 T1003.008 NaN NaN Without DS \n", "354 T1547 NaN NaN Without DS \n", "\n", "[62 rows x 8 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[(data['matrix']=='mitre-attack') & (data['Ind_DS']=='Without DS')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Techniques without data sources" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "techniques_without_data_sources=techniques[techniques.data_sources.isnull()].reset_index(drop=True)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sourcesCount_DSInd_DS
0mitre-attack[PRE][resource-development]VulnerabilitiesT1588.006NaNNaNWithout DS
1mitre-attack[PRE][reconnaissance]Spearphishing ServiceT1598.001NaNNaNWithout DS
2mitre-attack[PRE][reconnaissance]Purchase Technical DataT1597.002NaNNaNWithout DS
3mitre-attack[PRE][reconnaissance]Threat Intel VendorsT1597.001NaNNaNWithout DS
4mitre-attack[PRE][reconnaissance]Search Closed SourcesT1597NaNNaNWithout DS
\n", "
" ], "text/plain": [ " matrix platform tactic technique \\\n", "0 mitre-attack [PRE] [resource-development] Vulnerabilities \n", "1 mitre-attack [PRE] [reconnaissance] Spearphishing Service \n", "2 mitre-attack [PRE] [reconnaissance] Purchase Technical Data \n", "3 mitre-attack [PRE] [reconnaissance] Threat Intel Vendors \n", "4 mitre-attack [PRE] [reconnaissance] Search Closed Sources \n", "\n", " technique_id data_sources Count_DS Ind_DS \n", "0 T1588.006 NaN NaN Without DS \n", "1 T1598.001 NaN NaN Without DS \n", "2 T1597.002 NaN NaN Without DS \n", "3 T1597.001 NaN NaN Without DS \n", "4 T1597 NaN NaN Without DS " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_without_data_sources.head()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 337 techniques without data sources ( 38% of 878 techniques)\n" ] } ], "source": [ "print('There are ',techniques['data_sources'].isna().sum(),' techniques without data sources (',\"{0:.0%}\".format(techniques['data_sources'].isna().sum()/len(techniques)),' of ',len(techniques),' techniques)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Techniques With Data Sources" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "techniques_with_data_sources=techniques[techniques.data_sources.notnull()].reset_index(drop=True)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sourcesCount_DSInd_DS
0mitre-attack[Network][execution]Network Device CLIT1059.008[Network device logs, Network device run-time ...4.0With DS
1mitre-attack[Network][collection]Network Device Configuration DumpT1602.002[Netflow/Enclave netflow, Network protocol ana...3.0With DS
2mitre-attack[Network][defense-evasion, persistence]TFTP BootT1542.005[Network device run-time memory, Network devic...5.0With DS
3mitre-attack[Network][defense-evasion, persistence]ROMMONkitT1542.004[File monitoring, Netflow/Enclave netflow, Net...4.0With DS
4mitre-attack[Network][collection]SNMP (MIB Dump)T1602.001[Netflow/Enclave netflow, Network protocol ana...3.0With DS
\n", "
" ], "text/plain": [ " matrix platform tactic \\\n", "0 mitre-attack [Network] [execution] \n", "1 mitre-attack [Network] [collection] \n", "2 mitre-attack [Network] [defense-evasion, persistence] \n", "3 mitre-attack [Network] [defense-evasion, persistence] \n", "4 mitre-attack [Network] [collection] \n", "\n", " technique technique_id \\\n", "0 Network Device CLI T1059.008 \n", "1 Network Device Configuration Dump T1602.002 \n", "2 TFTP Boot T1542.005 \n", "3 ROMMONkit T1542.004 \n", "4 SNMP (MIB Dump) T1602.001 \n", "\n", " data_sources Count_DS Ind_DS \n", "0 [Network device logs, Network device run-time ... 4.0 With DS \n", "1 [Netflow/Enclave netflow, Network protocol ana... 3.0 With DS \n", "2 [Network device run-time memory, Network devic... 5.0 With DS \n", "3 [File monitoring, Netflow/Enclave netflow, Net... 4.0 With DS \n", "4 [Netflow/Enclave netflow, Network protocol ana... 3.0 With DS " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_with_data_sources.head()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 541 techniques with data sources ( 62% of 878 techniques)\n" ] } ], "source": [ "print('There are ',techniques['data_sources'].notna().sum(),' techniques with data sources (',\"{0:.0%}\".format(techniques['data_sources'].notna().sum()/len(techniques)),' of ',len(techniques),' techniques)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 12. Grouping Techniques With Data Sources By Matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a graph to represent the number of techniques per matrix:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"x\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"y\": {\"type\": \"nominal\", \"field\": \"Matrix\"}}, \"height\": 100, \"width\": 300}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 10, \"dy\": 0}, \"encoding\": {\"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"y\": {\"type\": \"nominal\", \"field\": \"Matrix\"}}, \"height\": 100, \"width\": 300}], \"data\": {\"name\": \"data-fb2770765a9a1c165be37278cc07fa93\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-fb2770765a9a1c165be37278cc07fa93\": [{\"Matrix\": \"mitre-attack\", \"Count of Techniques\": 474}, {\"Matrix\": \"mitre-ics-attack\", \"Count of Techniques\": 67}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"550c9a4e-6e47-4b38-b24f-ccdb98b73f04\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#550c9a4e-6e47-4b38-b24f-ccdb98b73f04" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "matrix_distribution = pandas.DataFrame({\n", " 'Matrix': list(techniques_with_data_sources.groupby(['matrix'])['matrix'].count().keys()),\n", " 'Count of Techniques': techniques_with_data_sources.groupby(['matrix'])['matrix'].count().tolist()})\n", "bars = alt.Chart(matrix_distribution).mark_bar().encode(y='Matrix',x='Count of Techniques').properties(width=300,height=100)\n", "text = bars.mark_text(align='center',baseline='middle',dx=10,dy=0).encode(text='Count of Techniques')\n", "bars + text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the techniques belong to **mitre-attack** matrix which is the main **Enterprise** matrix. Reference: https://attack.mitre.org/wiki/Main_Page " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 13. Grouping Techniques With Data Sources by Platform" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need to split the **platform** column values because a technique might be mapped to more than one platform" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "techniques_platform=techniques_with_data_sources\n", "\n", "attributes_1 = ['platform'] # In attributes we are going to indicate the name of the columns that we need to split\n", "\n", "for a in attributes_1:\n", " s = techniques_platform.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n", " # \"s\" is going to be a column of a frame with every value of the list inside each cell of the column \"a\"\n", " s.name = a\n", " # We name \"s\" with the same name of \"a\".\n", " techniques_platform=techniques_platform.drop(a, axis=1).join(s).reset_index(drop=True)\n", " # We drop the column \"a\" from \"techniques_platform\", and then join \"techniques_platform\" with \"s\"\n", "\n", "# Let's re-arrange the columns from general to specific\n", "techniques_platform_2=techniques_platform.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now show techniques with data sources mapped to one platform at the time" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sources
0mitre-attackNetwork[execution]Network Device CLIT1059.008[Network device logs, Network device run-time ...
1mitre-attackNetwork[collection]Network Device Configuration DumpT1602.002[Netflow/Enclave netflow, Network protocol ana...
2mitre-attackNetwork[defense-evasion, persistence]TFTP BootT1542.005[Network device run-time memory, Network devic...
3mitre-attackNetwork[defense-evasion, persistence]ROMMONkitT1542.004[File monitoring, Netflow/Enclave netflow, Net...
4mitre-attackNetwork[collection]SNMP (MIB Dump)T1602.001[Netflow/Enclave netflow, Network protocol ana...
\n", "
" ], "text/plain": [ " matrix platform tactic \\\n", "0 mitre-attack Network [execution] \n", "1 mitre-attack Network [collection] \n", "2 mitre-attack Network [defense-evasion, persistence] \n", "3 mitre-attack Network [defense-evasion, persistence] \n", "4 mitre-attack Network [collection] \n", "\n", " technique technique_id \\\n", "0 Network Device CLI T1059.008 \n", "1 Network Device Configuration Dump T1602.002 \n", "2 TFTP Boot T1542.005 \n", "3 ROMMONkit T1542.004 \n", "4 SNMP (MIB Dump) T1602.001 \n", "\n", " data_sources \n", "0 [Network device logs, Network device run-time ... \n", "1 [Netflow/Enclave netflow, Network protocol ana... \n", "2 [Network device run-time memory, Network devic... \n", "3 [File monitoring, Netflow/Enclave netflow, Net... \n", "4 [Netflow/Enclave netflow, Network protocol ana... " ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_platform_2.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a visualization to show the number of techniques grouped by platform:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Platform\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Platform\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 200}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Platform\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Platform\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 200}], \"data\": {\"name\": \"data-94eeddf8fc5f36e972721aadcb2c794d\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-94eeddf8fc5f36e972721aadcb2c794d\": [{\"Platform\": \"AWS\", \"Count of Techniques\": 53}, {\"Platform\": \"Azure\", \"Count of Techniques\": 53}, {\"Platform\": \"Azure AD\", \"Count of Techniques\": 30}, {\"Platform\": \"Control Server\", \"Count of Techniques\": 23}, {\"Platform\": \"Data Historian\", \"Count of Techniques\": 12}, {\"Platform\": \"Engineering Workstation\", \"Count of Techniques\": 13}, {\"Platform\": \"Field Controller/RTU/PLC/IED\", \"Count of Techniques\": 38}, {\"Platform\": \"GCP\", \"Count of Techniques\": 53}, {\"Platform\": \"Human-Machine Interface\", \"Count of Techniques\": 25}, {\"Platform\": \"Input/Output Server\", \"Count of Techniques\": 6}, {\"Platform\": \"Linux\", \"Count of Techniques\": 252}, {\"Platform\": \"Network\", \"Count of Techniques\": 28}, {\"Platform\": \"Office 365\", \"Count of Techniques\": 51}, {\"Platform\": \"PRE\", \"Count of Techniques\": 14}, {\"Platform\": \"SaaS\", \"Count of Techniques\": 35}, {\"Platform\": \"Safety Instrumented System/Protection Relay\", \"Count of Techniques\": 18}, {\"Platform\": \"Windows\", \"Count of Techniques\": 435}, {\"Platform\": \"macOS\", \"Count of Techniques\": 265}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"91350139-e783-4480-84ab-a442dc283743\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#91350139-e783-4480-84ab-a442dc283743" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "platform_distribution = pandas.DataFrame({\n", " 'Platform': list(techniques_platform_2.groupby(['platform'])['platform'].count().keys()),\n", " 'Count of Techniques': techniques_platform_2.groupby(['platform'])['platform'].count().tolist()})\n", "bars = alt.Chart(platform_distribution,height=300).mark_bar().encode(x ='Platform',y='Count of Techniques',color='Platform').properties(width=200)\n", "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n", "bars + text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the bar chart above we can see that there are more techniques with data sources mapped to the Windows platform." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 14. Grouping Techniques With Data Sources by Tactic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, first we need to split the tactic column values because a technique might be mapped to more than one tactic:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "techniques_tactic=techniques_with_data_sources\n", "\n", "attributes_2 = ['tactic'] # In attributes we are going to indicate the name of the columns that we need to split\n", "\n", "for a in attributes_2:\n", " s = techniques_tactic.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n", " # \"s\" is going to be a column of a frame with every value of the list inside each cell of the column \"a\"\n", " s.name = a\n", " # We name \"s\" with the same name of \"a\".\n", " techniques_tactic = techniques_tactic.drop(a, axis=1).join(s).reset_index(drop=True)\n", " # We drop the column \"a\" from \"techniques_tactic\", and then join \"techniques_tactic\" with \"s\"\n", "\n", "# Let's re-arrange the columns from general to specific\n", "techniques_tactic_2=techniques_tactic.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now show techniques with data sources mapped to one tactic at the time" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sources
0mitre-attack[Network]executionNetwork Device CLIT1059.008[Network device logs, Network device run-time ...
1mitre-attack[Network]collectionNetwork Device Configuration DumpT1602.002[Netflow/Enclave netflow, Network protocol ana...
2mitre-attack[Network]defense-evasionTFTP BootT1542.005[Network device run-time memory, Network devic...
3mitre-attack[Network]persistenceTFTP BootT1542.005[Network device run-time memory, Network devic...
4mitre-attack[Network]defense-evasionROMMONkitT1542.004[File monitoring, Netflow/Enclave netflow, Net...
\n", "
" ], "text/plain": [ " matrix platform tactic \\\n", "0 mitre-attack [Network] execution \n", "1 mitre-attack [Network] collection \n", "2 mitre-attack [Network] defense-evasion \n", "3 mitre-attack [Network] persistence \n", "4 mitre-attack [Network] defense-evasion \n", "\n", " technique technique_id \\\n", "0 Network Device CLI T1059.008 \n", "1 Network Device Configuration Dump T1602.002 \n", "2 TFTP Boot T1542.005 \n", "3 TFTP Boot T1542.005 \n", "4 ROMMONkit T1542.004 \n", "\n", " data_sources \n", "0 [Network device logs, Network device run-time ... \n", "1 [Netflow/Enclave netflow, Network protocol ana... \n", "2 [Network device run-time memory, Network devic... \n", "3 [Network device run-time memory, Network devic... \n", "4 [File monitoring, Netflow/Enclave netflow, Net... " ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_tactic_2.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a visualization to show the number of techniques grouped by tactic:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Tactic\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Tactic\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 400}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Tactic\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Tactic\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 400}], \"data\": {\"name\": \"data-a36a295299fa7b623bea39cbd6dc16e5\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-a36a295299fa7b623bea39cbd6dc16e5\": [{\"Tactic\": \"command-and-control-ics\", \"Count of Techniques\": 2}, {\"Tactic\": \"lateral-movement-ics\", \"Count of Techniques\": 5}, {\"Tactic\": \"persistence-ics\", \"Count of Techniques\": 6}, {\"Tactic\": \"resource-development\", \"Count of Techniques\": 7}, {\"Tactic\": \"discovery-ics\", \"Count of Techniques\": 7}, {\"Tactic\": \"evasion-ics\", \"Count of Techniques\": 7}, {\"Tactic\": \"reconnaissance\", \"Count of Techniques\": 7}, {\"Tactic\": \"execution-ics\", \"Count of Techniques\": 8}, {\"Tactic\": \"initial-access-ics\", \"Count of Techniques\": 9}, {\"Tactic\": \"collection-ics\", \"Count of Techniques\": 11}, {\"Tactic\": \"impair-process-control\", \"Count of Techniques\": 11}, {\"Tactic\": \"inhibit-response-function\", \"Count of Techniques\": 15}, {\"Tactic\": \"exfiltration\", \"Count of Techniques\": 17}, {\"Tactic\": \"initial-access\", \"Count of Techniques\": 19}, {\"Tactic\": \"lateral-movement\", \"Count of Techniques\": 23}, {\"Tactic\": \"impact\", \"Count of Techniques\": 26}, {\"Tactic\": \"execution\", \"Count of Techniques\": 34}, {\"Tactic\": \"collection\", \"Count of Techniques\": 34}, {\"Tactic\": \"discovery\", \"Count of Techniques\": 36}, {\"Tactic\": \"command-and-control\", \"Count of Techniques\": 40}, {\"Tactic\": \"credential-access\", \"Count of Techniques\": 48}, {\"Tactic\": \"privilege-escalation\", \"Count of Techniques\": 89}, {\"Tactic\": \"persistence\", \"Count of Techniques\": 99}, {\"Tactic\": \"defense-evasion\", \"Count of Techniques\": 152}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"6bb193cd-6df5-404c-992b-3c19bd9bf3bc\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#6bb193cd-6df5-404c-992b-3c19bd9bf3bc" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tactic_distribution = pandas.DataFrame({\n", " 'Tactic': list(techniques_tactic_2.groupby(['tactic'])['tactic'].count().keys()),\n", " 'Count of Techniques': techniques_tactic_2.groupby(['tactic'])['tactic'].count().tolist()}).sort_values(by='Count of Techniques',ascending=True)\n", "bars = alt.Chart(tactic_distribution,width=800,height=300).mark_bar().encode(x ='Tactic',y='Count of Techniques',color='Tactic').properties(width=400)\n", "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n", "bars + text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Defende-evasion and Persistence are tactics with the highest nummber of techniques with data sources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 15. Grouping Techniques With Data Sources by Data Source" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to split the data source column values because a technique might be mapped to more than one data source:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "techniques_data_source=techniques_with_data_sources\n", "\n", "attributes_3 = ['data_sources'] # In attributes we are going to indicate the name of the columns that we need to split\n", "\n", "for a in attributes_3:\n", " s = techniques_data_source.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n", " # \"s\" is going to be a column of a frame with every value of the list inside each cell of the column \"a\"\n", " s.name = a\n", " # We name \"s\" with the same name of \"a\".\n", " techniques_data_source = techniques_data_source.drop(a, axis=1).join(s).reset_index(drop=True)\n", " # We drop the column \"a\" from \"techniques_data_source\", and then join \"techniques_data_source\" with \"s\"\n", "\n", "# Let's re-arrange the columns from general to specific\n", "techniques_data_source_2 = techniques_data_source.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)\n", "\n", "# We are going to edit some names inside the dataframe to improve the consistency:\n", "techniques_data_source_3 = techniques_data_source_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now show techniques with data sources mapped to one data source at the time" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sources
0mitre-attack[Network][execution]Network Device CLIT1059.008Network device logs
1mitre-attack[Network][execution]Network Device CLIT1059.008Network device run-time memory
2mitre-attack[Network][execution]Network Device CLIT1059.008Network device command history
3mitre-attack[Network][execution]Network Device CLIT1059.008Network device configuration
4mitre-attack[Network][collection]Network Device Configuration DumpT1602.002Netflow/Enclave netflow
\n", "
" ], "text/plain": [ " matrix platform tactic technique \\\n", "0 mitre-attack [Network] [execution] Network Device CLI \n", "1 mitre-attack [Network] [execution] Network Device CLI \n", "2 mitre-attack [Network] [execution] Network Device CLI \n", "3 mitre-attack [Network] [execution] Network Device CLI \n", "4 mitre-attack [Network] [collection] Network Device Configuration Dump \n", "\n", " technique_id data_sources \n", "0 T1059.008 Network device logs \n", "1 T1059.008 Network device run-time memory \n", "2 T1059.008 Network device command history \n", "3 T1059.008 Network device configuration \n", "4 T1602.002 Netflow/Enclave netflow " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_data_source_3.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a visualization to show the number of techniques grouped by data sources:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Data Source\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Data Source\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 1200}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Data Source\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Data Source\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"height\": 300, \"width\": 1200}], \"data\": {\"name\": \"data-0c7cf91db0a6e6401724291cca8f060b\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-0c7cf91db0a6e6401724291cca8f060b\": [{\"Data Source\": \"API monitoring\", \"Count of Techniques\": 82}, {\"Data Source\": \"AWS CloudTrail logs\", \"Count of Techniques\": 32}, {\"Data Source\": \"Access tokens\", \"Count of Techniques\": 4}, {\"Data Source\": \"Alarm History\", \"Count of Techniques\": 3}, {\"Data Source\": \"Alarm history\", \"Count of Techniques\": 9}, {\"Data Source\": \"Alarm thresholds\", \"Count of Techniques\": 1}, {\"Data Source\": \"Anti-virus\", \"Count of Techniques\": 11}, {\"Data Source\": \"Application Logs\", \"Count of Techniques\": 16}, {\"Data Source\": \"Asset management\", \"Count of Techniques\": 3}, {\"Data Source\": \"Authentication logs\", \"Count of Techniques\": 66}, {\"Data Source\": \"Azure activity logs\", \"Count of Techniques\": 32}, {\"Data Source\": \"BIOS\", \"Count of Techniques\": 5}, {\"Data Source\": \"Binary file metadata\", \"Count of Techniques\": 29}, {\"Data Source\": \"Browser extensions\", \"Count of Techniques\": 1}, {\"Data Source\": \"Component firmware\", \"Count of Techniques\": 4}, {\"Data Source\": \"Controller parameters\", \"Count of Techniques\": 1}, {\"Data Source\": \"Controller program\", \"Count of Techniques\": 7}, {\"Data Source\": \"DLL monitoring\", \"Count of Techniques\": 36}, {\"Data Source\": \"DNS records\", \"Count of Techniques\": 8}, {\"Data Source\": \"Data historian\", \"Count of Techniques\": 4}, {\"Data Source\": \"Data loss prevention\", \"Count of Techniques\": 10}, {\"Data Source\": \"Detonation chamber\", \"Count of Techniques\": 6}, {\"Data Source\": \"Digital certificate logs\", \"Count of Techniques\": 1}, {\"Data Source\": \"Digital signatures\", \"Count of Techniques\": 3}, {\"Data Source\": \"Disk forensics\", \"Count of Techniques\": 3}, {\"Data Source\": \"Domain registration\", \"Count of Techniques\": 1}, {\"Data Source\": \"EFI\", \"Count of Techniques\": 3}, {\"Data Source\": \"Email gateway\", \"Count of Techniques\": 12}, {\"Data Source\": \"Environment variable\", \"Count of Techniques\": 5}, {\"Data Source\": \"File Monitoring\", \"Count of Techniques\": 1}, {\"Data Source\": \"File monitoring\", \"Count of Techniques\": 196}, {\"Data Source\": \"GCP audit logs\", \"Count of Techniques\": 12}, {\"Data Source\": \"Host network interface\", \"Count of Techniques\": 7}, {\"Data Source\": \"Host network interfaces\", \"Count of Techniques\": 2}, {\"Data Source\": \"Kernel drivers\", \"Count of Techniques\": 6}, {\"Data Source\": \"Loaded DLLs\", \"Count of Techniques\": 23}, {\"Data Source\": \"MBR\", \"Count of Techniques\": 3}, {\"Data Source\": \"Mail server\", \"Count of Techniques\": 16}, {\"Data Source\": \"Malware reverse engineering\", \"Count of Techniques\": 11}, {\"Data Source\": \"Named Pipes\", \"Count of Techniques\": 1}, {\"Data Source\": \"Netflow/Enclave netflow\", \"Count of Techniques\": 74}, {\"Data Source\": \"Network device command history\", \"Count of Techniques\": 2}, {\"Data Source\": \"Network device configuration\", \"Count of Techniques\": 5}, {\"Data Source\": \"Network device logs\", \"Count of Techniques\": 24}, {\"Data Source\": \"Network device run-time memory\", \"Count of Techniques\": 4}, {\"Data Source\": \"Network intrusion detection system\", \"Count of Techniques\": 18}, {\"Data Source\": \"Network protocol analysis\", \"Count of Techniques\": 89}, {\"Data Source\": \"OAuth audit logs\", \"Count of Techniques\": 4}, {\"Data Source\": \"Office 365 account logs\", \"Count of Techniques\": 12}, {\"Data Source\": \"Office 365 audit logs\", \"Count of Techniques\": 8}, {\"Data Source\": \"Office 365 trace logs\", \"Count of Techniques\": 4}, {\"Data Source\": \"Packet capture\", \"Count of Techniques\": 118}, {\"Data Source\": \"PowerShell logs\", \"Count of Techniques\": 23}, {\"Data Source\": \"Process Monitoring\", \"Count of Techniques\": 320}, {\"Data Source\": \"Process command-line parameters\", \"Count of Techniques\": 199}, {\"Data Source\": \"Process use of network\", \"Count of Techniques\": 68}, {\"Data Source\": \"SSL/TLS certificates\", \"Count of Techniques\": 2}, {\"Data Source\": \"SSL/TLS inspection\", \"Count of Techniques\": 24}, {\"Data Source\": \"SSl/TLS inspection\", \"Count of Techniques\": 1}, {\"Data Source\": \"Sensor health and status\", \"Count of Techniques\": 4}, {\"Data Source\": \"Sequential Event Recorder\", \"Count of Techniques\": 1}, {\"Data Source\": \"Sequential event recorder\", \"Count of Techniques\": 14}, {\"Data Source\": \"Services\", \"Count of Techniques\": 5}, {\"Data Source\": \"Social media monitoring\", \"Count of Techniques\": 5}, {\"Data Source\": \"Stackdriver logs\", \"Count of Techniques\": 27}, {\"Data Source\": \"System calls\", \"Count of Techniques\": 10}, {\"Data Source\": \"Third-party application logs\", \"Count of Techniques\": 5}, {\"Data Source\": \"User interface\", \"Count of Techniques\": 4}, {\"Data Source\": \"VBR\", \"Count of Techniques\": 2}, {\"Data Source\": \"WMI Objects\", \"Count of Techniques\": 2}, {\"Data Source\": \"Web application firewall logs\", \"Count of Techniques\": 9}, {\"Data Source\": \"Web logs\", \"Count of Techniques\": 12}, {\"Data Source\": \"Web proxy\", \"Count of Techniques\": 11}, {\"Data Source\": \"Windows Error Reporting\", \"Count of Techniques\": 4}, {\"Data Source\": \"Windows Registry\", \"Count of Techniques\": 57}, {\"Data Source\": \"Windows error reporting\", \"Count of Techniques\": 1}, {\"Data Source\": \"Windows event logs\", \"Count of Techniques\": 51}, {\"Data Source\": \"Windows registry\", \"Count of Techniques\": 2}, {\"Data Source\": \"process use of network\", \"Count of Techniques\": 1}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"668c5615-0c95-4616-850e-c55bc0da70c2\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#668c5615-0c95-4616-850e-c55bc0da70c2" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_source_distribution = pandas.DataFrame({\n", " 'Data Source': list(techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().keys()),\n", " 'Count of Techniques': techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().tolist()})\n", "bars = alt.Chart(data_source_distribution,width=800,height=300).mark_bar().encode(x ='Data Source',y='Count of Techniques',color='Data Source').properties(width=1200)\n", "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n", "bars + text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few interesting things from the bar chart above:\n", "* Process Monitoring, File Monitoring, and Process Command-line parameters are the Data Sources with the highest number of techniques\n", "* There are some data source names that include string references to Windows such as PowerShell, Windows and wmi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 16. Most Relevant Groups Of Data Sources Per Technique" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Number Of Data Sources Per Technique" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although identifying the data sources with the highest number of techniques is a good start, they usually do not work alone. You might be collecting **Process Monitoring** already but you might be still missing a lot of context from a data perspective." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"x\": {\"type\": \"quantitative\", \"field\": \"Number of Data Sources\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"width\": 500}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"Number of Data Sources\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"width\": 500}], \"data\": {\"name\": \"data-b6f6d78cd7978454e387468282a2f262\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-b6f6d78cd7978454e387468282a2f262\": [{\"Number of Data Sources\": 1, \"Count of Techniques\": 37}, {\"Number of Data Sources\": 2, \"Count of Techniques\": 107}, {\"Number of Data Sources\": 3, \"Count of Techniques\": 125}, {\"Number of Data Sources\": 4, \"Count of Techniques\": 118}, {\"Number of Data Sources\": 5, \"Count of Techniques\": 49}, {\"Number of Data Sources\": 6, \"Count of Techniques\": 33}, {\"Number of Data Sources\": 7, \"Count of Techniques\": 14}, {\"Number of Data Sources\": 8, \"Count of Techniques\": 10}, {\"Number of Data Sources\": 9, \"Count of Techniques\": 5}, {\"Number of Data Sources\": 10, \"Count of Techniques\": 3}, {\"Number of Data Sources\": 11, \"Count of Techniques\": 3}, {\"Number of Data Sources\": 12, \"Count of Techniques\": 4}, {\"Number of Data Sources\": 13, \"Count of Techniques\": 1}, {\"Number of Data Sources\": 14, \"Count of Techniques\": 1}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"d80ddddd-15b9-47bb-894b-26b319632f83\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#d80ddddd-15b9-47bb-894b-26b319632f83" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_source_distribution_2 = pandas.DataFrame({\n", " 'Techniques': list(techniques_data_source_3.groupby(['technique'])['technique'].count().keys()),\n", " 'Count of Data Sources': techniques_data_source_3.groupby(['technique'])['technique'].count().tolist()})\n", "\n", "data_source_distribution_3 = pandas.DataFrame({\n", " 'Number of Data Sources': list(data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().keys()),\n", " 'Count of Techniques': data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().tolist()})\n", "\n", "bars = alt.Chart(data_source_distribution_3).mark_bar().encode(x ='Number of Data Sources',y='Count of Techniques').properties(width=500)\n", "text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')\n", "bars + text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The image above shows you the number data sources needed per techniques according to ATT&CK:\n", "* There are 71 techniques that require 3 data sources as enough context to validate the detection of them according to ATT&CK\n", "* Only one technique has 12 data sources\n", "* One data source only applies to 19 techniques" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create subsets of data sources with the data source column defining and using a python function:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "# https://stackoverflow.com/questions/26332412/python-recursive-function-to-display-all-subsets-of-given-set\n", "def subs(l):\n", " res = []\n", " for i in range(1, len(l) + 1):\n", " for combo in itertools.combinations(l, i):\n", " res.append(list(combo))\n", " return res" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before applying the function, we need to use lowercase data sources names and sort data sources names to improve consistency:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "df = techniques_with_data_sources[['data_sources']]" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "for index, row in df.iterrows():\n", " row[\"data_sources\"]=[x.lower() for x in row[\"data_sources\"]]\n", " row[\"data_sources\"].sort()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data_sources
0[network device command history, network devic...
1[netflow/enclave netflow, network protocol ana...
2[file monitoring, network device command histo...
3[file monitoring, netflow/enclave netflow, net...
4[netflow/enclave netflow, network protocol ana...
\n", "
" ], "text/plain": [ " data_sources\n", "0 [network device command history, network devic...\n", "1 [netflow/enclave netflow, network protocol ana...\n", "2 [file monitoring, network device command histo...\n", "3 [file monitoring, netflow/enclave netflow, net...\n", "4 [netflow/enclave netflow, network protocol ana..." ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's apply the function and split the subsets column:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df['subsets']=df['data_sources'].apply(subs)\n" ] } ], "source": [ "df['subsets']=df['data_sources'].apply(subs)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data_sourcessubsets
0[network device command history, network devic...[[network device command history], [network de...
1[netflow/enclave netflow, network protocol ana...[[netflow/enclave netflow], [network protocol ...
2[file monitoring, network device command histo...[[file monitoring], [network device command hi...
3[file monitoring, netflow/enclave netflow, net...[[file monitoring], [netflow/enclave netflow],...
4[netflow/enclave netflow, network protocol ana...[[netflow/enclave netflow], [network protocol ...
\n", "
" ], "text/plain": [ " data_sources \\\n", "0 [network device command history, network devic... \n", "1 [netflow/enclave netflow, network protocol ana... \n", "2 [file monitoring, network device command histo... \n", "3 [file monitoring, netflow/enclave netflow, net... \n", "4 [netflow/enclave netflow, network protocol ana... \n", "\n", " subsets \n", "0 [[network device command history], [network de... \n", "1 [[netflow/enclave netflow], [network protocol ... \n", "2 [[file monitoring], [network device command hi... \n", "3 [[file monitoring], [netflow/enclave netflow],... \n", "4 [[netflow/enclave netflow], [network protocol ... " ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to split the subsets column values:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "techniques_with_data_sources_preview = df" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "attributes_4 = ['subsets']\n", "\n", "for a in attributes_4:\n", " s = techniques_with_data_sources_preview.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n", " s.name = a\n", " techniques_with_data_sources_preview = techniques_with_data_sources_preview.drop(a, axis=1).join(s).reset_index(drop=True)\n", " \n", "techniques_with_data_sources_subsets = techniques_with_data_sources_preview.reindex(['data_sources','subsets'], axis=1)\n" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data_sourcessubsets
0[network device command history, network devic...[network device command history]
1[network device command history, network devic...[network device configuration]
2[network device command history, network devic...[network device logs]
3[network device command history, network devic...[network device run-time memory]
4[network device command history, network devic...[network device command history, network devic...
\n", "
" ], "text/plain": [ " data_sources \\\n", "0 [network device command history, network devic... \n", "1 [network device command history, network devic... \n", "2 [network device command history, network devic... \n", "3 [network device command history, network devic... \n", "4 [network device command history, network devic... \n", "\n", " subsets \n", "0 [network device command history] \n", "1 [network device configuration] \n", "2 [network device logs] \n", "3 [network device run-time memory] \n", "4 [network device command history, network devic... " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_with_data_sources_subsets.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's add three columns to analyse the dataframe: subsets_name (Changing Lists to Strings), subsets_number_elements ( Number of data sources per subset) and number_data_sources_per_technique" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "techniques_with_data_sources_subsets['subsets_name']=techniques_with_data_sources_subsets['subsets'].apply(lambda x: ','.join(map(str, x)))\n", "techniques_with_data_sources_subsets['subsets_number_elements']=techniques_with_data_sources_subsets['subsets'].str.len()\n", "techniques_with_data_sources_subsets['number_data_sources_per_technique']=techniques_with_data_sources_subsets['data_sources'].str.len()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data_sourcessubsetssubsets_namesubsets_number_elementsnumber_data_sources_per_technique
0[network device command history, network devic...[network device command history]network device command history14
1[network device command history, network devic...[network device configuration]network device configuration14
2[network device command history, network devic...[network device logs]network device logs14
3[network device command history, network devic...[network device run-time memory]network device run-time memory14
4[network device command history, network devic...[network device command history, network devic...network device command history,network device ...24
\n", "
" ], "text/plain": [ " data_sources \\\n", "0 [network device command history, network devic... \n", "1 [network device command history, network devic... \n", "2 [network device command history, network devic... \n", "3 [network device command history, network devic... \n", "4 [network device command history, network devic... \n", "\n", " subsets \\\n", "0 [network device command history] \n", "1 [network device configuration] \n", "2 [network device logs] \n", "3 [network device run-time memory] \n", "4 [network device command history, network devic... \n", "\n", " subsets_name subsets_number_elements \\\n", "0 network device command history 1 \n", "1 network device configuration 1 \n", "2 network device logs 1 \n", "3 network device run-time memory 1 \n", "4 network device command history,network device ... 2 \n", "\n", " number_data_sources_per_technique \n", "0 4 \n", "1 4 \n", "2 4 \n", "3 4 \n", "4 4 " ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_with_data_sources_subsets.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As it was described above, we need to find grups pf data sources, so we are going to filter out all the subsets with only one data source:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "subsets = techniques_with_data_sources_subsets\n", "\n", "subsets_ok=subsets[subsets.subsets_number_elements != 1]" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data_sourcessubsetssubsets_namesubsets_number_elementsnumber_data_sources_per_technique
4[network device command history, network devic...[network device command history, network devic...network device command history,network device ...24
5[network device command history, network devic...[network device command history, network devic...network device command history,network device ...24
6[network device command history, network devic...[network device command history, network devic...network device command history,network device ...24
7[network device command history, network devic...[network device configuration, network device ...network device configuration,network device logs24
8[network device command history, network devic...[network device configuration, network device ...network device configuration,network device ru...24
\n", "
" ], "text/plain": [ " data_sources \\\n", "4 [network device command history, network devic... \n", "5 [network device command history, network devic... \n", "6 [network device command history, network devic... \n", "7 [network device command history, network devic... \n", "8 [network device command history, network devic... \n", "\n", " subsets \\\n", "4 [network device command history, network devic... \n", "5 [network device command history, network devic... \n", "6 [network device command history, network devic... \n", "7 [network device configuration, network device ... \n", "8 [network device configuration, network device ... \n", "\n", " subsets_name subsets_number_elements \\\n", "4 network device command history,network device ... 2 \n", "5 network device command history,network device ... 2 \n", "6 network device command history,network device ... 2 \n", "7 network device configuration,network device logs 2 \n", "8 network device configuration,network device ru... 2 \n", "\n", " number_data_sources_per_technique \n", "4 4 \n", "5 4 \n", "6 4 \n", "7 4 \n", "8 4 " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subsets_ok.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we calculate the most relevant groups of data sources (Top 15):" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "subsets_graph = subsets_ok.groupby(['subsets_name'])['subsets_name'].count().to_frame(name='subsets_count').sort_values(by='subsets_count',ascending=False)[0:15]" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subsets_count
subsets_name
process command-line parameters,process monitoring183
file monitoring,process monitoring144
file monitoring,process command-line parameters100
file monitoring,process command-line parameters,process monitoring88
network protocol analysis,packet capture76
api monitoring,process monitoring70
process monitoring,process use of network56
netflow/enclave netflow,packet capture55
process monitoring,windows registry50
packet capture,process use of network45
packet capture,process monitoring43
process command-line parameters,windows registry41
netflow/enclave netflow,network protocol analysis41
network protocol analysis,process use of network40
netflow/enclave netflow,process monitoring38
\n", "
" ], "text/plain": [ " subsets_count\n", "subsets_name \n", "process command-line parameters,process monitoring 183\n", "file monitoring,process monitoring 144\n", "file monitoring,process command-line parameters 100\n", "file monitoring,process command-line parameters... 88\n", "network protocol analysis,packet capture 76\n", "api monitoring,process monitoring 70\n", "process monitoring,process use of network 56\n", "netflow/enclave netflow,packet capture 55\n", "process monitoring,windows registry 50\n", "packet capture,process use of network 45\n", "packet capture,process monitoring 43\n", "process command-line parameters,windows registry 41\n", "netflow/enclave netflow,network protocol analysis 41\n", "network protocol analysis,process use of network 40\n", "netflow/enclave netflow,process monitoring 38" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subsets_graph" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Data Sources\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Data Sources\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"width\": 500}, {\"mark\": {\"type\": \"text\", \"align\": \"center\", \"baseline\": \"middle\", \"dx\": 0, \"dy\": -5}, \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"Data Sources\"}, \"text\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}, \"x\": {\"type\": \"nominal\", \"field\": \"Data Sources\"}, \"y\": {\"type\": \"quantitative\", \"field\": \"Count of Techniques\"}}, \"width\": 500}], \"data\": {\"name\": \"data-ef18c839539c3164e0c40c20eb1da48e\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-ef18c839539c3164e0c40c20eb1da48e\": [{\"Data Sources\": \"process command-line parameters,process monitoring\", \"Count of Techniques\": 183}, {\"Data Sources\": \"file monitoring,process monitoring\", \"Count of Techniques\": 144}, {\"Data Sources\": \"file monitoring,process command-line parameters\", \"Count of Techniques\": 100}, {\"Data Sources\": \"file monitoring,process command-line parameters,process monitoring\", \"Count of Techniques\": 88}, {\"Data Sources\": \"network protocol analysis,packet capture\", \"Count of Techniques\": 76}, {\"Data Sources\": \"api monitoring,process monitoring\", \"Count of Techniques\": 70}, {\"Data Sources\": \"process monitoring,process use of network\", \"Count of Techniques\": 56}, {\"Data Sources\": \"netflow/enclave netflow,packet capture\", \"Count of Techniques\": 55}, {\"Data Sources\": \"process monitoring,windows registry\", \"Count of Techniques\": 50}, {\"Data Sources\": \"packet capture,process use of network\", \"Count of Techniques\": 45}, {\"Data Sources\": \"packet capture,process monitoring\", \"Count of Techniques\": 43}, {\"Data Sources\": \"process command-line parameters,windows registry\", \"Count of Techniques\": 41}, {\"Data Sources\": \"netflow/enclave netflow,network protocol analysis\", \"Count of Techniques\": 41}, {\"Data Sources\": \"network protocol analysis,process use of network\", \"Count of Techniques\": 40}, {\"Data Sources\": \"netflow/enclave netflow,process monitoring\", \"Count of Techniques\": 38}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"2214899b-49ff-44bd-8006-c13ea8aa10bc\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#2214899b-49ff-44bd-8006-c13ea8aa10bc" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subsets_graph_2 = pandas.DataFrame({\n", " 'Data Sources': list(subsets_graph.index),\n", " 'Count of Techniques': subsets_graph['subsets_count'].tolist()})\n", "\n", "bars = alt.Chart(subsets_graph_2).mark_bar().encode(x ='Data Sources', y ='Count of Techniques', color='Data Sources').properties(width=500)\n", "text = bars.mark_text(align='center',baseline='middle',dx= 0,dy=-5).encode(text='Count of Techniques')\n", "bars + text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Group (Process Monitoring - Process Command-line parameters) is the is the group of data sources with the highest number of techniques. This group of data sources are suggested to hunt 78 techniques" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 17. Let's Split all the Information About Techniques With Data Sources Defined: Matrix, Platform, Tactic and Data Source" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's split all the relevant columns of the dataframe:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sources
0mitre-attackNetworkexecutionNetwork Device CLIT1059.008Network device logs
1mitre-attackNetworkexecutionNetwork Device CLIT1059.008Network device run-time memory
2mitre-attackNetworkexecutionNetwork Device CLIT1059.008Network device command history
3mitre-attackNetworkexecutionNetwork Device CLIT1059.008Network device configuration
4mitre-attackNetworkcollectionNetwork Device Configuration DumpT1602.002Netflow/Enclave netflow
\n", "
" ], "text/plain": [ " matrix platform tactic technique \\\n", "0 mitre-attack Network execution Network Device CLI \n", "1 mitre-attack Network execution Network Device CLI \n", "2 mitre-attack Network execution Network Device CLI \n", "3 mitre-attack Network execution Network Device CLI \n", "4 mitre-attack Network collection Network Device Configuration Dump \n", "\n", " technique_id data_sources \n", "0 T1059.008 Network device logs \n", "1 T1059.008 Network device run-time memory \n", "2 T1059.008 Network device command history \n", "3 T1059.008 Network device configuration \n", "4 T1602.002 Netflow/Enclave netflow " ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_data = techniques_with_data_sources\n", "\n", "attributes = ['platform','tactic','data_sources'] # In attributes we are going to indicate the name of the columns that we need to split\n", "\n", "for a in attributes:\n", " s = techniques_data.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n", " # \"s\" is going to be a column of a frame with every value of the list inside each cell of the column \"a\"\n", " s.name = a\n", " # We name \"s\" with the same name of \"a\".\n", " techniques_data=techniques_data.drop(a, axis=1).join(s).reset_index(drop=True)\n", " # We drop the column \"a\" from \"techniques_data\", and then join \"techniques_data\" with \"s\"\n", "\n", "# Let's re-arrange the columns from general to specific\n", "techniques_data_2=techniques_data.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)\n", "\n", "# We are going to edit some names inside the dataframe to improve the consistency:\n", "techniques_data_3 = techniques_data_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])\n", "\n", "techniques_data_3.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do you remember data sources names with a reference to Windows? After splitting the dataframe by platforms, tactics and data sources, are there any macOC or linux techniques that consider windows data sources? Let's identify those rows:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "# After splitting the rows of the dataframe, there are some values that relate windows data sources with platforms like linux and masOS.\n", "# We need to identify those rows\n", "conditions = [(techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),\n", " (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),\n", " (techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),\n", " (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),\n", " (techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True),\n", " (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True)]\n", "# In conditions we indicate a logical test\n", "\n", "choices = ['NO OK','NO OK','NO OK','NO OK','NO OK','NO OK']\n", "# In choices, we indicate the result when the logical test is true\n", "\n", "techniques_data_3['Validation'] = np.select(conditions,choices,default='OK')\n", "# We add a column \"Validation\" to \"techniques_data_3\" with the result of the logical test. The default value is going to be \"OK\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the inconsistent data?" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sourcesValidation
162mitre-attackLinuxdefense-evasionRun Virtual InstanceT1564.006Windows RegistryNO OK
168mitre-attackmacOSdefense-evasionRun Virtual InstanceT1564.006Windows RegistryNO OK
179mitre-attackLinuxdefense-evasionHidden File SystemT1564.005Windows RegistryNO OK
181mitre-attackmacOSdefense-evasionHidden File SystemT1564.005Windows RegistryNO OK
794mitre-attackmacOSdefense-evasionHidden WindowT1564.003PowerShell logsNO OK
\n", "
" ], "text/plain": [ " matrix platform tactic technique \\\n", "162 mitre-attack Linux defense-evasion Run Virtual Instance \n", "168 mitre-attack macOS defense-evasion Run Virtual Instance \n", "179 mitre-attack Linux defense-evasion Hidden File System \n", "181 mitre-attack macOS defense-evasion Hidden File System \n", "794 mitre-attack macOS defense-evasion Hidden Window \n", "\n", " technique_id data_sources Validation \n", "162 T1564.006 Windows Registry NO OK \n", "168 T1564.006 Windows Registry NO OK \n", "179 T1564.005 Windows Registry NO OK \n", "181 T1564.005 Windows Registry NO OK \n", "794 T1564.003 PowerShell logs NO OK " ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_analysis_data_no_ok = techniques_data_3[techniques_data_3.Validation == 'NO OK']\n", "# Finally, we are filtering all the values with NO OK\n", "\n", "techniques_analysis_data_no_ok.head()" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 136 rows with inconsistent data\n" ] } ], "source": [ "print('There are ',len(techniques_analysis_data_no_ok),' rows with inconsistent data')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the impact of this inconsistent data from a platform and data sources perspective?" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "df = techniques_with_data_sources\n", "\n", "attributes = ['platform','data_sources']\n", "\n", "for a in attributes:\n", " s = df.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)\n", " s.name = a\n", " df=df.drop(a, axis=1).join(s).reset_index(drop=True)\n", " \n", "df_2=df.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)\n", "df_3 = df_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])\n", "\n", "conditions = [(df_3['data_sources'].str.contains('windows',case=False)== True),\n", " (df_3['data_sources'].str.contains('powershell',case=False)== True),\n", " (df_3['data_sources'].str.contains('wmi',case=False)== True)]\n", "\n", "choices = ['Windows','Windows','Windows']\n", "\n", "df_3['Validation'] = np.select(conditions,choices,default='Other')\n", "df_3['Num_Tech'] = 1\n", "df_4 = df_3[df_3.Validation == 'Windows']\n", "df_5 = df_4.groupby(['data_sources','platform'])['technique'].nunique()\n", "df_6 = df_5.to_frame().reset_index()" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "const spec = {\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-6d4700e1646c3dceebb7655c72e7b5ac\"}, \"mark\": \"bar\", \"encoding\": {\"color\": {\"type\": \"nominal\", \"field\": \"platform\"}, \"x\": {\"type\": \"quantitative\", \"field\": \"technique\", \"stack\": \"normalize\"}, \"y\": {\"type\": \"nominal\", \"field\": \"data_sources\"}}, \"height\": 200, \"$schema\": \"https://vega.github.io/schema/vega-lite/v4.8.1.json\", \"datasets\": {\"data-6d4700e1646c3dceebb7655c72e7b5ac\": [{\"data_sources\": \"PowerShell logs\", \"platform\": \"Linux\", \"technique\": 9}, {\"data_sources\": \"PowerShell logs\", \"platform\": \"Network\", \"technique\": 2}, {\"data_sources\": \"PowerShell logs\", \"platform\": \"Windows\", \"technique\": 22}, {\"data_sources\": \"PowerShell logs\", \"platform\": \"macOS\", \"technique\": 13}, {\"data_sources\": \"WMI Objects\", \"platform\": \"Linux\", \"technique\": 1}, {\"data_sources\": \"WMI Objects\", \"platform\": \"Windows\", \"technique\": 2}, {\"data_sources\": \"WMI Objects\", \"platform\": \"macOS\", \"technique\": 1}, {\"data_sources\": \"Windows Error Reporting\", \"platform\": \"Linux\", \"technique\": 4}, {\"data_sources\": \"Windows Error Reporting\", \"platform\": \"Windows\", \"technique\": 4}, {\"data_sources\": \"Windows Error Reporting\", \"platform\": \"macOS\", \"technique\": 4}, {\"data_sources\": \"Windows Registry\", \"platform\": \"AWS\", \"technique\": 2}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Azure\", \"technique\": 2}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Azure AD\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Control Server\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Data Historian\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Engineering Workstation\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"GCP\", \"technique\": 2}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Human-Machine Interface\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Linux\", \"technique\": 19}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Network\", \"technique\": 3}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Office 365\", \"technique\": 5}, {\"data_sources\": \"Windows Registry\", \"platform\": \"SaaS\", \"technique\": 1}, {\"data_sources\": \"Windows Registry\", \"platform\": \"Windows\", \"technique\": 55}, {\"data_sources\": \"Windows Registry\", \"platform\": \"macOS\", \"technique\": 19}, {\"data_sources\": \"Windows error reporting\", \"platform\": \"Data Historian\", \"technique\": 1}, {\"data_sources\": \"Windows error reporting\", \"platform\": \"Engineering Workstation\", \"technique\": 1}, {\"data_sources\": \"Windows error reporting\", \"platform\": \"Human-Machine Interface\", \"technique\": 1}, {\"data_sources\": \"Windows error reporting\", \"platform\": \"Windows\", \"technique\": 1}, {\"data_sources\": \"Windows event logs\", \"platform\": \"AWS\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Azure\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Azure AD\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Control Server\", \"technique\": 1}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Engineering Workstation\", \"technique\": 2}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Field Controller/RTU/PLC/IED\", \"technique\": 1}, {\"data_sources\": \"Windows event logs\", \"platform\": \"GCP\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Human-Machine Interface\", \"technique\": 2}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Linux\", \"technique\": 19}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Network\", \"technique\": 2}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Office 365\", \"technique\": 3}, {\"data_sources\": \"Windows event logs\", \"platform\": \"SaaS\", \"technique\": 1}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Safety Instrumented System/Protection Relay\", \"technique\": 2}, {\"data_sources\": \"Windows event logs\", \"platform\": \"Windows\", \"technique\": 50}, {\"data_sources\": \"Windows event logs\", \"platform\": \"macOS\", \"technique\": 18}, {\"data_sources\": \"Windows registry\", \"platform\": \"Engineering Workstation\", \"technique\": 1}, {\"data_sources\": \"Windows registry\", \"platform\": \"Field Controller/RTU/PLC/IED\", \"technique\": 1}, {\"data_sources\": \"Windows registry\", \"platform\": \"Windows\", \"technique\": 2}]}};\n", "const opt = {};\n", "const type = \"vega-lite\";\n", "const id = \"5e119597-5160-4769-a803-0ec11b1a8ecd\";\n", "\n", "const output_area = this;\n", "\n", "require([\"nbextensions/jupyter-vega/index\"], function(vega) {\n", " const target = document.createElement(\"div\");\n", " target.id = id;\n", " target.className = \"vega-embed\";\n", "\n", " const style = document.createElement(\"style\");\n", " style.textContent = [\n", " \".vega-embed .error p {\",\n", " \" color: firebrick;\",\n", " \" font-size: 14px;\",\n", " \"}\",\n", " ].join(\"\\\\n\");\n", "\n", " // element is a jQuery wrapped DOM element inside the output area\n", " // see http://ipython.readthedocs.io/en/stable/api/generated/\\\n", " // IPython.display.html#IPython.display.Javascript.__init__\n", " element[0].appendChild(target);\n", " element[0].appendChild(style);\n", "\n", " vega.render(\"#\" + id, spec, type, opt, output_area);\n", "}, function (err) {\n", " if (err.requireType !== \"scripterror\") {\n", " throw(err);\n", " }\n", "});\n" ], "text/plain": [ "" ] }, "metadata": { "jupyter-vega": "#5e119597-5160-4769-a803-0ec11b1a8ecd" }, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(df_6).mark_bar().encode(x=alt.X('technique', stack=\"normalize\"), y='data_sources', color='platform').properties(height=200)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are techniques that consider Windows Error Reporting, Windows Registry, and Windows event logs as data sources and they also consider platforms like Linux and masOS. We do not need to consider this rows because those data sources can only be managed at a Windows environment. These are the techniques that we should not consider in our data base:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
techniquedata_sources
5953OS Credential DumpingPowerShell logs
5832Remote ServicesPowerShell logs
2814Clear Command HistoryPowerShell logs
2432Credentials from Password StoresPowerShell logs
4564Peripheral Device DiscoveryPowerShell logs
2271KeychainPowerShell logs
2259Credentials from Web BrowsersPowerShell logs
2392GUI Input CapturePowerShell logs
1831Impair Command History LoggingPowerShell logs
794Hidden WindowPowerShell logs
1611Hide ArtifactsPowerShell logs
5431Input CapturePowerShell logs
5402Command and Scripting InterpreterPowerShell logs
3206Event Triggered ExecutionWMI Objects
4156Exploitation of Remote ServicesWindows Error Reporting
4206Exploitation for Defense EvasionWindows Error Reporting
5361Exploitation for Privilege EscalationWindows Error Reporting
4241Exploitation for Credential AccessWindows Error Reporting
3212Event Triggered ExecutionWindows Registry
5217Software Deployment ToolsWindows Registry
4038Service StopWindows Registry
4020Inhibit System RecoveryWindows Registry
5426Input CaptureWindows Registry
3389Create or Modify System ProcessWindows Registry
5827Remote ServicesWindows Registry
4373Browser ExtensionsWindows Registry
162Run Virtual InstanceWindows Registry
2414KeyloggingWindows Registry
1875Impair DefensesWindows Registry
2599Masquerade Task or ServiceWindows Registry
1857Disable or Modify ToolsWindows Registry
2654Subvert Trust ControlsWindows Registry
1824Disable or Modify System FirewallWindows Registry
1204System ServicesWindows Registry
2341Modify Authentication ProcessWindows Registry
2722Unsecured CredentialsWindows Registry
179Hidden File SystemWindows Registry
2895Abuse Elevation Control MechanismWindows Registry
5278Indicator Removal on HostWindows event logs
5775Obfuscated Files or InformationWindows event logs
5401Command and Scripting InterpreterWindows event logs
5828Remote ServicesWindows event logs
5559Scheduled Task/JobWindows event logs
5427Input CaptureWindows event logs
2970Local AccountWindows event logs
3202Event Triggered ExecutionWindows event logs
4439Create AccountWindows event logs
2602Masquerade Task or ServiceWindows event logs
2655Subvert Trust ControlsWindows event logs
4078File and Directory Permissions ModificationWindows event logs
2720Unsecured CredentialsWindows event logs
4022Inhibit System RecoveryWindows event logs
3624System Shutdown/RebootWindows event logs
3605Account Access RemovalWindows event logs
2962Domain AccountWindows event logs
4909Account ManipulationWindows event logs
3388Create or Modify System ProcessWindows event logs
\n", "
" ], "text/plain": [ " technique data_sources\n", "5953 OS Credential Dumping PowerShell logs\n", "5832 Remote Services PowerShell logs\n", "2814 Clear Command History PowerShell logs\n", "2432 Credentials from Password Stores PowerShell logs\n", "4564 Peripheral Device Discovery PowerShell logs\n", "2271 Keychain PowerShell logs\n", "2259 Credentials from Web Browsers PowerShell logs\n", "2392 GUI Input Capture PowerShell logs\n", "1831 Impair Command History Logging PowerShell logs\n", "794 Hidden Window PowerShell logs\n", "1611 Hide Artifacts PowerShell logs\n", "5431 Input Capture PowerShell logs\n", "5402 Command and Scripting Interpreter PowerShell logs\n", "3206 Event Triggered Execution WMI Objects\n", "4156 Exploitation of Remote Services Windows Error Reporting\n", "4206 Exploitation for Defense Evasion Windows Error Reporting\n", "5361 Exploitation for Privilege Escalation Windows Error Reporting\n", "4241 Exploitation for Credential Access Windows Error Reporting\n", "3212 Event Triggered Execution Windows Registry\n", "5217 Software Deployment Tools Windows Registry\n", "4038 Service Stop Windows Registry\n", "4020 Inhibit System Recovery Windows Registry\n", "5426 Input Capture Windows Registry\n", "3389 Create or Modify System Process Windows Registry\n", "5827 Remote Services Windows Registry\n", "4373 Browser Extensions Windows Registry\n", "162 Run Virtual Instance Windows Registry\n", "2414 Keylogging Windows Registry\n", "1875 Impair Defenses Windows Registry\n", "2599 Masquerade Task or Service Windows Registry\n", "1857 Disable or Modify Tools Windows Registry\n", "2654 Subvert Trust Controls Windows Registry\n", "1824 Disable or Modify System Firewall Windows Registry\n", "1204 System Services Windows Registry\n", "2341 Modify Authentication Process Windows Registry\n", "2722 Unsecured Credentials Windows Registry\n", "179 Hidden File System Windows Registry\n", "2895 Abuse Elevation Control Mechanism Windows Registry\n", "5278 Indicator Removal on Host Windows event logs\n", "5775 Obfuscated Files or Information Windows event logs\n", "5401 Command and Scripting Interpreter Windows event logs\n", "5828 Remote Services Windows event logs\n", "5559 Scheduled Task/Job Windows event logs\n", "5427 Input Capture Windows event logs\n", "2970 Local Account Windows event logs\n", "3202 Event Triggered Execution Windows event logs\n", "4439 Create Account Windows event logs\n", "2602 Masquerade Task or Service Windows event logs\n", "2655 Subvert Trust Controls Windows event logs\n", "4078 File and Directory Permissions Modification Windows event logs\n", "2720 Unsecured Credentials Windows event logs\n", "4022 Inhibit System Recovery Windows event logs\n", "3624 System Shutdown/Reboot Windows event logs\n", "3605 Account Access Removal Windows event logs\n", "2962 Domain Account Windows event logs\n", "4909 Account Manipulation Windows event logs\n", "3388 Create or Modify System Process Windows event logs" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_analysis_data_no_ok[['technique','data_sources']].drop_duplicates().sort_values(by='data_sources',ascending=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Without considering this inconsistent data, the final dataframe is:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
matrixplatformtactictechniquetechnique_iddata_sourcesValidation
0mitre-attackNetworkexecutionNetwork Device CLIT1059.008Network device logsOK
1mitre-attackNetworkexecutionNetwork Device CLIT1059.008Network device run-time memoryOK
2mitre-attackNetworkexecutionNetwork Device CLIT1059.008Network device command historyOK
3mitre-attackNetworkexecutionNetwork Device CLIT1059.008Network device configurationOK
4mitre-attackNetworkcollectionNetwork Device Configuration DumpT1602.002Netflow/Enclave netflowOK
\n", "
" ], "text/plain": [ " matrix platform tactic technique \\\n", "0 mitre-attack Network execution Network Device CLI \n", "1 mitre-attack Network execution Network Device CLI \n", "2 mitre-attack Network execution Network Device CLI \n", "3 mitre-attack Network execution Network Device CLI \n", "4 mitre-attack Network collection Network Device Configuration Dump \n", "\n", " technique_id data_sources Validation \n", "0 T1059.008 Network device logs OK \n", "1 T1059.008 Network device run-time memory OK \n", "2 T1059.008 Network device command history OK \n", "3 T1059.008 Network device configuration OK \n", "4 T1602.002 Netflow/Enclave netflow OK " ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "techniques_analysis_data_ok = techniques_data_3[techniques_data_3.Validation == 'OK']\n", "techniques_analysis_data_ok.head()" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 6650 rows of data that you can play with\n" ] } ], "source": [ "print('There are ',len(techniques_analysis_data_ok),' rows of data that you can play with')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 18. Getting Techniques by Data Sources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function gets techniques' information that includes specific data sources" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "data_source = 'PROCESS MONITORING'" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "results = lift.get_techniques_by_datasources(data_source)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "320" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(results)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(results)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "results2 = lift.get_techniques_by_datasources('pRoceSS MoniTorinG','process commAnd-linE parameters')" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "336" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(results2)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AttackPattern(type='attack-pattern', id='attack-pattern--2de47683-f398-448f-b947-9abcc3e32fad', created_by_ref='identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5', created='2020-10-05T13:24:49.780Z', modified='2020-10-09T16:05:36.344Z', name='Print Processors', description='Adversaries may abuse print processors to run malicious DLLs during system boot for persistence and/or privilege escalation. Print processors are DLLs that are loaded by the print spooler service, spoolsv.exe, during boot. \\n\\nAdversaries may abuse the print spooler service by adding print processors that load malicious DLLs at startup. A print processor can be installed through the AddPrintProcessor API call with an account that has SeLoadDriverPrivilege enabled. Alternatively, a print processor can be registered to the print spooler service by adding the HKLM\\\\SYSTEM\\\\\\\\[CurrentControlSet or ControlSet001]\\\\Control\\\\Print\\\\Environments\\\\\\\\[Windows architecture: e.g., Windows x64]\\\\Print Processors\\\\\\\\[user defined]\\\\Driver Registry key that points to the DLL. For the print processor to be correctly installed, it must be located in the system print-processor directory that can be found with the GetPrintProcessorDirectory API call.(Citation: Microsoft AddPrintProcessor May 2018) After the print processors are installed, the print spooler service, which starts during boot, must be restarted in order for them to run.(Citation: ESET PipeMon May 2020) The print spooler service runs under SYSTEM level permissions, therefore print processors installed by an adversary may run under elevated privileges.', kill_chain_phases=[KillChainPhase(kill_chain_name='mitre-attack', phase_name='persistence'), KillChainPhase(kill_chain_name='mitre-attack', phase_name='privilege-escalation')], external_references=[ExternalReference(source_name='mitre-attack', url='https://attack.mitre.org/techniques/T1547/012', external_id='T1547.012'), ExternalReference(source_name='Microsoft AddPrintProcessor May 2018', description='Microsoft. (2018, May 31). AddPrintProcessor function. Retrieved October 5, 2020.', url='https://docs.microsoft.com/en-us/windows/win32/printdocs/addprintprocessor'), ExternalReference(source_name='ESET PipeMon May 2020', description='Tartare, M. et al. (2020, May 21). No “Game over” for the Winnti Group. Retrieved August 24, 2020.', url='https://www.welivesecurity.com/2020/05/21/no-game-over-winnti-group/')], object_marking_refs=['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'], x_mitre_contributors=['Mathieu Tartare, ESET'], x_mitre_data_sources=['Process monitoring', 'Windows Registry', 'File monitoring', 'DLL monitoring', 'API monitoring'], x_mitre_detection='Monitor process API calls to AddPrintProcessor and GetPrintProcessorDirectory. New print processor DLLs are written to the print processor directory. Also monitor Registry writes to HKLM\\\\SYSTEM\\\\ControlSet001\\\\Control\\\\Print\\\\Environments\\\\\\\\[Windows architecture]\\\\Print Processors\\\\\\\\[user defined]\\\\\\\\Driver or HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Control\\\\Print\\\\Environments\\\\\\\\[Windows architecture]\\\\Print Processors\\\\\\\\[user defined]\\\\Driver as they pertain to print processor installations.\\n\\nMonitor for abnormal DLLs that are loaded by spoolsv.exe. Print processors that do not correlate with known good software or patching may be suspicious.', x_mitre_is_subtechnique=True, x_mitre_permissions_required=['Administrator', 'SYSTEM'], x_mitre_platforms=['Windows'], x_mitre_version='1.0')" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results2[1]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }