{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Alert Investigation - Windows Process Alerts\n", "\n", "**Notebook Version:** 1.1
" ] }, { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Description:\n", "This notebook is intended for triage and investigation of security alerts related to process execution. It is specifically targeted at alerts triggered by suspicious process activity on Windows hosts. \n", "\n", "**Data Sources Used**:
\n", "- Log Analytics/Azure Sentinel\n", " - SecurityAlert, \n", " - SecurityEvent\n", "- Threat Intelligence Providers (Optional)\n", " - OTX (https://otx.alienvault.com/)\n", " - VirusTotal (https://www.virustotal.com/)\n", " - XForce (https://www.ibm.com/security/xforce)" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-10-28T21:29:26.482381Z", "start_time": "2019-10-28T21:29:26.468678Z" } }, "source": [ "### Notebook Setup\n", "
\n", "  Details...\n", " If this is your first time running this Notebook please run the cells in in the Setup section before proceeding to ensure you have the required packages installed correctly. Similarly if you see any import failures (```ImportError```) in the notebook, please make sure that you have run the [Setup](#setup) section first.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-10-22T23:14:41.411650Z", "start_time": "2019-10-22T23:14:41.404709Z" }, "scrolled": true }, "outputs": [], "source": [ "# Imports\n", "import sys\n", "import warnings\n", "MIN_REQ_PYTHON = (3,6)\n", "if sys.version_info < MIN_REQ_PYTHON:\n", " print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')\n", " print('or later is selected as the active kernel.')\n", " sys.exit(\"Python %s.%s or later is required.\\n\" % MIN_REQ_PYTHON)\n", "\n", "import numpy as np\n", "from IPython import get_ipython\n", "from IPython.display import display, HTML, Markdown\n", "import ipywidgets as widgets\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import networkx as nx\n", "sns.set()\n", "import pandas as pd\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 50)\n", "pd.set_option('display.max_colwidth', 100)\n", "\n", "from msticpy.nbtools.utility import md, md_warn\n", "from msticpy.nbtools import *\n", "from msticpy.sectools import *\n", "from msticpy.data.data_providers import QueryProvider\n", "import msticpy.nbtools.kql as qry\n", "import msticpy.nbtools.nbdisplay as nbdisp\n", "\n", "# Some of our dependencies (networkx) still use deprecated Matplotlib\n", "# APIs - we can't do anything about it so suppress them from view\n", "from matplotlib import MatplotlibDeprecationWarning\n", "warnings.simplefilter(\"ignore\", category=MatplotlibDeprecationWarning)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get WorkspaceId and Authenticate to Log Analytics\n", "
\n", "  Details...\n", "If you are using user/device authentication, run the following cell. \n", "- Click the 'Copy code to clipboard and authenticate' button.\n", "- This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard. \n", "- Select the text box and paste (Ctrl-V/Cmd-V) the copied value. \n", "- You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.\n", "\n", "Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:\n", "```\n", "%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)\n", "```\n", "instead of\n", "```\n", "%kql loganalytics://code().workspace(WORKSPACE_ID)\n", "```\n", "\n", "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", "On successful authentication you should see a ```popup schema``` button.\n", "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-10-22T23:13:55.299804Z", "start_time": "2019-10-22T23:13:55.289803Z" } }, "outputs": [], "source": [ "#See if we have an Azure Sentinel Workspace defined in our config file, if not let the user specify Workspace and Tenant IDs\n", "from msticpy.nbtools.wsconfig import WorkspaceConfig\n", "ws_config = WorkspaceConfig()\n", "try:\n", " ws_id = ws_config['workspace_id']\n", " ten_id = ws_config['tenant_id']\n", " display(HTML(\"Workspace details collected from config file\"))\n", " config = True\n", "except:\n", " display(HTML('Please go to your Log Analytics workspace, copy the workspace ID'\n", " ' and/or tenant Id and paste here to enable connection to the workspace and querying of it..
'))\n", " ws_id = mnbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n", " prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n", " ten_id = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n", " prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n", " config = False\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-10-22T23:14:44.347500Z", "start_time": "2019-10-22T23:14:43.651068Z" } }, "outputs": [], "source": [ "# Establish a query provider for Azure Sentinel and connect to it\n", "if config is False:\n", " ws_id = ws_id.value\n", " ten_id = ten_id.value\n", "qry_prov = QueryProvider('LogAnalytics')\n", "la_connection_string = f'loganalytics://code().tenant(\"{ten_id}\").workspace(\"{ws_id}\")'\n", "qry_prov.connect(connection_str=f'{la_connection_string}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get List of Alerts\n", "\n", "We are using an alert as the starting point for this investigation, specify a time range to search for alerts. Once this is set run the following cell to retrieve any alerts in that time window.\n", "You can change the time range and re-run the queries until you find the alerts that you want to investigate." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-10-22T23:14:46.856497Z", "start_time": "2019-10-22T23:14:46.817502Z" } }, "outputs": [], "source": [ "alert_q_times = nbwidgets.QueryTime(units='hour',\n", " max_before=20, max_after=1, before=3)\n", "alert_q_times.display()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-10-22T23:26:37.907238Z", "start_time": "2019-10-22T23:26:36.046087Z" }, "scrolled": false }, "outputs": [], "source": [ "alert_list = qry_prov.SecurityAlert.list_alerts(\n", " alert_q_times)\n", "alert_counts = qry_prov.SecurityAlert.list_alerts_counts(\n", " alert_q_times)\n", "\n", "if isinstance(alert_list, pd.DataFrame) and not alert_list.empty:\n", " print(len(alert_counts), ' distinct alert types')\n", " print(len(alert_list), ' distinct alerts')\n", "\n", "# Display alerts on timeline to aid in visual grouping\n", " nbdisplay.display_timeline(\n", " data=alert_list, source_columns=[\"AlertName\", 'CompromisedEntity'], title=\"Alerts over time\", height=300, color=\"red\")\n", " display(alert_counts.head(10)) # remove '.head(10)'' to see the full list grouped by AlertName\n", "else:\n", " display(Markdown('No related alerts found.'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Choose Alert to Investigate\n", "To focus the investigation select an alert from a list of retrieved alerts.\n", "\n", "As you select an alert, the main properties will be shown below the list.\n", "\n", "Use the filter box to narrow down your search to any substring in the AlertName." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-10-22T23:26:48.905935Z", "start_time": "2019-10-22T23:26:48.869967Z" }, "scrolled": false }, "outputs": [], "source": [ "get_alert = None\n", "alert_select = nbwidgets.AlertSelector(alerts=alert_list, action=nbdisp.display_alert)\n", "alert_select.display()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract properties and entities from Alert\n", "In order to pivot to data related to the selected security alert we need to identify key data points in the selected alert. This section extracts the alert information and entities into a SecurityAlert object allowing us to query the properties more reliably. \n", "\n", "Properties in this object will be used to automatically provide parameters for queries and UI elements.\n", "Subsequent queries will use properties like the host name and derived properties such as the OS family (Linux or Windows) to adapt the query. Query time selectors like the one above will also default to an origin time that matches the alert selected.\n", "\n", "The alert view below shows all of the main properties of the alert plus the extended property dictionary (if any) and JSON representations of the Entity." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-10-23T20:58:37.235238Z", "start_time": "2019-10-23T20:58:37.115240Z" }, "scrolled": false }, "outputs": [], "source": [ "# Extract entities and properties into a SecurityAlert class\n", "if alert_select is None or alert_select.selected_alert is None:\n", " raise ValueError(\"Please select an alert before executing remaining cells.\")\n", "else:\n", " security_alert = SecurityAlert(alert_select.selected_alert)\n", " \n", "nbdisplay.display_alert(security_alert, show_entities=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Entity Graph\n", "Depending on the type of alert there may be one or more entities attached as properties. Entities are key indicators that we can pivot on during our investigation, such as Host, Account, IpAddress, Process, etc. - essentially the 'nouns' of security investigation. \n", "Entities are often related to other entities - for example a process will usually have a related file entity (the process image) and an Account entity (the context in which the process was running). Endpoint alerts typically always have a host entity (which could be a physical or virtual machine). In order to more effectively understand the links between related entities we can plot them as a graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Draw the graph using Networkx/Matplotlib\n", "%matplotlib inline\n", "alertentity_graph = security_alert_graph.create_alert_graph(security_alert)\n", "nbdisp.draw_alert_entity_graph(alertentity_graph, width=15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Related Alerts\n", "For certain entities in the alert we can search for other alerts that have that entity in common. Currently this pivot supports alerts with the same Host, Account or Process. \n", "\n", "**Notes:**\n", "- Some alert types do not include all of these entity types.\n", "- The original alert will be included in the \"Related Alerts\" set if it occurs within the query time boundary set below.\n", "\n", "In order to more effectively identify related alerts the query time boundaries can be adjusted to encompass a longer time frame." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", "query_times = nbwidgets.QueryTime(units='day', origin_time=security_alert.TimeGenerated, \n", " max_before=28, max_after=1, before=5)\n", "query_times.display()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "if not security_alert.primary_host:\n", " print('Related alerts is not yet supported for alerts that are not host-based')\n", " related_alerts = None\n", "else:\n", " related_alerts = qry_prov.SecurityAlert.list_related_alerts(query_times, security_alert)\n", "\n", "\n", " if related_alerts is not None and not related_alerts.empty:\n", " host_alert_items = related_alerts\\\n", " .query('host_match == @True')[['AlertType', 'StartTimeUtc']]\\\n", " .groupby('AlertType').StartTimeUtc.agg('count').to_dict()\n", " acct_alert_items = related_alerts\\\n", " .query('acct_match == @True')[['AlertType', 'StartTimeUtc']]\\\n", " .groupby('AlertType').StartTimeUtc.agg('count').to_dict()\n", " proc_alert_items = related_alerts\\\n", " .query('proc_match == @True')[['AlertType', 'StartTimeUtc']]\\\n", " .groupby('AlertType').StartTimeUtc.agg('count').to_dict()\n", "\n", " def print_related_alerts(alertDict, entityType, entityName):\n", " if len(alertDict) > 0:\n", " print('Found {} different alert types related to this {} (\\'{}\\')'\n", " .format(len(alertDict), entityType, entityName))\n", " for (k,v) in alertDict.items():\n", " print(' {}, Count of alerts: {}'.format(k, v))\n", " else:\n", " print('No alerts for {} entity \\'{}\\''.format(entityType, entityName))\n", "\n", " print_related_alerts(host_alert_items, 'host', security_alert.hostname)\n", " print_related_alerts(acct_alert_items, 'account', \n", " security_alert.primary_account.qualified_name \n", " if security_alert.primary_account\n", " else None)\n", " print_related_alerts(proc_alert_items, 'process', \n", " security_alert.primary_process.ProcessFilePath \n", " if security_alert.primary_process\n", " else None)\n", " nbdisp.display_timeline(data=related_alerts, source_columns = ['AlertName'], title='Alerts', height=100)\n", " else:\n", " display(Markdown('No related alerts found.'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Show these related alerts on a graph\n", "To see the how these alerts relate to our original alert, and how these new alerts relate to each other we can graph them." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Draw a graph of this (add to entity graph)\n", "%matplotlib notebook\n", "%matplotlib inline\n", "\n", "if related_alerts is not None and not related_alerts.empty:\n", " rel_alert_graph = mas.add_related_alerts(related_alerts=related_alerts,\n", " alertgraph=alertentity_graph)\n", " nbdisp.draw_alert_entity_graph(rel_alert_graph, width=15)\n", "else:\n", " display(Markdown('No related alerts found.'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Browse List of Related Alerts\n", "Once we have understood how these alerts related to each other, we can view the details of each new, related alert." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "def disp_full_alert(alert):\n", " global related_alert\n", " related_alert = SecurityAlert(alert)\n", " nbdisplay.display_alert(related_alert, show_entities=True)\n", "\n", "if related_alerts is not None and not related_alerts.empty:\n", " related_alerts['CompromisedEntity'] = related_alerts['Computer']\n", " print('Selected alert is available as \\'related_alert\\' variable.')\n", " rel_alert_select = nbwidgets.AlertSelector(alerts=related_alerts, action=disp_full_alert)\n", " rel_alert_select.display()\n", "else:\n", " display(Markdown('No related alerts found.'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Process Tree\n", "If the alert has a process entity this section tries to retrieve the entire process tree to which that process belongs.\n", "\n", "Notes:\n", "- The alert must have a process entity\n", "- Only processes started within the query time boundary will be included\n", "- Ancestor and descented processes are retrieved to two levels (i.e. the parent and grandparent of the alert process plus any child and grandchild processes).\n", "- Sibling processes are the processes that share the same parent as the alert process\n", "- This can be a long-running query, especially if a wide time window is used! Caveat Emptor!\n", "\n", "The source (alert) process is shown in red.\n", "\n", "What's shown for each process:\n", "- Each process line is indented according to its position in the tree hierarchy\n", "- Top line fields:\n", " - \\[relationship to source process:lev# - where # is the hops away from the source process\\]\n", " - Process creation date-time (UTC)\n", " - Process Image path\n", " - PID - Process Id\n", " - SubjSess - the session Id of the process spawning the new process\n", " - TargSess - the new session Id if the process is launched in another context/session. If 0/0x0 then the process is launched in the same session as its parent\n", "- Second line fields:\n", " - Process command line\n", " - Account - name of the account context in which the process is running" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", "query_times = nbwidgets.QueryTime(units='minute', origin_time=security_alert.origin_time)\n", "query_times.display()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from msticpy.nbtools.query_defns import DataFamily\n", "\n", "if security_alert.data_family != DataFamily.WindowsSecurity:\n", " raise ValueError('The remainder of this notebook currently only supports Windows. '\n", " 'Linux support is in development but not yet implemented.')\n", "\n", "def extract_missing_pid(security_alert):\n", " for pid_ext_name in ['Process Id', 'Suspicious Process Id']:\n", " pid = security_alert.ExtendedProperties.get(pid_ext_name, None)\n", " if pid:\n", " return pid\n", "\n", "def extract_missing_sess_id(security_alert):\n", " sess_id = security_alert.ExtendedProperties.get('Account Session Id', None)\n", " if sess_id:\n", " return sess_id\n", " for session in [e for e in security_alert.entities if\n", " e['Type'] == 'host-logon-session' or e['Type'] == 'hostlogonsession']:\n", " return session['SessionId']\n", " \n", "if (security_alert.primary_process):\n", " # Do some patching up if the process entity doesn't have a PID\n", " pid = security_alert.primary_process.ProcessId\n", " if not pid:\n", " pid = extract_missing_pid(security_alert)\n", " if pid:\n", " security_alert.primary_process.ProcessId = pid\n", " else:\n", " raise ValueError('Could not find the process Id for the alert process.')\n", " \n", " # Do the same if we can't find the account logon ID\n", " if not security_alert.get_logon_id():\n", " sess_id = extract_missing_sess_id(security_alert)\n", " if sess_id and security_alert.primary_account:\n", " security_alert.primary_account.LogonId = sess_id\n", " else:\n", " raise ValueError('Could not find the session Id for the alert process.')\n", " \n", " # run the query\n", " process_tree = qry_prov.WindowsSecurity.get_process_tree(query_times, security_alert)\n", "\n", " if len(process_tree) > 0:\n", " # Print out the text view of the process tree\n", " nbdisplay.display_process_tree(process_tree)\n", " else:\n", " display(Markdown('No processes were returned so cannot obtain a process tree.'\n", " '\\n\\nSkip to [Other Processes](#process_clustering) later in the'\n", " ' notebook to retrieve all processes'))\n", "else:\n", " display(Markdown('This alert has no process entity so cannot obtain a process tree.'\n", " '\\n\\nSkip to [Other Processes](#process_clustering) later in the'\n", " ' notebook to retrieve all processes'))\n", " process_tree = None\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Process Time Line\n", "As well as seeing the processes involved in a tree we want to see the chronology of this process execution. This shows each process in the process tree on a time line view.\n", "If a large number of processes are involved in this process tree it may take some time to display this time line graphic." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Show timeline of events\n", "if process_tree is not None and not process_tree.empty:\n", " nbdisplay.display_timeline(data=process_tree, alert=security_alert, \n", " title='Alert Process Session', height=250)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other Processes on Host - Clustering\n", "Sometimes you don't have a source process from which to build our investigation. Other times it's just useful to see what other process activity is occurring on the host. This section retrieves all processes on the host within the time bounds\n", "set in the query times widget.\n", "\n", "If you want to view the raw details of this process data display the *processes_on_host* dataframe.\n", "\n", "In order to more effectively analyze this process data we can cluster processes into distinct process clusters.\n", "To do this we process the raw event list output to extract a few features that render strings (such as commandline)into numerical values. The default below uses the following features:\n", "- commandLineTokensFull - this is a count of common delimiters in the commandline \n", " (given by this regex r'[\\s\\-\\\\/\\.,\"\\'|&:;%$()]'). The aim of this is to capture the commandline structure while ignoring variations on what is essentially the same pattern (e.g. temporary path GUIDs, target IP or host names, etc.)\n", "- pathScore - this sums the ordinal (character) value of each character in the path (so /bin/bash and /bin/bosh would have similar scores).\n", "- isSystemSession - 1 if this is a root/system session, 0 if anything else.\n", "\n", "Then we run a clustering algorithm (DBScan in this case) on the process list. The result groups similar (noisy) processes together and leaves unique process patterns as single-member clusters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clustered Processes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from msticpy.sectools.eventcluster import dbcluster_events, add_process_features\n", "\n", "processes_on_host = None\n", "if security_alert.primary_host:\n", " processes_on_host = qry_prov.WindowsSecurity.list_processes_in_session(query_times, security_alert)\n", "\n", " if processes_on_host is not None and not processes_on_host.empty:\n", " feature_procs = add_process_features(input_frame=processes_on_host,\n", " path_separator=security_alert.path_separator)\n", "\n", " # you might need to play around with the max_cluster_distance parameter.\n", " # decreasing this gives more clusters.\n", " (clus_events, dbcluster, x_data) = dbcluster_events(data=feature_procs,\n", " cluster_columns=['commandlineTokensFull', \n", " 'pathScore', \n", " 'isSystemSession'],\n", " max_cluster_distance=0.0001)\n", " print('Number of input events:', len(feature_procs))\n", " print('Number of clustered events:', len(clus_events))\n", " clus_events[['ClusterSize', 'processName']][clus_events['ClusterSize'] > 1].plot.bar(x='processName', \n", " title='Process names with Cluster > 1', \n", " figsize=(12,3));\n", "if processes_on_host is None or processes_on_host.empty:\n", " display(Markdown('Unable to obtain any processes for this host. This feature'\n", " ' is currently only supported for Windows hosts.'\n", " '\\n\\nIf this is a Windows host skip to [Host Logons](#host_logons)'\n", " ' later in the notebook to examine logon events.'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variability in Command Lines and Process Names\n", "In this section we display a number of charts highlighting the variability of command lines and processes paths associated with each process. \n", "\n", "The top chart shows the variability of command line content for a given process name. The wider the box, the more instances were found with different command line structure. For certain processes such as cmd.exe or powershell.exe a wide variability in command lines could be expected, however with other processes this could be considered abnormal.\n", "\n", "Note, the 'structure' in this case is measured by the number of tokens or delimiters in the command line and does not look at content differences. This is done so that commonly varying instances of the same command line are grouped together.
\n", "For example `updatepatch host1.mydom.com` and `updatepatch host2.mydom.com` will be grouped together.\n", "\n", "\n", "The second graph shows processes by variation in the full path associated with the process. This does compare content so `c:\\windows\\system32\\net.exe` and `e:\\windows\\system32\\net.exe` are treated as distinct. You would normally not expect to see any variability in this chart unless you have multiple copies of the same name executable or an executable is trying masquerade as another well-known binary." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Looking at the variability of commandlines and process image paths\n", "import seaborn as sns\n", "sns.set(style=\"darkgrid\")\n", "\n", "if processes_on_host is not None and not processes_on_host.empty:\n", " proc_plot = sns.catplot(y=\"processName\", x=\"commandlineTokensFull\", \n", " data=feature_procs.sort_values('processName'),\n", " kind='box', height=10)\n", " proc_plot.fig.suptitle('Variability of Commandline Tokens', x=1, y=1)\n", "\n", " proc_plot = sns.catplot(y=\"processName\", x=\"pathLogScore\", \n", " data=feature_procs.sort_values('processName'),\n", " kind='box', height=10, hue='isSystemSession')\n", " proc_plot.fig.suptitle('Variability of Path', x=1, y=1);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "if 'clus_events' in locals() and not clus_events.empty:\n", " resp = input('View the clustered data? y/n')\n", " if resp == 'y':\n", " display(clus_events.sort_values('TimeGenerated')[['TimeGenerated', 'LastEventTime',\n", " 'NewProcessName', 'CommandLine', \n", " 'ClusterSize', 'commandlineTokensFull',\n", " 'pathScore', 'isSystemSession']])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Look at clusters for individual process names\n", "def view_cluster(exe_name):\n", " display(clus_events[['ClusterSize', 'processName', 'CommandLine', 'ClusterId']][clus_events['processName'] == exe_name])\n", " \n", "display(Markdown('You can view the cluster members for individual processes'\n", " 'by inserting a new cell and entering:
'\n", " '`>>> view_cluster(process_name)`
'\n", " 'where process_name is the unqualified process binary. E.g
'\n", " '`>>> view_cluster(\\'reg.exe\\')`'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time Line of clustered processes data vs. original data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Show timeline of events - clustered events\n", "if 'clus_events' in locals() and not clus_events.empty:\n", " nbdisp.display_timeline(data=clus_events, \n", " overlay_data=processes_on_host, \n", " alert=security_alert, \n", " title='Distinct Host Processes (bottom) and All Proceses (top)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Contents](#toc)\n", "## Base64 Decode and Check for IOCs\n", "This section looks for Indicators of Compromise (IoC) within the data sets passed to it.\n", "\n", "The first section looks at the command line for the process related to our original alert (if any). It also looks for Base64 encoded strings within the data - this is a common way of hiding attacker intent. It attempts to decode any strings that look like Base64. Additionally, if the Base64 decode operation returns any items that look like a Base64 encoded string or file, a gzipped binary sequence, a zipped or tar archive, it will attempt to extract the contents before searching for potentially interesting items." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "process = security_alert.primary_process\n", "ioc_extractor = IoCExtract()\n", "\n", "if process:\n", " # if nothing is decoded this just returns the input string unchanged\n", " base64_dec_str, _ = base64.unpack_items(input_string=process[\"CommandLine\"])\n", " if base64_dec_str and 'IoC patterns found in process tree.\"))\n", " display(ioc_df)\n", "else:\n", " ioc_df = None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### If any Base64 encoded strings, decode and search for IoCs in the results.\n", "For simple strings the Base64 decoded output is straightforward. However it is not uncommon to see nested encodings therefore we want to try to extract and decode these nested elements as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if source_processes is not None:\n", " dec_df = base64.unpack_items(data=source_processes, column='CommandLine')\n", " \n", "if source_processes is not None and (dec_df is not None and not dec_df.empty):\n", " display(HTML(\"

Decoded base 64 command lines

\"))\n", " display(HTML(\"Warning - some binary patterns may be decodable as unicode strings\"))\n", " display(dec_df[['full_decoded_string', 'original_string', 'decoded_string', 'input_bytes', 'file_hashes']])\n", "\n", " ioc_dec_df = ioc_extractor.extract(data=dec_df, columns=['full_decoded_string'])\n", " if len(ioc_dec_df):\n", " display(HTML(\"

IoC patterns found in base 64 decoded data

\"))\n", " display(ioc_dec_df)\n", " if ioc_df is not None:\n", " ioc_df = ioc_df.append(ioc_dec_df ,ignore_index=True)\n", " else:\n", " ioc_df = ioc_dec_df\n", "else:\n", " print(\"No base64 encodings found.\")" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "## Threat Intelligence Lookup\n", "Now that we have identified a number of IoCs we want to check to see if they are associated with known mallicious activity. To do this we will query three different Threat Intelligence providers to see if we get results.\n", "\n", "We will be using:\n", "- VirusTotal https://www.virustotal.com/.\n", "- Alienware OTX https://otx.alienvault.com/\n", "- IBM X-Force https://exchange.xforce.ibmcloud.com/\n", "\n", "If you do not have an API key for any of these providers simply remove their name from the providers list in our lookup_iocs command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tilookups = TILookup()\n", "if ioc_df is not None and not ioc_df.empty:\n", " ti_results = tilookups.lookup_iocs(data=ioc_df, obs_col='Observable', ioc_type_col='IoCType', providers=[\"OTX\", \"VirusTotal\", \"XForce\"])\n", " if not ti_results[ti_results['Severity'] > 0].empty:\n", " md(\"Positive TI Results:\", \"bold\")\n", " display(ti_results[ti_results['Severity'] > 0])\n", " else:\n", " md(\"No postive matches found in threat intelligence\")\n", "else:\n", " md(\"No IOCs to lookup\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alert command line - Occurrence on other hosts in workspace\n", "Understanding where else a command line is being run in an environment can give us a good idea of the scope of a security incident, or help us determine whether activity is malicious or expected.\n", "\n", "To get a sense of whether the alert process is something that is occuring on other hosts, run this section." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", "query_times = nbwidgets.QueryTime(units='day', before=5, max_before=20,\n", " after=1, max_after=10,\n", " origin_time=security_alert.origin_time)\n", "query_times.display()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This query needs a commandline parameter which isn't supplied\n", "# by default from the the alert \n", "# - so extract and escape this from the process\n", "if not security_alert.primary_process:\n", " raise ValueError('This alert has no process entity. This section is not applicable.')\n", "\n", "proc_match_in_ws = None\n", "commandline = security_alert.primary_process.CommandLine\n", "commandline = utility.escape_windows_path(commandline)\n", "commandline = commandline.replace('\"',\"'\")\n", "process = security_alert.ExtendedProperties['process name']\n", "process = utility.escape_windows_path(process)\n", "process = process.replace('\"',\"'\")\n", "md(f\"Command Line: {commandline}\")\n", "if commandline.strip():\n", " proc_match_in_ws = qry_prov.WindowsSecurity.list_hosts_matching_commandline(start=query_times.start, end=query_times.end, process_name=process,\n", " commandline=commandline)\n", "\n", "else:\n", " md('process has empty commandline')\n", "# Check the results\n", "if proc_match_in_ws is None or proc_match_in_ws.empty:\n", " md('No proceses with matching commandline found in on other hosts in workspace')\n", " md(f'between, {query_times.start}, and, {query_times.end}')\n", "else:\n", " hosts = proc_match_in_ws['Computer'].drop_duplicates().shape[0]\n", " processes = proc_match_in_ws.shape[0]\n", " md('{numprocesses} proceses with matching commandline found on {numhosts} hosts in workspace'\\\n", " .format(numprocesses=processes, numhosts=hosts))\n", " md('between', query_times.start, 'and', query_times.end)\n", " md('To examine these execute the dataframe \\'{}\\' in a new cell'.format('proc_match_in_ws'))\n", " md(proc_match_in_ws[['TimeCreatedUtc','Computer', 'NewProcessName', 'CommandLine']].head())\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If at this point you wish to investigate a particular host in detail you can use the cells below or you can switch to our Host Investigation Notebooks that provide a deep dive capability for Windows and Linux hosts.\n", "\n", "## Host Logons\n", "This section retrieves the logon events on the host in the alert.\n", "\n", "You may want to use the query times to search over a broader range than the default." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", "query_times = nbwidgets.QueryTime(units='day', origin_time=security_alert.origin_time,\n", " before=1, after=0, max_before=20, max_after=1)\n", "query_times.display()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "If you wish to investigate a specific host in detail you can use the cells below or switch to our Account investigation notebook. \n", "\n", "## Alert Logon Account\n", "This returns the account associated with the alert being investigated." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "logon_id = security_alert.get_logon_id()\n", "\n", "if logon_id:\n", " if logon_id in ['0x3e7', '0X3E7', '-1', -1]:\n", " print('Cannot retrieve single logon event for system logon id '\n", " '- please continue with All Host Logons below.')\n", " else:\n", " logon_event = qry.get_host_logon(provs=[query_times, security_alert])\n", " nbdisp.display_logon_data(logon_event, security_alert)\n", "else:\n", " print('No account entity in the source alert or the primary account had no logonId value set.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### All Host Logons\n", "Since the number of logon events may be large and, in the case of system logons, very repetitive, we use clustering to try to identity logons with unique characteristics.\n", "\n", "In this case we use the numeric score of the account name and the logon type (i.e. interactive, service, etc.). The results of the clustered logons are shown below along with a more detailed, readable printout of the logon event information. The data here will vary depending on whether this is a Windows or Linux host." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from msticpy.sectools.eventcluster import dbcluster_events, add_process_features, _string_score\n", "\n", "if security_alert.primary_host:\n", " host_logons = qry_prov.WindowsSecurity.list_host_logons(query_times, security_alert)\n", "else:\n", " host_logons = None\n", " md(\"No data available - alert has no host entity.\")\n", " \n", "if host_logons is not None and not host_logons.empty:\n", " logon_features = host_logons.copy()\n", " logon_features['AccountNum'] = host_logons.apply(lambda x: _string_score(x.Account), axis=1)\n", " logon_features['LogonHour'] = host_logons.apply(lambda x: x.TimeGenerated.hour, axis=1)\n", "\n", " # you might need to play around with the max_cluster_distance parameter.\n", " # decreasing this gives more clusters.\n", " (clus_logons, _, _) = dbcluster_events(data=logon_features, time_column='TimeGenerated',\n", " cluster_columns=['AccountNum',\n", " 'LogonType'],\n", " max_cluster_distance=0.0001)\n", " md('Number of input events:', len(host_logons))\n", " md('Number of clustered events:', len(clus_logons))\n", " md('\\nDistinct host logon patterns:')\n", " display(clus_logons.sort_values('TimeGenerated'))\n", "else:\n", " md('No logon events found for host.')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Display logon details\n", "if host_logons is not None:\n", " nbdisp.display_logon_data(clus_logons, security_alert)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing All Logons with Clustered results relative to Alert time line\n", "To understand these logons in relation to the original alert we are investigating we want to view them in a time line." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Show timeline of events - all logons + clustered logons\n", "if host_logons is not None and not host_logons.empty:\n", " nbdisp.display_timeline(data=host_logons, overlay_data=clus_logons,\n", " alert=security_alert, \n", " source_columns=['Account', 'LogonType'],\n", " title='All Host Logons')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Process Session and Logon Events in Timelines\n", "This shows the timeline of the clustered logon events with the process tree obtained earlier. This allows you to get a sense of which logon was responsible for the process tree session whether any additional logons (e.g. creating a process as another user) might be associated with the alert timeline.\n", "\n", "*Note you should use the pan and zoom tools to align the timelines since the data may be over different time ranges.*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display(clus_logons.head())\n", "process_tree.head()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Show timeline of events - all events\n", "if host_logons is not None and not host_logons.empty:\n", " nbdisplay.display_timeline(data=clus_logons, overlay_data=process_tree, source_columns=['Account'],\n", " alert=security_alert,\n", " title='Clustered Host Logons', height=200)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Counts of Logon types by Account\n", "if host_logons is not None and not host_logons.empty:\n", " display(host_logons[['Account', 'LogonType', 'TimeGenerated']]\n", " .groupby(['Account','LogonType']).count()\n", " .rename(columns={'TimeGenerated': 'LogonCount'}))" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "## Failed Logons\n", "Failed logons can provide a valuable source of data for investigation so we also want to look at failed logons during the period of our investigation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true, "scrolled": true }, "outputs": [], "source": [ "if security_alert.primary_host:\n", " failedLogons = qry_prov.WindowsSecurity.list_host_logon_failures(query_times, security_alert)\n", "else:\n", " md(\"No data available - alert has no host entity.\")\n", " failedLogons = None\n", " \n", "\n", "if failedLogons is not None and not failedLogons.empty:\n", " md(f'No logon failures recorded for this host between {security_alert.StartTimeUtc} and {security_alert.EndTimeUtc}')\n", "\n", "failedLogons" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "## Appendices\n", "### Available DataFrames" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "print('List of current DataFrames in Notebook')\n", "print('-' * 50)\n", "current_vars = list(locals().keys())\n", "for var_name in current_vars:\n", " if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith('_'):\n", " print(var_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Saving Data to CSV\n", "To save the contents of a pandas DataFrame to an CSV\n", "use the following syntax\n", "```\n", "host_logons.to_csv('host_logons.csv')\n", "```" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "tags": [ "todo" ] }, "source": [ "### Saving Data to Excel\n", "To save the contents of a pandas DataFrame to an Excel spreadsheet\n", "use the following syntax\n", "```\n", "writer = pd.ExcelWriter('myWorksheet.xlsx')\n", "my_data_frame.to_excel(writer,'Sheet1')\n", "writer.save()\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup Cell\n", "If you have not run this Notebook before please run this cell before running the rest of the Notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\",category=DeprecationWarning)\n", "\n", "MIN_REQ_PYTHON = (3,6)\n", "if sys.version_info < MIN_REQ_PYTHON:\n", " print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')\n", " print('or later is selected as the active kernel.')\n", " sys.exit(\"Python %s.%s or later is required.\\n\" % MIN_REQ_PYTHON)\n", " \n", "# Package Installs - try to avoid if they are already installed\n", "try:\n", " import msticpy.sectools as sectools\n", " import Kqlmagic\n", " print('If you answer \"n\" this cell will exit with an error in order to avoid the pip install calls,')\n", " print('This error can safely be ignored.')\n", " resp = input('msticpy and Kqlmagic packages are already loaded. Do you want to re-install? (y/n)')\n", " if resp.strip().lower() != 'y':\n", " sys.exit('pip install aborted - you may skip this error and continue.')\n", " else:\n", " print('After installation has completed, restart the current kernel and run '\n", " 'the notebook again skipping this cell.')\n", "except ImportError:\n", " pass\n", "\n", "print('\\nPlease wait. Installing required packages. This may take a few minutes...')\n", "!pip install git+https://github.com/microsoft/msticpy --upgrade --user\n", "!pip install Kqlmagic --no-cache-dir --upgrade --user\n", "\n", "print('\\nTo ensure that the latest versions of the installed libraries '\n", " 'are used, please restart the current kernel and run '\n", " 'the notebook again skipping this cell.')" ] } ], "metadata": { "hide_input": false, "kernel_info": { "name": "python3" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "nteract": { "version": "0.15.0" }, "toc": { "base_numbering": 1, "nav_menu": { "height": "318.996px", "width": "320.994px" }, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "165px" }, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "position": { "height": "406.193px", "left": "1468.4px", "right": "20px", "top": "120px", "width": "456.572px" }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }