{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Stolen Szechuan Sauce - Analysis.ipynb", "private_outputs": true, "provenance": [], "collapsed_sections": [], "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "STgOOx_i9NKV" }, "source": [ "# The Case of The Stolen Szechuan Sauce\n", "\n", "This is a simple colab demonstrating one way of analyzing data from the Stolen Szechuan Sauce challenge (found [here](https://dfirmadness.com/the-stolen-szechuan-sauce/)).\n", "\n", "This colab will not go into any of the data upload. It assumes that all data is already collected and uploaded to Timesketch. To see one way of uploading the data to Timesketch, use [this colab](https://colab.research.google.com/github/google/timesketch/blob/master/notebooks/Stolen_Szechuan_Sauce_Data_Upload.ipynb)\n", "\n", "For a more generic instructions of Colab can be [found here](https://colab.research.google.com/github/google/timesketch/blob/master/notebooks/colab-timesketch-demo.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "uy3o_dS2T6hg" }, "source": [ "## Setup" ] }, { "cell_type": "markdown", "metadata": { "id": "VlUAyi73BJUI" }, "source": [ "If you are running this on a cloud runtime you'll need to install these dependencies:" ] }, { "cell_type": "code", "metadata": { "id": "YywEaQSTBOjH" }, "source": [ "# @markdown Only execute if not already installed and running a cloud runtime\n", "!pip install -q timesketch_api_client\n", "!pip install -q vt-py nest_asyncio pandas\n", "!pip install -q picatrix" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "fi9-7n--lXOV", "cellView": "form" }, "source": [ "# @title Import libraries\n", "# @markdown This cell will import all the libraries needed for the running of this colab.\n", "\n", "import re\n", "import requests\n", "\n", "import pandas as pd\n", "\n", "from timesketch_api_client import config\n", "from picatrix import notebook_init\n", "\n", "import vt\n", "import nest_asyncio # https://github.com/VirusTotal/vt-py/issues/21\n", "\n", "nest_asyncio.apply()\n", "notebook_init.init()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "MQg0I0Cl6ecu", "cellView": "form" }, "source": [ "# @title VirusTotal Configuration\n", "# @markdown In order to be able to lookup domains/IPs/samples using VirtusTotal we need to get an API key.\n", "# @markdown\n", "# @markdown If you don't have an API key you must sign up to [VirusTotal Community](https://www.virustotal.com/gui/join-us).\n", "# @markdown Once you have a valid VirusTotal Community account you will find your personal API key in your personal settings section. \n", "\n", "VT_API_KEY = '' # @param {type: \"string\"}\n", "\n", "# @markdown If you don't have the API key you will not be able to use the Virustotal API\n", "# @markdown to lookup information." ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "REUKmoy_G1p_", "cellView": "form" }, "source": [ "# @title Declare functions\n", "\n", "# @markdown This cell will define few functions that we will use throughout\n", "# @markdown this colab. This would be better to define outside of the notebook\n", "# @markdown in a library that would be imported, but we keep it here for now.\n", "\n", "def print_dict(my_dict, space_before=0):\n", " \"\"\"Print the content of a dictionary.\"\"\"\n", " max_len = max([len(x) for x in my_dict.keys()])\n", " spaces = ' '*space_before\n", " format_str = f'{spaces}{{key:{max_len}s}} = {{value}}'\n", " for key, value in my_dict.items():\n", " if isinstance(value, dict):\n", " print(format_str.format(key=key, value=''))\n", " print_dict(value, space_before=space_before + 8)\n", " elif isinstance(value, list):\n", " value_str = ', '.join(value)\n", " print(format_str.format(key=key, value=value_str))\n", " else:\n", " print(format_str.format(key=key, value=value))\n", "\n", "\n", "def ip_info(address):\n", " \"\"\"Print out information about an IP address using the VT API.\"\"\"\n", " url = 'https://www.virustotal.com/vtapi/v2/ip-address/report'\n", " params = {\n", " 'apikey': VT_API_KEY,\n", " 'ip': address}\n", "\n", " response = requests.get(url, params=params)\n", " j_obj = response.json()\n", "\n", " def _print_stuff(part):\n", " print('')\n", " header = part.replace('_', ' ').capitalize()\n", " print(f'{header}:')\n", " for item in j_obj.get(part, []):\n", " print_dict(item, 2)\n", "\n", " _print_stuff('resolutions')\n", " _print_stuff('detected_urls')\n", " _print_stuff('detected_referrer_samples')\n", " _print_stuff('detected_communicating_samples')\n", " _print_stuff('detected_downloaded_samples')" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "FiPTLA-qlkWQ", "cellView": "form" }, "source": [ "# @markdown Get a copy of the Timesketch client object.\n", "# @markdown Parameters to configure the client:\n", "# @markdown + host_uri: https://demo.timesketch.org\n", "# @markdown + username: demo\n", "# @markdown + auth_mode: timesketch (username/password)\n", "# @markdown + password: demo\n", "\n", "ts_client = config.get_client(confirm_choices=True)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "hkQ98MnC-x3p" }, "source": [ "Now that we've got a copy of the TS client we need to get to the sketch." ] }, { "cell_type": "code", "metadata": { "id": "RN5fKCshls9L" }, "source": [ "for sketch in ts_client.list_sketches():\n", " if not sketch.name.startswith('Szechuan'):\n", " continue\n", "\n", " print('We found the sketch to use')\n", " print(f'[{sketch.id}] {sketch.name} - {sketch.description}')\n", " break" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Bpz29qgAGLFt" }, "source": [ "OK, sketch nr 6 is the one that we are after, let's set that as the active sketch. This is something that the Timesketch picatrix magics expect, that is to first set the active sketch that you will be using. After that all the magics don't need sketch definitions." ] }, { "cell_type": "code", "metadata": { "id": "pO76nz3TGZAH" }, "source": [ "%timesketch_set_active_sketch 6" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "R6kRDR2fToLC" }, "source": [ "To learn more about picatrix and how it works, please use the magic `%picatrixmagics` and see what magics are available and then use `%magic --help` or `magic_func?` to see more information about that magic.\n", "\n", "One such example could be:" ] }, { "cell_type": "code", "metadata": { "id": "RJheDxzhTxS2" }, "source": [ "timesketch_list_saved_searches_func?" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "2Ir9TevSLGBq" }, "source": [ "## Pre-Thoughts" ] }, { "cell_type": "markdown", "metadata": { "id": "xgwQlv1FLHzx" }, "source": [ "Timesketch analyzers can provide quite a lot of value to any analysis. They can do pretty much everything that can be achieved in a colab like this, and in the Timesketch UI, except programatically. In this case, one of the very valuable analyzers is the `logon` analyzer. That analyzer will look for evidence of logons, and then extract values out of the logon entries and add them to the dataset.\n", "\n", "Another potentially valuable analyzer is browser search, etc. To get a history of what analyzers have been run you can visit [this page](https://demo.timesketch.org/sketch/6/manage/timelines) or run the following code snippet:\n" ] }, { "cell_type": "code", "metadata": { "id": "kIjGq8dmMYJv" }, "source": [ "for status in sketch.get_analyzer_status():\n", " print(f'Analyzer: {status[\"analyzer\"]} - status: {status[\"status\"]}')\n", " print(f'Results: {status[\"results\"]}')\n", " print('')" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "lLlaWPgi_o1r" }, "source": [ "From there you can get a glance at what has analysis has been done on the dataset, and what the results were.. for instance that `login` was completed and it found several logon and logoff entries.\n", "\n", "However now we can start answering the questions.\n", "\n", "## Questions" ] }, { "cell_type": "markdown", "metadata": { "id": "kNyG9uur_1xK" }, "source": [ "### What’s the Operating System of the Server?\n", "\n", "Let's start exploring this, OS information is stored in the registry. Let's query it" ] }, { "cell_type": "code", "metadata": { "id": "_gDu_58uk3Go" }, "source": [ "search_query = timesketch_query_func(\n", " 'parser:\"winreg/windows_version\"',\n", " fields='datetime,key_path,data_type,message,timestamp_desc,parser,display_name,product_name,hostname,timestamp_desc'\n", ")\n", "cur_df = search_query.table" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "hfAIKnt-EKak" }, "source": [ "cur_df[['hostname', 'product_name']]" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "mQ4RxzwnDrbu" }, "source": [ "So we now have the all the data, we can read the data from the table or do one more filtering to get the answer:" ] }, { "cell_type": "code", "metadata": { "id": "x01scNWNG9Zn" }, "source": [ "cur_df[cur_df.hostname == 'CITADEL-DC01'].product_name.value_counts()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "2ksIYrrI_8Be" }, "source": [ "### What’s the Operating System of the Desktop?\n", "\n", "we can use the same data as we collected before:" ] }, { "cell_type": "code", "metadata": { "id": "jDRnhdlUHG4S" }, "source": [ "cur_df[cur_df.hostname == 'DESKTOP-SDN1RPT'].product_name.value_counts()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "wrpH_DttAejS" }, "source": [ "### What was the local time of the Server?\n", "\n", "To answer that we need to get the current control set" ] }, { "cell_type": "code", "metadata": { "id": "XRo0pe4eJyMi" }, "source": [ "cur_df = timesketch_query_func(\n", " 'HKEY_LOCAL_MACHINE*System*Select AND hostname:\"CITADEL-DC01\"',\n", " fields=(\n", " 'datetime,key_path,data_type,message,timestamp_desc,parser,display_name,'\n", " 'product_name,hostname,timestamp_desc,values')\n", ").table" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "oPECs_jt_T_H" }, "source": [ "Now let's look at what the value is set for the key." ] }, { "cell_type": "code", "metadata": { "id": "1TwaMYZBJ2ce" }, "source": [ "for key, value in cur_df[['key_path', 'values']].values:\n", " print(f'Key: {key}')\n", " print(f'Value: {value}')" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "1taojYh2_kDX" }, "source": [ "We can parse this out a bit more if we want to, or just read from there that the current value is 1" ] }, { "cell_type": "code", "metadata": { "id": "MaRYpTIN_on9" }, "source": [ "cur_df['current_value'] = cur_df['values'].str.extract(r'Current: \\[[A-Z_]+\\] (\\d) ')\n", "\n", "cur_df[['key_path', 'current_value']]" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "VSAo1MWHJ-vP" }, "source": [ "The current one is set 1" ] }, { "cell_type": "code", "metadata": { "id": "duOb8qURKJpn" }, "source": [ "cur_df = timesketch_query_func(\n", " 'TimeZoneInformation AND hostname:\"CITADEL-DC01\"',\n", " fields='datetime,key_path,data_type,message,timestamp_desc,parser,display_name,product_name,hostname,timestamp_desc,configuration'\n", ").table\n", "cur_df" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "VTkHkocEB9Iy" }, "source": [ "Let's increase the column with for pandas, that will make it easier to read columns with longer text in them." ] }, { "cell_type": "code", "metadata": { "id": "LQIF_bVFLYLt" }, "source": [ "pd.set_option('max_colwidth', 400)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "3ZeK5sfCKcwi" }, "source": [ "cur_df[cur_df.key_path.str.contains('ControlSet001')][['configuration']]" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "IJMrF5txAjHm" }, "source": [ "So we need to extract what is in `TimeZoneKeyName`, we can do this differently. For now we can just read the configuration field, and then split it into a dict and then construct a new DataFrame with these fields, that is taking a line that is `key1: value1 key2: value2 ...` and creating a data frame with `key1, key2, ...` being the column names." ] }, { "cell_type": "code", "metadata": { "id": "eBSQDEI33wdH" }, "source": [ "lines = []\n", "\n", "for value in cur_df[cur_df.key_path.str.contains('ControlSet001')]['configuration'].values:\n", " items = value.split(':')\n", " line_dict = {}\n", " key = items[0]\n", " for item in items[1:-1]:\n", " *values, new_key = item.split()\n", "\n", " line_dict[key] = ' '.join(values)\n", " key = new_key\n", "\n", " line_dict[key] = items[-1]\n", " lines.append(line_dict)\n", "\n", "time_df = pd.DataFrame(lines)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "LG_9D9mzCBrX" }, "source": [ "Let's look at the newly constructed data frame" ] }, { "cell_type": "code", "metadata": { "id": "cgtctQ_hCEfi" }, "source": [ "time_df" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "9f4lPffKEByw" }, "source": [ "Then we've got the time zone of the server, which is `Pacific Standard Time`" ] }, { "cell_type": "markdown", "metadata": { "id": "oNZXvnc5Ajs1" }, "source": [ "### What was the initial entry vector (how did they get in)?\n", "\n", "If we assume they got in from externally, doing some statistics on the network data might be useful. For that we need to do some aggregations.\n", "\n", "First to understand what aggregations are available to use, and how to use them, let's use the `list_available_aggregators` which produces a data frame with the names of the aggregators and what parameters they need for configuration.\n" ] }, { "cell_type": "code", "metadata": { "id": "y76Xnu7AEmO8" }, "source": [ "%timesketch_available_aggregators" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "QHok29TaExGy" }, "source": [ "Now that we know what aggregators are available, let's start with aggregating the field `Source`, and get the top 10.\n", "\n", "For that we need to use the `field_bucket` aggregator, and configuring it using the parameters `field`, `limit` and `supported_charts`.\n", "\n", "The charts that are available are:\n", " + barchart\n", " + hbarchart\n", " + table\n", " + circlechart\n", " + linechart\n", "\n", "For this let's use a horizontal bar chart, `hbarchart`" ] }, { "cell_type": "code", "metadata": { "id": "gLMtyrWtLqLj" }, "source": [ "params = {\n", " 'field': 'Source',\n", " 'limit': 10,\n", " 'supported_charts': 'hbarchart',\n", " 'chart_title': 'Top 10 Source IP',\n", "}\n", "\n", "aggregation = timesketch_run_aggregator_func(\n", " 'field_bucket', parameters=params\n", ")\n", "aggregation.chart" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "X0oGT9mk0B5D" }, "source": [ "If you are viewing this as in Colab but connecting to a local runtime you may need to enable this in order to be able to view the charts:\n", "\n", "(if it doesn't work, uncomment the code that is applicable to you and then re-run the aggregation cell)" ] }, { "cell_type": "code", "metadata": { "id": "0VqBw5ovz4p5" }, "source": [ "# Remove the commend and run this code if you are running in colab\n", "# but have a local Jupyter kernel running:\n", "# alt.renderers.enable('colab')\n", "\n", "# Remove this comment if you are running in Jupyter and the chart is not displayed\n", "# alt.renderers.enable('notebook')" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "GHNhwNsGFSUH" }, "source": [ "If you prefer to get the data frame instead of the chart you can call `aggregation.table`" ] }, { "cell_type": "code", "metadata": { "id": "ZFnInNAnFZYB" }, "source": [ "aggregation.table" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "0EsPZ5wyFbxY" }, "source": [ "Now let's look at the `Destination` field, same as before:" ] }, { "cell_type": "code", "metadata": { "id": "WNauKP1OL1Ps" }, "source": [ "params = {\n", " 'field': 'Destination',\n", " 'limit': 10,\n", " 'supported_charts': 'hbarchart',\n", " 'chart_title': 'Top 10 Source IP',\n", "}\n", "\n", "aggregation = timesketch_run_aggregator_func('field_bucket', parameters=params)\n", "aggregation.chart" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Dm5dT3ZyL3rU" }, "source": [ "We can clearly see that the ```194.61.24.102``` sticks out, so lets try to understand what this IP did. Also note that it is not common that a system from the internet tries to connect to a intranet IP." ] }, { "cell_type": "markdown", "metadata": { "id": "w-QQPvb8MIgr" }, "source": [ "#### A Look at IP 194.61.24.102" ] }, { "cell_type": "code", "metadata": { "id": "Os8SYJiMM96R" }, "source": [ "attacker_dst = timesketch_query_func(\n", " 'Source:\"194.61.24.102\" AND data_type:\"pcap:wireshark:entry\"',\n", " fields='datetime,message,timestamp_desc,Destination,DST port,Source,Protocol,src port').table\n", "attacker_dst.head(10)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "NQarJWFW_K4h" }, "source": [ "OK, we can see that the API says we got 40k records returned but the search actually produced 128.328 records,so let's increase our max entries..." ] }, { "cell_type": "code", "metadata": { "id": "CwIEey96_TUA" }, "source": [ "search_obj = timesketch_query_func(\n", " 'Source:\"194.61.24.102\" AND data_type:\"pcap:wireshark:entry\"',\n", " fields='datetime,message,timestamp_desc,Destination,DST port,Source,Protocol,src port')\n", "\n", "search_obj.max_entries = 150000\n", "attacker_dst = search_obj.table\n", "attacker_dst.head(10)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "yHK4xt7mGAz1" }, "source": [ "We got a fairly large table, let's look at the size:" ] }, { "cell_type": "code", "metadata": { "id": "2_MxJxRqGE6L" }, "source": [ "attacker_dst.shape" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "XFmjxrW2GHyf" }, "source": [ "We will now need to do some aggregation on the data that we got, let's use pandas for that. For that there is a function called `groupby` where we can run aggregations.\n", "\n", "We want to group based on `DST port` and `Destination`, so we only need those two columns + one more to store the count/sum." ] }, { "cell_type": "code", "metadata": { "id": "D0uY7rFeGW4u" }, "source": [ "attacker_group = attacker_dst[['DST port','Destination', 'Protocol']].groupby(\n", " ['DST port','Destination'], as_index=False)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "RWDgHXLdG37E" }, "source": [ "Now we got a group, and to get a count, we can use the `count()` function of the group." ] }, { "cell_type": "code", "metadata": { "id": "GZ2OEKNmG7NY" }, "source": [ "attacker_dst_mytable = attacker_group.count()\n", "attacker_dst_mytable.rename(columns={'Protocol': 'Count'}, inplace=True)\n", "attacker_dst_mytable.sort_values(by=['Count'], ascending=False)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "0TWTkbGONNkj" }, "source": [ "So we can already point out that there is a lot of traffic from this ip to ```10.42.85.10``` on port ```3389```which is used for Remote Desktop Protocol (RDP)\n", "\n", "Let's now look at the IP traffic as it was parsed by scapy" ] }, { "cell_type": "code", "metadata": { "id": "pmqx3AFHNlCc" }, "source": [ "attacker_dst = timesketch_query_func(\n", " '194.61.24.102 AND data_type:\"scapy:pcap:entry\"',\n", " fields='datetime,message,timestamp_desc,ip_flags,ip_dst,ip_src,payload,tcp_flags,tcp_seq,tcp_sport,tcp_dport,tcp_window').table" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "QCbCRWmxH7ML" }, "source": [ "Let's look at a few entries here:" ] }, { "cell_type": "code", "metadata": { "id": "esIgaBjeOwCN" }, "source": [ "attacker_dst.head(10)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "-VfhmF4eIW2Q" }, "source": [ "What we can see here is that quite a bit of the information is in the message field that we need to decode.\n", "\n", "We also see that the `evil` bit is set... we could query for that as well. Let's start there, to do an aggregation based on that." ] }, { "cell_type": "code", "metadata": { "id": "fYpGly-JIe26" }, "source": [ "params = {\n", " 'field': 'ip_src',\n", " 'query_string': 'ip_flags:\"evil\"',\n", " 'supported_charts': 'hbarchart',\n", " 'chart_title': 'Source IPs with \"evil\" bit set',\n", "}\n", "\n", "aggregation = timesketch_run_aggregator_func('query_bucket', parameters=params)\n", "aggregation.table" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "7m7YfTJ5JZbw" }, "source": [ "We could even save this (if you have write access to the sketch, which the demo user does not have)" ] }, { "cell_type": "code", "metadata": { "id": "GVfPCwpcJdjV" }, "source": [ "name = 'Source IPs with \"evil\" bit set'\n", "aggregation.name = name\n", "aggregation.title = name\n", "aggregation.save()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "u4zTQKnpJlm1" }, "source": [ "And now we could use this in a story for instance.\n", "\n", "But let's move on and parse the message field:\n", "\n", "First let's look at a single entry. To see how it is constructed:" ] }, { "cell_type": "code", "metadata": { "id": "-l2C19JEJsSp" }, "source": [ "attacker_dst.iloc[0].message" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "ZYX6y6yPPef3" }, "source": [ "Now that we know that, let's first remove the `