{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Guided Hunting - Covid-19 Themed Threats\n", "**Notebook Version:** 1.0
\n", "**Python Version:** Python 3.6 (including Python 3.6 - AzureML)
\n", "**Data Sources Required:** CommonSecurityLog, OfficeActivity, SecurityEvent
\n", " \n", "This Notebook assists defenders in hunting for Covid-19 themed attacks by identifying anomalous Covid-19 related events within your Microsoft Sentinel Workspace. This is designed to be a hunting notebook and has a high probability of returning false positives and returned data points should not be seen as detections without further investigation.\n", "\n", "**How to use:**
\n", "Run the cells in this Notebook in order, at various points in the Notebook flow you will be prompted to enter or select options relevant to the scope of your triage.
\n", "This Notebook presumes you have Microsoft Sentinel Workspace settings and Threat Intelligence providers configured in a config file. If you do not have this in place please refer https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html# to https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb and to set this up." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\r\n", "from IPython.display import display, HTML\r\n", "\r\n", "REQ_PYTHON_VER=(3, 6)\r\n", "REQ_MSTICPY_VER=(1, 0, 0)\r\n", "REQ_MP_EXTRAS = [\"ml\"]\r\n", "\r\n", "\r\n", "\r\n", "# If not using Azure Notebooks, install msticpy with\r\n", "# %pip install msticpy\r\n", "from msticpy.nbtools import nbinit\r\n", "extra_imports = [\r\n", " \"tqdm.notebook, tqdm\",\r\n", " \"whois\",\r\n", " \"dns\",\r\n", " \"tldextract\",\r\n", " \"datetime\",\r\n", " \"msticpy.nbtools.foliummap, get_map_center\",\r\n", " \"msticpy.nbtools.foliummap, get_center_ip_entities\",\r\n", " \"msticpy.sectools.ip_utils, convert_to_ip_entities\",\r\n", " \"functools, lru_cache\",\r\n", "]\r\n", "\r\n", "additional_packages = [\r\n", " \"tldextract\", \"IPWhois\", \"python-whois\"\r\n", "]\r\n", "nbinit.init_notebook(\r\n", " namespace=globals(),\r\n", " additional_packages=additional_packages,\r\n", " extra_imports=extra_imports,\r\n", ");\r\n", "\r\n", "from bokeh.plotting import figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# See if we have a Microsoft Sentinel Workspace defined in our config file.\n", "# If not, let the user specify Workspace and Tenant IDs\n", "\n", "ws_config = WorkspaceConfig()\n", "if not ws_config.config_loaded:\n", " ws_config.prompt_for_ws()\n", " \n", "qry_prov = QueryProvider(data_environment=\"AzureSentinel\")\n", "print(\"done\")\n", " \n", "ti = TILookup()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Authenticate to Microsoft Sentinel workspace\n", "qry_prov.connect(ws_config)\n", "table_index = qry_prov.schema_tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Network event investigation\n", "\n", "Select the time window in which to review network logs in (default is last 24 hours). In large environments you may need to make this time windows smaller in order to avoid query timeout." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "query_times = nbwidgets.QueryTime(units='hours',\n", " max_before=72, max_after=0, before=24)\n", "query_times.display()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "start = query_times.start\r\n", "end = query_times.end\r\n", "# Get Covid-19 related URLs from Network Logs\r\n", "url_q = f\"\"\"\r\n", "CommonSecurityLog\r\n", "| where TimeGenerated between (datetime({start})..datetime({end}))\r\n", "| extend Url = iif(isnotempty(RequestURL), RequestURL , iif(isnotempty(DestinationHostName), DestinationHostName, \"None\"))\r\n", "| where Url != \"None\" \r\n", "| distinct Url\r\n", "| where tolower(Url) matches regex(\"(?i)(covid|corona.*virus)\")\r\n", "\"\"\"\r\n", "print(\"Collecting data...\")\r\n", "url_data = qry_prov.exec_query(url_q)\r\n", "print(\"Done\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@lru_cache(maxsize=5000)\r\n", "def get_domain(url):\r\n", " try:\r\n", " _, domain,tld = tldextract.extract(url)\r\n", " return f\"{domain}.{tld}\"\r\n", " except:\r\n", " return None\r\n", "\r\n", "@lru_cache(maxsize=5000)\r\n", "def whois_url(dom):\r\n", " try:\r\n", " wis = whois.whois(dom)\r\n", " return wis['creation_date']\r\n", " except (KeyError, whois.parser.PywhoisError, UnicodeError) as e:\r\n", " return None\r\n", "\r\n", "tqdm.pandas(desc=\"Lookup progress\")\r\n", "\r\n", "if isinstance(url_data, pd.DataFrame) and not url_data.empty:\r\n", " md(\"Extracting domains\")\r\n", " url_data['domain'] = url_data['Url'].progress_apply(get_domain)\r\n", " url_data = url_data['domain'].drop_duplicates().reset_index().drop(['index'], axis=1)\r\n", " md(f\"Getting domain registration dates for {len(url_data)} unique domains\")\r\n", " md(\"This can take a while for large numbers of domains ~ 100 domains/min\")\r\n", " url_data['creation_date'] = url_data['domain'].progress_apply(whois_url)\r\n", "else:\r\n", " md(\"No matches found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pick a date to filter out domains registered before:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "#Date Picker for excluding dates\n", "date_pick = widgets.DatePicker(\n", " description='Pick a Date',\n", " disabled=False,\n", " value= datetime.date(2020, 3, 1)\n", ")\n", "display(date_pick)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "start_time = datetime.datetime.combine(date_pick.value, datetime.datetime.min.time())\n", "\n", "\n", "def clean_dates(row):\n", " if type(row['creation_date']) == datetime.datetime:\n", " return row['creation_date']\n", " elif type(row['creation_date']) == list: \n", " return row['creation_date'][0]\n", " elif row['creation_date'] == \"before Aug-1996\":\n", " return datetime.datetime(1996, 8, 1, 0, 0 ,0)\n", " else:\n", " return None\n", " \n", "recent_urls_df = None\n", "if isinstance(url_data, pd.DataFrame) and not url_data.empty: \n", " url_data = url_data.mask(url_data.eq('None')).dropna() \n", " md(\"Formatting and filtering dates\")\n", " url_data['creation_date'] = url_data.progress_apply(clean_dates, axis=1)\n", " filter_mask = url_data['creation_date'] > start_time\n", " recent_urls_df = url_data[filter_mask]\n", " md(\"Recently registered domains relating to Covid-19\",\"bold\")\n", " display(recent_urls_df)\n", "else:\n", " md(\"No URL data to process.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Lookup domain in threat intel\n", "def lookup_dom(row):\n", " sev = []\n", " resp = ti.lookup_ioc(observable=row[\"domain\"], providers=['OTX', 'XForce', 'VirusTotal'])\n", " for response in resp[1]:\n", " sev.append(response[1].severity) \n", " if 'high' in sev:\n", " severity = \"High\"\n", " elif 'warning' in sev:\n", " severity = \"Warning\"\n", " elif 'information' in sev:\n", " severity = \"Information\"\n", " else:\n", " severity = \"None\"\n", " return severity\n", "\n", "# Lookup primary IP address of domain\n", "def get_ip(row):\n", " try:\n", " ip = dns.resolver.query(row['domain'], \"A\")\n", " return ip[0]\n", " except:\n", " return None\n", "\n", "# Lookup IP address in threat intel\n", "def lookup_ip(row):\n", " sev = []\n", " resp = ti.lookup_ioc(observable=str(row[\"ip\"]), providers=['OTX', 'XForce', 'VirusTotal'])\n", " for response in resp[1]:\n", " sev.append(response[1].severity) \n", " if 'high' in sev:\n", " severity = \"High\"\n", " elif 'warning' in sev:\n", " severity = \"Warning\"\n", " elif 'information' in sev:\n", " severity = \"Information\"\n", " else:\n", " severity = \"None\"\n", " return severity\n", "\n", "\n", "# Highlight cells based on Threat Intelligence results. \n", "def color_cells(val):\n", " if isinstance(val, str):\n", " if val.lower() == \"high\":\n", " color = 'Red'\n", " elif val.lower() == 'warning':\n", " color = 'Orange'\n", " elif val.lower() == 'information':\n", " color = 'Green'\n", " else:\n", " color = 'none'\n", " else:\n", " color = 'none'\n", " return 'background-color: %s' % color \n", "\n", "if isinstance(recent_urls_df, pd.DataFrame) and not recent_urls_df.empty:\n", " md(\"Getting IP addresses for domain\")\n", " recent_urls_df['ip'] = recent_urls_df.progress_apply(get_ip, axis=1)\n", " md(\"Looking up IP addresses in threat intelligence\")\n", " recent_urls_df['IP TI Risk'] = recent_urls_df.progress_apply(lookup_ip, axis=1)\n", " md(\"Looking up domains in threat intelligence\")\n", " recent_urls_df['Domain TI Risk'] = recent_urls_df.progress_apply(lookup_dom, axis=1)\n", " md(\"Threat Intellignce results for domains and assocaited IP addresses:\", \"bold\")\n", " display(recent_urls_df.style.applymap(color_cells).hide_index())\n", "else:\n", " md(f\"No domains registered since {start_time} were found\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Select a domain to get more details on:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if isinstance(recent_urls_df, pd.DataFrame) and not recent_urls_df.empty:\n", " doms = list(recent_urls_df['domain'])\n", " dom_picker = widgets.Dropdown(\n", " options=doms,\n", " description='Domain:',\n", " disabled=False,\n", " )\n", " display(dom_picker)\n", "else:\n", " md(f\"No domains registered since {start_time} were found\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if isinstance(recent_urls_df, pd.DataFrame) and not recent_urls_df.empty:\n", " dom_details = recent_urls_df[recent_urls_df['domain'] == dom_picker.value]\n", " md(\"Domain details:\",\"bold\")\n", " display(dom_details)\n", " resp = ti.lookup_ioc(observable=dom_details.iloc[0]['domain'], providers=['OTX', 'XForce', 'VirusTotal'])\n", " resp = ti.result_to_df(resp)\n", " md(\"Domain TI details:\",\"bold\")\n", " display(resp)\n", " ip_resp = ti.lookup_ioc(observable=str(dom_details.iloc[0]['ip']), providers=['OTX', 'XForce', 'VirusTotal'])\n", " ip_resp = ti.result_to_df(ip_resp)\n", " md(\"IP address TI details:\",\"bold\")\n", " display(ip_resp)\n", " dom_q = f\"\"\"\n", " CommonSecurityLog\n", " | where TimeGenerated between (datetime({start})..datetime({end}))\n", " //| extend Url = iif(isnotempty(RequestURL), RequestURL , iif(isnotempty(DestinationHostName), DestinationHostName, \"None\"))\n", " //| where Url != \"None\" and tolower(Url) contains ('{dom_picker.value}')\n", " | where RequestURL has \"{dom_picker.value.strip()}\" or DestinationHostName has \"{dom_picker.value.strip()}\"\n", " \"\"\"\n", " print(\"Getting raw events (this can take a few minutes)...\")\n", " dom_data = qry_prov.exec_query(dom_q)\n", " md(\"Raw log events:\",\"bold\")\n", " display(dom_data)\n", "else:\n", " md(f\"No domains registered since {start_time} were found\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if isinstance(recent_urls_df, pd.DataFrame) and not recent_urls_df.empty:\n", " ips = dom_data['DestinationIP'].unique()\n", " folium_map = FoliumMap()\n", " ip_entities = [] \n", " for ip in ips:\n", " ip_entities.append(convert_to_ip_entities(ip)[0])\n", " md(f'Map of destination IP addresses associated with {dom_picker.value}', 'bold')\n", " icon_props = {'color': 'orange'}\n", " location = get_map_center(ip_entities)\n", " folium_map.add_ip_cluster(ip_entities=ip_entities, location=location,\n", " **icon_props)\n", " display(folium_map.folium_map)\n", "else:\n", " md(f\"No domains registered since {start_time} were found\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Office Activity Investigation\n", "\n", "Review Covid-19 related files that have been accessed by large part of your organization. There is a good chance many of these are legitimate organizational documents but some may be widely shared malicious or mis-leading documents.\n", "Enter the approximate number of users in your organization and the query will identify documents accessed by more than 10% of your user base.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "users = widgets.IntText(\n", " value=1000,\n", " description='No. of users:',\n", " disabled=False\n", ")\n", "display(users)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Widely accessed file query\n", "percentage_users = users.value * 0.1\n", "files_q = f\"\"\"\n", "OfficeActivity\n", "| where SourceFileName matches regex(\"(?i)(covid|corona.*virus)\")\n", "| where Operation == \"FileAccessed\"\n", "| summarize dcount(UserId) by SourceFileName, OfficeObjectId\n", "| where dcount_UserId > {percentage_users}\n", "| sort by dcount_UserId\n", "\"\"\"\n", "\n", "files = qry_prov.exec_query(files_q)\n", "if isinstance(files, pd.DataFrame) and not files.empty:\n", " display(files)\n", "else:\n", " md(\"No Covid-19 related files found.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look for Covid-19 related documents that have been uploaded by User Agents that have not been widely seen in the environment over the last 30 days. This may indicate malicious users uploading documents." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rare_uas_q = \"\"\"\n", "let rare_uas = (\n", "OfficeActivity\n", "| where TimeGenerated > ago(30d)\n", "| where Operation == \"FileUploaded\"\n", "| where UserAgent !startswith \"Microsoft Office\" and UserAgent !startswith \"OneNote\" and UserAgent !startswith \"OneDrive\" and UserAgent !startswith \"Outlook\" and UserAgent !startswith \"Exchange\" \n", "| summarize count() by UserAgent\n", "| project UserAgent);\n", "OfficeActivity\n", "| where Operation == \"FileUploaded\"\n", "| where SourceFileName matches regex(\"(?i)(covid|corona.*virus)\")\n", "| where UserAgent in~ (rare_uas)\n", "\"\"\"\n", "print(\"Collecting data...\")\n", "rare_uas = None\n", "rare_uas = qry_prov.exec_query(rare_uas_q)\n", "\n", "if isinstance(rare_uas, pd.DataFrame) and not rare_uas.empty:\n", " md(\"Done\")\n", "else:\n", " md(\"No Covid-19 related activity found.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "uas_picker = None\n", "if isinstance(rare_uas, pd.DataFrame) and not rare_uas.empty:\n", " uas = rare_uas['UserAgent'].unique()\n", " uas_picker = widgets.Dropdown(\n", " options=uas,\n", " description='User Agent:',\n", " disabled=False,\n", " )\n", " display(uas_picker)\n", "else:\n", " md(f\"No rare User Agents Found\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if uas_picker:\n", " filtered_rare_uas = rare_uas[rare_uas['UserAgent']==uas_picker.value]\n", " display(filtered_rare_uas)\n", "else:\n", " filtered_rare_uas = None\n", " md(f\"No rare User Agents Found\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if isinstance(filtered_rare_uas, pd.DataFrame) and not filtered_rare_uas.empty:\n", " rare_uas_ips = filtered_rare_uas['ClientIP'].unique()\n", " folium_map = FoliumMap()\n", " ip_entities = [] \n", " for ip in rare_uas_ips:\n", " ip_entities.append(convert_to_ip_entities(ip)[0])\n", " md('Map of the source IP of file uploads from rare user agents', 'bold')\n", " icon_props = {'color': 'orange'}\n", " location = get_map_center(ip_entities)\n", " folium_map.add_ip_cluster(ip_entities=ip_entities, location=location,\n", " **icon_props)\n", " display(folium_map.folium_map)\n", "else:\n", " md(f\"No rare User Agents Found\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Host Activity Investigation\n", "Look for new processes spawned from a command line containing COVID-19 related names that may be phishing lures. We start by looking at the number of hosts observed with the command line spawning the particular process and from there can drill down into a specific command line." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "process_q = f\"\"\"\n", "SecurityEvent\n", "| where TimeGenerated between (datetime({start})..datetime({end}))\n", "| where EventID == 4688\n", "| where CommandLine matches regex(\"(?i)(covid|corona.*virus)\")\n", "| summarize dcount(Computer) by CommandLine, NewProcessName\n", "| sort by dcount_Computer\"\"\"\n", "\n", "process_data = qry_prov.exec_query(process_q)\n", "\n", "cmd_lines = None\n", "if isinstance(process_data, pd.DataFrame) and not process_data.empty:\n", " display(process_data)\n", " cmd_lines = process_data['CommandLine'].unique()\n", "else:\n", " md(\"No Covid related process data found\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select command line to look at in more detail:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if cmd_lines:\n", " cmd_line = widgets.Dropdown(\n", " options=cmd_lines,\n", " description='Command Lines:',\n", " disabled=False,\n", " )\n", " display(cmd_line)\n", "else:\n", " md(\"No Covid related process data found\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if cmd_lines:\r\n", " cmd_line_clean = cmd_line.value.replace('\\\\', \"\\\\\\\\\")\r\n", " cmd_line_q = f\"\"\"\r\n", " SecurityEvent\r\n", " | where TimeGenerated between (datetime({start})..datetime({end}))\r\n", " | where EventID == 4688\r\n", " | where CommandLine == \"{cmd_line_clean}\"\r\n", " \"\"\"\r\n", "\r\n", " cmd_line_events = qry_prov.exec_query(cmd_line_q)\r\n", "\r\n", " if isinstance(cmd_line_events, pd.DataFrame) and not cmd_line_events.empty:\r\n", " display(cmd_line_events)\r\n", " else:\r\n", " md(\"No events found\")\r\n", "else:\r\n", " md(\"No Covid related process data found\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use Sysmon data to identify files that are included in the Microsoft Covid-19 threat intelligence data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sysmon_q = f\"\"\"Event\n", "| where TimeGenerated between (datetime({start})..datetime({end}))\n", "| where Source == \"Microsoft-Windows-Sysmon\"\n", "| extend RenderedDescription = tostring(split(RenderedDescription, \":\")[0])\n", "| project TimeGenerated, Source, EventID, Computer, UserName, EventData, RenderedDescription\n", "| extend EvData = parse_xml(EventData)\n", "| extend EventDetail = EvData.DataItem.EventData.Data\n", "| extend Hashes = tostring(EventDetail[17].[\"#text\"])\n", "| where isnotempty(Hashes)\"\"\"\n", "\n", "sysmon_df = qry_prov.exec_query(sysmon_q)\n", "\n", "covid_ti = pd.read_csv(\"https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Sample%20Data/Feeds/Microsoft.Covid19.Indicators.csv\", \n", " index_col=False,\n", " names=[\"TimeGenerated\",\"Hash\",\"Hash Type\", \"TLP\", \"Service\", \"Type\", \"Source\"],\n", " parse_dates=[\"TimeGenerated\"],\n", " infer_datetime_format=True\n", " )\n", "hash_iocs = covid_ti['Hash']\n", "\n", "if isinstance(sysmon_df, pd.DataFrame) and not sysmon_df.empty:\n", " md(\"Sysmon events with hashes that appear in Microsoft Covid-19 TI feed:\")\n", " display(sysmon_df[sysmon_df['Hashes'].isin(hash_iocs)])\n", "else:\n", " md(\"No Sysmon data present\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3.8 - AzureML", "language": "python", "name": "python38-azureml" }, "language_info": { "name": "python", "version": "" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }