{ "cells": [ { "cell_type": "markdown", "source": [ "# Title: Office 365 Explorer\n", "
\n", "  Details...\n", "**Notebook Version:** 1.0
\n", "**Python Version:** Python 3.6 (including Python 3.6 - AzureML)
\n", "**Required Packages**: kqlmagic, msticpy, pandas, numpy, matplotlib, seaborn, ipywidgets, ipython, scikit_learn, folium, maxminddb_geolite2
\n", "**Platforms Supported**:\n", "- Azure Notebooks Free Compute\n", "- Azure Notebooks DSVM\n", "- OS Independent\n", "\n", "**Data Sources Required**:\n", "- Log Analytics - OfficeActivity, IPLocation, Azure Network Analytics\n", "\n", "
\n", "\n", "Brings together a series of queries and visualizations to help you investigate the security status of Office 365 subscription and individual user activities.\n", "- The first section focuses on Tenant-Wide data queries and analysis\n", "- The second section allows you to focus on individial accounts and examine them for any suspicious activity.\n", "\n", "This notebook is intended to be illustrative of the types of data available in Office 365 Activity data and how to query and use them. It is not meant to be used as a prescriptive guide to how to navigate through the data. \n", "
Feel free to experiment and submit anything interesting you find to the community." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "

Table of Contents

\n", "
" ], "metadata": { "toc": true } }, { "cell_type": "markdown", "source": [ "---\n", "## Notebook initialization\n", "The next cell:\n", "- Checks for the correct Python version\n", "- Checks versions and optionally installs required packages\n", "- Imports the required packages into the notebook\n", "- Sets a number of configuration options.\n", "\n", "This should complete without errors. If you encounter errors or warnings look at the following two notebooks:\n", "- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)\n", "- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n", "\n", "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n", "- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)\n", "- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)\n", "\n", "You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. \n", "There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:\n", "- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)\n", "- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)\n" ], "metadata": {} }, { "cell_type": "code", "source": [ "from pathlib import Path\n", "from pathlib import Path\n", "from IPython.display import display, HTML\n", "\n", "REQ_PYTHON_VER = \"3.6\"\n", "REQ_MSTICPY_VER = \"1.0.0\"\n", "REQ_MP_EXTRAS = [\"ml\"]\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", " from utils.nb_check import check_versions\n", " check_versions(REQ_PYTHON_VER, REQ_MSTICPY_VER, REQ_MP_EXTRAS)\n", "\n", "# If not using Azure Notebooks, install msticpy with\n", "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "\n", "additional_packages = [\"seaborn\"]\n", "nbinit.init_notebook(\n", " namespace=globals(),\n", " additional_packages=additional_packages,\n", ");\n" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-26T23:59:19.478614Z", "start_time": "2020-06-26T23:59:17.250092Z" }, "gather": { "logged": 1617832167665 } } }, { "cell_type": "markdown", "source": [ "### Get WorkspaceId and Authenticate to Log Analytics \n", "
\n", "  Details...\n", "If you are using user/device authentication, run the following cell. \n", "- Click the 'Copy code to clipboard and authenticate' button.\n", "- This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard. \n", "- Select the text box and paste (Ctrl-V/Cmd-V) the copied value. \n", "- You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.\n", "\n", "Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:\n", "```\n", "%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)\n", "```\n", "instead of\n", "```\n", "%kql loganalytics://code().workspace(WORKSPACE_ID)\n", "```\n", "\n", "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", "On successful authentication you should see a ```popup schema``` button.\n", "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", "
" ], "metadata": { "tags": [ "remove" ] } }, { "cell_type": "code", "source": [ "# See if we have an Azure Sentinel Workspace defined in our config file.\n", "# If not, let the user specify Workspace and Tenant IDs\n", "\n", "ws_config = WorkspaceConfig()\n", "if not ws_config.config_loaded:\n", " ws_config.prompt_for_ws()\n", " \n", "qry_prov = QueryProvider(data_environment=\"AzureSentinel\")\n", "print(\"done\")" ], "outputs": [], "execution_count": null, "metadata": { "gather": { "logged": 1617832180691 } } }, { "cell_type": "code", "source": [ "# Authenticate to Azure Sentinel workspace\n", "qry_prov.connect(ws_config)\n", "table_index = qry_prov.schema_tables" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-26T23:59:54.546697Z", "start_time": "2020-06-26T23:59:27.512604Z" }, "gather": { "logged": 1617832189469 }, "tags": [ "todo" ] } }, { "cell_type": "markdown", "source": [ "### Configuration\n", "\n", "#### `msticpyconfig.yaml` configuration File\n", "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", "\n", "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "## Office 365 Activity" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Log Analytics Queries and query time window" ], "metadata": {} }, { "cell_type": "code", "source": [ "if 'OfficeActivity' not in table_index:\n", " display(Markdown('

Warning. Office Data not available.


'\n", " 'Either Office 365 data has not been imported into the workspace or'\n", " ' the OfficeActivity table is empty.
'\n", " 'This workbook is not useable with the current workspace.'))\n", "else:\n", " md('Office Activity table has records available for hunting')" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:32:09.095417Z", "start_time": "2020-06-27T00:32:09.027664Z" }, "gather": { "logged": 1617832197019 } } }, { "cell_type": "code", "source": [ "# set the origin time to the time of our alert\n", "md(\"For large O365 user bases, use short time ranges to keep the query times reasonable.\")\n", "o365_query_times = nbwidgets.QueryTime(\n", " units='hours', before=6, after=0, max_before=72, max_after=12)\n", "o365_query_times.display()" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:00:02.384265Z", "start_time": "2020-06-27T00:00:02.302204Z" }, "gather": { "logged": 1617832203033 } } }, { "cell_type": "code", "source": [ "# Queries\n", "office_ops_query = '''\n", "OfficeActivity\n", "| where TimeGenerated >= datetime({start})\n", "| where TimeGenerated <= datetime({end})\n", "| where UserType == 'Regular'\n", "'''\n", "\n", "office_ops_summary_query = '''\n", "OfficeActivity \n", "| where TimeGenerated >= datetime({start})\n", "| where TimeGenerated <= datetime({end})\n", "| where UserType == 'Regular'\n", "| extend RecordOp = strcat(RecordType, '-', Operation)\n", "| summarize OperationCount=count() by RecordType, Operation, UserId, UserAgent, ClientIP, Client_IPAddress, bin(TimeGenerated, 1h)\n", "'''\n" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:00:07.416773Z", "start_time": "2020-06-27T00:00:07.406273Z" }, "gather": { "logged": 1617838607470 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "## Tenant-wide Information" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Summary of O365 Activity Types\n", "#### Warning this query can be time consuming for large O365 subscriptions" ], "metadata": {} }, { "cell_type": "code", "source": [ "print('Getting data...', end=' ')\n", "o365_query = office_ops_summary_query.format(start = o365_query_times.start, \n", " end=o365_query_times.end)\n", "\n", "office_ops_summary_df = qry_prov.exec_query(o365_query)\n", "print('done.')\n", "\n", "clientip_valid = ~office_ops_summary_df.ClientIP.isin((\"\", \"\"))\n", "client_ipaddress_valid = ~office_ops_summary_df.Client_IPAddress.isin((\"\", \"\"))\n", "office_ops_summary_df.loc[clientip_valid, \"IPAddress\"] = office_ops_summary_df.ClientIP\n", "office_ops_summary_df.loc[client_ipaddress_valid, \"IPAddress\"] = office_ops_summary_df.Client_IPAddress\n", "\n", "(office_ops_summary_df\n", " .assign(UserId = lambda x: x.UserId.str.lower())\n", " .groupby(['RecordType', 'Operation'])\n", " .aggregate({'IPAddress': 'nunique',\n", " 'UserId': 'nunique',\n", " 'OperationCount': 'sum'}))" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:00:19.867160Z", "start_time": "2020-06-27T00:00:17.317726Z" }, "gather": { "logged": 1617839015265 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Variability of IP Address for users" ], "metadata": {} }, { "cell_type": "code", "source": [ "\n", "\n", "unique_ip_op_ua = (\n", " office_ops_summary_df.assign(UserId = lambda x: x.UserId.str.lower())\n", " .groupby(['UserId', 'Operation'])\n", " .aggregate({'IPAddress': 'nunique', 'OperationCount': 'sum'})\n", " .reset_index()\n", " .rename(columns={\"IPAddress\": \"ClientIPCount\"})\n", ")" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:00:27.023842Z", "start_time": "2020-06-27T00:00:26.990387Z" }, "gather": { "logged": 1617839025338 } } }, { "cell_type": "code", "source": [ "import math\n", "multi_ip_users = unique_ip_op_ua[unique_ip_op_ua[\"ClientIPCount\"] > 1]\n", "if len(multi_ip_users) > 0:\n", " height = max(math.log10(len(multi_ip_users.UserId.unique())) * 10, 8)\n", " aspect = 10 / height\n", " user_ip_op = sns.catplot(x=\"ClientIPCount\", y=\"UserId\", hue='Operation', data=multi_ip_users, height=height, aspect=aspect)\n", " md('Variability of IP Address Usage by user')\n", "else:\n", " md('No users with multiple IP addresses')" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:02:38.220632Z", "start_time": "2020-06-27T00:02:38.196469Z" }, "gather": { "logged": 1617839033518 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Accounts with multiple IPs and Geolocations" ], "metadata": {} }, { "cell_type": "code", "source": [ "iplocation = GeoLiteLookup()\n", "\n", "\n", "\n", "restrict_cols = ['RecordType', 'TimeGenerated', 'Operation',\n", " 'UserId', 'IPAddress', 'UserAgent']\n", "office_ops_summary = office_ops_summary_df[restrict_cols].assign(UserId = lambda x: x.UserId.str.lower())\n", "unique_ip_op_ua['ClientIPCount'] = unique_ip_op_ua['ClientIPCount']\n", "office_ops_merged = pd.merge(unique_ip_op_ua.query('ClientIPCount > 1').drop(columns='ClientIPCount'), \n", " office_ops_summary,\n", " on=['UserId', 'Operation'])\n", "\n", "if not office_ops_merged.empty:\n", " client_ips = (\n", " office_ops_merged\n", " .query('IPAddress != \"\" & IPAddress != \"\"')['IPAddress']\n", " .drop_duplicates()\n", " .tolist()\n", " )\n", " ip_entities = []\n", " for ip in client_ips:\n", " ip_entity = entities.IpAddress(Address=ip)\n", " iplocation.lookup_ip(ip_entity=ip_entity)\n", " if ip_entity.Location:\n", " ip_dict = {'Address': ip_entity.Address}\n", " ip_dict.update(ip_entity.Location.properties)\n", " ip_entities.append(pd.Series(ip_dict))\n", "\n", " ip_locs_df = pd.DataFrame(data=ip_entities)\n", " ip_locs_df\n", "\n", " office_ops_summary_ip_loc = pd.merge(office_ops_merged, \n", " ip_locs_df, left_on='IPAddress', \n", " right_on='Address', how='left')\n", "\n", " display(\n", " office_ops_summary_ip_loc\n", " .groupby(['UserId', 'CountryCode', 'City'])\n", " .aggregate({'IPAddress': 'nunique', 'OperationCount': 'sum'})\n", " .reset_index()\n", " .sort_values(\"IPAddress\", ascending=False)\n", " .query(\"IPAddress > 1\")\n", " )\n", "else:\n", " md(\"No operations with > 1 IP Address\")" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:02:48.941771Z", "start_time": "2020-06-27T00:02:44.876978Z" }, "gather": { "logged": 1617839160431 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Choose threshold to show User Logons where User has logged on from > N IP Address in period" ], "metadata": {} }, { "cell_type": "code", "source": [ "th_wgt = widgets.IntSlider(value=1, min=1, max=50, step=1, description='Set IP Count Threshold', **WIDGET_DEFAULTS)\n", "th_wgt" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:02:53.533522Z", "start_time": "2020-06-27T00:02:53.520578Z" }, "gather": { "logged": 1617832386585 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Matrix of Selected Operation Types by Location and IP" ], "metadata": {} }, { "cell_type": "code", "source": [ "print('Getting data...', end=' ')\n", "o365_query = office_ops_query.format(start=o365_query_times.start, \n", " end=o365_query_times.end)\n", "# %kql -query o365_query\n", "# office_ops_df = _kql_raw_result_.to_dataframe()\n", "office_ops_df = qry_prov.exec_query(o365_query)\n", "print('done.') \n", "\n", "clientip_valid = ~office_ops_df.ClientIP.isin((\"\", \"\"))\n", "client_ipaddress_valid = ~office_ops_df.Client_IPAddress.isin((\"\", \"\"))\n", "office_ops_df.loc[clientip_valid, \"IPAddress\"] = office_ops_df.ClientIP\n", "office_ops_df.loc[client_ipaddress_valid, \"IPAddress\"] = office_ops_df.Client_IPAddress\n", "\n", "\n", "# Get Locations for distinct IPs\n", "client_ips = office_ops_df.IPAddress.unique().tolist()\n", "\n", "ip_entities = []\n", "for ip in client_ips:\n", " ip_entity = entities.IpAddress(Address=ip)\n", " iplocation.lookup_ip(ip_entity=ip_entity)\n", " if ip_entity.Location:\n", " ip_dict = {'Address': ip_entity.Address}\n", " ip_dict.update(ip_entity.Location.properties)\n", " ip_entities.append(pd.Series(ip_dict))\n", "\n", "ip_locs_df = pd.DataFrame(data=ip_entities)\n", "\n", "# Get rid of unneeded columns\n", "restrict_cols = ['OfficeId', 'RecordType', 'TimeGenerated', 'Operation',\n", " 'OrganizationId', 'UserType', 'UserKey', 'OfficeWorkload',\n", " 'ResultStatus', 'OfficeObjectId', 'UserId', 'IPAddress',\n", " 'ActorIpAddress', 'UserAgent']\n", "office_ops_restr = office_ops_df[restrict_cols]\n", "\n", "if client_ips:\n", " # Merge main DF with IP location data\n", " office_ops_locs = pd.merge(\n", " office_ops_restr,\n", " ip_locs_df,\n", " how='right',\n", " left_on='IPAddress',\n", " right_on='Address',\n", " indicator=True\n", " )\n", "\n", " # limit_op_types = ['FileDownloaded', 'FileModified','FileUploaded',\n", " # 'MailboxLogin']\n", "\n", " # office_ops_locs = office_ops_locs[office_ops_locs.Operation.isin(limit_op_types)]\n", "\n", " # Calculate operations grouped by location and operation type\n", " cm = sns.light_palette(\"yellow\", as_cmap=True)\n", " country_by_op_count = (office_ops_locs[['Operation', 'RecordType', 'CountryCode', 'City']]\n", " .groupby(['CountryCode', 'City', 'Operation'])\n", " .count())\n", " display(country_by_op_count.unstack().fillna(0).rename(columns={'RecordType':'OperationCount'}))\n", " # .style.background_gradient(cmap=cm))\n", "\n", " # Group by Client IP, Country, Operation\n", " clientip_by_op_count = (office_ops_locs[['IPAddress', 'Operation', 'RecordType', 'CountryCode']]\n", " .groupby(['IPAddress', 'CountryCode', 'Operation'])\n", " .count())\n", "\n", " (clientip_by_op_count.unstack().fillna(0).rename(columns={'RecordType':'OperationCount'}))\n", " # .style.background_gradient(cmap=cm))\n", "else:\n", " md(\"No client IPs found\")" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:03:01.285844Z", "start_time": "2020-06-27T00:02:58.782751Z" }, "gather": { "logged": 1617839182980 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Geolocation Map of Client IPs" ], "metadata": {} }, { "cell_type": "code", "source": [ "from msticpy.nbtools.foliummap import FoliumMap\n", "folium_map = FoliumMap(zoom_start=3)\n", "\n", "def get_row_ip_loc(row):\n", " try:\n", " _, ip_entity = iplocation.lookup_ip(ip_address=row.ClientIP)\n", " return ip_entity\n", " except ValueError:\n", " return None\n", " \n", "off_ip_locs = (office_ops_df[['ClientIP']]\n", " .drop_duplicates()\n", " .apply(get_row_ip_loc, axis=1)\n", " .tolist())\n", "ip_locs = [ip_list[0] for ip_list in off_ip_locs if ip_list]\n", " \n", "display(HTML('

External IP Addresses seen in Office Activity

'))\n", "display(HTML('Numbered circles indicate multiple items - click to expand.'))\n", "\n", "\n", "icon_props = {'color': 'purple'}\n", "folium_map.add_ip_cluster(ip_entities=ip_locs,\n", " **icon_props)\n", "folium_map.center_map()\n", "folium_map.folium_map" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:03:05.569063Z", "start_time": "2020-06-27T00:03:05.057550Z" }, "gather": { "logged": 1617839189483 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Graphical Activity Timeline" ], "metadata": {} }, { "cell_type": "code", "source": [ "import warnings\n", "\n", "with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " display(Markdown(\"### Change in rate of Operation\"))\n", " plot_all = sns.relplot(\n", " data=office_ops_summary_df,\n", "\n", " x=\"TimeGenerated\",\n", " y=\"OperationCount\",\n", " kind=\"line\",\n", " aspect=2,\n", " hue=\"Operation\",\n", " )\n", " plot_all.ax.set_title(\"All operations\")\n", "\n", " plot_sp = sns.relplot(\n", " data=office_ops_summary_df.query('RecordType == \"SharePointFileOperation\"'),\n", " x=\"TimeGenerated\",\n", " y=\"OperationCount\",\n", " hue=\"Operation\",\n", " kind=\"line\",\n", " aspect=2,\n", " )\n", " plot_sp.ax.set_title(\"Sharepoint operations\")" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:03:11.829518Z", "start_time": "2020-06-27T00:03:09.986019Z" }, "gather": { "logged": 1617839922677 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Users With largest Activity Type Count" ], "metadata": {} }, { "cell_type": "code", "source": [ "with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " display(Markdown('### Identify Users/IPs with largest operation count'))\n", " office_ops = (\n", " office_ops_summary_df\n", " .assign(\n", " Account=lambda x: (x.UserId.str.extract('([^@]+)@.*', expand=False)).str.lower())\n", " .sort_values(\"OperationCount\", ascending=False)\n", " .nlargest(50, \"OperationCount\")\n", " )\n", " \n", " if len(office_ops) > 0:\n", " sns.catplot(data=office_ops, y='Account', x='OperationCount', \n", " hue='Operation', aspect=2)\n", " display(office_ops.pivot_table('OperationCount', index=['Account'], \n", " columns='Operation')) #.style.bar(color='orange', align='mid'))\n", " else:\n", " md('no categorical data to plot')" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:03:15.304316Z", "start_time": "2020-06-27T00:03:15.175812Z" }, "gather": { "logged": 1617838281058 } } }, { "cell_type": "markdown", "source": [ "### Details of user operations" ], "metadata": { "nteract": { "transient": { "deleting": false } } } }, { "cell_type": "code", "source": [ "if not ip_locs_df.empty:\n", " off_ops_df = office_ops_df[['OfficeId', 'RecordType', 'TimeGenerated', 'Operation',\n", " 'OrganizationId', 'UserType', 'UserKey', 'OfficeWorkload',\n", " 'ResultStatus', 'OfficeObjectId', 'UserId', 'IPAddress', 'UserAgent']]\n", " display(\n", " pd.merge(off_ops_df, ip_locs_df, how='left', left_on='IPAddress', right_on='Address')\n", " [[\n", " 'TimeGenerated', 'Operation', 'RecordType', 'OfficeWorkload',\n", " 'ResultStatus', 'UserId', 'IPAddress', 'UserAgent', 'CountryCode',\n", " 'CountryName', 'State', 'City', 'Longitude', 'Latitude'\n", " ]]\n", " )\n", "else:\n", " md(\"No IP address information in the data.\")\n" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:05:58.742811Z", "start_time": "2020-06-27T00:05:58.694824Z" }, "gather": { "logged": 1617838377223 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "## Office User Investigation" ], "metadata": {} }, { "cell_type": "code", "source": [ "# set the origin time to the time of our alert\n", "o365_query_times_user = nbwidgets.QueryTime(units='days',\n", " before=10, after=1, max_before=60, max_after=20, auto_display=True)\n" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:02.119250Z", "start_time": "2020-06-27T00:06:02.032121Z" }, "gather": { "logged": 1617832447977 } } }, { "cell_type": "code", "source": [ "distinct_users = office_ops_df[['UserId']].sort_values('UserId')['UserId'].str.lower().drop_duplicates().tolist()\n", "distinct_users\n", "user_select = nbwidgets.SelectItem(description='Select User Id', item_list=distinct_users, auto_display=True)\n", " # (items=distinct_users)" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:03.023848Z", "start_time": "2020-06-27T00:06:02.983335Z" }, "gather": { "logged": 1617832453165 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Activity Summary" ], "metadata": {} }, { "cell_type": "code", "source": [ "# Provides a summary view of a given account's activity\n", "# For use when investigating an account that has been identified as having associated suspect activity or been otherwise compromised. \n", "# All office activity by UserName using UI to set Time range\n", "# Tags: #Persistence, #Discovery, #Lateral Movement, #Collection\n", "\n", "user_activity_query = '''\n", "OfficeActivity\n", "| where TimeGenerated >= datetime({start})\n", "| where TimeGenerated <= datetime({end})\n", "| where UserKey has \"{user}\" or UserId has \"{user}\"\n", "'''\n", "print('Getting data...', end=' ')\n", "o365_query = user_activity_query.format(start=o365_query_times_user.start, \n", " end=o365_query_times_user.end,\n", " user=user_select.value)\n", "%kql -query o365_query\n", "user_activity_df = _kql_raw_result_.to_dataframe()\n", "print('done.')\n", "user_activity_df" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:10.366715Z", "start_time": "2020-06-27T00:06:08.663607Z" }, "gather": { "logged": 1617832462626 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Operation Breakdown for User" ], "metadata": {} }, { "cell_type": "code", "source": [ "my_df = (user_activity_df[['OfficeId', 'RecordType', 'TimeGenerated', 'Operation',\n", " 'ResultStatus', 'UserId', 'ClientIP','UserAgent']]\n", " .groupby(['Operation', 'ResultStatus', 'ClientIP'])\n", " .aggregate({'OfficeId': 'count'})\n", " .rename(columns={'OfficeId': 'OperationCount', 'ClientIP': 'IPCount'})\n", " .reset_index())\n", "sns.catplot(x='OperationCount', y=\"Operation\", hue=\"ClientIP\", jitter=False, data=my_df, aspect=2.5);\n" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:15.221357Z", "start_time": "2020-06-27T00:06:14.703434Z" }, "gather": { "logged": 1617832491132 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### IP Count for Different User Operations " ], "metadata": {} }, { "cell_type": "code", "source": [ "my_df2 = (user_activity_df[['OfficeId', 'RecordType', 'TimeGenerated', 'Operation',\n", " 'ResultStatus', 'UserId', 'ClientIP','UserAgent']]\n", " .groupby(['Operation'])\n", " .aggregate({'OfficeId': 'count', 'ClientIP': 'nunique'})\n", " .rename(columns={'OfficeId': 'OperationCount', 'ClientIP': 'IPCount'})\n", " .reset_index())\n", "sns.barplot(x='IPCount', y=\"Operation\", data=my_df2);" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:18.926471Z", "start_time": "2020-06-27T00:06:18.687117Z" }, "gather": { "logged": 1617832500181 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Activity Timeline" ], "metadata": {} }, { "cell_type": "code", "source": [ "num_ops = user_activity_df[\"Operation\"].nunique()\n", "nbdisplay.display_timeline(data=user_activity_df,\n", " title='Office Operations',\n", " source_columns=['OfficeWorkload', 'Operation', 'ClientIP', 'ResultStatus'],\n", " group_by=\"Operation\",\n", " height=25 * num_ops)" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:23.188010Z", "start_time": "2020-06-27T00:06:22.814167Z" }, "gather": { "logged": 1617832505342 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### User IP GeoMap" ], "metadata": {} }, { "cell_type": "code", "source": [ "def get_row_ip_loc(row):\n", " try:\n", " _, ip_entity = iplocation.lookup_ip(ip_address=row.ClientIP)\n", " return ip_entity\n", " except ValueError:\n", " return None\n", " \n", "from msticpy.nbtools.foliummap import FoliumMap\n", "folium_map = FoliumMap(zoom_start=3)\n", "off_ip_locs = (user_activity_df[['ClientIP']]\n", " .drop_duplicates()\n", " .apply(get_row_ip_loc, axis=1)\n", " .tolist())\n", "ip_locs = [ip_list[0] for ip_list in off_ip_locs if ip_list]\n", " \n", "display(HTML('

External IP Addresses seen in Office Activity

'))\n", "display(HTML('Numbered circles indicate multiple items - click to expand.'))\n", "\n", "\n", "icon_props = {'color': 'purple'}\n", "folium_map.add_ip_cluster(ip_entities=ip_locs,\n", " **icon_props)\n", "folium_map.center_map()\n", "display(folium_map.folium_map)" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:27.528551Z", "start_time": "2020-06-27T00:06:27.470982Z" }, "gather": { "logged": 1617832511690 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "### Check for User IPs in Azure Network Flow Data\n", "The full data is available in the Dataframe ```az_net_query_byip```" ], "metadata": {} }, { "cell_type": "code", "source": [ "if 'AzureNetworkAnalytics_CL' not in table_index:\n", " md(\"\"\"\n", " \n", "

Warning. Azure network flow data not available.


\n", " This section of the notebook is not useable with the current workspace.\n", " \"\"\"\n", " )" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:33.023802Z", "start_time": "2020-06-27T00:06:33.019298Z" }, "gather": { "logged": 1617832537099 } } }, { "cell_type": "code", "source": [ "if 'AzureNetworkAnalytics_CL' not in table_index:\n", " display(Markdown('

Warning. Azure network flow data not available.


'\n", " 'This section of the notebook is not useable with the current workspace.'))\n", " \n", "# Build the query parameters\n", "all_user_ips = user_activity_df['ClientIP'].tolist()\n", "all_user_ips = [ip for ip in all_user_ips if ip and ip != '']\n", "# Some Office IPs have dest port appended to address\n", "ipv4_ips = [ip.split(\":\")[0] for ip in all_user_ips if \".\" in ip]\n", "ipv6_ips = [ip for ip in all_user_ips if \".\" not in ip]\n", "all_ips = list(set(ipv4_ips + ipv6_ips))\n", "\n", "az_net_comms_df = (\n", " qry_prov\n", " .Network\n", " .list_azure_network_flows_by_ip(start=o365_query_times_user.start,\n", " end=o365_query_times_user.end,\n", " ip_address_list=all_ips)\n", ")\n", "net_default_cols = ['FlowStartTime', 'FlowEndTime', 'VMName', 'VMIPAddress', \n", " 'PublicIPs', 'SrcIP', 'DestIP', 'L4Protocol', 'L7Protocol',\n", " 'DestPort', 'FlowDirection', 'AllowedOutFlows', \n", " 'AllowedInFlows']\n", "\n", "# %kql -query az_net_query_byip\n", "# az_net_comms_df = _kql_raw_result_.to_dataframe()\n", "if az_net_comms_df.empty:\n", " md_warn(\"No network flow data available in AzureNetworkAnalytics_CL table\"\n", " + \"\\nRemainder of cell will not work.\")\n", " raise ValueError(\"No network flow data available in AzureNetworkAnalytics_CL table\")\n", "\n", "import warnings\n", "\n", "with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " \n", " az_net_comms_df['TotalAllowedFlows'] = az_net_comms_df['AllowedOutFlows'] + az_net_comms_df['AllowedInFlows']\n", " sns.catplot(x=\"L7Protocol\", y=\"TotalAllowedFlows\", col=\"FlowDirection\", data=az_net_comms_df)\n", " sns.relplot(x=\"FlowStartTime\", y=\"TotalAllowedFlows\", \n", " col=\"FlowDirection\", kind=\"line\", \n", " hue=\"L7Protocol\", data=az_net_comms_df).set_xticklabels(rotation=50)\n", "\n", "cols = ['VMName', 'VMIPAddress', 'PublicIPs', 'SrcIP', 'DestIP', 'L4Protocol',\n", " 'L7Protocol', 'DestPort', 'FlowDirection', 'AllExtIPs', 'TotalAllowedFlows']\n", "flow_index = az_net_comms_df[cols].copy()\n", "def get_source_ip(row):\n", " if row.FlowDirection == 'O':\n", " return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n", " else:\n", " return row.AllExtIPs if row.AllExtIPs else row.DestIP\n", " \n", "def get_dest_ip(row):\n", " if row.FlowDirection == 'O':\n", " return row.AllExtIPs if row.AllExtIPs else row.DestIP\n", " else:\n", " return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n", "\n", "flow_index['source'] = flow_index.apply(get_source_ip, axis=1)\n", "flow_index['target'] = flow_index.apply(get_dest_ip, axis=1)\n", "flow_index['value'] = flow_index['L7Protocol']\n", "\n", "cm = sns.light_palette(\"green\", as_cmap=True)\n", "with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " display(flow_index[['source', 'target', 'value', 'L7Protocol', \n", " 'FlowDirection', 'TotalAllowedFlows']]\n", " .groupby(['source', 'target', 'value', 'L7Protocol', 'FlowDirection'])\n", " .sum().unstack().style.background_gradient(cmap=cm))\n", "\n", "nbdisp.display_timeline(data=az_net_comms_df.query('AllowedOutFlows > 0'),\n", " overlay_data=az_net_comms_df.query('AllowedInFlows > 0'),\n", " title='Network Flows (out=blue, in=green)',\n", " time_column='FlowStartTime',\n", " source_columns=['FlowType', 'AllExtIPs', 'L7Protocol', 'FlowDirection'],\n", " height=300)" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:06:39.582431Z", "start_time": "2020-06-27T00:06:37.614815Z" } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "## Rare Combinations of Country/UserAgent/Operation Type\n", "The dataframe below lists combinations in the time period that had less than 3 instances. This might help you to spot relatively unusual activity." ], "metadata": {} }, { "cell_type": "code", "source": [ "\n", "from msticpy.sectools.eventcluster import (dbcluster_events, \n", " add_process_features, \n", " char_ord_score,\n", " token_count,\n", " delim_count)\n", "\n", "restrict_cols = ['OfficeId', 'RecordType', 'TimeGenerated', 'Operation',\n", " 'OrganizationId', 'UserType', 'UserKey', 'OfficeWorkload',\n", " 'ResultStatus', 'OfficeObjectId', 'UserId', 'IPAddress','UserAgent']\n", "feature_office_ops = office_ops_df[restrict_cols]\n", "feature_office_ops = ( pd.merge(feature_office_ops, \n", " ip_locs_df, how='left', \n", " left_on='IPAddress', right_on='Address')\n", " .fillna(''))\n", "\n", "# feature_office_ops = office_ops_df.copy()\n", "\n", "feature_office_ops['country_num'] = feature_office_ops.apply(lambda x: char_ord_score(x.CountryCode) if x.CountryCode else 0, axis=1)\n", "feature_office_ops['ua_tokens'] = feature_office_ops.apply(lambda x: char_ord_score(x.UserAgent), axis=1)\n", "feature_office_ops['user_num'] = feature_office_ops.apply(lambda x: char_ord_score(x.UserId), axis=1)\n", "feature_office_ops['op_num'] = feature_office_ops.apply(lambda x: char_ord_score(x.Operation), axis=1)\n", "\n", "# you might need to play around with the max_cluster_distance parameter.\n", "# decreasing this gives more clusters.\n", "(clustered_ops, dbcluster, x_data) = dbcluster_events(data=feature_office_ops,\n", " cluster_columns=['country_num',\n", " 'op_num',\n", " 'ua_tokens'],\n", " time_column='TimeGenerated',\n", " max_cluster_distance=0.0001)\n", "print('Number of input events:', len(feature_office_ops))\n", "print('Number of clustered events:', len(clustered_ops))\n", "display(Markdown('#### Rarest combinations'))\n", "display(clustered_ops[['TimeGenerated', 'RecordType',\n", " 'Operation', 'UserId', 'UserAgent', 'ClusterSize',\n", " 'OfficeObjectId', 'CountryName']]\n", " .query('ClusterSize <= 2')\n", " .sort_values('ClusterSize', ascending=True))\n", "display(Markdown('#### Most common operations'))\n", "display((clustered_ops[['RecordType', 'Operation', 'ClusterSize']]\n", " .sort_values('ClusterSize', ascending=False)\n", " .head(10)))" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-06-27T00:07:07.310221Z", "start_time": "2020-06-27T00:07:07.124636Z" }, "gather": { "logged": 1617840015884 } } }, { "cell_type": "markdown", "source": [ "[Contents](#contents)\n", "# Appendices" ], "metadata": { "hidden": true } }, { "cell_type": "markdown", "source": [ "## Available DataFrames" ], "metadata": {} }, { "cell_type": "code", "source": [ "print('List of current DataFrames in Notebook')\n", "print('-' * 50)\n", "current_vars = list(locals().keys())\n", "for var_name in current_vars:\n", " if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith('_'):\n", " print(var_name)" ], "outputs": [], "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-10-31T23:59:13.369607Z", "start_time": "2019-10-31T23:59:13.342533Z" }, "scrolled": true } }, { "cell_type": "markdown", "source": [ "## Saving Data to Excel\n", "To save the contents of a pandas DataFrame to an Excel spreadsheet\n", "use the following syntax\n", "```\n", "writer = pd.ExcelWriter('myWorksheet.xlsx')\n", "my_data_frame.to_excel(writer,'Sheet1')\n", "writer.save()\n", "```" ], "metadata": { "heading_collapsed": true, "tags": [ "todo" ] } } ], "metadata": { "toc": { "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "230.17px" }, "skip_h1_title": true, "number_sections": false, "title_cell": "Table of Contents", "toc_window_display": true, "base_numbering": 1, "toc_section_display": true, "title_sidebar": "Contents", "toc_cell": true, "nav_menu": { "height": "318.996px", "width": "320.994px" }, "sideBar": true }, "hide_input": false, "microsoft": { "host": { "AzureML": { "notebookHasBeenCompleted": true } } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } }, "kernel_info": { "name": "python38-azureml" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "position": { "height": "406.193px", "left": "1468.4px", "right": "20px", "top": "120px", "width": "456.572px" }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false }, "latex_envs": { "report_style_numbering": false, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "LaTeX_envs_menu_present": true, "autocomplete": true, "autoclose": false, "eqNumInitial": 1, "bibliofile": "biblio.bib", "latex_user_defs": false, "current_citInitial": 1, "eqLabelWithNumbers": true, "labels_anchors": false, "cite_by": "apalike", "user_envs_cfg": false }, "language_info": { "name": "python", "version": "3.8.1", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" }, "kernelspec": { "name": "python38-azureml", "language": "python", "display_name": "Python 3.8 - AzureML" }, "nteract": { "version": "nteract-front-end@1.0.0" } }, "nbformat": 4, "nbformat_minor": 4 }