{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Guided Hunting: Investigating Malicious Links shared in Teams\n",
    "\n",
    "<details>\n",
    "<summary>&nbsp;<u>Details...</u></summary>\n",
    "    **Notebook Version:** 1.0<br>\n",
    "    **Python Version:** Python 3.8 (including Python 3.8 - AzureML)<br>\n",
    "    **Required MSTICPy Version**: >=1.8.0<br>\n",
    "    **Data Sources Required**:\n",
    "        - Log Analytics/Microsoft Sentinel - DeviceEvents, CommonSecurityLog, OfficeActivity.\n",
    "        - At least on TI provider that can handle URLs\n",
    "</details>\n",
    "\n",
    "This notebook shows you how you can use the power of Microsoft 365 Defender, Microsoft Sentinel, & the Microsoft Graph in order to find and investigate malicious links shared with users via Microsoft Teams.\n",
    "\n",
    "Pre-requisites:\n",
    " - Microsoft Defender for Endpoint data ingested into Microsoft Sentinel\n",
    " - An Azure AD app registered with permissions to access Teams APIs (https://docs.microsoft.com/azure/active-directory/develop/microsoft-graph-intro)\n",
    "     - Note: the app needs to be configured with Delegate permissions to the Teams APIs of the Microsoft Graph \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup Cells"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You may need to manually install msticpy with\n",
    "# %pip install msticpy[azsentinel]\n",
    "\n",
    "import msticpy\n",
    "\n",
    "msticpy.init_notebook(\n",
    "    namespace=globals(),\n",
    "    verbosity=0,\n",
    ");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sent_provider = QueryProvider(\"MicrosoftSentinel\")\n",
    "sent_provider.connect()\n",
    "graph_provider = QueryProvider(\"SecurityGraph\", delegated_auth=True)\n",
    "graph_provider.connect()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check for SmartScreen Events\n",
    "Investigating all links shared via Teams is going to provide a very large set of data. We can start our investigation from a more limited set of data but looking for events were SmartScreen was triggered after a user opened a link from Teams. To do that we need to look at Microsoft Defender for Endpoint data for SmartScreen events where the opening process was teams.\n",
    "\n",
    "Note: You could also use the [new UrlClickEvent dataset](https://techcommunity.microsoft.com/t5/microsoft-defender-for-office/introducing-the-urlclickevents-table-in-advanced-hunting-with/ba-p/3295096) in order to look for links shared via Teams. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get smart screen events triggered by Teams:\n",
    "query = \"\"\"DeviceEvents\n",
    "| where TimeGenerated > ago(30d)\n",
    "| where ActionType == \"SmartScreenUrlWarning\"\n",
    "| join (DeviceEvents | where ActionType == \"BrowserLaunchedToOpenUrl\" | extend OpeningProcess = InitiatingProcessFileName) on DeviceId, RemoteUrl\n",
    "| extend TeamsUser = InitiatingProcessAccountUpn1\n",
    "| where OpeningProcess =~ \"teams.exe\"\n",
    "| project-reorder DeviceName, RemoteUrl, OpeningProcess, TeamsUser\"\"\"\n",
    "\n",
    "smartscreen_df = sent_provider.exec_query(query)\n",
    "smartscreen_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get Teams Membership\n",
    "\n",
    "Once we have a these events we can search scope our investigation by focussing on links shared in the Teams Channels that the users associated with these SmartScreen events are members of."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Get Teams a User is part of by querying the Graph.\n",
    "users = smartscreen_df[\"TeamsUser\"].unique()\n",
    "team_membership = []\n",
    "for user in users:\n",
    "    team_membership_df = graph_provider.exec_query(f\"/users/{user}/joinedTeams\")\n",
    "    teams_ids = [team[\"id\"] for team in team_membership_df[\"value\"].iloc[0]]\n",
    "    teams_names = [team[\"id\"] for team in team_membership_df[\"value\"].iloc[0]]\n",
    "    teams = pd.DataFrame({\"ID\": teams_ids, \"Name\": teams_names})\n",
    "    team_membership.append(teams)\n",
    "\n",
    "teams_df = pd.concat(team_membership)\n",
    "md(\"Teams to investigate:\")\n",
    "display(teams_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get Messages with URLs in\n",
    "\n",
    "Now that we have a set of Teams to investigate we can use OfficeActivity logs to find all the messages that have a URL in them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get links in those teams\n",
    "msgs_query = f\"\"\"\n",
    "    let teams = dynamic({list(teams['ID'].unique())});\n",
    "    OfficeActivity\n",
    "    | where TimeGenerated > ago(30d)\n",
    "    | where OfficeWorkload =~ \"MicrosoftTeams\"\n",
    "    | where Operation in (\"MessageCreatedHasLink\", \"MessageUpdatedHasLink\")\n",
    "    | where AADGroupId in (teams)\n",
    "    | project MessageId, AADGroupId, ChannelGuid\"\"\"\n",
    "\n",
    "msgs_df = sent_provider.exec_query(msgs_query)\n",
    "md(\"Messages containing URLs from these Teams:\")\n",
    "display(msgs_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get Message Content\n",
    "The OfficeActivity logs don't contain details of the messages themselves, just a message ID. To get the message content, and the include URLs we need to query the Microsoft Graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "graph_provider.api_ver = \"beta\"\n",
    "link_messages = []\n",
    "for item in msgs_df.iterrows():\n",
    "    df = graph_provider.exec_query(\n",
    "        f\"/teams/{item[1]['AADGroupId']}/channels/{item[1]['ChannelGuid']}/messages/{item[1]['MessageId']}\"\n",
    "    )\n",
    "    link_messages.append(df)\n",
    "\n",
    "links_df = pd.concat(link_messages)\n",
    "md(f\"{len(links_df.index)} messages found:\")\n",
    "display(links_df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Find URLs in Messages and check Threat Intelligence\n",
    "\n",
    "Its likely at this stage in the investigation we still have large number of messages to investigate. For the next stage of our investigation we are going to use MSTICPy's IoC extraction capabilities to find the URLs in the messages and then look them up in our Threat Intelligence data to see if any are known to be suspicious."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lookup context to add to results\n",
    "def perform_lookups(row):\n",
    "    channel_details = graph_provider.exec_query(\n",
    "        f\"/teams/{row['channelIdentity.teamId']}/channels/{row['channelIdentity.channelId']}\"\n",
    "    )\n",
    "    channel_name = channel_details[\"displayName\"].iloc[0]\n",
    "    teams_details = graph_provider.exec_query(\n",
    "        f\"/teams/{row['channelIdentity.teamId']}/\"\n",
    "    )\n",
    "    teams_name = teams_details[\"displayName\"].iloc[0]\n",
    "    user = graph_provider.exec_query(f\"/users/{row['from.user.id']}/\")\n",
    "    user_name = user[\"userPrincipalName\"].iloc[0]\n",
    "    return pd.Series(\n",
    "        {\n",
    "            \"UserPrincipalName\": user_name,\n",
    "            \"ChannelName\": channel_name,\n",
    "            \"TeamName\": teams_name,\n",
    "        }\n",
    "    )\n",
    "\n",
    "\n",
    "ioc_matches = links_df.mp_ioc.extract(columns=[\"body.content\"], ioc_types=[\"url\"])\n",
    "ioc_matches[\"SourceIndex\"] = pd.to_numeric(ioc_matches[\"SourceIndex\"])\n",
    "ioc_matches[ioc_matches[\"IoCType\"] == \"url\"]\n",
    "merged_ioc_df = pd.merge(\n",
    "    left=links_df,\n",
    "    right=ioc_matches[ioc_matches[\"IoCType\"] == \"url\"],\n",
    "    how=\"right\",\n",
    "    left_index=True,\n",
    "    right_on=\"SourceIndex\",\n",
    ")\n",
    "ti = TILookup()\n",
    "ti_hits = ti.lookup_iocs(merged_ioc_df[\"Observable\"], providers=[\"XForce\", \"OTX\"])\n",
    "obs = ti_hits[ti_hits[\"Severity\"].isin([\"high\", \"warning\"])][\"Ioc\"].unique()\n",
    "merged_ioc_df[\"risky\"] = np.where(merged_ioc_df[\"Observable\"].isin(obs), True, False)\n",
    "merged_ioc_df[\"SentByUserId\"] = merged_ioc_df[\"from.user.id\"]\n",
    "merged_ioc_df[\"SentByUserName\"] = merged_ioc_df[\"from.user.displayName\"]\n",
    "merged_ioc_df[[\"SentBy\", \"PostedToChannel\", \"PostedToTeam\"]] = merged_ioc_df.apply(\n",
    "    perform_lookups, axis=1\n",
    ")\n",
    "md(f\"{len(merged_ioc_df.index)} messages with URLs present in Threat Intelligence:\")\n",
    "display(\n",
    "    merged_ioc_df[\n",
    "        [\n",
    "            \"createdDateTime\",\n",
    "            \"lastModifiedDateTime\",\n",
    "            \"SentBy\",\n",
    "            \"body.content\",\n",
    "            \"Observable\",\n",
    "            \"risky\",\n",
    "            \"PostedToTeam\",\n",
    "            \"PostedToChannel\",\n",
    "        ]\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the cells above we loook at links shared via Teams channels, however messages can also be shared in individual chats as well. <br>\n",
    "Looking at every chat message is infeasible but we can focus on chat messages sent by users who have shared malicious links in channels. <br>\n",
    "To do this we must first identify all of the chats those users are in, then get all the messages in those chats, and then look at any of those messages that have potential malicious links in them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lookup context to add to results for chats\n",
    "def perform_chat_lookups(user):\n",
    "    user = graph_provider.exec_query(f\"/users/{user}/\")\n",
    "    user_name = user[\"userPrincipalName\"].iloc[0]\n",
    "    return user_name\n",
    "\n",
    "\n",
    "# For each user in the above results, get thier chat messages\n",
    "sending_users = merged_ioc_df[\"SentBy\"].unique()\n",
    "all_messages = []\n",
    "for user in sending_users:\n",
    "    chats_df = graph_provider.exec_query(f\"/users/{user}/chats\")\n",
    "    chats = [message[\"id\"] for message in chats_df[\"value\"].iloc[0]]\n",
    "    for chat in chats:\n",
    "        messages = graph_provider.exec_query(f\"/users/{user}/chats/{chat}/messages\")\n",
    "        all_messages.append(messages)\n",
    "chat_messages_df = pd.concat(all_messages)\n",
    "# We can now parse out the message details and extract any URLs in them.\n",
    "chat_messages_df = (\n",
    "    chat_messages_df[\"value\"]\n",
    "    .apply(pd.Series)\n",
    "    .merge(chat_messages_df, right_index=True, left_index=True)\n",
    "    .drop([\"value\"], axis=1)\n",
    "    .melt(id_vars=[\"@odata.context\", \"@odata.nextLink\"], value_name=\"value\")\n",
    "    .drop(\"variable\", axis=1)\n",
    "    .dropna()\n",
    ")\n",
    "chat_messages_df = pd.json_normalize(chat_messages_df[\"value\"])\n",
    "chat_messages_df.dropna(subset=[\"body.content\"], inplace=True)\n",
    "chat_messages_df[\"content\"] = chat_messages_df[\"body.content\"].astype(str)\n",
    "chat_ioc_matches = chat_messages_df.mp_ioc.extract(\n",
    "    columns=[\"content\"], ioc_types=[\"url\"]\n",
    ")\n",
    "chat_ioc_matches[\"SourceIndex\"] = pd.to_numeric(chat_ioc_matches[\"SourceIndex\"])\n",
    "chat_ioc_matches[chat_ioc_matches[\"IoCType\"] == \"url\"]\n",
    "merged_chat_ioc_df = pd.merge(\n",
    "    left=chat_messages_df,\n",
    "    right=chat_ioc_matches[chat_ioc_matches[\"IoCType\"] == \"url\"],\n",
    "    how=\"right\",\n",
    "    left_index=True,\n",
    "    right_on=\"SourceIndex\",\n",
    ")\n",
    "\n",
    "# We can now look up those URLs in TI and filter on these items\n",
    "chat_ti_hits = ti.lookup_iocs(\n",
    "    merged_chat_ioc_df[\"Observable\"], providers=[\"XForce\", \"OTX\"]\n",
    ")\n",
    "chat_obs = chat_ti_hits[chat_ti_hits[\"Severity\"].isin([\"high\", \"warning\"])][\n",
    "    \"Ioc\"\n",
    "].unique()\n",
    "merged_chat_ioc_df[\"risky\"] = np.where(\n",
    "    merged_chat_ioc_df[\"Observable\"].isin(chat_obs), True, False\n",
    ")\n",
    "merged_chat_ioc_df[merged_chat_ioc_df[\"risky\"] == True]\n",
    "merged_chat_ioc_df[\"SentByUserId\"] = merged_chat_ioc_df[\"from.user.id\"]\n",
    "merged_chat_ioc_df[\"SentByUserName\"] = merged_chat_ioc_df[\"from.user.displayName\"]\n",
    "merged_chat_ioc_df[\"PostedToChannel\"] = merged_chat_ioc_df[\"chatId\"]\n",
    "merged_chat_ioc_df[\"PostedToTeam\"] = \"NaN\"\n",
    "merged_chat_ioc_df[\"SentBy\"] = merged_chat_ioc_df[\"from.user.id\"].apply(\n",
    "    perform_chat_lookups\n",
    ")\n",
    "all_iocs_df = pd.concat([merged_chat_ioc_df, merged_ioc_df])\n",
    "md(\n",
    "    f\"{len(merged_chat_ioc_df.index)} chat messages with URLs present in Threat Intelligence:\"\n",
    ")\n",
    "display(\n",
    "    merged_chat_ioc_df[\n",
    "        [\n",
    "            \"createdDateTime\",\n",
    "            \"lastModifiedDateTime\",\n",
    "            \"SentBy\",\n",
    "            \"body.content\",\n",
    "            \"Observable\",\n",
    "            \"risky\",\n",
    "            \"PostedToChannel\",\n",
    "        ]\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "We can now summarize our results to see who has been posting malicious URLs and where to."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md(\n",
    "    f\"{len(all_iocs_df['Observable'].unique())} malicious URL(s) sent by {len(all_iocs_df['SentBy'].unique())} user(s) in {len(all_iocs_df['id'].unique())} message(s), across {len(all_iocs_df['PostedToChannel'].astype(str).unique())} channel(s)\"\n",
    ")\n",
    "\n",
    "display(\"Malicious URLs sent:\")\n",
    "display(all_iocs_df.groupby([\"SentBy\", \"Observable\"]).agg({\"PostedToChannel\": list}))\n",
    "\n",
    "risky_users = all_iocs_df[all_iocs_df[\"risky\"] == True][\"SentBy\"].unique()\n",
    "risky_messages = all_iocs_df[all_iocs_df[\"SentBy\"].isin(risky_users)]\n",
    "md(\"Risky Messages:\", \"bold\")\n",
    "display(risky_messages[\"body.content\"])\n",
    "md(\"Channels with risky messages:\", \"bold\")\n",
    "display(risky_messages[\"channelIdentity.channelId\"].unique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a graph of users, urls, messages, channels to show the connections between them\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import networkx as nx\n",
    "\n",
    "all_iocs_df.mp_plot.timeline(\n",
    "    source_columns=[\"Observable\", \"SentBy\", \"PostedToChannel\"],\n",
    "    time_column=\"createdDateTime\",\n",
    "    group_by=\"risky\",\n",
    "    title=\"Timeline of message posts\",\n",
    ")\n",
    "\n",
    "md(\"Graph of events:\", \"bold\")\n",
    "G = nx.from_pandas_edgelist(\n",
    "    all_iocs_df, \"SentBy\", \"Observable\", edge_attr=[\"PostedToChannel\", \"body.content\"]\n",
    ")\n",
    "for row in all_iocs_df[[\"Observable\", \"PostedToChannel\", \"body.content\"]].iterrows():\n",
    "    G.add_edge(row[1][\"Observable\"], str(row[1][\"PostedToChannel\"]))\n",
    "fig = plt.figure(1, figsize=(20, 20))\n",
    "nx.draw(G, with_labels=True, font_size=12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further Investigation\n",
    "\n",
    "We can now expand our search to look for other hosts that may have visited these URLs to further expand the investigation scope. By using CommonSecurityLogs from Sentinel with MDE's DeviceNetworkInformation we can identify the hosts making these connections."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Hosts that have visited these URLs\n",
    "dns_matches = ioc_extractor.extract_df(\n",
    "    all_iocs_df, columns=[\"Observable\"], ioc_types=[\"dns\"]\n",
    ")\n",
    "dns_matches[\"Observable\"].unique()\n",
    "query = f\"\"\"\n",
    "    let urls = dynamic({all_iocs_df[\"Observable\"].unique().tolist()});\n",
    "    CommonSecurityLog\n",
    "    | where TimeGenerated > ago(7d)\n",
    "    | where RequestURL in (urls)\n",
    "    | extend timekey = bin(TimeGenerated, 1h)\n",
    "    | join kind=inner (DeviceNetworkInfo\n",
    "    | where TimeGenerated > ago(7d)\n",
    "    | mv-expand IPAddresses\n",
    "    | extend device_ip = tostring(IPAddresses.IPAddress)\n",
    "    | extend timekey = bin(TimeGenerated, 1h)) on $left.SourceIP == $right.device_ip, timekey\n",
    "    | project-reorder DeviceName1, timekey\n",
    "    | summarize max(timekey) by DeviceName1\"\"\"\n",
    "\n",
    "connection_events = sent_provider.exec_query(query)\n",
    "display(connection_events)\n",
    "\n",
    "# See if URLS were seen elsewhere i.e. Office Events, Alerts and create timeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Alerts with any of the URLs - timeline\n",
    "\n",
    "alerts_query = f\"\"\"let urls = dynamic([{all_iocs_df[\"Observable\"].unique().tolist()}]);\n",
    "    SecurityAlert\n",
    "    | mv-expand todynamic(Entities)\n",
    "    | where tostring(Entities.Type) =~ \"url\"\n",
    "    | evaluate bag_unpack(Entities, \"Entities_\")\n",
    "    | where Entities_Url in (urls)\"\"\"\n",
    "\n",
    "\n",
    "alert_events = sent_provider.exec_query(alerts_query)\n",
    "display(alert_events)\n",
    "\n",
    "if not alert_events.empty:\n",
    "    alert_events.mp_plot.timeline(\n",
    "        source_columns=[\"Entities_Url\"],\n",
    "        time_column=\"TimeGenerated\",\n",
    "        group_by=\"Entities_Url\",\n",
    "        title=\"Timeline of alerts\",\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Entities for further investigation\n",
    "The following are a set of entities worthy of further investigation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "md(\"Users who sent suspicious URLs:\", \"bold\")\n",
    "display(list(risky_users))\n",
    "md(\"Hosts that accessed suspcious Urls:\", \"bold\")\n",
    "display(\n",
    "    list(ss_df[\"DeviceName\"].unique()) + list(connection_events[\"DeviceName1\"].unique())\n",
    ")\n",
    "md(\"Suspicious URLs that were shared:\", \"bold\")\n",
    "display(list(all_iocs_df[\"Observable\"].unique()))"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "739aa94d41434660cad201339437bbf6f0b217d6c8b6dd6d17fce87baec5c88f"
  },
  "kernelspec": {
   "display_name": "Python 3.8 - AzureML",
   "language": "python",
   "name": "python38-azureml"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}