{
"cells": [
{
"cell_type": "markdown",
"source": [
"## Guided Hunting - Detect potential network beaconing using Apache Spark via Azure Synapse\r\n",
"\r\n",
"__Notebook Version:__ 1.0
\r\n",
"__Python Version:__ Python 3.8 - AzureML
\r\n",
"__Required Packages:__ azureml-synapse, Msticpy, azure-storage-file-datalake
\r\n",
"__Platforms Supported:__ Azure Machine Learning Notebooks connected to Azure Synapse Workspace\r\n",
" \r\n",
"__Data Source Required:__ Yes\r\n",
"\r\n",
"__Data Source:__ CommonSecurityLogs\r\n",
"\r\n",
"__Spark Version:__ 3.1 or above\r\n",
" \r\n",
"### Description\r\n",
"In this sample guided scenario notebook, we will demonstrate how to set up continuous data pipeline to store data into azure data lake storage (ADLS) and \r\n",
"then hunt on that data at scale using distributed processing via Azure Synapse workspace connected to serverless Spark pool. \r\n",
"Once historical dataset is available in ADLS , we can start performing common hunt operations, create a baseline of normal behavior using PySpark API and also apply data transformations \r\n",
"to find anomalous behaviors such as periodic network beaconing as explained in the blog - [Detect Network beaconing via Intra-Request time delta patterns in Microsoft Sentinel - Microsoft Tech Community](https://techcommunity.microsoft.com/t5/azure-sentinel/detect-network-beaconing-via-intra-request-time-delta-patterns/ba-p/779586). \r\n",
"You can use various other spark API to perform other data transformation to understand the data better. \r\n",
"The output generated can also be further enriched to populate Geolocation information and also visualize using Msticpy capabilities to identify any anomalies. \r\n",
".
\r\n",
"*** Python modules download may be needed. ***
\r\n",
"*** Please run the cells sequentially to avoid errors. Please do not use \"run all cells\". ***
\r\n",
"\r\n",
"## Table of Contents\r\n",
"1. Warm-up\r\n",
"2. Authentication to Azure Resources\r\n",
"3. Configure Azure ML and Azure Synapse Analytics\r\n",
"4. Load the Historical and current data\r\n",
"5. Data Wrangling using Spark\r\n",
"6. Enrich the results\r\n",
"7. Conclusion\r\n",
"\r\n"
],
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "markdown",
"source": [
"### Warm-up"
],
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "markdown",
"source": [
"> **Note**: Install below packages only for the first time and restart the kernel once done."
],
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "code",
"source": [
"# Install AzureML Synapse package to use spark magics\r\n",
"import sys\r\n",
"!{sys.executable} -m pip install azureml-synapse"
],
"outputs": [],
"execution_count": null,
"metadata": {
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
},
"gather": {
"logged": 1632406406186
}
}
},
{
"cell_type": "code",
"source": [
"# Install Azure storage datalake library to manipulate file systems\r\n",
"import sys\r\n",
"!{sys.executable} -m pip install azure-storage-file-datalake --pre"
],
"outputs": [],
"execution_count": null,
"metadata": {
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "code",
"source": [
"# Install Azure storage datalake library to manipulate file systems\r\n",
"import sys\r\n",
"!{sys.executable} -m pip install msticpy"
],
"outputs": [],
"execution_count": null,
"metadata": {
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "markdown",
"source": [
"*** $\\color{red}{Note:~After~installing~the~packages,~please~restart~the~kernel.}$ ***"
],
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "code",
"source": [
"# Load Python libraries that will be used in this notebook\r\n",
"from azure.common.client_factory import get_client_from_cli_profile\r\n",
"from azure.common.credentials import get_azure_cli_credentials\r\n",
"from azure.mgmt.resource import ResourceManagementClient\r\n",
"from azureml.core import Workspace, LinkedService, SynapseWorkspaceLinkedServiceConfiguration, Datastore\r\n",
"from azureml.core.compute import SynapseCompute, ComputeTarget\r\n",
"from datetime import timedelta, datetime\r\n",
"from azure.storage.filedatalake import DataLakeServiceClient\r\n",
"from azure.core._match_conditions import MatchConditions\r\n",
"from azure.storage.filedatalake._models import ContentSettings\r\n",
"\r\n",
"import json\r\n",
"import os, uuid, sys\r\n",
"import IPython\r\n",
"import pandas as pd\r\n",
"from ipywidgets import widgets, Layout\r\n",
"from IPython.display import display, HTML\r\n",
"from pathlib import Path\r\n",
"\r\n",
"REQ_PYTHON_VER=(3, 6)\r\n",
"REQ_MSTICPY_VER=(1, 4, 4)\r\n",
"\r\n",
"display(HTML(\"
\r\n",
"Warning: If you are storing secrets such as storage account keys in the notebook you should
\r\n",
"probably opt to store either into msticpyconfig file on the compute instance or use
\r\n",
"Read more about using KeyVault\r\n",
"in the MSTICPY docs\r\n",
"
\r\n", " MSTICPy GeoIP Providers\r\n", "
\r\n", "\r\n", "
\r\n", "