{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Azure Synapse - Configure Azure ML and Azure Synapse Analytics \n",
        "\n",
        "__Notebook Version:__ 1.0<br>\n",
        "__Python Version:__ Python 3.8 - AzureML<br>\n",
        "__Required Packages:__ No<br>\n",
        "__Platforms Supported:__  Azure Machine Learning Notebooks, Spark Version 3.1\n",
        "     \n",
        "__Data Source Required:__ No \n",
        "    \n",
        "### Description\n",
        "This notebook provides step-by-step instructions to set up Azure ML and Azure Synapse Analytics environment for your big data analytics scenarios that leverage Azure Synapse Spark engine. It covers: </br>\n",
        "Configuring your Azure Synapse workspace, \n",
        "creating a new [Azure Synapse Spark pool](https://docs.microsoft.com/azure/synapse-analytics/spark/apache-spark-overview#spark-pool-architecture),\n",
        "configuring your Azure Machine Learning workspace, and creating a new [link service](https://docs.microsoft.com/azure/machine-learning/how-to-link-synapse-ml-workspaces) to link Azure Synapse with Azure Machine Learning workspace.\n",
        "Additionally, the notebook provides the steps to export your data from a Log Analytics workspace to an [Azure Data Lake Storage gen2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction) that you can use for big data analytics.<br>\n",
        "*** Python modules download may be needed. ***<br>\n",
        "*** Please run the cells sequentially to avoid errors.  Please do not use \"run all cells\". *** <br>\n",
        "\n",
        "## Table of Contents\n",
        "1. Warm-up\n",
        "2. Authentication to Azure Resources\n",
        "3. Configure Azure Synapse Workspace\n",
        "4. Configure Azure Synapse Spark Pool\n",
        "5. Configure Azure ML Workspace and Linked Services\n",
        "6. Export Data from Azure Log Analytics to Azure Data Lake Storage Gen2\n",
        "7. Bonus"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 1. Warm-up"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Load Python libraries that will be used in this notebook\n",
        "from azure.mgmt.resource import ResourceManagementClient\n",
        "from azure.loganalytics.models import QueryBody\n",
        "from azure.mgmt.loganalytics import LogAnalyticsManagementClient\n",
        "from azure.loganalytics import LogAnalyticsDataClient\n",
        "from azureml.core import Workspace, LinkedService, SynapseWorkspaceLinkedServiceConfiguration, Datastore\n",
        "from azureml.core.compute import SynapseCompute, ComputeTarget\n",
        "from azure.identity import AzureCliCredential\n",
        "\n",
        "import time\n",
        "import json\n",
        "import os\n",
        "import pandas as pd\n",
        "import ipywidgets\n",
        "from IPython.display import display, HTML, Markdown\n",
        "from urllib.parse import urlparse"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "gather": {
          "logged": 1633641554118
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Functions will be used in this notebook\n",
        "def read_config_values(file_path):\n",
        "    \"This loads pre-generated parameters for Microsoft Sentinel Workspace\"\n",
        "    with open(file_path) as json_file:\n",
        "        if json_file:\n",
        "            json_config = json.load(json_file)\n",
        "            return (json_config[\"tenant_id\"],\n",
        "                    json_config[\"subscription_id\"],\n",
        "                    json_config[\"resource_group\"],\n",
        "                    json_config[\"workspace_id\"],\n",
        "                    json_config[\"workspace_name\"],\n",
        "                    json_config[\"user_alias\"],\n",
        "                    json_config[\"user_object_id\"])\n",
        "    return None\n",
        "\n",
        "def has_valid_token():\n",
        "    \"Check to see if there is a valid AAD token\"\n",
        "    try:\n",
        "        error = \"ERROR: Please run 'az login' to setup account.\"\n",
        "        expired = \"ERROR: AADSTS70043: The refresh token has expired or is invalid\"\n",
        "        validator = !az account get-access-token\n",
        "        \n",
        "        if any(expired in item for item in validator.get_list()):\n",
        "            return '**The refresh token has expired. <br> Please continue your login process. Then: <br> 1. If you plan to run multiple notebooks on the same compute instance today, you may restart the compute instance by clicking \"Compute\" on left menu, then select the instance, clicking \"Restart\"; <br> 2. Otherwise, you may just restart the kernel from top menu. <br> Finally, close and re-load the notebook, then re-run cells one by one from the top.**'\n",
        "        elif any(error in item for item in validator.get_list()):\n",
        "            return \"Please run 'az login' to setup account\"\n",
        "        else:\n",
        "            return None\n",
        "    except:\n",
        "        return \"Please login\"\n",
        "\n",
        "def convert_slist_to_dataframe(text, grep_text, grep_field_inx, remove_head, remove_tail):\n",
        "    try:\n",
        "        \"This function converts IPython.utils.text.SList to Pandas.dataFrame\"\n",
        "        grep_result = text.grep(grep_text,field=grep_field_inx)\n",
        "        df = pd.DataFrame(data=grep_result)\n",
        "        df[grep_field_inx] = df[grep_field_inx].str[remove_head:].str[:remove_tail]\n",
        "    except:\n",
        "        df = pd.DataFrame()\n",
        "    finally:\n",
        "        return df\n",
        "\n",
        "def process_la_result(result):\n",
        "    \"This function processes data returned from Azure LogAnalyticsDataClient, it returns pandas DataFrame.\"\n",
        "    json_result = result.as_dict()\n",
        "    cols = pd.json_normalize(json_result['tables'][0], 'columns')\n",
        "    final_result = pd.json_normalize(json_result['tables'][0], 'rows')\n",
        "    if final_result.shape[0] != 0:\n",
        "        final_result.columns = cols.name\n",
        "    return final_result\n",
        "\n",
        "def set_continuation_flag(flag):\n",
        "    \"Set continuation flag message\"\n",
        "    if flag == False:\n",
        "        print(\"continuation flag is false.\")\n",
        "    return flag\n",
        "\n",
        "def validate_input(regex, text):\n",
        "    \"User input validation\"\n",
        "    import re \n",
        "    pattern = re.compile(regex, re.I)\n",
        "    if text == None:\n",
        "        print(\"No Input found.\")\n",
        "        return False;\n",
        "    elif not re.fullmatch(pattern, text):\n",
        "        print(\"Input validation failed.\")\n",
        "        return False;\n",
        "    else:\n",
        "        return True;\n"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "gather": {
          "logged": 1633641556419
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Calling the above function to populate Microsoft Sentinel workspace parameters\n",
        "# The file, config.json, was generated by the system, however, you may modify the values, or manually set the variables\n",
        "tenant_id, subscription_id, resource_group, workspace_id, workspace_name, user_alias, user_object_id = read_config_values('config.json');\n",
        "print(\"Current Microsoft Sentinel Workspace: \" + workspace_name)"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "gather": {
          "logged": 1633641559682
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 2. Authentication to Azure Resources"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Azure CLI is used to get device code to login into Azure, you need to copy the code and open the DeviceLogin site.\n",
        "# You may add [--tenant $tenant_id] to the command\n",
        "if has_valid_token() != None:\n",
        "    message = '**The refresh token has expired. <br> Please continue your login process. Then: <br> 1. If you plan to run multiple notebooks on the same compute instance today, you may restart the compute instance by clicking \"Compute\" on left menu, then select the instance, clicking \"Restart\"; <br> 2. Otherwise, you may just restart the kernel from top menu. <br> Finally, close and re-load the notebook, then re-run cells one by one from the top.**'\n",
        "    display(Markdown(message))\n",
        "    !echo -e '\\e[42m'\n",
        "    !az login --tenant $tenant_id --use-device-code\n",
        "\n",
        "# Initializing Azure Storage and Azure Resource Python clients\n",
        "resource_client = ResourceManagementClient(AzureCliCredential(), subscription_id = subscription_id)\n",
        "\n",
        "# Set continuation_flag\n",
        "if resource_client == None:\n",
        "    continuation_flag = set_continuation_flag(False)\n",
        "else:\n",
        "    continuation_flag = set_continuation_flag(True)\n",
        "    !az account set --subscription $subscription_id\n",
        "    print('Successfully signed in.')\n"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1633641564008
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 3. Configure Azure Synapse Workspace\r\n",
        "In this section, you first select an Azure resource group, then select an Azure Synapse workspace.\r\n"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 1. Select Azure Resource Group for Synapse\n",
        "if continuation_flag:\n",
        "    group_list = resource_client.resource_groups.list()\n",
        "    synapse_group_dropdown = ipywidgets.Dropdown(options=sorted([g.name for g in group_list]), description='Groups:')\n",
        "    display(synapse_group_dropdown)"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "gather": {
          "logged": 1633628547832
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 2. Select an Azure Synapse workspace\n",
        "if continuation_flag and synapse_group_dropdown.value != None:\n",
        "    response = !az synapse workspace list --subscription $subscription_id --resource-group $synapse_group_dropdown.value\n",
        "    if response!= None:\n",
        "        name_list = convert_slist_to_dataframe(response, '\"name', 0, 13, -2)\n",
        "        if len(name_list) > 0:\n",
        "            synapse_workspace_dropdown = ipywidgets.Dropdown(options=name_list[0], description='Synapse WS:')\n",
        "            display(synapse_workspace_dropdown)\n",
        "        else:\n",
        "            print(\"No workspace found, please select one Resource Group with Synapse workspace.\")\n",
        "    else:\n",
        "        continuation_flag = False\n",
        "        print(\"Please create Azure Synapse Analytics Workspace before proceeding to next.\")\n",
        "else:\n",
        "    continuation_flag = False\n",
        "    print(\"Need to have a Azure Resource Group to proceed.\")\n"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "gather": {
          "logged": 1633628557616
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 4. Configure Azure Synapse Spark Pool\r\n",
        "In this section, you will create an Spark pool if you don't have one yet.  \r\n",
        "1. Enter a pool name, the rule for naming: must contain letters or numbers only and no special characters, must be 15 or less characters, must start with a letter, not contain reserved words, and be unique in the workspace.</br>\r\n",
        "2. Create the pool </br>\r\n",
        "3. List Spark pools for the Azure Synapse workspace"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "new_spark_pool_name = input(\"New Spark pool name:\")"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633641333970
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 1. !!PROCEED THIS ONLY WHEN YOU WANT TO: Create an Azure Synapse Spark Pool!!\n",
        "if continuation_flag and validate_input(r\"[A-Za-z0-9]{1,15}\", new_spark_pool_name):\n",
        "    !az synapse spark pool create --name $new_spark_pool_name --subscription $subscription_id \\\n",
        "    --workspace-name $synapse_workspace_dropdown.value \\\n",
        "    --resource-group $synapse_group_dropdown.value \\\n",
        "    --spark-version 3.1 --node-count 3 --node-size Small --debug\n",
        "    print('====== Task completed. ======')\n",
        "elif continuation_flag:\n",
        "    print(\"Please enter a valid Spark pool name.\")"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "gather": {
          "logged": 1633641369157
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Run the cell below to select a Spark pool that you want to use from the Spark pool list."
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 2. List Azure Synapse Spark Pool\n",
        "if continuation_flag and synapse_workspace_dropdown.value != None:\n",
        "    response_pool = !az synapse spark pool list --resource-group $synapse_group_dropdown.value --workspace-name $synapse_workspace_dropdown.value --subscription $subscription_id\n",
        "    if response_pool!= None and len(response_pool.grep(\"ERROR: AADSTS70043\")) == 0:\n",
        "        pool_list = convert_slist_to_dataframe(response_pool, '\"name', 0, 13, -2)\n",
        "        if len(pool_list) > 0:\n",
        "            spark_pool_dropdown = ipywidgets.Dropdown(options=pool_list[0], description='Spark Pools:')\n",
        "            display(spark_pool_dropdown)\n",
        "    else:\n",
        "        print(\"First make sure you have logged into the system.\")\n",
        "else:\n",
        "    continuation_flag = False\n",
        "    print(\"Need to have a Azure Spnapse Workspace to proceed.\")\n"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "gather": {
          "logged": 1633641662510
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 5. Configure Azure ML Workspace and Linked Services\r\n",
        "In this section, you will create a linked service, to link the selected Azure ML workspace to the selected Azure Synapse workspace, you need to be an owner of the selected Synapse workspace to proceed.  You then can attached a Spark pool to the linked service. </br>\r\n",
        "1. Select Azure resource group for Azure ML </br>\r\n",
        "2. Select Azure ML workspace </br>\r\n",
        "3. Get existing linked services for selected Azure ML workspace </br>\r\n",
        "4. Enter a linked service name </br>\r\n",
        "5. Create the linked service </br>\r\n",
        "6. Enter a Synapse compute name </br>\r\n",
        "7. Attach the Spark pool to the linked service </br>"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Select Azure Resource Group for Azure ML\r\n",
        "if continuation_flag:\r\n",
        "    aml_group_list = resource_client.resource_groups.list()\r\n",
        "    aml_group_dropdown = ipywidgets.Dropdown(options=sorted([g.name for g in aml_group_list]), description='Groups:')\r\n",
        "    display(aml_group_dropdown)"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642278196
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Select Azure ML Workspace\n",
        "if continuation_flag and aml_group_dropdown.value != None:\n",
        "    aml_workspace_result = Workspace.list(subscription_id=subscription_id, resource_group=aml_group_dropdown.value)\n",
        "    if aml_workspace_result != None:\n",
        "        aml_workspace_dropdown = ipywidgets.Dropdown(options=sorted(list(aml_workspace_result.keys())), description='AML WS:')\n",
        "        display(aml_workspace_dropdown)\n",
        "else:\n",
        "    continuation_flag = False\n",
        "    print(\"Need to have a Azure Resource Group to proceed.\")"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "gather": {
          "logged": 1633642285758
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Get Linked services for selected AML workspace\r\n",
        "if continuation_flag and aml_workspace_dropdown.value != None:\r\n",
        "    has_linked_service = False\r\n",
        "    aml_workspace = Workspace.get(name=aml_workspace_dropdown.value, subscription_id=subscription_id, resource_group=aml_group_dropdown.value)\r\n",
        "    aml_synapse_linked_service_list = LinkedService.list(aml_workspace)\r\n",
        "    if aml_synapse_linked_service_list != None:\r\n",
        "        for ls_name in [ls.name for ls in aml_synapse_linked_service_list]:\r\n",
        "            display(ls_name)\r\n",
        "            has_linked_service = True\r\n",
        "else:\r\n",
        "    print(\"No linked service\")\r\n",
        "    continuation_flag = False"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642290757
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "** EXECUTE THE FOLLOWING CELL ONLY WHEN YOU WANT TO: Create a new AML - Synapse linked service! ** </br>\r\n",
        "** Owner role of the Synapse workspace is required to create a linked service. **"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "linked_service_name=input('Linked service name:')"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642297847
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# !!PROCEED THIS ONLY WHEN YOU WANT TO: Create new linked service!!\r\n",
        "if continuation_flag and aml_workspace != None and synapse_workspace_dropdown.value != None and linked_service_name != None:\r\n",
        "    # Synapse Link Service Configuration\r\n",
        "    synapse_link_config = SynapseWorkspaceLinkedServiceConfiguration(subscription_id = aml_workspace.subscription_id, resource_group = synapse_group_dropdown.value, name= synapse_workspace_dropdown.value)\r\n",
        "\r\n",
        "    # Link workspaces and register Synapse workspace in Azure Machine Learning\r\n",
        "    linked_service = LinkedService.register(workspace = aml_workspace, name = linked_service_name, linked_service_config = synapse_link_config)\r\n",
        "        "
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1632356190410
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "** EXECUTE THE FOLLOWING CELL ONLY WHEN YOU WANT TO: Attach the selected Spark pool to the newly created linked service! **"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "synapse_compute_name=input('Synapse compute name:')"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642305321
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# !!PROCEED THIS ONLY WHEN YOU WANT TO: Attach the selected Spark pool to the above linked service\r\n",
        "if continuation_flag and aml_workspace != None and synapse_workspace_dropdown.value != None and linked_service != None and spark_pool_dropdown.value != None and synapse_compute_name != None:\r\n",
        "    spark_attach_config = SynapseCompute.attach_configuration(linked_service, type='SynapseSpark', pool_name=spark_pool_dropdown.value)\r\n",
        "    synapse_compute = ComputeTarget.attach(workspace = aml_workspace, name= synapse_compute_name, attach_configuration= spark_attach_config)\r\n",
        "    synapse_compute.wait_for_completion()"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1632356221430
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 6. Export Data from Azure Log Analytics to Azure Data Lake Storage Gen2\r\n",
        "In this section, you can export Microsoft Sentinel data in Log Analytics to a selected ADLS Gen2 storage. </br>\r\n",
        "1. Authenticate to Log Analytics </br>\r\n",
        "2. Select Log Analytics tables, no more than 10 tables.  This step may take a few minutes. </br>\r\n",
        "3. List existing Azure storages accounts in the selected Synapse workspace</br>\r\n",
        "4. Set target ADLS Gen2 storage as the data export destination</br>\r\n",
        "5. List all existing data export rules in the storage account</br>\r\n",
        "6. Enter data export rule name </br>\r\n",
        "7. Create a new data export rule </br>\r\n"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 1. Initialzie Azure LogAnalyticsDataClient, which is used to access Microsoft Sentinel log data in Azure Log Analytics.  \r\n",
        "# You may need to change resource_uri for various cloud environments.\r\n",
        "resource_uri = \"https://api.loganalytics.io\"\r\n",
        "la_client = get_client_from_cli_profile(LogAnalyticsManagementClient, subscription_id = subscription_id)\r\n",
        "creds, _ = get_azure_cli_credentials(resource=resource_uri)\r\n",
        "la_data_client = LogAnalyticsDataClient(creds)"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642693855
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "* In the following step, you may select no more than 10 tables for data export.  This process may take a few minutes, please be patient."
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 2. Get all tables available using Kusto query language.  If you need to know more about KQL, please check out the link provided at the introductory section.\r\n",
        "tables_result = None\r\n",
        "table_list = None\r\n",
        "all_tables_query = \"union withsource = SentinelTableName * | distinct SentinelTableName | sort by SentinelTableName asc\"\r\n",
        "if la_data_client != None:\r\n",
        "    tables_result = la_data_client.query(workspace_id, QueryBody(query=all_tables_query))\r\n",
        "\r\n",
        "if tables_result != None:\r\n",
        "    table_list = process_la_result(tables_result)\r\n",
        "    tables = sorted(table_list.SentinelTableName.tolist())\r\n",
        "    table_dropdown = ipywidgets.SelectMultiple(options=tables, row = 5, description='Tables:')\r\n",
        "    display(table_dropdown)"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642700739
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 3. List AzureBlobFS Storage URL in Synapse linked service\r\n",
        "if continuation_flag and synapse_workspace_dropdown.value != None:\r\n",
        "    synapse_linked_service_response = !az synapse linked-service list --workspace-name $synapse_workspace_dropdown.value\r\n",
        "    sls_list = convert_slist_to_dataframe(synapse_linked_service_response, '\"url', 0, 14, -1)\r\n",
        "    if len(sls_list) > 0:\r\n",
        "            synapse_linked_service_dropdown = ipywidgets.Dropdown(options=sls_list[0], description='ADLS URL:')\r\n",
        "            display(synapse_linked_service_dropdown)\r\n",
        "    else:\r\n",
        "        continuation_flag = False\r\n",
        "        print(\"Please create Azure Synapse linked service for storage before proceeding to next.\")\r\n",
        "else:\r\n",
        "    continuation_flag = False\r\n",
        "    print(\"Need to have a Azure Synapse workspace to proceed.\")\r\n",
        "                          "
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642712137
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 4. Set target ADLS Gen2 storage as data export destination\r\n",
        "if continuation_flag and synapse_linked_service_dropdown.value != None:\r\n",
        "    adls_gen2_name = urlparse(synapse_linked_service_dropdown.value).netloc.split('.')[0]\r\n",
        "    \r\n",
        "if continuation_flag and adls_gen2_name == None:\r\n",
        "    # You may set ADLS Gen2 manually here:\r\n",
        "    adls_gen2_name = \"\"\r\n",
        "\r\n",
        "if continuation_flag and synapse_group_dropdown.value != None and adls_gen2_name != None:\r\n",
        "    adls_resource_id = '/subscriptions/{0}/resourceGroups/{1}/providers/Microsoft.Storage/storageAccounts/{2}'.format(subscription_id, synapse_group_dropdown.value, adls_gen2_name)\r\n",
        "else:\r\n",
        "    continuation_flag = False\r\n",
        "    print(\"Need to have a resource group and an ADLS Gen2 account to continue.\")"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642899785
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 5. List all data export rules\r\n",
        "# Keep in mind that you cannot a destination that is already defined in a rule. Destination (resource id) must be unique across export rules in your workspace!!\r\n",
        "if continuation_flag:\r\n",
        "    export_response = !az monitor log-analytics workspace data-export list --resource-group $resource_group --workspace-name $workspace_name  \r\n",
        "    if export_response != None:\r\n",
        "        export_list = convert_slist_to_dataframe(export_response, '\"resourceId', 0, 19, -2)\r\n",
        "        if len(export_list) > 0:\r\n",
        "            data_export_dropdown = ipywidgets.Dropdown(options=export_list[0], description='Data Exports:')\r\n",
        "            display(data_export_dropdown)\r\n",
        "        else:\r\n",
        "            print(\"No data export rule was found\")\r\n",
        "    else:\r\n",
        "        print(\"No data export rule was found, you may create one in the following step.\")"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633642953956
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "** EXECUTE THE FOLLOWING CELL ONLY WHEN YOU WANT TO: Export data tables from Log Analytics to the selected Azure Data Lake Storage Gen 2! **"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "export_name=input('Export name:')"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633643007863
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 6. !!PROCEED THIS ONLY WHEN YOU WANT TO: Export data from Log Analytics to Azure Data Lake Storage Gen 2\r\n",
        "if continuation_flag and adls_resource_id != None and table_dropdown.value != None and export_name != None:\r\n",
        "    tables = \" \".join(table_dropdown.value)\r\n",
        "    !az monitor log-analytics workspace data-export create --resource-group $resource_group --workspace-name $workspace_name \\\r\n",
        "    --name $export_name --tables $tables --destination $adls_resource_id"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1628200011200
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Bonus\r\n",
        "These are optional steps."
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# List Log Analytics data export rules\r\n",
        "if continuation_flag:\r\n",
        "    export_response = !az monitor log-analytics workspace data-export list --resource-group $resource_group --workspace-name $workspace_name\r\n",
        "    if export_response != None:\r\n",
        "        export_rule_list = convert_slist_to_dataframe(export_response, '\"name', 0, 13, -2)\r\n",
        "        if len(export_rule_list) > 0:\r\n",
        "                export_rule_dropdown = ipywidgets.Dropdown(options=export_rule_list[0], description='Export Rules:')\r\n",
        "                display(export_rule_dropdown)"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1633643058766
        }
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "** EXECUTE THE FOLLOWING CELL ONLY WHEN YOU WANT TO: Delete a data export rule by name! **"
      ],
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# 2-b. Delete a Log Analytics data export rule\r\n",
        "if continuation_flag and export_rule_dropdown.value != None:\r\n",
        "    result = !az monitor log-analytics workspace data-export delete --resource-group $resource_group --workspace-name $workspace_name --name $export_rule_dropdown.value --yes\r\n",
        "    print(result)"
      ],
      "outputs": [],
      "execution_count": null,
      "metadata": {
        "collapsed": true,
        "jupyter": {
          "source_hidden": false,
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        },
        "gather": {
          "logged": 1628198253214
        }
      }
    }
  ],
  "metadata": {
    "kernel_info": {
      "name": "python38-azureml"
    },
    "kernelspec": {
      "name": "python38-azureml",
      "language": "python",
      "display_name": "Python 3.8 - AzureML"
    },
    "language_info": {
      "name": "python",
      "version": "3.8.5",
      "mimetype": "text/x-python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "pygments_lexer": "ipython3",
      "nbconvert_exporter": "python",
      "file_extension": ".py"
    },
    "microsoft": {
      "host": {
        "AzureML": {
          "notebookHasBeenCompleted": true
        }
      }
    },
    "nteract": {
      "version": "nteract-front-end@1.0.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}