{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Azure Analysis Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a demo notebook showing how to use **azure** pipeline on a signal using the `orion.analysis.analyze` function. For more information about the usage of microsoft's anomaly detection API, view their documentation [here](https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load the data\n",
    "\n",
    "In the first step, we load the signal that we want to process.\n",
    "\n",
    "To do so, we need to import the `orion.data.load_signal` function and call it passing\n",
    "either the path to the CSV file or the name of the signal to fetch fromm the `s3 bucket`.\n",
    "\n",
    "In this case, we will be loading the `S-1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>timestamp</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1222819200</td>\n",
       "      <td>-0.366359</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1222840800</td>\n",
       "      <td>-0.394108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1222862400</td>\n",
       "      <td>0.403625</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1222884000</td>\n",
       "      <td>-0.362759</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1222905600</td>\n",
       "      <td>-0.370746</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    timestamp     value\n",
       "0  1222819200 -0.366359\n",
       "1  1222840800 -0.394108\n",
       "2  1222862400  0.403625\n",
       "3  1222884000 -0.362759\n",
       "4  1222905600 -0.370746"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from orion.data import load_signal\n",
    "\n",
    "signal_path = 'S-1'\n",
    "\n",
    "data = load_signal(signal_path)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Setup the pipeline\n",
    "\n",
    "To use `azure` pipeline, we first need two important information: `subscription_key` and `endpoint`. In order to obtain them, you must setup an Anomaly Detection resource on Azure portal, follow the steps mentioned [here](https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/quickstarts/client-libraries?pivots=programming-language-python&tabs=linux) to setup your resource instance.\n",
    "\n",
    "Once that's accomplished, update the hyperparameter dictionary specified to the values of your instance. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your subscription key and endpoint\n",
    "\n",
    "subscription_key = None\n",
    "endpoint = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "hyperparameters = {\n",
    "    \"mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate#1\": {\n",
    "        \"interval\": 21600,\n",
    "    },\n",
    "    \"orion.primitives.azure_anomaly_detector.split_sequence#1\": {\n",
    "        \"sequence_size\": 6000,\n",
    "        \"overlap_size\": 2640\n",
    "    },\n",
    "    \"orion.primitives.azure_anomaly_detector.detect_anomalies#1\": {\n",
    "        \"subscription_key\": subscription_key,\n",
    "        \"endpoint\": endpoint,\n",
    "        \"overlap_size\": 2640,\n",
    "        \"interval\": 21600,\n",
    "        \"granularity\": \"hourly\",\n",
    "        \"custom_interval\": 6\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `split_sequence` primitive takes the signal and splits it into multiple signals based on the `sequence_size` and `overlap_size`. Since the method uses a rolling window sequence approach, we use the `overlap_size` to maintain historical information when splitting the sequence.\n",
    "\n",
    "It is custom to set the `overlap_size` as the same value in both `split_sequence` and `detect_anomalies` primitives. In addition, we require the frequency of the signal to be recorded in timestamp interval, as well as convention based where `granularity` refers to the aggregation unit (e.g. hourly, minutely, etc) and `custom_interval` refers to the quantity (in this case, 6 hours)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Detect anomalies using azure pipeline\n",
    "\n",
    "Once we have the data and setup, we use the azure pipeline to analyze it and search for anomalies.\n",
    "\n",
    "In order to do so, we will have import the `orion.analysis.analyze` function and pass it\n",
    "the loaded data and the path to the pipeline JSON that we want to use.\n",
    "\n",
    "In this case, we will be using the `azure.json` pipeline from inside the `orion` folder.\n",
    "\n",
    "The output will be a ``pandas.DataFrame`` containing a table with the detected anomalies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from orion.analysis import analyze\n",
    "\n",
    "pipeline_path = 'azure'\n",
    "\n",
    "if subscription_key and endpoint:\n",
    "    anomalies = analyze(pipeline_path, data, hyperparams=hyperparameters)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}