{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Orion on Multivariate Input\n",
    "\n",
    "In this notebook, we demonstrate how you can use multivariate time series in Orion. We will walk through the process using NASA's dataset, you can find the original data in [Telemanom](https://github.com/khundman/telemanom) github or directly from their [S3 bucket](https://s3-us-west-2.amazonaws.com/telemanom/data.zip)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load the data\n",
    "\n",
    "In the first step, we setup the environment and load the CSV that we want to process.\n",
    "\n",
    "To do so, we need to import the `orion.data.load_signal` function and call it passing\n",
    "the path to the CSV file.\n",
    "\n",
    "In this case, we will be loading the `S-1.csv` file from inside the `data/multivariate` folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>timestamp</th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>...</th>\n",
       "      <th>15</th>\n",
       "      <th>16</th>\n",
       "      <th>17</th>\n",
       "      <th>18</th>\n",
       "      <th>19</th>\n",
       "      <th>20</th>\n",
       "      <th>21</th>\n",
       "      <th>22</th>\n",
       "      <th>23</th>\n",
       "      <th>24</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1222819200</td>\n",
       "      <td>-0.366359</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1222840800</td>\n",
       "      <td>-0.394108</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1222862400</td>\n",
       "      <td>0.403625</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1222884000</td>\n",
       "      <td>-0.362759</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1222905600</td>\n",
       "      <td>-0.370746</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 26 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    timestamp         0    1    2    3    4    5    6    7    8  ...   15  \\\n",
       "0  1222819200 -0.366359  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0   \n",
       "1  1222840800 -0.394108  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0   \n",
       "2  1222862400  0.403625  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0   \n",
       "3  1222884000 -0.362759  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0   \n",
       "4  1222905600 -0.370746  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0   \n",
       "\n",
       "    16   17   18   19   20   21   22   23   24  \n",
       "0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "1  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "2  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "3  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "4  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "\n",
       "[5 rows x 26 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from orion.data import load_signal\n",
    "\n",
    "signal_path = 'multivariate/S-1'\n",
    "\n",
    "data = load_signal(signal_path)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Detect anomalies using Orion\n",
    "\n",
    "Once we have the data, let us try to use the LSTM pipeline to analyze it and search for anomalies.\n",
    "\n",
    "In order to do so, we will import the `Orion` class from `orion.core` and pass it\n",
    "the loaded data and the path to the pipeline JSON that we want to use.\n",
    "\n",
    "In this case, we will be using the `lstm_dynamic_threshold` pipeline from inside the `orion` folder. \n",
    "\n",
    "In addition, we setup the hyperparameters to correctly identify the signal we are trying to predict. In this case, dimension `0` is the signal value and such we set `target_column` to `0`. Note that `0` refers to the location of the channel rather than the name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From /Users/sarah/opt/anaconda3/envs/orion/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "If using Keras pass *_constraint arguments to layers.\n",
      "WARNING:tensorflow:From /Users/sarah/opt/anaconda3/envs/orion/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.\n",
      "\n",
      "Train on 7919 samples, validate on 1980 samples\n",
      "Epoch 1/5\n",
      "7919/7919 [==============================] - 39s 5ms/step - loss: 0.2056 - mse: 0.2056 - val_loss: 0.2614 - val_mse: 0.2614\n",
      "Epoch 2/5\n",
      "7919/7919 [==============================] - 35s 4ms/step - loss: 0.1993 - mse: 0.1993 - val_loss: 0.2581 - val_mse: 0.2581\n",
      "Epoch 3/5\n",
      "7919/7919 [==============================] - 37s 5ms/step - loss: 0.1973 - mse: 0.1973 - val_loss: 0.2637 - val_mse: 0.2637\n",
      "Epoch 4/5\n",
      "7919/7919 [==============================] - 36s 5ms/step - loss: 0.1934 - mse: 0.1934 - val_loss: 0.2594 - val_mse: 0.2594\n",
      "Epoch 5/5\n",
      "7919/7919 [==============================] - 36s 5ms/step - loss: 0.1933 - mse: 0.1933 - val_loss: 0.2808 - val_mse: 0.2808\n",
      "9899/9899 [==============================] - 13s 1ms/step\n"
     ]
    }
   ],
   "source": [
    "from orion import Orion\n",
    "\n",
    "hyperparameters = {\n",
    "    \"mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1\": {\n",
    "        'target_column': 0 \n",
    "    },\n",
    "    'keras.Sequential.LSTMTimeSeriesRegressor#1': {\n",
    "        'epochs': 5,\n",
    "        'verbose': True\n",
    "    }\n",
    "}\n",
    "\n",
    "orion = Orion(\n",
    "    pipeline='lstm_dynamic_threshold',\n",
    "    hyperparameters=hyperparameters\n",
    ")\n",
    "\n",
    "orion.fit(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output will be a ``pandas.DataFrame`` containing a table with the detected anomalies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "9899/9899 [==============================] - 12s 1ms/step\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>start</th>\n",
       "      <th>end</th>\n",
       "      <th>severity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1228219200</td>\n",
       "      <td>1229472000</td>\n",
       "      <td>0.623775</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        start         end  severity\n",
       "0  1228219200  1229472000  0.623775"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "orion.detect(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For reconstruction based pipelines, we need to specify the shape of the **input** and **target** sequences. For example, assume we are using the `lstm_autoencoder` pipeline, we set the hyperparameter values \n",
    "\n",
    "```python3\n",
    "hyperparameters = {\n",
    "    \"mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1\": {\n",
    "        'window_size': 100,\n",
    "        'target_column': 0 \n",
    "    },\n",
    "    'keras.Sequential.LSTMSeq2Seq#1': {\n",
    "        'epochs': 5,\n",
    "        'verbose': True,\n",
    "        'window_size': 100,\n",
    "        'input_shape': [100, 25],\n",
    "        'target_shape': [100, 1],\n",
    "    }\n",
    "}\n",
    "```\n",
    "\n",
    "where the shape of the input is dependent on \n",
    "\n",
    "1. `window_size` and \n",
    "2. the number of channels in the data.\n",
    "\n",
    "Similarly, the shape of the output is dependent on the `window_size`. Currently, we are focusing on multivariate input and univariate output, therefore the target shape should always be [`window_size`, 1]."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 7999 samples, validate on 2000 samples\n",
      "Epoch 1/5\n",
      "7999/7999 [==============================] - 17s 2ms/step - loss: 0.2017 - mse: 0.2017 - val_loss: 0.2593 - val_mse: 0.2593\n",
      "Epoch 2/5\n",
      "7999/7999 [==============================] - 16s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2604 - val_mse: 0.2604\n",
      "Epoch 3/5\n",
      "7999/7999 [==============================] - 17s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2579 - val_mse: 0.2579\n",
      "Epoch 4/5\n",
      "7999/7999 [==============================] - 16s 2ms/step - loss: 0.1985 - mse: 0.1985 - val_loss: 0.2577 - val_mse: 0.2577\n",
      "Epoch 5/5\n",
      "7999/7999 [==============================] - 17s 2ms/step - loss: 0.1984 - mse: 0.1984 - val_loss: 0.2558 - val_mse: 0.2558\n",
      "9999/9999 [==============================] - 7s 677us/step\n"
     ]
    }
   ],
   "source": [
    "hyperparameters = {\n",
    "    \"mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1\": {\n",
    "        'window_size': 150,\n",
    "        'target_column': 0 \n",
    "    },\n",
    "    'keras.Sequential.LSTMSeq2Seq#1': {\n",
    "        'epochs': 5,\n",
    "        'verbose': True,\n",
    "        'window_size': 150,\n",
    "        'input_shape': [150, 25],\n",
    "        'target_shape': [150, 1],\n",
    "    }\n",
    "}\n",
    "\n",
    "orion = Orion(\n",
    "    pipeline='lstm_autoencoder',\n",
    "    hyperparameters=hyperparameters\n",
    ")\n",
    "\n",
    "orion.fit(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TadGAN is also a reconstruction based pipeline, thus we specify the `input_shape` to be of multivariate shape as needed.\n",
    "\n",
    "```python3\n",
    "hyperparameters = {\n",
    "    'orion.primitives.tadgan.TadGAN#1': {\n",
    "        'epochs': 5,\n",
    "        'verbose': True,\n",
    "        'input_shape': [100, 25]\n",
    "    }\n",
    "}\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}