{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "5d4ce4b8",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"from orion.data import load_signal"
]
},
{
"cell_type": "markdown",
"id": "ec5c7e8c",
"metadata": {},
"source": [
"# 1. Data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "da695e98",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" timestamp | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1222819200 | \n",
" -0.366359 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1222840800 | \n",
" -0.394108 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1222862400 | \n",
" 0.403625 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1222884000 | \n",
" -0.362759 | \n",
"
\n",
" \n",
" | 4 | \n",
" 1222905600 | \n",
" -0.370746 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" timestamp value\n",
"0 1222819200 -0.366359\n",
"1 1222840800 -0.394108\n",
"2 1222862400 0.403625\n",
"3 1222884000 -0.362759\n",
"4 1222905600 -0.370746"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"signal_name = 'S-1'\n",
"\n",
"data = load_signal(signal_name)\n",
"\n",
"data.head()"
]
},
{
"cell_type": "markdown",
"id": "96767bae",
"metadata": {},
"source": [
"# 2. Pipeline"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "fa524c4c",
"metadata": {},
"outputs": [],
"source": [
"from mlblocks import MLPipeline\n",
"\n",
"pipeline_name = 'lstm_dynamic_threshold'\n",
"\n",
"pipeline = MLPipeline(pipeline_name)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5b889d9f",
"metadata": {},
"outputs": [],
"source": [
"hyperparameters = {\n",
" 'keras.Sequential.LSTMTimeSeriesRegressor#1': {\n",
" 'epochs': 5,\n",
" 'verbose': True\n",
" }\n",
"}\n",
"\n",
"pipeline.set_hyperparameters(hyperparameters)"
]
},
{
"cell_type": "markdown",
"id": "0f6eddcc",
"metadata": {},
"source": [
"## step by step execution\n",
"\n",
"MLPipelines are compose of a squence of primitives, these primitives apply tranformation and calculation operations to the data and updates the variables within the pipeline. To view the primitives used by the pipeline, we access its `primtivies` attribute. \n",
"\n",
"The `lstm_dynamic_threshold` contains 7 primitives. we will observe how the `context` (which are the variables held within the pipeline) are updated after the execution of each primitive."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ea1faf86",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['mlstars.custom.timeseries_preprocessing.time_segments_aggregate',\n",
" 'sklearn.impute.SimpleImputer',\n",
" 'sklearn.preprocessing.MinMaxScaler',\n",
" 'mlstars.custom.timeseries_preprocessing.rolling_window_sequences',\n",
" 'keras.Sequential.LSTMTimeSeriesRegressor',\n",
" 'orion.primitives.timeseries_errors.regression_errors',\n",
" 'orion.primitives.timeseries_anomalies.find_anomalies']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pipeline.primitives"
]
},
{
"cell_type": "markdown",
"id": "554f5daa",
"metadata": {},
"source": [
"### time segments aggregate\n",
"this primitive creates an equi-spaced time series by aggregating values over fixed specified interval.\n",
"\n",
"* **input**: `X` which is an n-dimensional sequence of values.\n",
"* **output**:\n",
" - `X` sequence of aggregated values, one column for each aggregation method.\n",
" - `index` sequence of index values (first index of each aggregated segment)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3b2c89f5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['X', 'index'])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"context = pipeline.fit(data, output_=0)\n",
"context.keys()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e69c4925",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"entry at 1222819200 has value [-0.36635895]\n",
"entry at 1222840800 has value [-0.39410778]\n",
"entry at 1222862400 has value [0.4036246]\n",
"entry at 1222884000 has value [-0.36275906]\n",
"entry at 1222905600 has value [-0.37074649]\n"
]
}
],
"source": [
"for i, x in list(zip(context['index'], context['X']))[:5]:\n",
" print(\"entry at {} has value {}\".format(i, x))"
]
},
{
"cell_type": "markdown",
"id": "00be9392",
"metadata": {},
"source": [
"### SimpleImputer\n",
"this primitive is an imputation transformer for filling missing values.\n",
"* **input**: `X` which is an n-dimensional sequence of values.\n",
"* **output**: `X` which is a transformed version of X."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "9658b615",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['index', 'X'])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"step = 1\n",
"\n",
"context = pipeline.fit(**context, output_=step, start_=step)\n",
"context.keys()"
]
},
{
"cell_type": "markdown",
"id": "86a369ed",
"metadata": {},
"source": [
"### MinMaxScaler\n",
"this primitive transforms features by scaling each feature to a given range.\n",
"* **input**: `X` the data used to compute the per-feature minimum and maximum used for later scaling along the features axis.\n",
"* **output**: `X` which is a transformed version of X."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "d130e641",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['index', 'X'])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"step = 2\n",
"\n",
"context = pipeline.fit(**context, output_=step, start_=step)\n",
"context.keys()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "5d15a6e8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"entry at 1222819200 has value [-0.36635895]\n",
"entry at 1222840800 has value [-0.39410778]\n",
"entry at 1222862400 has value [0.4036246]\n",
"entry at 1222884000 has value [-0.36275906]\n",
"entry at 1222905600 has value [-0.37074649]\n"
]
}
],
"source": [
"# after scaling the data between [-1, 1]\n",
"# in this example, no change is observed\n",
"# since the data was pre-handedly scaled\n",
"\n",
"for i, x in list(zip(context['index'], context['X']))[:5]:\n",
" print(\"entry at {} has value {}\".format(i, x))"
]
},
{
"cell_type": "markdown",
"id": "5b9eeac4",
"metadata": {},
"source": [
"### rolling window sequence\n",
"this primitive generates many sub-sequences of the original sequence. it uses a rolling window approach to create the sub-sequences out of time series data.\n",
"\n",
"* **input**: \n",
" - `X` n-dimensional sequence to iterate over.\n",
" - `index` array containing the index values of X.\n",
"* **output**:\n",
" - `X` input sequences.\n",
" - `y` target sequences.\n",
" - `index` first index value of each input sequence.\n",
" - `target_index` first index value of each target sequence."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "32759182",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['index', 'X', 'y', 'target_index'])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"step = 3\n",
"\n",
"context = pipeline.fit(**context, output_=step, start_=step)\n",
"context.keys()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "2500967d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X shape = (9899, 250, 1)\n",
"y shape = (9899, 1)\n",
"index shape = (9899,)\n",
"target index shape = (9899,)\n"
]
}
],
"source": [
"# after slicing X into multiple sub-sequences\n",
"# we obtain a 3 dimensional matrix X where\n",
"# the shape indicates (# slices, window size, 1)\n",
"# and similarly y is (# slices, target size)\n",
"\n",
"print(\"X shape = {}\\ny shape = {}\\nindex shape = {}\\ntarget index shape = {}\".format(\n",
" context['X'].shape, context['y'].shape, context['index'].shape, context['target_index'].shape))"
]
},
{
"cell_type": "markdown",
"id": "92a03719",
"metadata": {},
"source": [
"### LSTMTimeSeriesRegressor\n",
"this is a prediction model with double stacked LSTM layers used as a time series regressor. you can read more about it in the [related paper](https://arxiv.org/pdf/1802.04431.pdf).\n",
"\n",
"* **input**: \n",
" - `X` n-dimensional array containing the input sequences for the model.\n",
" - `y` n-dimensional array containing the target sequences for the model.\n",
"* **output**: `y_hat` predicted values."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "062598c1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:From /Users/sarahalnegheimish/opt/anaconda3/envs/orion/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n",
"Instructions for updating:\n",
"If using Keras pass *_constraint arguments to layers.\n",
"WARNING:tensorflow:From /Users/sarahalnegheimish/opt/anaconda3/envs/orion/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.\n",
"\n",
"Train on 7919 samples, validate on 1980 samples\n",
"Epoch 1/5\n",
"7919/7919 [==============================] - 29s 4ms/step - loss: 0.1960 - mse: 0.1960 - val_loss: 0.2748 - val_mse: 0.2748\n",
"Epoch 2/5\n",
"7919/7919 [==============================] - 31s 4ms/step - loss: 0.1928 - mse: 0.1928 - val_loss: 0.3016 - val_mse: 0.3016\n",
"Epoch 3/5\n",
"7919/7919 [==============================] - 32s 4ms/step - loss: 0.1898 - mse: 0.1898 - val_loss: 0.3237 - val_mse: 0.3237\n",
"Epoch 4/5\n",
"7919/7919 [==============================] - 29s 4ms/step - loss: 0.1886 - mse: 0.1886 - val_loss: 0.2640 - val_mse: 0.2640\n",
"Epoch 5/5\n",
"7919/7919 [==============================] - 29s 4ms/step - loss: 0.1863 - mse: 0.1863 - val_loss: 0.4433 - val_mse: 0.4433\n",
"9899/9899 [==============================] - 11s 1ms/step\n"
]
},
{
"data": {
"text/plain": [
"dict_keys(['index', 'target_index', 'X', 'y', 'y_hat'])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"step = 4\n",
"\n",
"context = pipeline.fit(**context, output_=step, start_=step)\n",
"context.keys()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "a9007107",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"entry at 1228219200 has value [-0.3741225], predicted value [0.15370035]\n",
"entry at 1228240800 has value [1.], predicted value [0.18855423]\n",
"entry at 1228262400 has value [-0.35400432], predicted value [0.13846633]\n",
"entry at 1228284000 has value [1.], predicted value [0.11550236]\n",
"entry at 1228305600 has value [-0.38154089], predicted value [0.00080755]\n"
]
}
],
"source": [
"for i, y, y_hat in list(zip(context['target_index'], context['y'], context['y_hat']))[:5]:\n",
" print(\"entry at {} has value {}, predicted value {}\".format(i, y, y_hat))"
]
},
{
"cell_type": "markdown",
"id": "adeadad7",
"metadata": {},
"source": [
"### regression errors\n",
"\n",
"this primitive computes an array of absolute errors comparing predictions and expected output. Optionally smooth them using EWMA.\n",
"\n",
"* **input**: \n",
" - `y` ground truth.\n",
" - `y_hat` predicted values.\n",
"* **output**: `errors` array of errors."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "56173e5e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['index', 'target_index', 'y_hat', 'X', 'y', 'errors'])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"step = 5\n",
"\n",
"context = pipeline.fit(**context, output_=step, start_=step)\n",
"context.keys()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "125b1fc9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"entry at 1228219200 has error value 0.528\n",
"entry at 1228240800 has error value 0.671\n",
"entry at 1228262400 has error value 0.610\n",
"entry at 1228284000 has error value 0.681\n",
"entry at 1228305600 has error value 0.619\n"
]
}
],
"source": [
"for i, e in list(zip(context['target_index'], context['errors']))[:5]:\n",
" print(\"entry at {} has error value {:.3f}\".format(i, e))"
]
},
{
"cell_type": "markdown",
"id": "a15a1d20",
"metadata": {},
"source": [
"### find anomalies\n",
"this primitive extracts anomalies from sequences of errors following the approach explained in the [related paper](https://arxiv.org/pdf/1802.04431.pdf).\n",
"\n",
"* **input**: \n",
" - `errors` array of errors.\n",
" - `target_index` array of indices of errors.\n",
"* **output**: `y` array containing start-index, end-index, score for each anomalous sequence that was found."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "ac7ad495",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['index', 'target_index', 'y_hat', 'errors', 'X', 'y'])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"step = 6\n",
"\n",
"context = pipeline.fit(**context, output_=step, start_=step)\n",
"context.keys()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "1aebdbc1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" start | \n",
" end | \n",
" severity | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1.228219e+09 | \n",
" 1.229472e+09 | \n",
" 0.614295 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1.400134e+09 | \n",
" 1.404086e+09 | \n",
" 0.282328 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" start end severity\n",
"0 1.228219e+09 1.229472e+09 0.614295\n",
"1 1.400134e+09 1.404086e+09 0.282328"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(context['y'], columns=['start', 'end', 'severity'])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}