In [1]:
import pandas as pd

from orion.data import load_signal

# 1. Data

In [2]:
signal_name = 'S-1'

data = load_signal(signal_name)

data.head()

Unnamed: 0,timestamp,value
0,1222819200,-0.366359
1,1222840800,-0.394108
2,1222862400,0.403625
3,1222884000,-0.362759
4,1222905600,-0.370746


# 2. Pipeline

In [3]:
from mlblocks import MLPipeline

pipeline_name = 'matrixprofile'

pipeline = MLPipeline(pipeline_name)

## step by step execution

MLPipelines are compose of a squence of primitives, these primitives apply tranformation and calculation operations to the data and updates the variables within the pipeline. To view the primitives used by the pipeline, we access its `primtivies` attribute. 

The `matrixprofile` contains 7 primitives. we will observe how the `context` (which are the variables held within the pipeline) are updated after the execution of each primitive.

In [4]:
pipeline.primitives

['mlstars.custom.timeseries_preprocessing.time_segments_aggregate',
 'sklearn.impute.SimpleImputer',
 'sklearn.preprocessing.MinMaxScaler',
 'numpy.reshape',
 'stumpy.stump',
 'orion.primitives.timeseries_preprocessing.slice_array_by_dims',
 'numpy.reshape',
 'orion.primitives.timeseries_anomalies.find_anomalies']

### time segments aggregate
this primitive creates an equi-spaced time series by aggregating values over fixed specified interval.

* **input**: `X` which is an n-dimensional sequence of values.
* **output**:
    - `X` sequence of aggregated values, one column for each aggregation method.
    - `index` sequence of index values (first index of each aggregated segment).

In [5]:
context = pipeline.fit(data, output_=0)
context.keys()

dict_keys(['X', 'index'])

In [6]:
for i, x in list(zip(context['index'], context['X']))[:5]:
    print("entry at {} has value {}".format(i, x))

entry at 1222819200 has value [-0.36635895]
entry at 1222840800 has value [-0.39410778]
entry at 1222862400 has value [0.4036246]
entry at 1222884000 has value [-0.36275906]
entry at 1222905600 has value [-0.37074649]


### SimpleImputer
this primitive is an imputation transformer for filling missing values.
* **input**: `X` which is an n-dimensional sequence of values.
* **output**: `X` which is a transformed version of X.

In [7]:
step = 1

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

dict_keys(['index', 'X'])

### MinMaxScaler
this primitive transforms features by scaling each feature to a given range.
* **input**: `X` the data used to compute the per-feature minimum and maximum used for later scaling along the features axis.
* **output**: `X` which is a transformed version of X.

In [8]:
step = 2

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

dict_keys(['index', 'X'])

In [9]:
# after scaling the data between [0, 1]
# in this example, no change is observed
# since the data was pre-handedly scaled

for i, x in list(zip(context['index'], context['X']))[:5]:
    print("entry at {} has value {}".format(i, x))

entry at 1222819200 has value [0.31682053]
entry at 1222840800 has value [0.30294611]
entry at 1222862400 has value [0.7018123]
entry at 1222884000 has value [0.31862047]
entry at 1222905600 has value [0.31462675]


### reshape

this primitive flattens the array.
* **input**: `X` n-dimensional values.
* **output**: `X` which is a flat version of X.

In [10]:
step = 3

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

dict_keys(['index', 'X'])

In [11]:
context['X'].shape

(10149,)

### stump

this primitive computes the matrix profile of `X`.
* **input**: `X` n-dimensional values.
* **output**: `y` which is the matrix profile of X.

In [12]:
step = 4

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

dict_keys(['index', 'X', 'y'])

In [13]:
context['y'].shape

(10050, 4)

### slice array by dim

this primitive extracts the distance to the nearest neighbor from the matrix profile.
* **input**: `y` n-dimensional values.
* **output**: `y` which is the distance array in y.

In [14]:
step = 5

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

dict_keys(['index', 'X', 'y'])

In [15]:
context['y'].shape

(10050, 1)

### reshape

this primitive flattens the array.
* **input**: `y` n-dimensional values.
* **output**: `errors` which is a flat version of y.

In [16]:
step = 6

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

dict_keys(['index', 'X', 'y', 'errors'])

### find anomalies

this primitive extracts anomalies from sequences of errors following the approach explained in the [related paper](https://arxiv.org/pdf/1802.04431.pdf).

* **input**: 
    - `errors` array of errors.
    - `index` array of indices of errors.
* **output**: `y` array containing start-index, end-index, score for each anomalous sequence that was found.

In [17]:
step = 7

context = pipeline.fit(**context, output_=step, start_=step)
context.keys()

  np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):


dict_keys(['index', 'errors', 'X', 'y'])

In [18]:
pd.DataFrame(context['y'], columns=['start', 'end', 'severity'])

Unnamed: 0,start,end,severity
0,1310386000.0,1312826000.0,0.198253
1,1398125000.0,1401408000.0,1.728175
