<style>div.container { width: 100% }</style>
<img style="float:left;  vertical-align:text-bottom;" height="65" width="172" src="../assets/PyViz_logo_wm_line.png" />
<div style="float:right; vertical-align:text-bottom;"><h2>Tutorial 09. Operations and Pipelines</h2></div>

When interactively exploring a dataset you often end up interleaving visualization and analysis code. In HoloViews your visualization and your data are one and the same, so analysis and data transformations can be applied directly to the visualizable data. For that purpose HoloViews provides operations, which can be used to implement any analysis or data transformation you might want to do. Operations take a HoloViews Element and return another Element of either the same type or a new type, depending on the operation. We'll illustrate operations and pipelines using a variety of libraries:

<div style="margin: 10px">
<a href="http://holoviews.org"><img style="margin:8px; display:inline; object-fit:scale-down; max-height:150px" src="../assets/holoviews.png"/></a>
<a href="http://bokeh.pydata.org"><img style="margin:8px; display:inline; object-fit:scale-down; max-height:150px" src="../assets/bokeh.png"/></a>
<a href="http://datashader.org"><img style="margin:8px; display:inline; object-fit:scale-down; max-height:150px" src="../assets/datashader.png"/></a>
<a href="http://ioam.github.io/param"><img style="margin:8px; display:inline; object-fit:scale-down; max-height:150px" src="../assets/param.png"/></a><br><br>
<a href="http://pandas.pydata.org"><img style="margin:8px; display:inline; object-fit:scale-down; max-height:140px" src="../assets/pandas.png"/></a>
<a href="http://matplotlib.org"><img style="margin:8px; display:inline; object-fit:scale-down; max-height:150px" src="../assets/matplotlib_wm.png"/></a>
<a href="http://numpy.org"><img style="margin:8px; display:inline; object-fit:scale-down; max-height:150px" src="../assets/numpy.png"/></a>
</div>

Since Operations know about HoloViews you can apply them to large collections of data collected in HoloMap and DynamicMap containers. Since operations work on both of these containers that means they can also be applied lazily. This feature allows us to chain multiple operations in a data analysis, processing, and visualization pipeline, e.g. to drive the operation of a dashboard.

Pipelines built using DynamicMap and HoloViews operations are also useful for caching intermediate results and just-in-time computations, because they lazily (re)compute just the part of the pipeline that has changed.

In [None]:
import time
import param
import pandas as pd
from bokeh.sampledata import stocks
import holoviews as hv
from holoviews import opts

from holoviews.operation.timeseries import rolling, rolling_outlier_std
from holoviews.operation.datashader import datashade, dynspread

hv.extension('bokeh')

opts.defaults(opts.Curve(width=600, framewise=True))

# Declare some data

In this example we'll work with a timeseries that stands in for stock-price data.  We'll define a small function to load the stock data and define a ``DynamicMap`` that will generate a timeseries for each stock symbol:

In [None]:
def load_symbol(symbol, **kwargs):
    df = pd.DataFrame(getattr(stocks, symbol))
    df['date'] = df.date.astype('datetime64[ns]')
    return hv.Curve(df, ('date', 'Date'), ('adj_close', 'Adjusted Close'))

stock_symbols = ['AAPL', 'FB', 'GOOG', 'IBM', 'MSFT']
dmap = hv.DynamicMap(load_symbol, kdims='Symbol').redim.values(Symbol=stock_symbols)


We will start by visualizing this data as-is:

In [None]:
dmap

## Applying an operation

Now let's start applying some operations to this data. HoloViews ships with two ready-to-use timeseries operations: the ``rolling`` operation, which applies a function over a rolling window, and a ``rolling_outlier_std`` operation that computes outlier points in a timeseries.  Specifically, ``rolling_outlier_std`` excludes points less than one sigma (standard deviation) away from the rolling mean, which is just one  example; you can trivially write your own operations that do whatever you like.

In [None]:
smoothed = rolling(dmap, rolling_window=30)
outliers = rolling_outlier_std(dmap, rolling_window=30)
smoothed * outliers.opts(color='red')

As you can see, the operations transform the ``Curve`` element into a smoothed version and a set of ``Scatter`` points containing the outliers both with a ``rolling_window`` of 30. Since we applied the operation to a ``DynamicMap``, the operation is lazy and only computes the result when it is requested. 

In [None]:
# Exercise: Apply the rolling and rolling_outlier_std operations changing the rolling_window and sigma parameters

## Linking operations to streams

Instead of supplying the parameter values for each operation explicitly as a scalar value, we can also define a ``Stream`` that will let us update our visualization dynamically. By supplying a ``Stream`` with a ``rolling_window`` parameter to both operations, we can now generate our own events on the stream and watch our visualization update each time.

In [None]:
rolling_stream = hv.streams.Stream.define('rolling', rolling_window=5)
stream = rolling_stream()

rolled_dmap = rolling(dmap, streams=[stream])
outlier_dmap = rolling_outlier_std(dmap, streams=[stream])
rolled_dmap * outlier_dmap

In [None]:
for i in range(20, 200, 20):
    time.sleep(0.2)
    stream.event(rolling_window=i)

In [None]:
# Exercise: Create a stream to control the sigma value and add it to the outlier operation,
#           then vary the sigma value and observe the effect

## Defining operations

Defining custom Operations is also very straightforward. For instance, let's define an ``Operation`` to compute the residual between two overlaid ``Curve`` Elements. All we need to do is subclass from the ``Operation`` baseclass and define a ``_process`` method, which takes the ``Element`` or ``Overlay`` as input and returns a new ``Element``. The residual operation can then be used to subtract the y-values of the second Curve from those of the first Curve.

In [None]:
from holoviews.operation import Operation

class residual(Operation):
    """
    Subtracts two curves from one another.
    """
    
    label = param.String(default='Residual', doc="""
        Defines the label of the returned Element.""")
    
    def _process(self, element, key=None):
        # Get first and second Element in overlay
        el1, el2 = element.get(0), element.get(1)
        
        # Get x-values and y-values of curves
        xvals  = el1.dimension_values(0)
        yvals1 = el1.dimension_values(1)
        yvals2 = el2.dimension_values(1)
        
        # Return new Element with subtracted y-values
        # and new label
        return el1.clone((xvals, yvals1-yvals2),
                         vdims=[self.p.label])

To see what that looks like in action let's try it out by comparing the smoothed and original Curve.

In [None]:
residual_dmap = residual(rolled_dmap * dmap)
residual_dmap

Since the stream we created is linked to one of the inputs of ``residual_dmap``, changing the stream values triggers updates both in the plot above and in our new residual plot.

In [None]:
for i in range(20, 200, 20):
    time.sleep(0.2)
    stream.event(rolling_window=i)

## Chaining operations

Of course, since operations simply transform an Element in some way, operations can easily be chained. As a simple example, we will take the ``rolled_dmap`` and apply the ``datashading`` and ``dynspread`` operation to it to construct a datashaded version of the plot. As you'll be able to see, this concise specification defines a complex analysis pipeline that gets reapplied whenever you change the Symbol or interact with the plot -- whenever the data needs to be updated.

In [None]:
rolled = dynspread(datashade(rolled_dmap))
overlay = rolled.opts(width=600, height=400, framewise=True) * outlier_dmap
(overlay + residual_dmap).cols(1)

## Visualizing the pipeline

To understand what is going on we will write a small utility that traverses the output we just displayed above and visualizes each processing step leading up to it.

In [None]:
def traverse(obj, key, items=None):
    items = [] if items is None else items
    for inp in obj.callback.inputs[:1]:
        label = inp.callback.operation.name if isinstance(inp.callback, hv.core.OperationCallable) else 'price'
        if inp.last: items.append(inp[key].relabel(label))
        if isinstance(inp, hv.DynamicMap): traverse(inp, key, items)
    return list(hv.core.util.unique_iterator(items))[:-1]

layout = hv.Layout(traverse(overlay, 'AAPL'))
layout.opts(
    opts.Curve(width=250, height=200),
    opts.RGB(width=250, height=200)).cols(4)

Reading from right to left, the original price timeseries is first smoothed with a rolling window, then datashaded, then each pixel is spread to cover a larger area. As you can see, arbitrarily many standard or custom operations can be defined to capture even very complex workflows so that they can be replayed dynamically as needed interactively.

# Onwards

Next we will look at how we can handle [large datasets](./10_Working_with_Large_Datasets.ipynb).