<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#msticpy---Event-Timeline" data-toc-modified-id="msticpy---Event-Timeline-1">msticpy - Event Timeline</a></span></li><li><span><a href="#Discrete-Event-Timelines" data-toc-modified-id="Discrete-Event-Timelines-2">Discrete Event Timelines</a></span><ul class="toc-item"><li><span><a href="#Plotting-a-simple-timeline" data-toc-modified-id="Plotting-a-simple-timeline-2.1">Plotting a simple timeline</a></span></li><li><span><a href="#More-Advanced-Timelines" data-toc-modified-id="More-Advanced-Timelines-2.2">More Advanced Timelines</a></span><ul class="toc-item"><li><span><a href="#Grouping-Series-From-a-Single-DataFrame" data-toc-modified-id="Grouping-Series-From-a-Single-DataFrame-2.2.1">Grouping Series From a Single DataFrame</a></span></li></ul></li><li><span><a href="#Displaying-a-reference-line" data-toc-modified-id="Displaying-a-reference-line-2.3">Displaying a reference line</a></span></li><li><span><a href="#Plotting-series-from-different-data-sets" data-toc-modified-id="Plotting-series-from-different-data-sets-2.4">Plotting series from different data sets</a></span></li></ul></li><li><span><a href="#Plotting-Series-with-Scalar-Values" data-toc-modified-id="Plotting-Series-with-Scalar-Values-3">Plotting Series with Scalar Values</a></span><ul class="toc-item"><li><span><a href="#Documentation-for-display_timeline_values" data-toc-modified-id="Documentation-for-display_timeline_values-3.1">Documentation for display_timeline_values</a></span></li></ul></li><li><span><a href="#Exporting-Plots-as-PNGs" data-toc-modified-id="Exporting-Plots-as-PNGs-4">Exporting Plots as PNGs</a></span></li></ul></div>

# msticpy - Event Timeline

This notebook demonstrates the use of the timeline displays built using the [Bokeh library](https://bokeh.pydata.org).

There are two display types:
- Discrete event series - this plots multiple series of events as discrete glyphs
- Event value series - this plots a scalar value of the events using glyphs, bars or traditional line graph (or some combination.

In [1]:
# Imports
import sys
import warnings

from msticpy.common.utility import check_py_version
MIN_REQ_PYTHON = (3,6)
check_py_version(MIN_REQ_PYTHON)

from IPython import get_ipython
from IPython.display import display, HTML, Markdown
import ipywidgets as widgets

import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', 100)

from msticpy.nbtools import *
from msticpy.sectools import *

WIDGET_DEFAULTS = {'layout': widgets.Layout(width='95%'),
                   'style': {'description_width': 'initial'}}


# Discrete Event Timelines

## Plotting a simple timeline
nbdisplay.display_timeline
```
Display a timeline of events.

Parameters
----------
data : Union[dict, pd.DataFrame]
    Either
    dict of data sets to plot on the timeline with the following structure.

    Key: str
        Name of data set to be displayed in legend
    Value: dict
        containing

        data: pd.DataFrame
            Data to plot
        time_column: str, optional
            Name of the timestamp column
            (defaults to `time_column` function parameter)
        source_columns: list[str], optional
            List of source columns to use in tooltips
            (defaults to `source_columns` function parameter)
        color: str, optional
            Color of datapoints for this data
            (defaults to autogenerating colors)

    Or
    DataFrame as a single data set or grouped into individual
    plot series using the `group_by` parameter
time_column : str, optional
    Name of the timestamp column
    (the default is 'TimeGenerated')
source_columns : list, optional
    List of default source columns to use in tooltips
    (the default is None)
```

In [2]:
processes_on_host = pd.read_csv('data/processes_on_host.csv',
                                parse_dates=["TimeGenerated"],
                                infer_datetime_format=True)

# At a minimum we need to pass a dataframe with data
nbdisplay.display_timeline(processes_on_host)

The Bokeh graph is interactive and has the following features:
- Tooltip display for each event marker as you hover over it
- Toolbar with the following tools (most are toggles enabling or disabling the tool):
  - Panning 
  - Select zoom
  - Mouse wheel zoom
  - Reset to default view
  - Save image to PNG
  - Hover tool
  
Additionally an interactive timeline navigation bar is displayed below the main graph. You can change the timespan shown on the main graph by dragging or resizing the selected area on this navigation bar.

**Note**: 
- the tooltips work on the Windows process data shown above because of a legacy fallback built into the code.
  Usually you   need to specify the `source_columns` parameter explicitly to have 
  the hover tooltips populated correctly.

## More Advanced Timelines
`display_timeline` also takes a number of optional parameters that give you more flexibility to show multiple data series and change the way the graph appears.
```
Other Parameters
----------------
title : str, optional
    Title to display (the default is None)
alert : SecurityAlert, optional
    Add a reference line/label using the alert time (the default is None)
ref_event : Any, optional
    Add a reference line/label using the alert time (the default is None)
ref_time : datetime, optional
    Add a reference line/label using `ref_time` (the default is None)
group_by : str
    (where `data` is a DataFrame)
    The column to group timelines on
sort_by : str
    (where `data` is a DataFrame)
    The column to order timelines on
legend: str, optional
    left, right or inline
    (the default is None/no legend)
yaxis : bool, optional
    Whether to show the yaxis and labels (default is False)
range_tool : bool, optional
    Show the the range slider tool (default is True)
height : int, optional
    The height of the plot figure
    (the default is auto-calculated height)
width : int, optional
    The width of the plot figure (the default is 900)
color : str
    Default series color (default is "navy")
```


### Grouping Series From a Single DataFrame


In [3]:
nbdisplay.display_timeline(processes_on_host,
                           group_by="Account",
                           source_columns=["NewProcessName",
                                           "ParentProcessName"],
                           legend="left");

We can use the group_by parameter to specify a column on which to split individually plotted series.

Specifying a legend, we can see the value of each series group. The legend is interactive - click on a series name to
hide/show the data. The legend can be placed inside of the chart (`legend="inline"`) or to the left or right.

Alternatively we can enable the yaxis - although this is not guaranteed to show all values of the groups.

**Note**: 
- the tooltips work on the Windows process data shown above because of a legacy fallback built into the code. Usually you need to specify the `source_columns` parameter explicitly to have the hover tooltips populated correctly.
- the trailing semicolon just stops Jupyter showing the return value from the function. It isn't mandatory

In [4]:
nbdisplay.display_timeline(processes_on_host,
                           group_by="Account",
                           source_columns=["NewProcessName",
                                           "ParentProcessName"],
                           legend="none",
                           yaxis=True, ygrid=True);

In [5]:
host_logons = pd.read_csv('data/host_logons.csv',
                          parse_dates=["TimeGenerated"], 
                          infer_datetime_format=True)

nbdisplay.display_timeline(host_logons,
                           title="Logons by Account name",
                           group_by="Account",
                           source_columns=["Account", "TargetLogonId", "LogonType"],
                           legend="left",
                           height=200);

nbdisplay.display_timeline(host_logons,
                           title="Logons by logon type",
                           group_by="LogonType",
                           source_columns=["Account", "TargetLogonId", "LogonType"],
                           legend="left",
                           height=200,
                           range_tool=False,
                           ygrid=True);

## Displaying a reference line
If you have a single item (e.g. an alert) that you want to show as a reference point on the graph you can pass a datetime value, or any object that has a TimeGenerated or StartTimeUtc property. 

If the object doesn't have one of these, just pass the property as the ref_time parameter.

In [6]:
fake_alert = processes_on_host.sample().iloc[0]

nbdisplay.display_timeline(host_logons,
                           title="Processes with marker",
                           group_by="LogonType",
                           source_columns=["Account", "TargetLogonId", "LogonType"],
                           alert=fake_alert,
                           legend="left");

## Plotting series from different data sets
When you want to plot data sets with different schema on the same plot it is difficult to put them in a single DataFrame.
To do this we need to assemble the different data sets into a dictionary and pass that to the `display_timeline`

The dictionary has this format:

Key: str
    Name of data set to be displayed in legend
    
Value: dict, the value holds the settings for each data series:

    data: pd.DataFrame
        Data to plot
    time_column: str, optional
        Name of the timestamp column
        (defaults to `time_column` function parameter)
    source_columns: list[str], optional
        List of source columns to use in tooltips
        (defaults to `source_columns` function parameter)
    color: str, optional
        Color of datapoints for this data
        (defaults to autogenerating colors)


In [7]:
procs_and_logons = {
    "Processes" : {"data": processes_on_host, "source_columns": ["NewProcessName", "Account"]},
    "Logons": {"data": host_logons, "source_columns": ["Account", "TargetLogonId", "LogonType"]}
}

nbdisplay.display_timeline(data=procs_and_logons,
                           title="Logons and Processes",
                           legend="left", yaxis=False);

# Plotting Series with Scalar Values
Often you may want to see a scalar value plotted with the series. 

The example below uses `display_timeline_values` to plot network flow data using the total flows recorded between a pair of IP addresses.

Note that the majority of parameters are the same as `display_timeline` but include a mandatory `y` parameter which indicates which value you want to plot on the y (vertical) axis.

In [8]:
az_net_flows_df = pd.read_csv('data/az_net_flows.csv',
                          parse_dates=["TimeGenerated", "FlowStartTime", "FlowEndTime"], 
                          infer_datetime_format=True)

flow_plot = nbdisplay.display_timeline_values(data=az_net_flows_df,
                                  group_by="L7Protocol",
                                  source_columns=["FlowType", 
                                                  "AllExtIPs", 
                                                  "L7Protocol", 
                                                  "FlowDirection", 
                                                  "TotalAllowedFlows"],
                                  time_column="FlowStartTime",
                                  y="TotalAllowedFlows",
                                  legend="right",
                                  height=500);

By default the plot uses vertical bars show the values but you can use any combination of vbar, circle and line, using the `kind` parameter. You specify the plot types as a list of strings (all lowercase).

**Notes**
- including "circle" in the plot kinds makes it easier to see the hover value
- the line plot can be a bit misleading since it will plot lines between adjacent data points of the same series implying that there is a gradual change in the value being plotted - even though there may be no data between the times of these adjacent points. For this reason using vbar is often a more accurate view.

In [9]:
flow_plot = nbdisplay.display_timeline_values(data=az_net_flows_df,
                                              group_by="L7Protocol",
                                              source_columns=["FlowType", 
                                                              "AllExtIPs", 
                                                              "L7Protocol", 
                                                              "FlowDirection", 
                                                              "TotalAllowedFlows"],
                                              time_column="FlowStartTime",
                                              y="TotalAllowedFlows",
                                              legend="right",
                                              height=500,
                                              kind=["vbar", "circle"]
                                            );

In [10]:
nbdisplay.display_timeline_values(data=az_net_flows_df[az_net_flows_df["L7Protocol"] == "http"],
                                  group_by="L7Protocol",
                                  title="Line plot can be misleading",                    
                                  source_columns=["FlowType", 
                                                  "AllExtIPs", 
                                                  "L7Protocol", 
                                                  "FlowDirection", 
                                                  "TotalAllowedFlows"],
                                  time_column="FlowStartTime",
                                  y="TotalAllowedFlows",
                                  legend="right",
                                  height=300,
                                  kind=["line", "circle"],
                                  range_tool=False
                                );
nbdisplay.display_timeline_values(data=az_net_flows_df[az_net_flows_df["L7Protocol"] == "http"],
                                  group_by="L7Protocol",
                                  title="Vbar and circle show zero gaps in data",                    
                                  source_columns=["FlowType", 
                                                  "AllExtIPs", 
                                                  "L7Protocol", 
                                                  "FlowDirection", 
                                                  "TotalAllowedFlows"],
                                  time_column="FlowStartTime",
                                  y="TotalAllowedFlows",
                                  legend="right",
                                  height=300,
                                  kind=["vbar", "circle"],
                                  range_tool=False
                                );

## Documentation for display_timeline_values
```
nbdisplay.display_timeline_values(
    data: pandas.core.frame.DataFrame,
    y: str,
    time_column: str = 'TimeGenerated',
    source_columns: list = None,
    **kwargs,
) -> figure

Display a timeline of events.

Parameters
----------
data : pd.DataFrame
    DataFrame as a single data set or grouped into individual
    plot series using the `group_by` parameter
time_column : str, optional
    Name of the timestamp column
    (the default is 'TimeGenerated')
y : str
    The column name holding the value to plot vertically
source_columns : list, optional
    List of default source columns to use in tooltips
    (the default is None)

Other Parameters
----------------
x : str, optional
    alias of `time_column`
title : str, optional
    Title to display (the default is None)
ref_event : Any, optional
    Add a reference line/label using the alert time (the default is None)
ref_time : datetime, optional
    Add a reference line/label using `ref_time` (the default is None)
group_by : str
    (where `data` is a DataFrame)
    The column to group timelines on
sort_by : str
    (where `data` is a DataFrame)
    The column to order timelines on
legend: str, optional
    left, right or inline
    (the default is None/no legend)
yaxis : bool, optional
    Whether to show the yaxis and labels
range_tool : bool, optional
    Show the the range slider tool (default is True)
height : int, optional
    The height of the plot figure
    (the default is auto-calculated height)
width : int, optional
    The width of the plot figure (the default is 900)
color : str
    Default series color (default is "navy"). This is overridden by
    automatic color assignments if plotting a grouped chart
kind : Union[str, List[str]]
    one or more glyph types to plot., optional
    Supported types are "circle", "line" and "vbar" (default is "vbar")

Returns
-------
figure
    The bokeh plot figure.
```

# Exporting Plots as PNGs
To use bokeh.io image export functions you need selenium, phantomjs and pillow installed:

`conda install -c bokeh selenium phantomjs pillow`

or

`pip install selenium pillow`
`npm install -g phantomjs-prebuilt`

For phantomjs see https://phantomjs.org/download.html.

Once the prerequisites are installed you can create a plot and save the return value to a variable. 
Then export the plot using `export_png` function.

```python
from bokeh.io import export_png
from IPython.display import Image

# Create a plot
flow_plot = nbdisplay.display_timeline_values(data=az_net_flows_df,
                                              group_by="L7Protocol",
                                              source_columns=["FlowType", 
                                                              "AllExtIPs", 
                                                              "L7Protocol", 
                                                              "FlowDirection", 
                                                              "TotalAllowedFlows"],
                                              time_column="FlowStartTime",
                                              y="TotalAllowedFlows",
                                              legend="right", 
                                              height=500,
                                              kind=["vbar", "circle"]
                                            );

# Export 
file_name = "plot.png"
export_png(flow_plot, filename=file_name)

# Read it and show it
display(Markdown(f"## Here is our saved plot: {file_name}"))
Image(filename=file_name)
```