# _TriScale_ - Seasonal Components


> This notebook is intended for **live tutorial** sessions about _TriScale._  
Here is the [self-study version](tutorial_seasonal-comp.ipynb).

To get started, we need to import a few Python modules. All the _TriScale_-specific functions are part of one module called `triscale`.

In [None]:
import os
from pathlib import Path
import datetime

import pandas as pd
import numpy as np

import triscale

## Analysis of Glossy data (low-power wireless)

We consider performance data from [Glossy](https://ieeexplore.ieee.org/document/5779066) collected on [the FlockLab testbed](http://flocklab.ethz.ch/) as experiment environment. 

[Glossy](https://ieeexplore.ieee.org/document/5779066) is a low-power wireless protocol based on synchronous transmissions and a flooding strategy. One important tuning parameter of Glossy is the number of times $N$ that each node transmit each packet.

The literature reports that a larger $N$ yields better reliability; that is, a larger packet reception ratio (PRR). We performed a short experimental study to validate this  observation. 
More specifically, we test two values:
- $N=1$
- $N=2$

> For a more extensive description of the data collection and analysis, you can check the [complete case study notebook](casestudy_glossy.ipynb) or the [_TriScale_ paper](https://doi.org/10.5281/zenodo.3464273) itself.

In the nutshell, the dataset contains:
- one metric: the median **packet reception ratio** between all nodes, or **PRR** for short
- measured **24 times** per day, scheduled randomly
- collected during **three weeks**
- using both $N=1$ and $N=2$

In [None]:
# Load the PRR results from the test
glossy = pd.read_csv('ExampleData/metrics_glossy.csv', index_col=0, parse_dates=True)

# Display a random sample
glossy.sample(5)

> **Note.** To avoid bias during the analysis, the dataset has been "anonymized;" that is, we randomly replaced the value of $N$ with a letter ($A$ or $B$).

The FlockLab testbed is located in an office building, where we expect more wireless interference during the day than during the night. Thus, for a fair comparison, the  time span of a series of runs should be at least one day (24 hours).

Let us select two days and compare the PRR of $A$ and $B$ on those days. The KPI definition is given below.  

In [None]:
# Days considered for the data analysis
day_A = '2019-08-24'
day_B = '2019-08-26'

# Fitering the dataset for the data or interest
data_A = glossy.loc[day_A].PRR_A.dropna().values
data_B = glossy.loc[day_B].PRR_B.dropna().values

# KPI definition
KPI = {'name': 'PRR',
       'unit': '\%',
       'percentile': 50,
       'confidence': 95,
       'class': 'one-sided',
       'bounds': [0,100],
       'bound': 'lower'}

Use the `triscale.analysis_kpi()` function to compute the KPI value for each group. 

- Which group seems to perform best?
- What confidence to you have in this result?

In [None]:
test_A, KPI_A = triscale.analysis_kpi(data_A, KPI)
test_B, KPI_B = triscale.analysis_kpi(data_B, KPI)

print('KPI group A: {}'.format(KPI_A))
print('KPI group B: {}'.format(KPI_B))

$A$ seems to perform better than $B$. 

But, even if the KPI has been defined with a high level of confidence, it **does not** mean that the experimental conditions during the two days were actually comparable... 

As a matter of fact, $A$ corresponds to $N=1$ which is highly unlikely to perform better than $N=2$...

## What about seasonality? 

In the previous analysis, we (randomly?) picked some days for each group. But what
do we know about the possible correlation between those two days? 
- Maybe we got unlucky on the day $B$ was tested?
- Or maybe we omitted some hidden factor?

To investigate that, we can look at the [wireless link quality data for FlockLab](https://doi.org/10.5281/zenodo.3354717), which is collected by the FlockLab maintainers and made publicly available. They ran the link quality tests every two hours, resulting in 12 measurement points per day.

In this tutorial, we look at the data from August 2019, which has a large overlap with
our data collection period. Let's load this dataset and have a look...

In [None]:
link_quality = pd.read_csv('ExampleData/flocklab_link_quality.csv', index_col=0, parse_dates=True)
link_quality.head()

The dataset is simple: every two hours, we have one value representing the "average 
link quality" on the testbed (the computation that led to this average is irrelevant here).

_TriScale_'s `network_profiling()` generates an autocorellation plot based on such data, as illustrated below.

In [None]:
link_quality_bounds = [0,100]
link_quality_name = 'PRR [%]'
fig_theil, fig_autocorr = triscale.network_profiling(
    link_quality, 
    link_quality_bounds, 
    link_quality_name,
)
fig_autocorr.show()

One can clearly see from the autocorrelation plot that the average link quality on FlockLab has strong seasonal components. The **first pic at lag 12 (i.e., 24h)** reveals the daily seasonal component. 

But there is also **a second main peak at lag 84**; which corresponds to one week.
Indeed, there is less interference in the weekends than on weekdays, which creates a weekly seasonal component.

Due to this weekly component, it becomes problematic (aka, potentially wrong) to
compare results from different time periods which span less than a week.
In other word, the time span for series of runs must be at least one week long
to be fairly comparable.

Let us quickly check which days of the week we picked for our first analysis...

In [None]:
def weekday(str):
    '''Simple function printing the weekday
    from a date given as a string
    '''
    year, month, day  = (int(x) for x in str.split('-'))    
    ans = datetime.date(year, month, day)
    return ans.strftime("%A")
    
print('Data from group A was from a {}.'.format(weekday(day_A)))
print('Data from group A was from a {}.'.format(weekday(day_B)))

Bingo! $B$ was tested on a weekday, while $A$ was tested on a weekend...

> **Takeaway.** The day of the week was a "hidden" factor in our first analysis. Neglecting it led to wrong conclusions. 

## Your turn: time to practice

Let us now use the entire Glossy dataset and analyse it as one series (with a span of three weeks).

In [None]:
data_A = glossy.PRR_A.dropna().values
data_B = glossy.PRR_B.dropna().values

Use again the `triscale.analysis_kpi()` function to compute the KPI value for each group.
- Which group seems to perform best now?
- What about independence? Do you think the results are trustworthy?

In [None]:
########## YOUR CODE HERE ###########
# ...
#####################################

### Solution

<details>
  <summary><br/>Click here to show the solution.</summary>
  
```python
>>> triscale.analysis_kpi(data_A, KPI)
(False, 80.0)
>>> triscale.analysis_kpi(data_B, KPI)
(False, 88.0)
```
Now, we do obtain the expected result: $N=2$ (group $B$) performs better than $N=1$.
Note however that the independence test fails. This is due to the ordering of the tests:
We scheduled tests randomly every day individually, not over the 3 weeks time span.
Therefore, the data are affected by the (strong) weekly correlation on the environment.
    
We can observe this correlation bt plotting the data and/or it's autocorellation function:
```python
>>> plots=['series','autocorr']
>>> triscale.analysis_kpi(data_A, KPI, plots)
```
> **Note.** Contrary to the `link_quality` dataset, the Glossy data (`data_A` and `data_B`) is not ordered in time (which is intentional, since we try to discard the time factor from the analysis). Therefore, it is difficult to interpret the location of the peaks in the autocorrelation plot; mainly, we can observe that there is a strong correlation structure in the dataset. 
    
We can try to emulate the fact that we'd have properly randomized the run epochs by shuffling the data.
    
```python
>>> import random
>>> random.shuffle(data_A)
>>> to_plot=['autocorr']
>>> triscale.analysis_kpi(data_A, KPI, to_plot)
```  
    
As you can see, the correlation structure significantly flattens. In some cases, the independence test might even pass... But keep in mind that this it is only an artifact! To make a strong statement, the run epochs should have been truly randomized.
</details>

--- 
[Back to main repository](.)