{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "372eb989-c3fb-45dc-877a-a304d989eb10",
   "metadata": {},
   "source": [
    "# Report interpretation\n",
    "In this tutorial we cover each section of the default popmon report step-by-step.\n",
    "This tutorial uses access of the datastore discussed in the advanced tutorial, but it's okay to ignore that since our aim is to understand the plots and interpret the results.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a050efd-6dcd-48d3-b0ab-c632e1af0f0a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "import popmon\n",
    "from popmon import resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0375a382-eb68-49dd-aaa4-151012f0a8e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\n",
    "    resources.data(\"flight_delays.csv.gz\"), index_col=0, parse_dates=[\"DATE\"]\n",
    ")\n",
    "report = df.pm_stability_report(time_axis=\"DATE\", time_width=\"1w\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ae84d7e-3614-47da-91a9-ade872e99fdc",
   "metadata": {},
   "source": [
    "The datastore holds all values that are computed for the report. The following keys are available:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e3a21c6-15d9-4a93-8774-fda59535aca8",
   "metadata": {},
   "outputs": [],
   "source": [
    "list(report.datastore.keys())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ba65cc0-4a0c-43ea-908c-eea0e500c77a",
   "metadata": {},
   "source": [
    "The plots that are generated are stored in \"report_sections\". Each section has three keys:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "baa3ba92-ef67-4ab4-932c-90728a2c0752",
   "metadata": {},
   "outputs": [],
   "source": [
    "list(report.datastore[\"report_sections\"][0].keys())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d0a97dd-a2e0-4683-9b78-083975ac3c11",
   "metadata": {},
   "source": [
    "For each of the sections we will inspect the diagrams:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18192a69-cd4c-4690-a38e-bcda73e3b083",
   "metadata": {},
   "outputs": [],
   "source": [
    "[section[\"section_title\"] for section in report.datastore[\"report_sections\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "820f8077-fca8-498d-995c-b2ca7f3777bb",
   "metadata": {},
   "source": [
    "The diagrams are currently stored as Base64 encoded images. Traffic lights and alerts are tables. We will use the following helper functions to display them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a1e7150-d2f3-4d69-b8f1-1a03f5d455aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import HTML\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "def show_image(plot):\n",
    "    display(HTML(f'<img src=\"data:image/jpeg;base64, {plot[\"plot\"]}\" />'))\n",
    "    text = f'<strong>{plot[\"name\"]}</strong>'\n",
    "    if plot[\"description\"]:\n",
    "        text += f': {plot[\"description\"]}'\n",
    "    display(HTML(text))\n",
    "\n",
    "\n",
    "def show_table(plot):\n",
    "    style = \"\"\"table.overview{\n",
    "        margin: 25px;\n",
    "    }\n",
    "    table.overview tbody td.metric{\n",
    "        white-space: nowrap;\n",
    "    }\n",
    "    table.overview tbody td.cell{\n",
    "       border: 1px solid #333333;\n",
    "       text-align: center;\n",
    "    }\n",
    "    table.overview td.cell-green{\n",
    "        background: green;\n",
    "    }\n",
    "    table.overview td.cell-yellow{\n",
    "        background: yellow;\n",
    "    }\n",
    "    table.overview td.cell-red{\n",
    "        background: red;\n",
    "    }\n",
    "    table.overview tfoot td {\n",
    "        padding-top: 5px;\n",
    "        text-align: center;\n",
    "    }\n",
    "    table.overview tfoot td span{\n",
    "        -ms-writing-mode: tb-rl;\n",
    "        -webkit-writing-mode: vertical-rl;\n",
    "        writing-mode: vertical-rl;\n",
    "        transform: rotate(180deg);\n",
    "        white-space: nowrap;\n",
    "\n",
    "        font-size: 14px;\n",
    "        font-weight: 300;\n",
    "    }\n",
    "    \"\"\"\n",
    "    display(HTML(f\"<style>{style}</style>\"))\n",
    "    display(HTML(plot[\"plot\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1aee210f-e291-4682-9ca4-0c5b8c41b9c9",
   "metadata": {},
   "source": [
    "## Histograms"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69cd7f7d-dc46-415d-a5a7-fba906ddee53",
   "metadata": {},
   "source": [
    "### Categorical feature"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d85a60c8-1166-4317-89d0-84912ae7092b",
   "metadata": {},
   "source": [
    "In the figure below you see three histograms overlayed. The `histogram_ref` (purple) is the reference histogram. Recall this can be complete dataset, or a reference (training)set if you are monitoring the data coming into a model. The `histogram` (orange) histogram is then the current batch of data (e.g. a week). The `histogram_prev1` is them the previous batch (e.g. last week).\n",
    "\n",
    "On the x-axis, the value of the feature is displayed, in this case the airline. The y-axis contains the bin probability (the normalized counts).\n",
    "\n",
    "This example shows that all three histograms lie closely together, with the largest difference in the last bin (WN)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0cc4526b-179a-4947-8e04-c0b613e0e827",
   "metadata": {},
   "outputs": [],
   "source": [
    "# First section, First Feature, First plot\n",
    "show_image(report.datastore[\"report_sections\"][1][\"features\"][0][\"plots\"][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce42ad0a-b171-4573-8ad8-1ba4f61bfb65",
   "metadata": {},
   "source": [
    "### Continuous feature"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "448be619-96e5-46f3-a737-c0f9086a5fa7",
   "metadata": {},
   "source": [
    "The interpretation for continuous features is much different, apart from the x-axis being binned continuous values.\n",
    "\n",
    "Here we observe relatively larges differences around many values of AIR TIME.\n",
    "At ~130 (the previous histogram is much higher than the current) and at ~50 (the current is much higher than the previous). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dfcafce5-38a9-48e5-bd65-9023442db8a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "show_image(report.datastore[\"report_sections\"][1][\"features\"][1][\"plots\"][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9859dc07",
   "metadata": {},
   "source": [
    "## Heatmaps"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a88b4c0",
   "metadata": {},
   "source": [
    "### Categorical feature"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4c66d21",
   "metadata": {},
   "source": [
    "In the figure below you see heatmap for the AIRLINE feature. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d4b5a6c",
   "metadata": {},
   "source": [
    "On the x-axis, the time bins are displayed. The y-axis displays the values of feature displayed, in this case AIRLINE."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95afb533",
   "metadata": {},
   "source": [
    "The heatmap plot is particularly useful to observe change in a feature over a large window time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86fe5b39",
   "metadata": {},
   "outputs": [],
   "source": [
    "show_image(report.datastore[\"report_sections\"][1][\"features\"][0][\"plots\"][2])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17ce65a3-bf76-4e02-bfb1-2ff98a7db4fc",
   "metadata": {},
   "source": [
    "## Traffic Lights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "778ab162-253b-49ba-89ec-f7c1aee54121",
   "metadata": {},
   "source": [
    "The traffic light overview shows the highest traffic light per batch. That is: Red > Yellow > Green.\n",
    "\n",
    "There are two kinds of patterns that are important to observe in the traffic lights overview table in general:\n",
    "\n",
    "1. The column is mostly red/yellow. This indicates that most thresholds are crossed. This is probably the batch you would like to have a look at.\n",
    "2. The row is mostly red/yellow. This indicates that the comparison is overly sensitive. It can be ignored, removed, or the (dynamic) bounds could be increased.\n",
    "\n",
    "The traffic light overview is particularly useful as way to prioritize which of the diagrams to look at first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b351158-a3bb-43d5-b125-6a7319351d5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "show_table(report.datastore[\"report_sections\"][2][\"features\"][0][\"plots\"][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "926ea074-afb2-42fe-aa01-32288da17b49",
   "metadata": {},
   "source": [
    "## Alerts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "403ade43-160a-40a8-9245-bd318ab85ae7",
   "metadata": {},
   "source": [
    "The alerts overview table provides insight in how many traffic light bounds are crossed for each batch for each feature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "047a5c1a-d85a-4aed-940d-ff74db05528e",
   "metadata": {},
   "outputs": [],
   "source": [
    "show_table(report.datastore[\"report_sections\"][3][\"features\"][0][\"plots\"][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b4f67a8-87b9-4079-ba6f-a9fe45fee852",
   "metadata": {},
   "source": [
    "## Comparisons"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a3b1bc8-b79b-4640-be6e-033339cc4f2a",
   "metadata": {},
   "source": [
    "The comparisons diagrams compare statistics with a reference. In this case, the reference is the preceding time slot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "062d977c-0739-4bb2-abfe-af20ab8366a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "show_image(report.datastore[\"report_sections\"][4][\"features\"][0][\"plots\"][\"ref\"][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d906f5d3-18f4-46a9-bbd2-45ebd6de3e4f",
   "metadata": {},
   "source": [
    "## Profiles"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa3280b5-1e4c-4eb6-b35f-154d86ef0617",
   "metadata": {},
   "source": [
    "The diagrams in the profiles section track a certain statistic. As we can see from the name and description of the diagram, we are looking at the number of entries. The traffic light bounds are included in the plot. The last bin contains far less results. This is probably because the data stops at the end of 2015, while the bin spans over two years."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69242502-a7a8-4372-80a8-684ff6276f22",
   "metadata": {},
   "outputs": [],
   "source": [
    "show_image(report.datastore[\"report_sections\"][5][\"features\"][0][\"plots\"][0])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  },
  "toc-autonumbering": false
 },
 "nbformat": 4,
 "nbformat_minor": 5
}