{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# `popmon` introductory notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook contains examples of how to generate `popmon` reports from a pandas DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# (optional) Adjust the jupyter notebook style for easier navigation of the reports\n", "from IPython.core.display import HTML, display\n", "\n", "# Wider notebook\n", "display(HTML(\"\"))\n", "# Cells are higher by default\n", "display(HTML(\"\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup `popmon` and load our dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install popmon (if not installed yet) in the current environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "\n", "!\"{sys.executable}\" -m pip install -q popmon" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import pandas and popmon, load and example dataset provided by popmon and show the first few results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "import popmon\n", "from popmon import resources\n", "from popmon.config import Settings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(resources.data(\"test.csv.gz\"), parse_dates=[\"date\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reporting given a pandas.DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "report = df.pm_stability_report(\n", " # Use the 'date' column as our time axis\n", " time_axis=\"date\",\n", " # Create batches for every two weeks of data\n", " time_width=\"2w\",\n", " # Select a subset of features\n", " features=[\"date:age\", \"date:isActive\", \"date:eyeColor\"],\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "report" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regenerate the report\n", "You can change the report parameters without having to rerun the computational part of the pipeline using the `regenerate` method. For example: a short (limited) report will be generated since `extended_report` flag is set to `False`. If a user wants to configure which statistics she/he wants to see, `show_stats` argument has to be set accordingly.\n", "\n", "Another option is to change the `plot_hist_n` parameter to control the number of histograms being displayed per feature." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "report_settings = Settings()\n", "report_settings.report.extended_report = False\n", "report_settings.report.section.histograms.plot_hist_n = 6\n", "\n", "report.regenerate(settings=report_settings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reporting given a histograms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the user would like to generate the report directly from histograms, then popmon also supports that.\n", "First, we generate histograms, (but we could load pre-generated histograms from a pickle or json file as well)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "hists = df.pm_make_histograms(\n", " time_axis=\"date\",\n", " time_width=\"2w\",\n", " features=[\"date:age\", \"date:gender\", \"date:isActive\"],\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list(hists.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then generate the report based on histograms:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "report = popmon.stability_report(hists)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "report" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 4 }