{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n", ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Getting_Started)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Getting_Started) to leverage the power of whylogs and WhyLabs together!*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Started" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/basic/Getting_Started_with_UDFs.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Content" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we'll explore the basics of logging data with whylogs and a user defined function or UDF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing whylogs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install whylogs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading a Pandas DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "data = {\n", " \"animal\": [\"cat\", \"hawk\", \"clam\", \"cat\", \"mongoose\", \"octopus\"],\n", " \"legs\": [4, 2, 1, 4, 4, 8],\n", " \"weight\": [4.3, 1.8, 1.3, 4.1, 5.4, 3.2],\n", "}\n", "\n", "df = pd.DataFrame(data)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Defining a simple metric UDF\n", "Here we use a metric UDF targeting a named column `animal` as an example to show how we can add features to a dataframe for custom monitoring. In this example we model some custom logic for if the animal has a cool name. This is a toy example that just checks if the name is longer than 4 characters, and does a binary classification, but you could return a score based on values in a column too." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import whylogs as why\n", "from whylogs.experimental.core.udf_schema import udf_schema\n", "from whylogs.experimental.core.metrics.udf_metric import register_metric_udf\n", "\n", "\n", "@register_metric_udf(col_name=\"animal\")\n", "def has_cool_animal_name(text):\n", " if len(text) > 4: # long names are cool\n", " return 1\n", " else:\n", " return 0\n", " \n", "custom_schema = udf_schema()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Profiling with whylogs + UDFs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To obtain a profile of your data, you can simply use whylogs' `log` call with your UDF schema defined earlier. This will attach a feature named `animal.has_cool_animal_name` which you can then see in WhyLabs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import whylogs as why\n", "\n", "why.init()\n", "\n", "results = why.log(df, name=\"udf_demo\", schema=custom_schema)\n", "results.view().to_pandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going Further with UDFs\n", "Unlike metric UDFs, **dataset UDFs** can take multiple columns as input. Dataset UDFs create a new column in your pandas dataframe, which then is profiled along with your inputs." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from whylogs.experimental.core.udf_schema import register_dataset_udf\n", "import pandas as pd\n", "\n", "@register_dataset_udf([\"legs\", \"weight\"])\n", "def weight_per_leg(data: pd.DataFrame) -> pd.Series:\n", " return data[\"weight\"] / data[\"legs\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "custom_schema2 = udf_schema()\n", "results = why.log(df, schema=custom_schema2)\n", "results.view().to_pandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more details on the different kinds of UDFs (say you wanted to calculate a score based on multiple columns) see this example:\n", "* https://github.com/whylabs/whylogs/blob/mainline/python/examples/experimental/whylogs_UDF_examples.ipynb" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "5dd5901cadfd4b29c2aaf95ecd29c0c3b10829ad94dcfe59437dbee391154aea" } } }, "nbformat": 4, "nbformat_minor": 2 }