{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n",
">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Constraints_Suite)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Constraints_Suite) to leverage the power of whylogs and WhyLabs together!*"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Simple Constraints - Examples and Usage"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/basic/Constraints_Suite.ipynb)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"> This is a `whylogs v1` example. For the analog feature in `v0`, please refer to [this example](https://github.com/whylabs/whylogs/blob/maintenance/0.7.x/examples/Constraints_Suite.ipynb)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, we'll show how to define a number of simple constraints and examples on how to use them. For the basics on how to build your own set of constraints, see the example - [Data Validation with Metric Constraints](https://whylogs.readthedocs.io/en/stable/examples/advanced/Metric_Constraints.html).\n",
"\n",
"The constraints are listed according to the metric namespace used when defining them. For each category, we will create helper functions for simple and popular constraints. Each helper function has a brief explanation in its docstring. After defining the helper functions, we'll show a simple example on how to build the constraints out of the functions and visualize them as a report with the visualization module."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"> Note: The constraints shown here are still experimental and subject to further changes. Stay tuned for upgrades!"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Completeness Constraints\n",
"\n",
"| constraint | parameters | semantic | metric |\n",
"|------------------------------|---------------------|-------------------------------------------------------|--------|\n",
"| no_missing_values | column name | Checks that are no missing values in the column | Counts |\n",
"| null_values_below_number | column name, number | Number of null values must be below given number. | Counts |\n",
"| null_percentage_below_number | column name, number | Percentage of null values must be below given number. | Counts |"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Consistency Constraints\n",
"\n",
"| constraint | parameters | semantic | metric |\n",
"|-----------------------------------|----------------------------|----------------------------------------------------------------------------------------|----------------|\n",
"| greater_than_number | column name | Minimum value of given column must be above defined number. | Distribution |\n",
"| smaller_than_number | column name, number | Maximum value of given column must be below defined number. | Distribution |\n",
"| is_in_range | column name, lower, upper | Checks that all of column's values are in defined range (inclusive). | Distribution |\n",
"| is_non_negative | column name | Checks if a column is non negative. | Distribution |\n",
"| n_most_common_items_in_set | column name, reference set | Checks if the top n most common items appear in the dataset. | Frequent Items |\n",
"| frequent_strings_in_reference_set | column name, reference set | Checks if a set of variables appear in the frequent strings for a string column. | Frequent Items |\n",
"| count_below_number | column name, number | Checks if elements in a column are below given number. | Counts |\n",
"| distinct_number_in_range | column name, lower, upper | Checks if number of distinct categories is between lower and upper values (inclusive). | Cardinality |\n",
"| column_is_nullable_integral | column name | Check if column contains only records of specific datatype. | Types |\n",
"| column_is_nullable_boolean | column name | Check if column contains only records of specific datatype. | Types |\n",
"| column_is_nullable_fractional | column name | Check if column contains only records of specific datatype. | Types |\n",
"| column_is_nullable_object | column name | Check if column contains only records of specific datatype. | Types |\n",
"| column_is_nullable_string | column name | Check if column contains only records of specific datatype. | Types |"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Condition Count Constraints\n",
"\n",
"Please refer to the example [Metric Constraints with Condition Count Metrics](https://github.com/whylabs/whylogs/blob/mainline/python/examples/advanced/Metric_Constraints_with_Condition_Count_Metrics.ipynb) for examples on how to use these constraints.\n",
"\n",
"| constraint | parameters | semantic | metric |\n",
"|------------------------|-------------------------------------|--------------------------------------------------------------------------------------|--------------|\n",
"| condition_meets | column name, condition_name | Fails if condition not met at least once. | Condition Count |\n",
"| condition_never_meets | column name, condition_name | Fails if condition is met at least once | Condition Count |\n",
"| condition_count_below | column name, condition_name, max_count | Fails if condition is met more than max count | Condition Count |\n",
"\n",
"## Statistics Constraints\n",
"\n",
"| constraint | parameters | semantic | metric |\n",
"|------------------------|-------------------------------------|--------------------------------------------------------------------------------------|--------------|\n",
"| mean_between_range | column name, lower, upper | Mean must be between range defined by lower and upper bounds. | Distribution |\n",
"| stddev_between_range | column name, lower, upper | Standard deviarion must be between range defined by lower and upper bounds. | Distribution |\n",
"| quantile_between_range | column name, quantile, lower, upper | Q-th quantile value must be withing the range defined by lower and upper boundaries. | Distribution |"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents\n",
"\n",
"- [Installing and Importing Modules](#pre)\n",
"- [Distribution Metrics Constraints](#distribution)\n",
"- [Frequent Items/Frequent Strings Metrics Constraints](#frequent)\n",
"- [Counters Constraints](#counts)\n",
"- [Cardinality Constraints](#card)\n",
"- [Types Constraints](#types)\n",
"- [Combined Metrics Constraints](#comb)\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installing whylogs and importing modules \n",
"\n",
"If you haven't already, install whylogs:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Note: you may need to restart the kernel to use updated packages.\n",
"%pip install 'whylogs[viz]'"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, let's import the helper functions needed to define the constraints:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from whylogs.core.constraints import ConstraintsBuilder\n",
"from whylogs.core.constraints.factories import (\n",
" greater_than_number,\n",
" is_in_range,\n",
" is_non_negative,\n",
" mean_between_range,\n",
" smaller_than_number,\n",
" stddev_between_range,\n",
" quantile_between_range\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Examples - Distribution Metrics Constraints"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import whylogs as why\n",
"import pandas as pd\n",
"data = {\n",
" \"animal\": [\"cat\", \"hawk\", \"snake\", \"cat\", \"mosquito\"],\n",
" \"legs\": [4, 2, 0, 4, 6],\n",
" \"weight\": [4.3, 1.8, 1.3, 4.1, 5.5e-6],\n",
"}\n",
"\n",
"results = why.log(pd.DataFrame(data))\n",
"profile_view = results.view()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"