{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
},
"id": "BeXzH8r34sbM"
},
"source": [
">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n",
">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Profiles)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Profiles) to leverage the power of whylogs and WhyLabs together!*"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
},
"id": "lOMfJZre4sbQ"
},
"source": [
"# Writing profiles - Local/S3\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_Profiles.ipynb)\n",
"\n",
"Hello there! If you've come to this tutorial, perhaps you are wondering what can you do after you generated your first (or maybe not the first) profile. Well, a good practice is to store these profiles as *lightweight* files, which is one of the cool features `whylogs` brings to the table.\n",
"\n",
"Here we will check different flavors of writing, and you can check which one of these will meet your current needs. Shall we?"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
},
"id": "EHAazI3n4sbR"
},
"source": [
"## Installing whylogs\n",
"\n",
"Let's first install whylogs, if you don't have it installed already:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"id": "J5VspSS14sbR",
"outputId": "3e544285-0d2b-4572-8e8d-f92015f5547a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: whylogs in /Users/murilomendonca/Documents/repos/whylogs/python/.venv/lib/python3.9/site-packages (1.1.7)\r\n",
"Requirement already satisfied: protobuf>=3.19.4 in /Users/murilomendonca/Documents/repos/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (4.21.6)\r\n",
"Requirement already satisfied: typing-extensions>=3.10 in /Users/murilomendonca/Documents/repos/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (4.3.0)\r\n",
"Requirement already satisfied: whylogs-sketching>=3.4.1.dev3 in /Users/murilomendonca/Documents/repos/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (3.4.1.dev3)\r\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"# Note: you may need to restart the kernel to use updated packages.\n",
"import shutil\n",
"%pip install whylogs"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
},
"id": "Hmn-M9nU4sbT"
},
"source": [
"## Creating simple profiles\n",
"\n",
"In order for us to get started, let's take a very simple example dataset and profile it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"id": "scsGpGsS4sbU"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"data = {\n",
" \"col_1\": [1.0, 2.2, 0.1, 1.2],\n",
" \"col_2\": [\"some\", \"text\", \"column\", \"example\"],\n",
" \"col_3\": [4, 2, 3, 5]\n",
"}\n",
"\n",
"df = pd.DataFrame(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
},
"id": "LAINFxFe4sbU",
"outputId": "37399d7e-70d8-4849-b687-5bac3dea4dd3"
},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | col_1 | \n", "col_2 | \n", "col_3 | \n", "
---|---|---|---|
0 | \n", "1.0 | \n", "some | \n", "4 | \n", "
1 | \n", "2.2 | \n", "text | \n", "2 | \n", "
2 | \n", "0.1 | \n", "column | \n", "3 | \n", "
3 | \n", "1.2 | \n", "example | \n", "5 | \n", "
\n", " | cardinality/est | \n", "cardinality/lower_1 | \n", "cardinality/upper_1 | \n", "counts/n | \n", "counts/null | \n", "distribution/max | \n", "distribution/mean | \n", "distribution/median | \n", "distribution/min | \n", "distribution/n | \n", "... | \n", "distribution/stddev | \n", "type | \n", "types/boolean | \n", "types/fractional | \n", "types/integral | \n", "types/object | \n", "types/string | \n", "frequent_items/frequent_strings | \n", "ints/max | \n", "ints/min | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
col_1 | \n", "4.0 | \n", "4.0 | \n", "4.0002 | \n", "4 | \n", "0 | \n", "2.2 | \n", "1.125 | \n", "1.2 | \n", "0.1 | \n", "4 | \n", "... | \n", "0.861684 | \n", "SummaryType.COLUMN | \n", "0 | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
col_2 | \n", "4.0 | \n", "4.0 | \n", "4.0002 | \n", "4 | \n", "0 | \n", "NaN | \n", "0.000 | \n", "NaN | \n", "NaN | \n", "0 | \n", "... | \n", "0.000000 | \n", "SummaryType.COLUMN | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "4 | \n", "[FrequentItem(value='some', est=1, upper=1, lo... | \n", "NaN | \n", "NaN | \n", "
col_3 | \n", "4.0 | \n", "4.0 | \n", "4.0002 | \n", "4 | \n", "0 | \n", "5.0 | \n", "3.500 | \n", "4.0 | \n", "2.0 | \n", "4 | \n", "... | \n", "1.290994 | \n", "SummaryType.COLUMN | \n", "0 | \n", "0 | \n", "4 | \n", "0 | \n", "0 | \n", "[FrequentItem(value='2', est=1, upper=1, lower... | \n", "5.0 | \n", "2.0 | \n", "
3 rows × 28 columns
\n", "