{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Writing profiles\n", "\n", "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/1.0.x/python/examples/integrations/Writing_Profiles.ipynb)\n", "\n", "Hello there! If you've come to this tutorial, perhaps you are wondering what can you do after you generated your first (or maybe not the first) profile. Well, a good practice is to store these profiles as *lightweight* files, which is one of the cool features `whylogs` brings to the table.\n", "\n", "Here we will check different flavors of writing, and you can check which one of these will meet your current needs. Shall we? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating simple profiles\n", "\n", "In order for us to get started, let's take a very simple example dataset and profile it." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "data = {\n", " \"col_1\": [1.0, 2.2, 0.1, 1.2],\n", " \"col_2\": [\"some\", \"text\", \"column\", \"example\"],\n", " \"col_3\": [4, 2, 3, 5]\n", "}\n", "\n", "df = pd.DataFrame(data)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | col_1 | \n", "col_2 | \n", "col_3 | \n", "
---|---|---|---|
0 | \n", "1.0 | \n", "some | \n", "4 | \n", "
1 | \n", "2.2 | \n", "text | \n", "2 | \n", "
2 | \n", "0.1 | \n", "column | \n", "3 | \n", "
3 | \n", "1.2 | \n", "example | \n", "5 | \n", "
\n", " | counts/n | \n", "counts/null | \n", "types/integral | \n", "types/fractional | \n", "types/boolean | \n", "types/string | \n", "types/object | \n", "cardinality/est | \n", "cardinality/upper_1 | \n", "cardinality/lower_1 | \n", "... | \n", "distribution/n | \n", "distribution/max | \n", "distribution/min | \n", "distribution/q_10 | \n", "distribution/q_25 | \n", "distribution/median | \n", "distribution/q_75 | \n", "distribution/q_90 | \n", "ints/max | \n", "ints/min | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
col_2 | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "4 | \n", "0 | \n", "4.0 | \n", "4.0002 | \n", "4.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
col_1 | \n", "4 | \n", "0 | \n", "0 | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "4.0 | \n", "4.0002 | \n", "4.0 | \n", "... | \n", "4.0 | \n", "2.2 | \n", "0.1 | \n", "0.1 | \n", "1.0 | \n", "1.2 | \n", "2.2 | \n", "2.2 | \n", "NaN | \n", "NaN | \n", "
col_3 | \n", "4 | \n", "0 | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "4.0 | \n", "4.0002 | \n", "4.0 | \n", "... | \n", "4.0 | \n", "5.0 | \n", "2.0 | \n", "2.0 | \n", "3.0 | \n", "4.0 | \n", "5.0 | \n", "5.0 | \n", "5.0 | \n", "2.0 | \n", "
3 rows × 24 columns
\n", "