{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Started" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/1.0.x/python/examples/basic/Getting_Started.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "whylogs provides a standard to log any kind of data.\n", "\n", "With whylogs, we will show how to log data, generating statistical summaries called *profiles*. These profiles can be used in a number of ways, like:\n", "\n", "* Data Visualization\n", "* Data Validation\n", "* Tracking changes in your datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Content" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we'll explore the basics of logging data with whylogs:\n", "- Installing whylogs\n", "- Profiling data\n", "- Interacting with the profile\n", "- Writing/Reading profiles to/from disk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing whylogs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "whylogs is made available as a Python package. You can get the latest version from PyPI with `pip install whylogs`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "!pip install -q whylogs --pre" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading a Pandas DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before showing how we can log data, we first need the data itself. Let's create a simple Pandas DataFrame:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "data = {\n", " \"animal\": [\"cat\", \"hawk\", \"snake\", \"cat\"],\n", " \"legs\": [4, 2, 0, 4],\n", " \"weight\": [4.3, 1.8, 1.3, 4.1],\n", "}\n", "\n", "df = pd.DataFrame(data)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Profiling with whylogs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To obtain a profile of your data, you can simply use whylogs' `log` call, and navigate through the result to a specific profile with `get_profile`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import whylogs as why\n", "\n", "results = why.log(df)\n", "profile = results.profile()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyzing Profiles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you're done logging the data, you can generate a `Profile View` and inspect it in a Pandas Dataframe format:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | counts/n | \n", "counts/null | \n", "types/integral | \n", "types/fractional | \n", "types/boolean | \n", "types/string | \n", "types/object | \n", "cardinality/est | \n", "cardinality/upper_1 | \n", "cardinality/lower_1 | \n", "... | \n", "distribution/n | \n", "distribution/max | \n", "distribution/min | \n", "distribution/q_10 | \n", "distribution/q_25 | \n", "distribution/median | \n", "distribution/q_75 | \n", "distribution/q_90 | \n", "ints/max | \n", "ints/min | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
animal | \n", "8 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "8 | \n", "0 | \n", "6.0 | \n", "6.00030 | \n", "6.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
weight | \n", "8 | \n", "0 | \n", "0 | \n", "8 | \n", "0 | \n", "0 | \n", "0 | \n", "7.0 | \n", "7.00035 | \n", "7.0 | \n", "... | \n", "8.0 | \n", "30.1 | \n", "1.3 | \n", "1.3 | \n", "4.1 | \n", "4.3 | \n", "14.3 | \n", "30.1 | \n", "NaN | \n", "NaN | \n", "
legs | \n", "8 | \n", "0 | \n", "8 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3.0 | \n", "3.00015 | \n", "3.0 | \n", "... | \n", "8.0 | \n", "4.0 | \n", "0.0 | \n", "0.0 | \n", "2.0 | \n", "4.0 | \n", "4.0 | \n", "4.0 | \n", "4.0 | \n", "0.0 | \n", "
3 rows × 24 columns
\n", "