{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Converting Profiles from whylogs v0 to v1" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "If, for some reason, you have profiles generated from whylogs v0 (Python or Java) and wish to work with them in whylogs v1, we provide converters to help you do so.\n", "\n", "Once you convert the profiles to v1, you can use them just as you would any other v1 whylogs profile.\n", "\n", "This short example is divided into two parts:\n", "\n", "- Download a sample v0 profile and write it to disk\n", "- Read the v0 profile and convert it to a v1 Profile View\n", "\n", "Let's get to it!" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Installing whylogs and importing modules" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note: you may need to restart the kernel to use updated packages.\n", "%pip install whylogs" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Downloading v0 profile\n", "\n", "First, we need a sample v0 profile to demonstrate how to convert it.\n", "\n", "To do so, we'll download a v0 profile from S3 and write it to disk." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#write a file to disk from an url\n", "from urllib.request import urlopen\n", "url = \"https://whylabs-public.s3.us-west-2.amazonaws.com/whylogs_examples/dataset_profile_v0.bin\"\n", "profile_name = \"dataset_profile_v0.bin\"\n", "\n", "# Download from URL\n", "with urlopen(url) as file:\n", " content = file.read()\n", "\n", "# Save to file\n", "with open(profile_name, 'wb') as download:\n", " download.write(content)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Convert serialized v0 profile to v1 profile view\n", "\n", "The converter will enable you to read the v0 profile and convert it to a v1 profile view.\n", "\n", "Considering it's a Profile View, you'll be able to use it for tasks such as visualization, analysis and merging. However, you won't be able to use it to continue logging new data.\n", "\n", "To do so, we'll use the `read_v0_to_view` utility from `whylogs.migration.converters`.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For this example to work you need to have a recent version of whylogs (tested with 1.1.22), you are currently running: whylogs==1.1.19\n", "Reading v0 file from disk: dataset_profile_v0.bin\n", "Converted: dataset_profile_v0.bin to a v1 DatasetProfileView\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cardinality/estcardinality/lower_1cardinality/upper_1counts/infcounts/ncounts/nancounts/nulldistribution/maxdistribution/meandistribution/median...distribution/stddevfrequent_items/frequent_stringsints/maxints/mintypetypes/booleantypes/fractionaltypes/integraltypes/objecttypes/string
column
animal3.03.03.000150400NaN0.0NaN...0.000000[FrequentItem(value='cat', est=2, upper=2, low...00SummaryType.COLUMN00004
legs3.03.03.0001504004.02.54.0...1.914854[FrequentItem(value='4', est=2, upper=2, lower...00SummaryType.COLUMN00400
weight3.03.03.0001504004.33.44.1...1.389244[FrequentItem(value='1.8', est=1, upper=1, low...00SummaryType.COLUMN03000
\n", "

3 rows × 30 columns

\n", "
" ], "text/plain": [ " cardinality/est cardinality/lower_1 cardinality/upper_1 counts/inf \\\n", "column \n", "animal 3.0 3.0 3.00015 0 \n", "legs 3.0 3.0 3.00015 0 \n", "weight 3.0 3.0 3.00015 0 \n", "\n", " counts/n counts/nan counts/null distribution/max \\\n", "column \n", "animal 4 0 0 NaN \n", "legs 4 0 0 4.0 \n", "weight 4 0 0 4.3 \n", "\n", " distribution/mean distribution/median ... distribution/stddev \\\n", "column ... \n", "animal 0.0 NaN ... 0.000000 \n", "legs 2.5 4.0 ... 1.914854 \n", "weight 3.4 4.1 ... 1.389244 \n", "\n", " frequent_items/frequent_strings ints/max ints/min \\\n", "column \n", "animal [FrequentItem(value='cat', est=2, upper=2, low... 0 0 \n", "legs [FrequentItem(value='4', est=2, upper=2, lower... 0 0 \n", "weight [FrequentItem(value='1.8', est=1, upper=1, low... 0 0 \n", "\n", " type types/boolean types/fractional types/integral \\\n", "column \n", "animal SummaryType.COLUMN 0 0 0 \n", "legs SummaryType.COLUMN 0 0 4 \n", "weight SummaryType.COLUMN 0 3 0 \n", "\n", " types/object types/string \n", "column \n", "animal 0 4 \n", "legs 0 0 \n", "weight 0 0 \n", "\n", "[3 rows x 30 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from whylogs.migration.converters import (\n", " read_v0_to_view\n", ")\n", "from whylogs.core import DatasetProfileView\n", "\n", "import whylogs as why\n", "\n", "print(f\"For this example to work you need to have a recent version of whylogs (tested with 1.1.22), you are currently running: whylogs=={why.__version__}\")\n", "\n", "profile_file_path_v0 = \"dataset_profile_v0.bin\"\n", "\n", "print(f\"Reading v0 file from disk: {profile_file_path_v0}\")\n", "view: DatasetProfileView = read_v0_to_view(profile_file_path_v0)\n", "print(f\"Converted: {profile_file_path_v0} to a v1 DatasetProfileView\")\n", "view.to_pandas()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "And there you have it!\n", "\n", "You can now use the profile for tasks such as:\n", "\n", "- [Visualization](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/basic/Notebook_Profile_Visualizer.ipynb)\n", "- [Data Validation](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/basic/Constraints_Suite.ipynb)\n", "- [Merging](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/basic/Merging_Profiles.ipynb)\n", "- [Writing to WhyLabs](https://nbviewer.org/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_to_WhyLabs.ipynb)\n", "- etc.\n", "\n", "Head to our [examples page](https://github.com/whylabs/whylogs/tree/mainline/python/examples) to see more!" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "5dd5901cadfd4b29c2aaf95ecd29c0c3b10829ad94dcfe59437dbee391154aea" } } }, "nbformat": 4, "nbformat_minor": 2 }