{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n", ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Feature_Weights_to_WhyLabs)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Feature_Weights_to_WhyLabs) to leverage the power of whylogs and WhyLabs together!*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Feature Importance with WhyLabs" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_Feature_Weights_to_WhyLabs.ipynb)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Feature Importance consists of a class of techniques that indicates the relative importance of each feature for a given prediction model. This is done by calculating and assigning a score, or weight, to each feature. Tracking the feature weights of a model can be useful for different reasons, such as:\n", "\n", "- Data Insights: By determining which features are more or less important to your task, a domain expert can make informed decisions, such as gathering more data or adding/removing different features.\n", "- Model Interpretability: Feature weights can provide insights into which features your model considers more or less relevant, improving the model's interpretability.\n", "- Model Upgrading: Feature weights can be the basis of further feature selection processes, removing unnecessary features, improving the maintainability and performance of your model.\n", "\n", "WhyLabs supports feature weight tracking and visualization at your monitoring dashboard. The most straightforward way to send and receive feature weights to WhyLabs is through whylogs, and that's exactly what we'll show in this example.\n", "\n", "In this example, we will:\n", "\n", "- Calculate simple feature weights with Linear Regression\n", "- Send feature weights to your dashboard at WhyLabs\n", "- Get latest version of feature weights from your dashboard" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Installing dependencies" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "First, let's make sure you have the required packages installed:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: whylogs in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (1.3.1)\n", "Requirement already satisfied: platformdirs<4.0.0,>=3.5.0 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (3.10.0)\n", "Requirement already satisfied: protobuf>=3.19.4 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (4.24.2)\n", "Requirement already satisfied: requests<3.0,>=2.27 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (2.31.0)\n", "Requirement already satisfied: types-requests<3.0.0.0,>=2.30.0.0 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (2.31.0.2)\n", "Requirement already satisfied: typing-extensions>=3.10 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (4.7.1)\n", "Requirement already satisfied: whylabs-client<0.6.0,>=0.5.5 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (0.5.5)\n", "Requirement already satisfied: whylogs-sketching>=3.4.1.dev3 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylogs) (3.4.1.dev3)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from requests<3.0,>=2.27->whylogs) (3.2.0)\n", "Requirement already satisfied: idna<4,>=2.5 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from requests<3.0,>=2.27->whylogs) (3.4)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from requests<3.0,>=2.27->whylogs) (1.26.16)\n", "Requirement already satisfied: certifi>=2017.4.17 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from requests<3.0,>=2.27->whylogs) (2023.7.22)\n", "Requirement already satisfied: types-urllib3 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from types-requests<3.0.0.0,>=2.30.0.0->whylogs) (1.26.25.14)\n", "Requirement already satisfied: python-dateutil in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from whylabs-client<0.6.0,>=0.5.5->whylogs) (2.8.2)\n", "Requirement already satisfied: six>=1.5 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from python-dateutil->whylabs-client<0.6.0,>=0.5.5->whylogs) (1.16.0)\n", "\u001b[33mDEPRECATION: feast 0.22.4 has a non-standard dependency specifier googleapis-common-protos<2,>=1.52.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mDEPRECATION: feast 0.22.4 has a non-standard dependency specifier PyYAML<7,>=5.4.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mDEPRECATION: feast 0.22.4 has a non-standard dependency specifier dask<2022.02.0,>=2021.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n", "Collecting scikit-learn==1.0.2\n", " Downloading scikit_learn-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.4 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m26.4/26.4 MB\u001b[0m \u001b[31m3.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n", "\u001b[?25hRequirement already satisfied: numpy>=1.14.6 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from scikit-learn==1.0.2) (1.25.2)\n", "Requirement already satisfied: scipy>=1.1.0 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from scikit-learn==1.0.2) (1.9.3)\n", "Requirement already satisfied: joblib>=0.11 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from scikit-learn==1.0.2) (1.3.2)\n", "Requirement already satisfied: threadpoolctl>=2.0.0 in /home/anthony/workspace/whylogs/python/.venv/lib/python3.9/site-packages (from scikit-learn==1.0.2) (3.2.0)\n", "\u001b[33mDEPRECATION: feast 0.22.4 has a non-standard dependency specifier googleapis-common-protos<2,>=1.52.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mDEPRECATION: feast 0.22.4 has a non-standard dependency specifier PyYAML<7,>=5.4.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mDEPRECATION: feast 0.22.4 has a non-standard dependency specifier dask<2022.02.0,>=2021.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of feast or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", "\u001b[0mInstalling collected packages: scikit-learn\n", " Attempting uninstall: scikit-learn\n", " Found existing installation: scikit-learn 1.3.0\n", " Uninstalling scikit-learn-1.3.0:\n", " Successfully uninstalled scikit-learn-1.3.0\n", "Successfully installed scikit-learn-1.0.2\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "# Note: you may need to restart the kernel to use updated packages.\n", "%pip install whylogs\n", "%pip install scikit-learn==1.0.2" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating Feature Weights\n", "\n", "There are several different ways of calculating feature importance. One such way is calculating the model coefficients of a linear regression model, which can be interpreted as a feature importance score. That's the method we'll use for this demonstration. We'll create 5 informative features and 5 random ones.\n", "\n", "The code below was based on the article [How to Calculate Feature Importance With Python](https://machinelearningmastery.com/calculate-feature-importance-with-python/), by Jason Brownlee. I definitely recommend it if you are interested in other ways of calculating feature importance." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Feature_0': 3.4778530780712388e-15,\n", " 'Feature_1': 12.444827855389764,\n", " 'Feature_2': -3.108624468950438e-14,\n", " 'Feature_3': -1.9095836023552692e-14,\n", " 'Feature_4': 93.32225450776932,\n", " 'Feature_5': 86.50810998606799,\n", " 'Feature_6': 26.74606669803453,\n", " 'Feature_7': 3.285346398262155,\n", " 'Feature_8': -2.531308496145357e-14,\n", " 'Feature_9': 1.9539925233402755e-14}" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.datasets import make_regression\n", "from sklearn.linear_model import LinearRegression\n", "# define dataset\n", "X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1)\n", "# define the model\n", "model = LinearRegression()\n", "# fit the model\n", "model.fit(X, y)\n", "# get importance\n", "importance = model.coef_\n", "# summarize feature importance\n", "\n", "weights = {\"Feature_{}\".format(key): value for (key, value) in enumerate(importance)}\n", "\n", "weights" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We end up with a dictionary with the features as keys and the respective scores as values. This is an example of *global* feature importance, as opposed to *local* feature importance, which would show the contribution of features for a specific prediction. Currently, WhyLabs and whylogs support only global feature importance. Therefore, this is the structure we'll use to later send the Feature Weights to WhyLabs." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## βœ”οΈ Setting the Environment Variables\n", "\n", "In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.\n", "\n", "We will need three pieces of information:\n", "\n", "- API token\n", "- Organization ID\n", "- Dataset ID (or model-id)\n", "\n", "First, grab a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Feature_Weights_to_WhyLabs) if you haven't already. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.\n", "\n", "After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Initializing session with config /home/anthony/.config/whylogs/config.ini\n", "\n", "βœ… Using session type: WHYLABS\n", " β€· org id: org-JpsdM6\n", " β€· api key: l70gARzBVZ\n", " β€· default dataset: model-66\n", "\n", "In production, you should pass the api key as an environment variable WHYLABS_API_KEY, the org id as WHYLABS_DEFAULT_ORG_ID, and the default dataset id as WHYLABS_DEFAULT_DATASET_ID.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import whylogs as why\n", "\n", "why.init()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Writing Feature Weights to WhyLabs" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Once the feature weights are calculated, sending them to WhyLabs is very simple.\n", "\n", "We first need to wrap the dictionary into a `FeatureWeights` object:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from whylogs.core.feature_weights import FeatureWeights\n", "feature_weights = FeatureWeights(weights)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "And then use the `WhyLabsWriter` to write it, provided your environment variables are properly set: " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True, '200')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from whylogs.api.writer.whylabs import WhyLabsWriter\n", "result = feature_weights.writer(\"whylabs\").write()\n", "\n", "result" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Another way of doing the exact same thing is by instantiating the Writer itself, and then calling `write`:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True, '200')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "WhyLabsWriter().write(feature_weights)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Feature Weights from WhyLabs\n", "\n", "You can also get the feature weights from WhyLabs with `get_feature_weights()`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Feature_4': 93.32225450776932, 'Feature_5': 86.50810998606799, 'Feature_6': 26.74606669803453, 'Feature_1': 12.444827855389764, 'Feature_7': 3.285346398262155, 'Feature_2': -3.108624468950438e-14, 'Feature_8': -2.531308496145357e-14, 'Feature_9': 1.9539925233402755e-14, 'Feature_3': -1.9095836023552692e-14, 'Feature_0': 3.4778530780712388e-15}\n", "{'author': 'system', 'updated_timestamp': 1694207704547, 'version': 2}\n" ] } ], "source": [ "result = WhyLabsWriter().get_feature_weights()\n", "\n", "print(result.weights)\n", "print(result.metadata)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the result will contain the set of weights in `result.weights`, along with additional metadata in `result.metadata`.\n", "\n", "If you write multiple set of weights to the same model at WhyLabs, the content will be overwritten. When using `get_feature_weights()`, you'll get the latest version, that is, the last set of weights you sent. You're able to see which version it is in the metadata, along with the timestamp of creation." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Reference" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "- [Understanding Feature Importance and How to Implement it in Python](https://towardsdatascience.com/understanding-feature-importance-and-how-to-implement-it-in-python-ff0287b20285)\n", "- [How to Calculate Feature Importance With Python](https://machinelearningmastery.com/calculate-feature-importance-with-python/)\n", "- [Feature importance β€” what’s in a name?](https://medium.com/bigdatarepublic/feature-importance-whats-in-a-name-79532e59eea3)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.10 ('.venv': venv)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "5dd5901cadfd4b29c2aaf95ecd29c0c3b10829ad94dcfe59437dbee391154aea" } } }, "nbformat": 4, "nbformat_minor": 2 }