{ "cells": [ { "cell_type": "markdown", "id": "587f4596", "metadata": {}, "source": [ "# Plan house improvements using causal analysis" ] }, { "cell_type": "markdown", "id": "df6550b0", "metadata": {}, "source": [ "This notebook demonstrates the use of the Responsible AI Toolbox to make renovation decisions from historic apartments pricing data. It walks through the API calls necessary to create a widget with causal inferencing insights, then guides a visual analysis of the data." ] }, { "cell_type": "markdown", "id": "b20f25c3", "metadata": {}, "source": [ "## Launch Responsible AI Toolbox" ] }, { "cell_type": "markdown", "id": "a122e129", "metadata": {}, "source": [ "The following section examines the code necessary to create the dataset. It then generates insights using the `responsibleai` API that can be visually analyzed." ] }, { "cell_type": "code", "execution_count": null, "id": "ebae7586", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "import zipfile" ] }, { "cell_type": "markdown", "id": "927997ce", "metadata": {}, "source": [ "First, load the apartment dataset and specify the different types of features. Then, clean it and put it into a DataFrame with named columns. After loading and cleaning the data, split the datapoints into training and test sets. Assemble separate datasets for the full sample and the test data." ] }, { "cell_type": "code", "execution_count": null, "id": "99f4447e", "metadata": {}, "outputs": [], "source": [ "from raiutils.dataset import fetch_dataset\n", "from sklearn.pipeline import Pipeline\n", "from sklearn.impute import SimpleImputer\n", "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n", "from sklearn.compose import ColumnTransformer\n", "\n", "def split_label(dataset, target_feature):\n", " X = dataset.drop([target_feature], axis=1)\n", " y = dataset[[target_feature]]\n", " return X, y\n", "\n", "def clean_data(X, y, target_feature):\n", " features = X.columns.values.tolist()\n", " classes = y[target_feature].unique().tolist()\n", " pipe_cfg = {\n", " 'num_cols': X.dtypes[X.dtypes == 'int64'].index.values.tolist(),\n", " 'cat_cols': X.dtypes[X.dtypes == 'object'].index.values.tolist(),\n", " }\n", " num_pipe = Pipeline([\n", " ('num_imputer', SimpleImputer(strategy='median')),\n", " ('num_scaler', StandardScaler())\n", " ])\n", " cat_pipe = Pipeline([\n", " ('cat_imputer', SimpleImputer(strategy='constant', fill_value='?')),\n", " ('cat_encoder', OneHotEncoder(handle_unknown='ignore', sparse=False))\n", " ])\n", " feat_pipe = ColumnTransformer([\n", " ('num_pipe', num_pipe, pipe_cfg['num_cols']),\n", " ('cat_pipe', cat_pipe, pipe_cfg['cat_cols'])\n", " ])\n", " X = feat_pipe.fit_transform(X)\n", " print(pipe_cfg['cat_cols'])\n", " return X, feat_pipe, features, classes\n", "\n", "target_feature = 'SalePriceK'\n", "categorical_features = []\n", "\n", "outdirname = 'responsibleai.12.28.21'\n", "zipfilename = outdirname + '.zip'\n", "\n", "fetch_dataset('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)\n", "\n", "with zipfile.ZipFile(zipfilename, 'r') as unzip:\n", " unzip.extractall('.')\n", "\n", "all_data = pd.read_csv('apartments-train.csv')\n", "all_data = all_data.drop(['Sold_HigherThan_Median','SalePrice'], axis=1)\n", "X, y = split_label(all_data, target_feature)\n", "X_train_original, X_test_original, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=7)\n", "\n", "X_train, feat_pipe, features, classes = clean_data(X_train_original, y_train, target_feature)\n", "y_train = y_train[target_feature].to_numpy()\n", "\n", "X_test = feat_pipe.transform(X_test_original)\n", "y_test = y_test[target_feature].to_numpy()\n", "\n", "train_data = X_train_original.copy()\n", "train_data[target_feature] = y_train\n", "\n", "test_data = X_test_original.copy()\n", "test_data[target_feature] = y_test" ] }, { "cell_type": "markdown", "id": "69e3860a", "metadata": {}, "source": [ "### Creat Data Insights" ] }, { "cell_type": "code", "execution_count": null, "id": "516b2fa9", "metadata": {}, "outputs": [], "source": [ "from raiwidgets import ResponsibleAIDashboard\n", "from responsibleai import RAIInsights" ] }, { "cell_type": "markdown", "id": "7c8b0634", "metadata": {}, "source": [ "To use Responsible AI Dashboard, initialize a RAIInsights object upon which different components can be loaded.\n", "\n", "RAIInsights accepts the model, the full dataset, the test dataset, the target feature string, the task type string, and a list of strings of categorical feature names as its arguments.\n", "\n", "Here you can pass None as the model object as we only require historic data to make causal decisions." ] }, { "cell_type": "code", "execution_count": null, "id": "c941835b", "metadata": {}, "outputs": [], "source": [ "rai_insights = RAIInsights(None, train_data, test_data, target_feature, 'regression',\n", " categorical_features=categorical_features, \n", " classes=['Less than median', 'More than median'])" ] }, { "cell_type": "markdown", "id": "ab1068ee", "metadata": {}, "source": [ "Add the components of the toolbox that are focused on decision-making." ] }, { "cell_type": "code", "execution_count": null, "id": "b8587d53", "metadata": {}, "outputs": [], "source": [ "# Queue Responsible AI insights with causal insights\n", "rai_insights.causal.add(treatment_features=['OverallCond', 'OverallQual', 'Fireplaces', 'GarageCars', 'ScreenPorch'])" ] }, { "cell_type": "markdown", "id": "ec82a856", "metadata": {}, "source": [ "Once all the desired components have been loaded, compute insights on the test set." ] }, { "cell_type": "code", "execution_count": null, "id": "127ec383", "metadata": {}, "outputs": [], "source": [ "# Compute insights\n", "rai_insights.compute()" ] }, { "cell_type": "markdown", "id": "83b5112c", "metadata": {}, "source": [ "Finally, visualize and explore the model insights. Use the resulting widget or follow the link to view this in a new tab." ] }, { "cell_type": "code", "execution_count": null, "id": "a73ad853", "metadata": {}, "outputs": [], "source": [ "ResponsibleAIDashboard(rai_insights)" ] }, { "cell_type": "markdown", "id": "3a6d610b", "metadata": {}, "source": [ "See this [developer blog](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/responsible-ai-dashboard-a-one-stop-shop-for-operationalizing/ba-p/3030944) (Decision Making Flow section) to learn more about this use case and how to use the dashboard to debug your housing price prediction model." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 5 }