{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "587f4596",
   "metadata": {},
   "source": [
    "# Plan house improvements using causal analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df6550b0",
   "metadata": {},
   "source": [
    "This notebook demonstrates the use of the Responsible AI Toolbox to make renovation decisions from historic apartments pricing data. It walks through the API calls necessary to create a widget with causal inferencing insights, then guides a visual analysis of the data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b20f25c3",
   "metadata": {},
   "source": [
    "## Launch Responsible AI Toolbox"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a122e129",
   "metadata": {},
   "source": [
    "The following section examines the code necessary to create the dataset. It then generates insights using the `responsibleai` API that can be visually analyzed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ebae7586",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from sklearn.model_selection import train_test_split\n",
    "import zipfile"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "927997ce",
   "metadata": {},
   "source": [
    "First, load the apartment dataset and specify the different types of features. Then, clean it and put it into a DataFrame with named columns. After loading and cleaning the data, split the datapoints into training and test sets. Assemble separate datasets for the full sample and the test data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99f4447e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from raiutils.dataset import fetch_dataset\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.impute import SimpleImputer\n",
    "from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
    "from sklearn.compose import ColumnTransformer\n",
    "\n",
    "def split_label(dataset, target_feature):\n",
    "    X = dataset.drop([target_feature], axis=1)\n",
    "    y = dataset[[target_feature]]\n",
    "    return X, y\n",
    "\n",
    "def clean_data(X, y, target_feature):\n",
    "    features = X.columns.values.tolist()\n",
    "    classes = y[target_feature].unique().tolist()\n",
    "    pipe_cfg = {\n",
    "        'num_cols': X.dtypes[X.dtypes == 'int64'].index.values.tolist(),\n",
    "        'cat_cols': X.dtypes[X.dtypes == 'object'].index.values.tolist(),\n",
    "    }\n",
    "    num_pipe = Pipeline([\n",
    "        ('num_imputer', SimpleImputer(strategy='median')),\n",
    "        ('num_scaler', StandardScaler())\n",
    "    ])\n",
    "    cat_pipe = Pipeline([\n",
    "        ('cat_imputer', SimpleImputer(strategy='constant', fill_value='?')),\n",
    "        ('cat_encoder', OneHotEncoder(handle_unknown='ignore', sparse=False))\n",
    "    ])\n",
    "    feat_pipe = ColumnTransformer([\n",
    "        ('num_pipe', num_pipe, pipe_cfg['num_cols']),\n",
    "        ('cat_pipe', cat_pipe, pipe_cfg['cat_cols'])\n",
    "    ])\n",
    "    X = feat_pipe.fit_transform(X)\n",
    "    print(pipe_cfg['cat_cols'])\n",
    "    return X, feat_pipe, features, classes\n",
    "\n",
    "target_feature = 'SalePriceK'\n",
    "categorical_features = []\n",
    "\n",
    "outdirname = 'responsibleai.12.28.21'\n",
    "zipfilename = outdirname + '.zip'\n",
    "\n",
    "fetch_dataset('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)\n",
    "\n",
    "with zipfile.ZipFile(zipfilename, 'r') as unzip:\n",
    "    unzip.extractall('.')\n",
    "\n",
    "all_data = pd.read_csv('apartments-train.csv')\n",
    "all_data = all_data.drop(['Sold_HigherThan_Median','SalePrice'], axis=1)\n",
    "X, y = split_label(all_data, target_feature)\n",
    "X_train_original, X_test_original, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=7)\n",
    "\n",
    "X_train, feat_pipe, features, classes = clean_data(X_train_original, y_train, target_feature)\n",
    "y_train = y_train[target_feature].to_numpy()\n",
    "\n",
    "X_test = feat_pipe.transform(X_test_original)\n",
    "y_test = y_test[target_feature].to_numpy()\n",
    "\n",
    "train_data = X_train_original.copy()\n",
    "train_data[target_feature] = y_train\n",
    "\n",
    "test_data = X_test_original.copy()\n",
    "test_data[target_feature] = y_test"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69e3860a",
   "metadata": {},
   "source": [
    "### Creat Data Insights"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "516b2fa9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from raiwidgets import ResponsibleAIDashboard\n",
    "from responsibleai import RAIInsights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c8b0634",
   "metadata": {},
   "source": [
    "To use Responsible AI Dashboard, initialize a RAIInsights object upon which different components can be loaded.\n",
    "\n",
    "RAIInsights accepts the model, the full dataset, the test dataset, the target feature string, the task type string, and a list of strings of categorical feature names as its arguments.\n",
    "\n",
    "Here you can pass None as the model object as we only require historic data to make causal decisions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c941835b",
   "metadata": {},
   "outputs": [],
   "source": [
    "rai_insights = RAIInsights(None, train_data, test_data, target_feature, 'regression',\n",
    "                             categorical_features=categorical_features, \n",
    "                             classes=['Less than median', 'More than median'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab1068ee",
   "metadata": {},
   "source": [
    "Add the components of the toolbox that are focused on decision-making."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8587d53",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Queue Responsible AI insights with causal insights\n",
    "rai_insights.causal.add(treatment_features=['OverallCond', 'OverallQual', 'Fireplaces', 'GarageCars', 'ScreenPorch'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec82a856",
   "metadata": {},
   "source": [
    "Once all the desired components have been loaded, compute insights on the test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "127ec383",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute insights\n",
    "rai_insights.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83b5112c",
   "metadata": {},
   "source": [
    "Finally, visualize and explore the model insights. Use the resulting widget or follow the link to view this in a new tab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a73ad853",
   "metadata": {},
   "outputs": [],
   "source": [
    "ResponsibleAIDashboard(rai_insights)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a6d610b",
   "metadata": {},
   "source": [
    "See this [developer blog](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/responsible-ai-dashboard-a-one-stop-shop-for-operationalizing/ba-p/3030944) (Decision Making Flow section) to learn more about this use case and how to use the dashboard to debug your housing price prediction model."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}