{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AI Automation for AI Fairness\n",
    "\n",
    "When AI models contribute to high-impact decisions such as whether or\n",
    "not someone gets a loan, we want them to be fair.\n",
    "Unfortunately, in current practice, AI models are often optimized\n",
    "primarily for accuracy, with little consideration for fairness.  This\n",
    "blog post gives a hands-on example for how AI Automation can help build AI\n",
    "models that are both accurate and fair.\n",
    "This blog post is written for data scientists who have some familiarity\n",
    "with Python. No prior knowledge of AI Automation or AI Fairness is\n",
    "required, we will introduce the relevant concepts as we get to them.\n",
    "\n",
    "Bias in data leads to bias in models. AI models are increasingly\n",
    "consulted for consequential decisions about people, in domains\n",
    "including credit loans, hiring and retention, penal justice, medical,\n",
    "and more. Often, the model is trained from past decisions made by\n",
    "humans. If the decisions used for training were discriminatory, then\n",
    "your trained model will be too, unless you are careful. Being careful\n",
    "about bias is something you should do as a data scientist.\n",
    "Fortunately, you do not have to grapple with this issue alone.  You\n",
    "can consult others about ethics. You can also ask yourself how your AI\n",
    "model may affect your (or your institution's) reputation. And\n",
    "ultimately, you must follow applicable laws and regulations.\n",
    "\n",
    "_AI Fairness_ can be measured via several metrics, and you need to\n",
    "select the appropriate metrics based on the circumstances.  For\n",
    "illustration purposes, this blog post uses one particular fairness\n",
    "metric called _disparate impact_. Disparate impact is defined as the\n",
    "ratio of the rate of favorable outcome for the unprivileged group to\n",
    "that of the privileged group. To make this definition more concrete,\n",
    "consider the case where a favorable outcome means getting a loan, the\n",
    "unprivileged group is women, and the privileged group is men.  Then if\n",
    "your AI model were to let women get a loan in 30% of the cases and men\n",
    "in 60% of the cases, the disparate impact would be 30% / 60% = 0.5,\n",
    "indicating a gender bias towards men.  The ideal value for disparate\n",
    "impact is 1, and you could define fairness for this metric as a band\n",
    "around 1, e.g., from 0.8 to 1.2.\n",
    "\n",
    "To get the best performance out of your AI model, you must experiment\n",
    "with its configuration. This means searching a high-dimensional space\n",
    "where some options are categorical, some are continuous, and some are\n",
    "even conditional. No configuration is optimal for all domains let\n",
    "alone all metrics, and searching them all by hand is impossible. In\n",
    "fact, in a high-dimensional space, even exhaustively enumerating all\n",
    "the valid combinations soon becomes impractical.  Fortunately, you can\n",
    "use tools to automate the search, thus making you more productive at\n",
    "finding good models quickly. These productivity and quality\n",
    "improvements become compounded when you have to do the search over.\n",
    "\n",
    "_AI Automation_ is a technology that assists data scientists in\n",
    "building AI models by automating some of the tedious steps. One AI\n",
    "automation technique is _algorithm selection_ , which automatically\n",
    "chooses among alternative algorithms for a particular task. Another AI\n",
    "automation technique is _hyperparameter tuning_ , which automatically\n",
    "configures the arguments of AI algorithms. You can use AI automation\n",
    "to optimize for a variety of metrics.  This blog post shows you how to use AI\n",
    "automation to optimize for both accuracy and for fairness as measured\n",
    "by disparate impact.\n",
    "\n",
    "This blog post is generated from a [Jupyter](https://jupyter.org/)\n",
    "notebook that uses the following open-source Python libraries. \n",
    "[AIF360](https://aif360.mybluemix.net/) \n",
    "is a collection of fairness metrics and bias mitigation algorithms.\n",
    "The [pandas](https://pandas.pydata.org/),\n",
    "[scikit-learn](https://scikit-learn.org/), and\n",
    "[XGBoost](https://xgboost.ai/) libraries support\n",
    "data analysis and machine learning with data structures and a\n",
    "comprehensive collection of AI algorithms.\n",
    "The [hyperopt](http://hyperopt.github.io/hyperopt/) library\n",
    "implements both algorithm selection and hyperparameter tuning for\n",
    "AI automation.\n",
    "And [Lale](https://github.com/IBM/lale) is a library for\n",
    "semi-automated data science; this blog post uses Lale as the backbone\n",
    "for putting the other libraries together.\n",
    "\n",
    "Our starting point is a dataset and a task. For illustration\n",
    "purposes, we picked [credit-g](https://www.openml.org/d/31), also\n",
    "known as the German Credit dataset. Each row describes a person\n",
    "using several features that may help evaluate them as a potential\n",
    "loan applicant. The task is to classify people into either\n",
    "good or bad credit risks. We load a version of the dataset prepared\n",
    "for AIF360."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lale.lib.aif360 import fetch_creditg_df\n",
    "(train_X, train_y), (test_X, test_y) = fetch_creditg_df()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see what the dataset looks like, we can use off-the-shelf\n",
    "functionality from pandas for inspecting the shape and the first few\n",
    "rows.  The creditg dataset has a single label column, `class`, to be\n",
    "predicted as the outcome.\n",
    "A _protected attribute_ is a feature that partitions the population\n",
    "into groups whose outcome should have parity. The credit-g dataset has\n",
    "two protected attribute columns, `sex` and `age`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "train_X.shape (670, 58)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>class</th>\n",
       "      <th>checking_status_0&lt;=X&lt;200</th>\n",
       "      <th>checking_status_&lt;0</th>\n",
       "      <th>checking_status_&gt;=200</th>\n",
       "      <th>checking_status_no checking</th>\n",
       "      <th>credit_history_all paid</th>\n",
       "      <th>credit_history_critical/other existing credit</th>\n",
       "      <th>credit_history_delayed previously</th>\n",
       "      <th>credit_history_existing paid</th>\n",
       "      <th>credit_history_no credits/all paid</th>\n",
       "      <th>purpose_business</th>\n",
       "      <th>purpose_domestic appliance</th>\n",
       "      <th>purpose_education</th>\n",
       "      <th>purpose_furniture/equipment</th>\n",
       "      <th>purpose_new car</th>\n",
       "      <th>purpose_other</th>\n",
       "      <th>purpose_radio/tv</th>\n",
       "      <th>purpose_repairs</th>\n",
       "      <th>purpose_retraining</th>\n",
       "      <th>purpose_used car</th>\n",
       "      <th>savings_status_100&lt;=X&lt;500</th>\n",
       "      <th>savings_status_500&lt;=X&lt;1000</th>\n",
       "      <th>savings_status_&lt;100</th>\n",
       "      <th>savings_status_&gt;=1000</th>\n",
       "      <th>savings_status_no known savings</th>\n",
       "      <th>employment_1&lt;=X&lt;4</th>\n",
       "      <th>employment_4&lt;=X&lt;7</th>\n",
       "      <th>employment_&lt;1</th>\n",
       "      <th>employment_&gt;=7</th>\n",
       "      <th>employment_unemployed</th>\n",
       "      <th>other_parties_co applicant</th>\n",
       "      <th>other_parties_guarantor</th>\n",
       "      <th>other_parties_none</th>\n",
       "      <th>property_magnitude_car</th>\n",
       "      <th>property_magnitude_life insurance</th>\n",
       "      <th>property_magnitude_no known property</th>\n",
       "      <th>property_magnitude_real estate</th>\n",
       "      <th>other_payment_plans_bank</th>\n",
       "      <th>other_payment_plans_none</th>\n",
       "      <th>other_payment_plans_stores</th>\n",
       "      <th>housing_for free</th>\n",
       "      <th>housing_own</th>\n",
       "      <th>housing_rent</th>\n",
       "      <th>job_high qualif/self emp/mgmt</th>\n",
       "      <th>job_skilled</th>\n",
       "      <th>job_unemp/unskilled non res</th>\n",
       "      <th>job_unskilled resident</th>\n",
       "      <th>own_telephone_none</th>\n",
       "      <th>own_telephone_yes</th>\n",
       "      <th>foreign_worker_no</th>\n",
       "      <th>foreign_worker_yes</th>\n",
       "      <th>duration</th>\n",
       "      <th>credit_amount</th>\n",
       "      <th>installment_commitment</th>\n",
       "      <th>residence_since</th>\n",
       "      <th>age</th>\n",
       "      <th>existing_credits</th>\n",
       "      <th>num_dependents</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>863</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>27.0</td>\n",
       "      <td>4526.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>748</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>5248.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>3181.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>798</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>717.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>1262.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>440</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>1884.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>124</th>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>1924.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>650</th>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>7476.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>409</th>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>939.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>752</th>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>841.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     class  checking_status_0<=X<200  checking_status_<0  \\\n",
       "863      1                       0.0                 0.0   \n",
       "748      1                       0.0                 0.0   \n",
       "64       1                       0.0                 0.0   \n",
       "798      1                       0.0                 0.0   \n",
       "52       1                       0.0                 0.0   \n",
       "440      1                       0.0                 0.0   \n",
       "124      0                       1.0                 0.0   \n",
       "650      1                       0.0                 1.0   \n",
       "409      0                       0.0                 0.0   \n",
       "752      1                       1.0                 0.0   \n",
       "\n",
       "     checking_status_>=200  checking_status_no checking  \\\n",
       "863                    0.0                          1.0   \n",
       "748                    0.0                          1.0   \n",
       "64                     0.0                          1.0   \n",
       "798                    0.0                          1.0   \n",
       "52                     0.0                          1.0   \n",
       "440                    0.0                          1.0   \n",
       "124                    0.0                          0.0   \n",
       "650                    0.0                          0.0   \n",
       "409                    1.0                          0.0   \n",
       "752                    0.0                          0.0   \n",
       "\n",
       "     credit_history_all paid  credit_history_critical/other existing credit  \\\n",
       "863                      0.0                                            1.0   \n",
       "748                      0.0                                            0.0   \n",
       "64                       0.0                                            0.0   \n",
       "798                      0.0                                            0.0   \n",
       "52                       0.0                                            0.0   \n",
       "440                      0.0                                            0.0   \n",
       "124                      0.0                                            0.0   \n",
       "650                      0.0                                            0.0   \n",
       "409                      0.0                                            1.0   \n",
       "752                      0.0                                            0.0   \n",
       "\n",
       "     credit_history_delayed previously  credit_history_existing paid  \\\n",
       "863                                0.0                           0.0   \n",
       "748                                0.0                           1.0   \n",
       "64                                 0.0                           1.0   \n",
       "798                                1.0                           0.0   \n",
       "52                                 0.0                           1.0   \n",
       "440                                0.0                           1.0   \n",
       "124                                0.0                           1.0   \n",
       "650                                0.0                           1.0   \n",
       "409                                0.0                           0.0   \n",
       "752                                0.0                           1.0   \n",
       "\n",
       "     credit_history_no credits/all paid  purpose_business  \\\n",
       "863                                 0.0               0.0   \n",
       "748                                 0.0               0.0   \n",
       "64                                  0.0               0.0   \n",
       "798                                 0.0               0.0   \n",
       "52                                  0.0               0.0   \n",
       "440                                 0.0               0.0   \n",
       "124                                 0.0               0.0   \n",
       "650                                 0.0               0.0   \n",
       "409                                 0.0               0.0   \n",
       "752                                 0.0               1.0   \n",
       "\n",
       "     purpose_domestic appliance  purpose_education  \\\n",
       "863                         0.0                0.0   \n",
       "748                         0.0                0.0   \n",
       "64                          0.0                0.0   \n",
       "798                         0.0                0.0   \n",
       "52                          0.0                0.0   \n",
       "440                         0.0                0.0   \n",
       "124                         0.0                0.0   \n",
       "650                         0.0                1.0   \n",
       "409                         0.0                0.0   \n",
       "752                         0.0                0.0   \n",
       "\n",
       "     purpose_furniture/equipment  purpose_new car  purpose_other  \\\n",
       "863                          1.0              0.0            0.0   \n",
       "748                          0.0              0.0            0.0   \n",
       "64                           0.0              0.0            0.0   \n",
       "798                          0.0              1.0            0.0   \n",
       "52                           0.0              0.0            0.0   \n",
       "440                          0.0              1.0            0.0   \n",
       "124                          1.0              0.0            0.0   \n",
       "650                          0.0              0.0            0.0   \n",
       "409                          0.0              1.0            0.0   \n",
       "752                          0.0              0.0            0.0   \n",
       "\n",
       "     purpose_radio/tv  purpose_repairs  purpose_retraining  purpose_used car  \\\n",
       "863               0.0              0.0                 0.0               0.0   \n",
       "748               0.0              0.0                 0.0               1.0   \n",
       "64                1.0              0.0                 0.0               0.0   \n",
       "798               0.0              0.0                 0.0               0.0   \n",
       "52                1.0              0.0                 0.0               0.0   \n",
       "440               0.0              0.0                 0.0               0.0   \n",
       "124               0.0              0.0                 0.0               0.0   \n",
       "650               0.0              0.0                 0.0               0.0   \n",
       "409               0.0              0.0                 0.0               0.0   \n",
       "752               0.0              0.0                 0.0               0.0   \n",
       "\n",
       "     savings_status_100<=X<500  savings_status_500<=X<1000  \\\n",
       "863                        0.0                         0.0   \n",
       "748                        0.0                         0.0   \n",
       "64                         0.0                         0.0   \n",
       "798                        0.0                         0.0   \n",
       "52                         0.0                         0.0   \n",
       "440                        0.0                         0.0   \n",
       "124                        0.0                         0.0   \n",
       "650                        0.0                         0.0   \n",
       "409                        0.0                         1.0   \n",
       "752                        1.0                         0.0   \n",
       "\n",
       "     savings_status_<100  savings_status_>=1000  \\\n",
       "863                  0.0                    1.0   \n",
       "748                  0.0                    0.0   \n",
       "64                   1.0                    0.0   \n",
       "798                  0.0                    0.0   \n",
       "52                   1.0                    0.0   \n",
       "440                  1.0                    0.0   \n",
       "124                  0.0                    0.0   \n",
       "650                  1.0                    0.0   \n",
       "409                  0.0                    0.0   \n",
       "752                  0.0                    0.0   \n",
       "\n",
       "     savings_status_no known savings  employment_1<=X<4  employment_4<=X<7  \\\n",
       "863                              0.0                0.0                0.0   \n",
       "748                              1.0                1.0                0.0   \n",
       "64                               0.0                0.0                0.0   \n",
       "798                              1.0                0.0                0.0   \n",
       "52                               0.0                1.0                0.0   \n",
       "440                              0.0                0.0                0.0   \n",
       "124                              1.0                0.0                0.0   \n",
       "650                              0.0                0.0                1.0   \n",
       "409                              0.0                0.0                1.0   \n",
       "752                              0.0                0.0                1.0   \n",
       "\n",
       "     employment_<1  employment_>=7  employment_unemployed  \\\n",
       "863            1.0             0.0                    0.0   \n",
       "748            0.0             0.0                    0.0   \n",
       "64             1.0             0.0                    0.0   \n",
       "798            0.0             1.0                    0.0   \n",
       "52             0.0             0.0                    0.0   \n",
       "440            0.0             1.0                    0.0   \n",
       "124            1.0             0.0                    0.0   \n",
       "650            0.0             0.0                    0.0   \n",
       "409            0.0             0.0                    0.0   \n",
       "752            0.0             0.0                    0.0   \n",
       "\n",
       "     other_parties_co applicant  other_parties_guarantor  other_parties_none  \\\n",
       "863                         0.0                      0.0                 1.0   \n",
       "748                         0.0                      0.0                 1.0   \n",
       "64                          0.0                      0.0                 1.0   \n",
       "798                         0.0                      0.0                 1.0   \n",
       "52                          0.0                      0.0                 1.0   \n",
       "440                         0.0                      0.0                 1.0   \n",
       "124                         0.0                      0.0                 1.0   \n",
       "650                         0.0                      0.0                 1.0   \n",
       "409                         0.0                      0.0                 1.0   \n",
       "752                         0.0                      0.0                 1.0   \n",
       "\n",
       "     property_magnitude_car  property_magnitude_life insurance  \\\n",
       "863                     0.0                                0.0   \n",
       "748                     1.0                                0.0   \n",
       "64                      0.0                                1.0   \n",
       "798                     1.0                                0.0   \n",
       "52                      1.0                                0.0   \n",
       "440                     1.0                                0.0   \n",
       "124                     0.0                                0.0   \n",
       "650                     0.0                                0.0   \n",
       "409                     0.0                                0.0   \n",
       "752                     0.0                                0.0   \n",
       "\n",
       "     property_magnitude_no known property  property_magnitude_real estate  \\\n",
       "863                                   0.0                             1.0   \n",
       "748                                   0.0                             0.0   \n",
       "64                                    0.0                             0.0   \n",
       "798                                   0.0                             0.0   \n",
       "52                                    0.0                             0.0   \n",
       "440                                   0.0                             0.0   \n",
       "124                                   0.0                             1.0   \n",
       "650                                   1.0                             0.0   \n",
       "409                                   0.0                             1.0   \n",
       "752                                   0.0                             1.0   \n",
       "\n",
       "     other_payment_plans_bank  other_payment_plans_none  \\\n",
       "863                       0.0                       0.0   \n",
       "748                       0.0                       1.0   \n",
       "64                        0.0                       1.0   \n",
       "798                       0.0                       1.0   \n",
       "52                        0.0                       1.0   \n",
       "440                       0.0                       1.0   \n",
       "124                       0.0                       1.0   \n",
       "650                       0.0                       1.0   \n",
       "409                       0.0                       1.0   \n",
       "752                       0.0                       1.0   \n",
       "\n",
       "     other_payment_plans_stores  housing_for free  housing_own  housing_rent  \\\n",
       "863                         1.0               0.0          1.0           0.0   \n",
       "748                         0.0               0.0          1.0           0.0   \n",
       "64                          0.0               0.0          1.0           0.0   \n",
       "798                         0.0               0.0          1.0           0.0   \n",
       "52                          0.0               0.0          1.0           0.0   \n",
       "440                         0.0               0.0          1.0           0.0   \n",
       "124                         0.0               0.0          0.0           1.0   \n",
       "650                         0.0               1.0          0.0           0.0   \n",
       "409                         0.0               0.0          1.0           0.0   \n",
       "752                         0.0               0.0          0.0           1.0   \n",
       "\n",
       "     job_high qualif/self emp/mgmt  job_skilled  job_unemp/unskilled non res  \\\n",
       "863                            0.0          0.0                          0.0   \n",
       "748                            0.0          1.0                          0.0   \n",
       "64                             0.0          1.0                          0.0   \n",
       "798                            0.0          1.0                          0.0   \n",
       "52                             0.0          1.0                          0.0   \n",
       "440                            1.0          0.0                          0.0   \n",
       "124                            0.0          1.0                          0.0   \n",
       "650                            1.0          0.0                          0.0   \n",
       "409                            0.0          1.0                          0.0   \n",
       "752                            0.0          0.0                          0.0   \n",
       "\n",
       "     job_unskilled resident  own_telephone_none  own_telephone_yes  \\\n",
       "863                     1.0                 0.0                1.0   \n",
       "748                     0.0                 1.0                0.0   \n",
       "64                      0.0                 0.0                1.0   \n",
       "798                     0.0                 0.0                1.0   \n",
       "52                      0.0                 1.0                0.0   \n",
       "440                     0.0                 0.0                1.0   \n",
       "124                     0.0                 1.0                0.0   \n",
       "650                     0.0                 0.0                1.0   \n",
       "409                     0.0                 0.0                1.0   \n",
       "752                     1.0                 1.0                0.0   \n",
       "\n",
       "     foreign_worker_no  foreign_worker_yes  duration  credit_amount  \\\n",
       "863                0.0                 1.0      27.0         4526.0   \n",
       "748                0.0                 1.0      21.0         5248.0   \n",
       "64                 0.0                 1.0      24.0         3181.0   \n",
       "798                0.0                 1.0      24.0          717.0   \n",
       "52                 0.0                 1.0      12.0         1262.0   \n",
       "440                0.0                 1.0      12.0         1884.0   \n",
       "124                0.0                 1.0      18.0         1924.0   \n",
       "650                0.0                 1.0      48.0         7476.0   \n",
       "409                0.0                 1.0      12.0          939.0   \n",
       "752                0.0                 1.0      12.0          841.0   \n",
       "\n",
       "     installment_commitment  residence_since  age  existing_credits  \\\n",
       "863                     4.0              2.0  1.0               2.0   \n",
       "748                     1.0              3.0  1.0               1.0   \n",
       "64                      4.0              4.0  1.0               1.0   \n",
       "798                     4.0              4.0  1.0               2.0   \n",
       "52                      3.0              2.0  0.0               1.0   \n",
       "440                     4.0              4.0  1.0               1.0   \n",
       "124                     4.0              3.0  1.0               1.0   \n",
       "650                     4.0              1.0  1.0               1.0   \n",
       "409                     4.0              2.0  1.0               3.0   \n",
       "752                     2.0              4.0  0.0               1.0   \n",
       "\n",
       "     num_dependents  sex  \n",
       "863             2.0  1.0  \n",
       "748             1.0  1.0  \n",
       "64              1.0  0.0  \n",
       "798             1.0  1.0  \n",
       "52              1.0  1.0  \n",
       "440             1.0  1.0  \n",
       "124             1.0  0.0  \n",
       "650             1.0  1.0  \n",
       "409             1.0  1.0  \n",
       "752             1.0  0.0  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "print(f'train_X.shape {train_X.shape}')\n",
    "pd.options.display.max_columns = None\n",
    "pd.concat([train_y.head(10), train_X.head(10)], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we look at how to train a classifier that is optimized for both\n",
    "accuracy and disparate impact, we will set a baseline, by training a\n",
    "pipeline that is only optimized for accuracy. For this purpose, we\n",
    "import a few algorithms from scikit-learn: a dimensionality reduction\n",
    "transformer (PCA) and three classifiers (logistic regression, gradient\n",
    "boosting, and a support vector machine)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.decomposition import PCA\n",
    "from sklearn.linear_model import LogisticRegression as LR\n",
    "from sklearn.ensemble import GradientBoostingClassifier as GBC\n",
    "from sklearn.svm import LinearSVC as SVM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use AI Automation, we need to define a _search space_ ,\n",
    "which is a set of possible machine learning pipelines and\n",
    "their associated hyperparameters. The following code\n",
    "uses Lale to define a search space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"184pt\" height=\"185pt\"\n",
       " viewBox=\"0.00 0.00 184.00 185.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 181)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-181 180,-181 180,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\"><title>cluster:choice_0</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice_0 = pca | no_op\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"8,-47 8,-169 78,-169 78,-47 8,-47\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<g id=\"clust2\" class=\"cluster\"><title>cluster:choice_1</title>\n",
       "<g id=\"a_clust2\"><a xlink:title=\"choice_1 = lr | gbc | svm\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"98,-8 98,-169 168,-169 168,-8 98,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"43\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- lr -->\n",
       "<g id=\"node3\" class=\"node\"><title>lr</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html\" xlink:title=\"lr = LR\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">LR</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;lr -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>pca&#45;&gt;lr</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M77.7296,-120C83.6523,-120 89.838,-120 95.8241,-120\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"88.0002,-123.5 98,-120 87.9998,-116.5 88.0002,-123.5\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node2\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"43\" cy=\"-75\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-66.2\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- gbc -->\n",
       "<g id=\"node4\" class=\"node\"><title>gbc</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.gradient_boosting_classifier.html\" xlink:title=\"gbc = GBC\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-74.2\" font-family=\"Times,serif\" font-size=\"11.00\">GBC</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- svm -->\n",
       "<g id=\"node5\" class=\"node\"><title>svm</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.linear_svc.html\" xlink:title=\"svm = SVM\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-34\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-31.2\" font-family=\"Times,serif\" font-size=\"11.00\">SVM</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5dcc5b3a20>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from lale.lib.lale import NoOp\n",
    "import lale\n",
    "lale.wrap_imported_operators()\n",
    "planned_orig = (PCA | NoOp) >> (LR | GBC | SVM)\n",
    "planned_orig.visualize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The call to `wrap_imported_operators` augments the algorithms\n",
    "that were imported from scikit-learn with metadata about\n",
    "their hyperparameters. The Lale combinator `|` indicates\n",
    "algorithmic choice. For example, `(PCA | NoOp)` indicates that\n",
    "it is up to the AI Automation to decide whether to apply a PCA\n",
    "transformer or whether to use a no-op transformer that leaves\n",
    "the data unchanged. Note that the `PCA` itself is not configured\n",
    "with concrete hyperparameters, since those will be left for the\n",
    "AI automation to choose instead. Finally, the Lale combinator\n",
    "`>>` pipes the output from the transformer into the input to the\n",
    "classifier, which is itself a choice between `(LR | GBC | SVM)`.\n",
    "The search space is encapsulated in the object `planned_orig`.\n",
    "\n",
    "We will use hyperopt to select the algorithms and to tune their\n",
    "hyperparameters. Lale provides a `Hyperopt` that\n",
    "turns a search space such as the one specified above into an\n",
    "optimization problem for hyperopt. After 10 trials, we get back\n",
    "the model that performed best for the default optimization\n",
    "objective, which is accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|███████| 10/10 [00:52<00:00,  5.24s/trial, best loss: -0.7492526158445442]\n",
      "1 out of 10 trials failed, call summary() for details.\n",
      "Run with verbose=True to see per-trial exceptions.\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"44pt\"\n",
       " viewBox=\"0.00 0.00 152.00 44.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 40)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-40 148,-40 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA(svd_solver=&#39;full&#39;, whiten=True)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-15.2\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- lr -->\n",
       "<g id=\"node2\" class=\"node\"><title>lr</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html\" xlink:title=\"lr = LR(dual=True, intercept_scaling=0.3725797779832578, max_iter=802, multi_class=&#39;ovr&#39;, solver=&#39;liblinear&#39;, tol=0.008013565529132195)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-15.2\" font-family=\"Times,serif\" font-size=\"11.00\">LR</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;lr -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>pca&#45;&gt;lr</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-18C62.3932,-18 71.3106,-18 79.8241,-18\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-21.5001 89.919,-18 79.919,-14.5001 79.919,-21.5001\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5de3976470>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from lale.lib.lale import Hyperopt\n",
    "best_estimator = planned_orig.auto_configure(\n",
    "    train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=10)\n",
    "best_estimator.visualize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As shown by the visualization, the search found a pipeline\n",
    "with a PCA transformer and an LR classifier.\n",
    "Inspecting the hyperparameters reveals which values\n",
    "worked best for the 10 trials on the dataset at hand."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "pca = PCA(svd_solver=\"full\", whiten=True)\n",
       "lr = LR(\n",
       "    dual=True,\n",
       "    intercept_scaling=0.3725797779832578,\n",
       "    max_iter=802,\n",
       "    multi_class=\"ovr\",\n",
       "    solver=\"liblinear\",\n",
       "    tol=0.008013565529132195,\n",
       ")\n",
       "pipeline = pca >> lr\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "best_estimator.pretty_print(ipython_display=True, show_imports=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the accuracy score metric from scikit-learn to measure\n",
    "how well the pipeline accomplishes the objective for which it\n",
    "was trained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 74.5%\n"
     ]
    }
   ],
   "source": [
    "import sklearn.metrics\n",
    "accuracy_scorer = sklearn.metrics.make_scorer(sklearn.metrics.accuracy_score)\n",
    "print(f'accuracy {accuracy_scorer(best_estimator, test_X, test_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The accuracy is close to the state of the art for this dataset.\n",
    "However, we would like our model to be not just accurate but also fair.\n",
    "As discussed before, we will use disparate impact as the fairness metric.\n",
    "To configure the metric, we need some fairness-related metadata.\n",
    "The favorable and unfavorable label are values of the target `class`\n",
    "column that indicate whether the loan was granted or denied.\n",
    "For illustrative purposes, we will only look at one of the protected\n",
    "attributes, `sex`, which in this dataset is encoded as 0 for female\n",
    "(unprivileged group) and 1 for male (privileged group)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "fairness_info = {\n",
    "    'favorable_label': 1,\n",
    "    'unfavorable_label': 0,\n",
    "    'protected_attribute_names': ['sex'],\n",
    "    'unprivileged_groups': [{'sex': 0}],\n",
    "    'privileged_groups': [{'sex': 1}]}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use the disparate impact metric from AIF360, wrapped for\n",
    "compatibility with scikit-learn."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "disparate impact 0.91\n"
     ]
    }
   ],
   "source": [
    "import aif360.metrics\n",
    "disparate_impact_scorer = lale.lib.aif360.disparate_impact(**fairness_info)\n",
    "print(f'disparate impact {disparate_impact_scorer(best_estimator, test_X, test_y):.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The disparate impact for this model is 0.91, which differs from\n",
    "the ideal value for this metric, which is 1.0. We would prefer a\n",
    "model that is much more fair. The AIF360 toolkit provides several\n",
    "algorithms for mitigating fairness problems. One of them is\n",
    "`DisparateImpactRemover`, which modifies the features that are\n",
    "not the protected attribute in such a way that it is hard to\n",
    "predict the protected attribute from them. We use a Lale version\n",
    "of `DisparateImpactRemover` that wraps the corresponding AIF360\n",
    "algorithm for AI Automation. This algorithm has a hyperparameter\n",
    "`repair_level` that we will tune with hyperparameter optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'description': 'Repair amount from 0 = none to 1 = full.',\n",
       " 'type': 'number',\n",
       " 'minimum': 0,\n",
       " 'maximum': 1,\n",
       " 'default': 1}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from lale.lib.aif360 import DisparateImpactRemover\n",
    "DisparateImpactRemover.hyperparam_schema('repair_level')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We compose the bias mitigation algorithm in a pipeline with\n",
    "a projection operator that strips out the projected\n",
    "attribute, followed by a choice of classifiers as before.\n",
    "In the visualization, light blue indicates trainable operators\n",
    "and dark blue indicates that automation must make a choice before\n",
    "the operators can be trained. Compared to the earlier pipeline,\n",
    "we omit the PCA, to make it easier for the estimator to disregard\n",
    "features that cause poor disparate impact."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"283pt\" height=\"185pt\"\n",
       " viewBox=\"0.00 0.00 283.20 185.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 181)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-181 279.196,-181 279.196,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\"><title>cluster:choice</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice = lr | gbc | svm\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"197.196,-8 197.196,-169 267.196,-169 267.196,-8 197.196,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"232.196\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- disparate_impact_remover -->\n",
       "<g id=\"node1\" class=\"node\"><title>disparate_impact_remover</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.aif360.disparate_impact_remover.html\" xlink:title=\"disparate_impact_remover = DisparateImpactRemover(sensitive_attribute=&#39;sex&#39;)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"39.598\" cy=\"-120\" rx=\"39.6962\" ry=\"28.0702\"/>\n",
       "<text text-anchor=\"middle\" x=\"39.598\" y=\"-129.2\" font-family=\"Times,serif\" font-size=\"11.00\">Disparate&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"39.598\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">Impact&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"39.598\" y=\"-105.2\" font-family=\"Times,serif\" font-size=\"11.00\">Remover</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project -->\n",
       "<g id=\"node2\" class=\"node\"><title>project</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project = Project(drop_columns=[&#39;sex&#39;])\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"142.196\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"142.196\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- disparate_impact_remover&#45;&gt;project -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>disparate_impact_remover&#45;&gt;project</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M79.2041,-120C87.6193,-120 96.4967,-120 104.822,-120\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"104.995,-123.5 114.995,-120 104.995,-116.5 104.995,-123.5\"/>\n",
       "</g>\n",
       "<!-- lr -->\n",
       "<g id=\"node3\" class=\"node\"><title>lr</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html\" xlink:title=\"lr = LR\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"232.196\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"232.196\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">LR</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project&#45;&gt;lr -->\n",
       "<g id=\"edge2\" class=\"edge\"><title>project&#45;&gt;lr</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M169.599,-120C177.589,-120 186.507,-120 195.02,-120\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"187.196,-123.5 197.196,-120 187.196,-116.5 187.196,-123.5\"/>\n",
       "</g>\n",
       "<!-- gbc -->\n",
       "<g id=\"node4\" class=\"node\"><title>gbc</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.gradient_boosting_classifier.html\" xlink:title=\"gbc = GBC\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"232.196\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"232.196\" y=\"-74.2\" font-family=\"Times,serif\" font-size=\"11.00\">GBC</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- svm -->\n",
       "<g id=\"node5\" class=\"node\"><title>svm</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.linear_svc.html\" xlink:title=\"svm = SVM\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"232.196\" cy=\"-34\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"232.196\" y=\"-31.2\" font-family=\"Times,serif\" font-size=\"11.00\">SVM</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5dce199358>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from lale.lib.lale import Project\n",
    "dimr = DisparateImpactRemover(sensitive_attribute='sex')\n",
    "proj = Project(drop_columns=['sex'])\n",
    "planned_fairer = dimr >> proj >> (LR | GBC | SVM)\n",
    "planned_fairer.visualize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unlike accuracy, which is a metric that can be computed from\n",
    "predicted labels alone, fairness metrics such as disparate\n",
    "impact need to look not just at labels but also at features.\n",
    "For instance, disparate impact is defined by comparing outcomes\n",
    "between a privileged group and an unprivileged group, so it\n",
    "needs to check the protected attribute to determine group\n",
    "membership for the sample person at hand. In order to use Lale Hyperopt for this\n",
    "case, we will have to define a custom scorer that takes the disparate impact into account\n",
    "along with accuracy. Hyperopt minimizes `(best_score - score_returned_by_the_scorer)`, where `best_score` \n",
    "is an argument to Hyperopt and `score_returned_by_the_scorer` is the value returned by the scorer for each\n",
    "evaluation point. In the custom scorer defined below, if the disparate impact is outside of  a margin of 10% around its ideal of 1, the score is -99, a low value. Otherwise, the score is accuracy, higher is better."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def combined_scorer(estimator, X, y):\n",
    "    accuracy = accuracy_scorer(estimator, X, y)\n",
    "    disparate_impact = disparate_impact_scorer(estimator, X, y)\n",
    "    if disparate_impact < 0.9 or 1.1 < disparate_impact:\n",
    "        return -99\n",
    "    else:\n",
    "        return accuracy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we have all the pieces in place to use AI Automation\n",
    "on our `planned_fairer` pipeline for both accuracy and\n",
    "disparate impact."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|███████| 10/10 [01:51<00:00, 11.16s/trial, best loss: 0.25519832372410856]\n"
     ]
    }
   ],
   "source": [
    "trained_fairer = planned_fairer.auto_configure(\n",
    "    train_X, train_y, optimizer=Hyperopt, cv=3, verbose=True,\n",
    "    max_evals=10, scoring=combined_scorer, best_score=1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As with any trained model, we can evaluate and visualize the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 75.5%\n",
      "disparate impact 1.00\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"267pt\" height=\"65pt\"\n",
       " viewBox=\"0.00 0.00 267.20 64.57\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 60.5685)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-60.5685 263.196,-60.5685 263.196,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- disparate_impact_remover -->\n",
       "<g id=\"node1\" class=\"node\"><title>disparate_impact_remover</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.aif360.disparate_impact_remover.html\" xlink:title=\"disparate_impact_remover = DisparateImpactRemover(sensitive_attribute=&#39;sex&#39;, repair_level=0.8726303533099419)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"39.598\" cy=\"-28.2843\" rx=\"39.6962\" ry=\"28.0702\"/>\n",
       "<text text-anchor=\"middle\" x=\"39.598\" y=\"-37.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Disparate&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"39.598\" y=\"-25.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Impact&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"39.598\" y=\"-13.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Remover</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project -->\n",
       "<g id=\"node2\" class=\"node\"><title>project</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project = Project(drop_columns=[&#39;sex&#39;])\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"142.196\" cy=\"-28.2843\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"142.196\" y=\"-25.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- disparate_impact_remover&#45;&gt;project -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>disparate_impact_remover&#45;&gt;project</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M79.2041,-28.2843C87.6193,-28.2843 96.4967,-28.2843 104.822,-28.2843\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"104.995,-31.7844 114.995,-28.2843 104.995,-24.7844 104.995,-31.7844\"/>\n",
       "</g>\n",
       "<!-- gbc -->\n",
       "<g id=\"node3\" class=\"node\"><title>gbc</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.gradient_boosting_classifier.html\" xlink:title=\"gbc = GBC(max_depth=4, max_features=0.10560431550816061, min_samples_leaf=0.03215642315369574, min_samples_split=0.4905516813945378, n_estimators=53, presort=&#39;auto&#39;)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"232.196\" cy=\"-28.2843\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"232.196\" y=\"-25.4843\" font-family=\"Times,serif\" font-size=\"11.00\">GBC</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project&#45;&gt;gbc -->\n",
       "<g id=\"edge2\" class=\"edge\"><title>project&#45;&gt;gbc</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M169.599,-28.2843C177.589,-28.2843 186.507,-28.2843 195.02,-28.2843\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"195.115,-31.7844 205.115,-28.2843 195.115,-24.7844 195.115,-31.7844\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5de374a390>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(f'accuracy {accuracy_scorer(trained_fairer, test_X, test_y):.1%}')\n",
    "print(f'disparate impact {disparate_impact_scorer(trained_fairer, test_X, test_y):.2f}')\n",
    "trained_fairer.visualize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As the result demonstrates, the best model found by AI Automation\n",
    "has similar accuracy and better disparate impact as the one we saw\n",
    "before. Also, it has tuned the repair level and\n",
    "has picked and tuned a classifier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "disparate_impact_remover = DisparateImpactRemover(\n",
       "    sensitive_attribute=\"sex\", repair_level=0.8726303533099419\n",
       ")\n",
       "project = Project(drop_columns=[\"sex\"])\n",
       "gbc = GBC(\n",
       "    max_depth=4,\n",
       "    max_features=0.10560431550816061,\n",
       "    min_samples_leaf=0.03215642315369574,\n",
       "    min_samples_split=0.4905516813945378,\n",
       "    n_estimators=53,\n",
       "    presort=\"auto\",\n",
       ")\n",
       "pipeline = disparate_impact_remover >> project >> gbc\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "trained_fairer.pretty_print(ipython_display=True, show_imports=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These results may vary by dataset and search space.\n",
    "\n",
    "In summary, this blog post showed you how to use AI Automation\n",
    "from Lale, while incorporating a fairness mitigation technique\n",
    "into the pipeline and a fairness metric into the objective.\n",
    "Of course, this blog post only scratches the surface of what can\n",
    "be done with AI Automation and AI Fairness. We encourage you to\n",
    "check out the open-source projects Lale and AIF360 and use them\n",
    "to build your own fair and accurate models!\n",
    "\n",
    "- Lale: https://github.com/IBM/lale\n",
    "- AIF360: https://aif360.mybluemix.net/\n",
    "\n",
    "The following notebook showcases how Lale exposes a few more\n",
    "operators and metrics from AIF360:\n",
    "https://nbviewer.jupyter.org/github/IBM/lale/blob/master/examples/demo_aif360_more.ipynb"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}