{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lale and its Impact on the Data Science Workflow\n",
    "\n",
    "Guillaume Baudart, Martin Hirzel, Kiran Kate, Pari Ram, and Avi Shinnar\n",
    "\n",
    "27 March 2020\n",
    "\n",
    "Examples, documentation, code: https://github.com/ibm/lale\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/docs/img/lale_logo.jpg\" alt=\"logo\" width=\"140px\" align=\"left\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Value Proposition\n",
    "\n",
    "- **target user**: data scientist familiar with Python and scikit-learn\n",
    "- **scope**: data preparation and machine learning (including some DL)\n",
    "- **value**: consistent API for both manual machine learning and auto-ML\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-three-values.png\" style=\"width:350px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install --quiet lale"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Enabling schema validation for this notebook\n",
    "from lale.settings import set_disable_data_schema_validation\n",
    "set_disable_data_schema_validation(False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape train_X_all (522910, 54), test_X (58102, 54)\n"
     ]
    }
   ],
   "source": [
    "import lale.datasets\n",
    "(train_X_all, train_y_all), (test_X, test_y) = lale.datasets.covtype_df(test_size=0.1)\n",
    "print(f'shape train_X_all {train_X_all.shape}, test_X {test_X.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape train_X (52291, 54), other_X (470619, 54)\n"
     ]
    }
   ],
   "source": [
    "import sklearn.model_selection\n",
    "train_X, other_X, train_y, other_y = sklearn.model_selection.train_test_split(\n",
    "    train_X_all, train_y_all, test_size=0.9)\n",
    "print(f'shape train_X {train_X.shape}, other_X {other_X.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y</th>\n",
       "      <th>Elevation</th>\n",
       "      <th>Aspect</th>\n",
       "      <th>Slope</th>\n",
       "      <th>Horizontal_Distance_To_Hydrology</th>\n",
       "      <th>Vertical_Distance_To_Hydrology</th>\n",
       "      <th>Horizontal_Distance_To_Roadways</th>\n",
       "      <th>Hillshade_9am</th>\n",
       "      <th>Hillshade_Noon</th>\n",
       "      <th>Hillshade_3pm</th>\n",
       "      <th>Horizontal_Distance_To_Fire_Points</th>\n",
       "      <th>Wilderness_Area1</th>\n",
       "      <th>Wilderness_Area2</th>\n",
       "      <th>Wilderness_Area3</th>\n",
       "      <th>Wilderness_Area4</th>\n",
       "      <th>Soil_Type1</th>\n",
       "      <th>Soil_Type2</th>\n",
       "      <th>Soil_Type3</th>\n",
       "      <th>Soil_Type4</th>\n",
       "      <th>Soil_Type5</th>\n",
       "      <th>Soil_Type6</th>\n",
       "      <th>Soil_Type7</th>\n",
       "      <th>Soil_Type8</th>\n",
       "      <th>Soil_Type9</th>\n",
       "      <th>Soil_Type10</th>\n",
       "      <th>Soil_Type11</th>\n",
       "      <th>Soil_Type12</th>\n",
       "      <th>Soil_Type13</th>\n",
       "      <th>Soil_Type14</th>\n",
       "      <th>Soil_Type15</th>\n",
       "      <th>Soil_Type16</th>\n",
       "      <th>Soil_Type17</th>\n",
       "      <th>Soil_Type18</th>\n",
       "      <th>Soil_Type19</th>\n",
       "      <th>Soil_Type20</th>\n",
       "      <th>Soil_Type21</th>\n",
       "      <th>Soil_Type22</th>\n",
       "      <th>Soil_Type23</th>\n",
       "      <th>Soil_Type24</th>\n",
       "      <th>Soil_Type25</th>\n",
       "      <th>Soil_Type26</th>\n",
       "      <th>Soil_Type27</th>\n",
       "      <th>Soil_Type28</th>\n",
       "      <th>Soil_Type29</th>\n",
       "      <th>Soil_Type30</th>\n",
       "      <th>Soil_Type31</th>\n",
       "      <th>Soil_Type32</th>\n",
       "      <th>Soil_Type33</th>\n",
       "      <th>Soil_Type34</th>\n",
       "      <th>Soil_Type35</th>\n",
       "      <th>Soil_Type36</th>\n",
       "      <th>Soil_Type37</th>\n",
       "      <th>Soil_Type38</th>\n",
       "      <th>Soil_Type39</th>\n",
       "      <th>Soil_Type40</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>484665</th>\n",
       "      <td>3</td>\n",
       "      <td>2277.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>228.0</td>\n",
       "      <td>145.0</td>\n",
       "      <td>1045.0</td>\n",
       "      <td>207.0</td>\n",
       "      <td>157.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>1516.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>451137</th>\n",
       "      <td>1</td>\n",
       "      <td>3273.0</td>\n",
       "      <td>296.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>371.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>1740.0</td>\n",
       "      <td>153.0</td>\n",
       "      <td>227.0</td>\n",
       "      <td>212.0</td>\n",
       "      <td>808.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239309</th>\n",
       "      <td>1</td>\n",
       "      <td>3062.0</td>\n",
       "      <td>298.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>408.0</td>\n",
       "      <td>78.0</td>\n",
       "      <td>2445.0</td>\n",
       "      <td>184.0</td>\n",
       "      <td>235.0</td>\n",
       "      <td>191.0</td>\n",
       "      <td>1041.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>406901</th>\n",
       "      <td>2</td>\n",
       "      <td>3195.0</td>\n",
       "      <td>42.0</td>\n",
       "      <td>19.0</td>\n",
       "      <td>376.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>3873.0</td>\n",
       "      <td>220.0</td>\n",
       "      <td>196.0</td>\n",
       "      <td>105.0</td>\n",
       "      <td>2935.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>379632</th>\n",
       "      <td>2</td>\n",
       "      <td>3003.0</td>\n",
       "      <td>310.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>182.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>2573.0</td>\n",
       "      <td>181.0</td>\n",
       "      <td>230.0</td>\n",
       "      <td>189.0</td>\n",
       "      <td>2408.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>510084</th>\n",
       "      <td>1</td>\n",
       "      <td>2898.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>-3.0</td>\n",
       "      <td>1865.0</td>\n",
       "      <td>224.0</td>\n",
       "      <td>219.0</td>\n",
       "      <td>129.0</td>\n",
       "      <td>1022.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96001</th>\n",
       "      <td>2</td>\n",
       "      <td>2221.0</td>\n",
       "      <td>338.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>242.0</td>\n",
       "      <td>72.0</td>\n",
       "      <td>437.0</td>\n",
       "      <td>168.0</td>\n",
       "      <td>204.0</td>\n",
       "      <td>172.0</td>\n",
       "      <td>342.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39684</th>\n",
       "      <td>1</td>\n",
       "      <td>3289.0</td>\n",
       "      <td>322.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>285.0</td>\n",
       "      <td>60.0</td>\n",
       "      <td>4012.0</td>\n",
       "      <td>172.0</td>\n",
       "      <td>219.0</td>\n",
       "      <td>186.0</td>\n",
       "      <td>1291.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>227535</th>\n",
       "      <td>2</td>\n",
       "      <td>2890.0</td>\n",
       "      <td>272.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>376.0</td>\n",
       "      <td>43.0</td>\n",
       "      <td>2296.0</td>\n",
       "      <td>204.0</td>\n",
       "      <td>242.0</td>\n",
       "      <td>176.0</td>\n",
       "      <td>2460.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85578</th>\n",
       "      <td>1</td>\n",
       "      <td>3340.0</td>\n",
       "      <td>204.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>510.0</td>\n",
       "      <td>134.0</td>\n",
       "      <td>1851.0</td>\n",
       "      <td>210.0</td>\n",
       "      <td>253.0</td>\n",
       "      <td>174.0</td>\n",
       "      <td>1426.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        y  Elevation  Aspect  Slope  Horizontal_Distance_To_Hydrology  \\\n",
       "484665  3     2277.0    41.0   31.0                             228.0   \n",
       "451137  1     3273.0   296.0   22.0                             371.0   \n",
       "239309  1     3062.0   298.0   13.0                             408.0   \n",
       "406901  2     3195.0    42.0   19.0                             376.0   \n",
       "379632  2     3003.0   310.0   14.0                             182.0   \n",
       "510084  1     2898.0    47.0   10.0                              30.0   \n",
       "96001   2     2221.0   338.0   22.0                             242.0   \n",
       "39684   1     3289.0   322.0   18.0                             285.0   \n",
       "227535  2     2890.0   272.0    6.0                             376.0   \n",
       "85578   1     3340.0   204.0   16.0                             510.0   \n",
       "\n",
       "        Vertical_Distance_To_Hydrology  Horizontal_Distance_To_Roadways  \\\n",
       "484665                           145.0                           1045.0   \n",
       "451137                            45.0                           1740.0   \n",
       "239309                            78.0                           2445.0   \n",
       "406901                            72.0                           3873.0   \n",
       "379632                            30.0                           2573.0   \n",
       "510084                            -3.0                           1865.0   \n",
       "96001                             72.0                            437.0   \n",
       "39684                             60.0                           4012.0   \n",
       "227535                            43.0                           2296.0   \n",
       "85578                            134.0                           1851.0   \n",
       "\n",
       "        Hillshade_9am  Hillshade_Noon  Hillshade_3pm  \\\n",
       "484665          207.0           157.0           65.0   \n",
       "451137          153.0           227.0          212.0   \n",
       "239309          184.0           235.0          191.0   \n",
       "406901          220.0           196.0          105.0   \n",
       "379632          181.0           230.0          189.0   \n",
       "510084          224.0           219.0          129.0   \n",
       "96001           168.0           204.0          172.0   \n",
       "39684           172.0           219.0          186.0   \n",
       "227535          204.0           242.0          176.0   \n",
       "85578           210.0           253.0          174.0   \n",
       "\n",
       "        Horizontal_Distance_To_Fire_Points  Wilderness_Area1  \\\n",
       "484665                              1516.0               0.0   \n",
       "451137                               808.0               0.0   \n",
       "239309                              1041.0               1.0   \n",
       "406901                              2935.0               1.0   \n",
       "379632                              2408.0               0.0   \n",
       "510084                              1022.0               1.0   \n",
       "96001                                342.0               0.0   \n",
       "39684                               1291.0               1.0   \n",
       "227535                              2460.0               1.0   \n",
       "85578                               1426.0               0.0   \n",
       "\n",
       "        Wilderness_Area2  Wilderness_Area3  Wilderness_Area4  Soil_Type1  \\\n",
       "484665               0.0               0.0               1.0         0.0   \n",
       "451137               0.0               1.0               0.0         0.0   \n",
       "239309               0.0               0.0               0.0         0.0   \n",
       "406901               0.0               0.0               0.0         0.0   \n",
       "379632               0.0               1.0               0.0         0.0   \n",
       "510084               0.0               0.0               0.0         0.0   \n",
       "96001                0.0               0.0               1.0         0.0   \n",
       "39684                0.0               0.0               0.0         0.0   \n",
       "227535               0.0               0.0               0.0         0.0   \n",
       "85578                0.0               1.0               0.0         0.0   \n",
       "\n",
       "        Soil_Type2  Soil_Type3  Soil_Type4  Soil_Type5  Soil_Type6  \\\n",
       "484665         0.0         0.0         0.0         0.0         0.0   \n",
       "451137         0.0         0.0         0.0         0.0         0.0   \n",
       "239309         0.0         0.0         0.0         0.0         0.0   \n",
       "406901         0.0         0.0         0.0         0.0         0.0   \n",
       "379632         0.0         0.0         0.0         0.0         0.0   \n",
       "510084         0.0         0.0         0.0         0.0         0.0   \n",
       "96001          0.0         0.0         0.0         0.0         0.0   \n",
       "39684          0.0         0.0         0.0         0.0         0.0   \n",
       "227535         0.0         0.0         0.0         0.0         0.0   \n",
       "85578          0.0         0.0         0.0         0.0         0.0   \n",
       "\n",
       "        Soil_Type7  Soil_Type8  Soil_Type9  Soil_Type10  Soil_Type11  \\\n",
       "484665         0.0         0.0         0.0          1.0          0.0   \n",
       "451137         0.0         0.0         0.0          0.0          0.0   \n",
       "239309         0.0         0.0         0.0          0.0          0.0   \n",
       "406901         0.0         0.0         0.0          0.0          0.0   \n",
       "379632         0.0         0.0         0.0          0.0          0.0   \n",
       "510084         0.0         0.0         0.0          0.0          0.0   \n",
       "96001          0.0         0.0         0.0          1.0          0.0   \n",
       "39684          0.0         0.0         0.0          0.0          0.0   \n",
       "227535         0.0         0.0         0.0          0.0          0.0   \n",
       "85578          0.0         0.0         0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type12  Soil_Type13  Soil_Type14  Soil_Type15  Soil_Type16  \\\n",
       "484665          0.0          0.0          0.0          0.0          0.0   \n",
       "451137          0.0          0.0          0.0          0.0          0.0   \n",
       "239309          0.0          0.0          0.0          0.0          0.0   \n",
       "406901          0.0          0.0          0.0          0.0          0.0   \n",
       "379632          0.0          0.0          0.0          0.0          0.0   \n",
       "510084          0.0          0.0          0.0          0.0          0.0   \n",
       "96001           0.0          0.0          0.0          0.0          0.0   \n",
       "39684           0.0          0.0          0.0          0.0          0.0   \n",
       "227535          0.0          0.0          0.0          0.0          0.0   \n",
       "85578           0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type17  Soil_Type18  Soil_Type19  Soil_Type20  Soil_Type21  \\\n",
       "484665          0.0          0.0          0.0          0.0          0.0   \n",
       "451137          0.0          0.0          0.0          0.0          0.0   \n",
       "239309          0.0          0.0          0.0          0.0          0.0   \n",
       "406901          0.0          0.0          0.0          0.0          0.0   \n",
       "379632          0.0          0.0          0.0          0.0          0.0   \n",
       "510084          0.0          0.0          0.0          1.0          0.0   \n",
       "96001           0.0          0.0          0.0          0.0          0.0   \n",
       "39684           0.0          0.0          0.0          0.0          0.0   \n",
       "227535          0.0          0.0          0.0          0.0          0.0   \n",
       "85578           0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type22  Soil_Type23  Soil_Type24  Soil_Type25  Soil_Type26  \\\n",
       "484665          0.0          0.0          0.0          0.0          0.0   \n",
       "451137          0.0          0.0          0.0          0.0          0.0   \n",
       "239309          0.0          0.0          0.0          0.0          0.0   \n",
       "406901          0.0          0.0          0.0          0.0          0.0   \n",
       "379632          0.0          0.0          0.0          0.0          0.0   \n",
       "510084          0.0          0.0          0.0          0.0          0.0   \n",
       "96001           0.0          0.0          0.0          0.0          0.0   \n",
       "39684           0.0          0.0          0.0          0.0          0.0   \n",
       "227535          0.0          0.0          0.0          0.0          0.0   \n",
       "85578           0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type27  Soil_Type28  Soil_Type29  Soil_Type30  Soil_Type31  \\\n",
       "484665          0.0          0.0          0.0          0.0          0.0   \n",
       "451137          0.0          0.0          0.0          0.0          1.0   \n",
       "239309          0.0          0.0          1.0          0.0          0.0   \n",
       "406901          0.0          0.0          0.0          1.0          0.0   \n",
       "379632          0.0          0.0          0.0          0.0          0.0   \n",
       "510084          0.0          0.0          0.0          0.0          0.0   \n",
       "96001           0.0          0.0          0.0          0.0          0.0   \n",
       "39684           0.0          0.0          0.0          0.0          0.0   \n",
       "227535          0.0          0.0          1.0          0.0          0.0   \n",
       "85578           0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type32  Soil_Type33  Soil_Type34  Soil_Type35  Soil_Type36  \\\n",
       "484665          0.0          0.0          0.0          0.0          0.0   \n",
       "451137          0.0          0.0          0.0          0.0          0.0   \n",
       "239309          0.0          0.0          0.0          0.0          0.0   \n",
       "406901          0.0          0.0          0.0          0.0          0.0   \n",
       "379632          0.0          1.0          0.0          0.0          0.0   \n",
       "510084          0.0          0.0          0.0          0.0          0.0   \n",
       "96001           0.0          0.0          0.0          0.0          0.0   \n",
       "39684           0.0          0.0          0.0          0.0          0.0   \n",
       "227535          0.0          0.0          0.0          0.0          0.0   \n",
       "85578           1.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type37  Soil_Type38  Soil_Type39  Soil_Type40  \n",
       "484665          0.0          0.0          0.0          0.0  \n",
       "451137          0.0          0.0          0.0          0.0  \n",
       "239309          0.0          0.0          0.0          0.0  \n",
       "406901          0.0          0.0          0.0          0.0  \n",
       "379632          0.0          0.0          0.0          0.0  \n",
       "510084          0.0          0.0          0.0          0.0  \n",
       "96001           0.0          0.0          0.0          0.0  \n",
       "39684           0.0          0.0          1.0          0.0  \n",
       "227535          0.0          0.0          0.0          0.0  \n",
       "85578           0.0          0.0          0.0          0.0  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.concat([pd.DataFrame({'y': train_y}, index=train_X.index),\n",
    "           train_X], axis=1).tail(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Manual Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.decomposition import PCA\n",
    "from xgboost import XGBClassifier as XGBoost\n",
    "lale.wrap_imported_operators()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"48pt\"\n",
       " viewBox=\"0.00 0.00 152.00 47.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 43.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-43.598 148,-43.598 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA(n_components=6)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.xgboost.xgb_classifier.html\" xlink:title=\"xg_boost = XGBoost(n_estimators=3)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>pca&#45;&gt;xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f4818fe9e10>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "manual_trainable = PCA(n_components=6) >> XGBoost(n_estimators=3)\n",
    "manual_trainable.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.39 s, sys: 953 ms, total: 3.34 s\n",
      "Wall time: 2.05 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "manual_trained = manual_trainable.fit(train_X, train_y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 67.1%\n"
     ]
    }
   ],
   "source": [
    "import sklearn.metrics\n",
    "manual_y = manual_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, manual_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hyperparameter Tuning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'description': 'Number of trees to fit.',\n",
       " 'type': 'integer',\n",
       " 'default': 100,\n",
       " 'minimumForOptimizer': 50,\n",
       " 'maximumForOptimizer': 1000}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "XGBoost.hyperparam_schema('n_estimators')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\n"
     ]
    }
   ],
   "source": [
    "print(PCA.documentation_url())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lale.lib.lale import Hyperopt\n",
    "import lale.schemas as schemas\n",
    "\n",
    "CustomPCA = PCA.customize_schema(n_components=schemas.Int(minimum=2, maximum=54))\n",
    "CustomXGBoost = XGBoost.customize_schema(n_estimators=schemas.Int(minimum=1, maximum=10))\n",
    "\n",
    "hpo_planned = CustomPCA >> CustomXGBoost\n",
    "hpo_trainable = Hyperopt(estimator=hpo_planned, max_evals=10, cv=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|███████| 10/10 [02:15<00:00, 13.53s/trial, best loss: -0.7727907776451675]\n",
      "CPU times: user 2min 53s, sys: 19.3 s, total: 3min 13s\n",
      "Wall time: 2min 30s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "hpo_trained = hpo_trainable.fit(train_X, train_y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### --- Excursions: Types as Search Spaces ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1208-loops.png\" style=\"width:700px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 77.7%\n"
     ]
    }
   ],
   "source": [
    "hpo_y = hpo_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, hpo_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspecting Automation Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"174pt\" height=\"65pt\"\n",
       " viewBox=\"0.00 0.00 174.11 64.57\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 60.5685)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-60.5685 170.108,-60.5685 170.108,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- custom_pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>custom_pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"custom_pca = CustomPCA(n_components=43, svd_solver=&#39;full&#39;, whiten=True)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"32.5269\" cy=\"-28.2843\" rx=\"32.5538\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"32.5269\" y=\"-31.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Custom&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"32.5269\" y=\"-19.4843\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- custom_xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>custom_xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.xgboost.xgb_classifier.html\" xlink:title=\"custom_xg_boost = CustomXGBoost(gamma=0.42208258595069725, learning_rate=0.6558019595096513, max_depth=5, min_child_weight=13, n_estimators=9, reg_alpha=0.3590229319214039, reg_lambda=0.7978279409450941, subsample=0.6209085649172931)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"133.581\" cy=\"-28.2843\" rx=\"32.5538\" ry=\"28.0702\"/>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-37.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Custom&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-25.4843\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-13.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- custom_pca&#45;&gt;custom_xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>custom_pca&#45;&gt;custom_xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M65.1405,-28.2843C73.2715,-28.2843 82.1469,-28.2843 90.7095,-28.2843\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"90.9278,-31.7844 100.928,-28.2843 90.9277,-24.7844 90.9278,-31.7844\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f47e4b2ee48>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline().visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "from sklearn.decomposition import PCA as CustomPCA\n",
       "from xgboost import XGBClassifier as CustomXGBoost\n",
       "import lale\n",
       "\n",
       "lale.wrap_imported_operators()\n",
       "custom_pca = CustomPCA.customize_schema(\n",
       "    n_components={\"type\": \"integer\", \"minimum\": 2, \"maximum\": 54}\n",
       ")(n_components=43, svd_solver=\"full\", whiten=True)\n",
       "custom_xg_boost = CustomXGBoost.customize_schema(\n",
       "    n_estimators={\"type\": \"integer\", \"minimum\": 1, \"maximum\": 10}\n",
       ")(\n",
       "    gamma=0.42208258595069725,\n",
       "    learning_rate=0.6558019595096513,\n",
       "    max_depth=5,\n",
       "    min_child_weight=13,\n",
       "    n_estimators=9,\n",
       "    reg_alpha=0.3590229319214039,\n",
       "    reg_lambda=0.7978279409450941,\n",
       "    subsample=0.6209085649172931,\n",
       ")\n",
       "pipeline = custom_pca >> custom_xg_boost\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline().pretty_print(ipython_display=True, customize_schema=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tid</th>\n",
       "      <th>loss</th>\n",
       "      <th>time</th>\n",
       "      <th>log_loss</th>\n",
       "      <th>status</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>p0</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.684229</td>\n",
       "      <td>2.293911</td>\n",
       "      <td>1.161776</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p1</th>\n",
       "      <td>1</td>\n",
       "      <td>-0.708057</td>\n",
       "      <td>3.347494</td>\n",
       "      <td>0.950058</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p2</th>\n",
       "      <td>2</td>\n",
       "      <td>-0.631983</td>\n",
       "      <td>3.356443</td>\n",
       "      <td>1.123108</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p3</th>\n",
       "      <td>3</td>\n",
       "      <td>-0.699050</td>\n",
       "      <td>2.606100</td>\n",
       "      <td>1.168528</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p4</th>\n",
       "      <td>4</td>\n",
       "      <td>-0.717428</td>\n",
       "      <td>5.158346</td>\n",
       "      <td>0.690650</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p5</th>\n",
       "      <td>5</td>\n",
       "      <td>-0.759653</td>\n",
       "      <td>7.138689</td>\n",
       "      <td>0.655658</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p6</th>\n",
       "      <td>6</td>\n",
       "      <td>-0.707598</td>\n",
       "      <td>3.555126</td>\n",
       "      <td>0.942210</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p7</th>\n",
       "      <td>7</td>\n",
       "      <td>-0.772791</td>\n",
       "      <td>10.981915</td>\n",
       "      <td>0.555780</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p8</th>\n",
       "      <td>8</td>\n",
       "      <td>-0.653057</td>\n",
       "      <td>2.016587</td>\n",
       "      <td>0.845659</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p9</th>\n",
       "      <td>9</td>\n",
       "      <td>-0.620853</td>\n",
       "      <td>2.155818</td>\n",
       "      <td>1.817853</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      tid      loss       time  log_loss status\n",
       "name                                           \n",
       "p0      0 -0.684229   2.293911  1.161776     ok\n",
       "p1      1 -0.708057   3.347494  0.950058     ok\n",
       "p2      2 -0.631983   3.356443  1.123108     ok\n",
       "p3      3 -0.699050   2.606100  1.168528     ok\n",
       "p4      4 -0.717428   5.158346  0.690650     ok\n",
       "p5      5 -0.759653   7.138689  0.655658     ok\n",
       "p6      6 -0.707598   3.555126  0.942210     ok\n",
       "p7      7 -0.772791  10.981915  0.555780     ok\n",
       "p8      8 -0.653057   2.016587  0.845659     ok\n",
       "p9      9 -0.620853   2.155818  1.817853     ok"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hpo_trained.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p9\n"
     ]
    }
   ],
   "source": [
    "worst_name = hpo_trained.summary().loss.argmax()\n",
    "if not isinstance(worst_name, str): #newer pandas argmax returns index\n",
    "    worst_name = hpo_trained.summary().index[worst_name]\n",
    "print(worst_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"174pt\" height=\"65pt\"\n",
       " viewBox=\"0.00 0.00 174.11 64.57\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 60.5685)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-60.5685 170.108,-60.5685 170.108,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- custom_pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>custom_pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"custom_pca = CustomPCA(n_components=20, svd_solver=&#39;full&#39;, whiten=True)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"32.5269\" cy=\"-28.2843\" rx=\"32.5538\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"32.5269\" y=\"-31.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Custom&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"32.5269\" y=\"-19.4843\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- custom_xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>custom_xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.xgboost.xgb_classifier.html\" xlink:title=\"custom_xg_boost = CustomXGBoost(gamma=0.37068548766270437, learning_rate=0.02005982973762002, max_depth=2, min_child_weight=9, n_estimators=5, reg_alpha=0.8716519284632148, reg_lambda=0.7305593001592293, subsample=0.9559232064468288)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"133.581\" cy=\"-28.2843\" rx=\"32.5538\" ry=\"28.0702\"/>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-37.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Custom&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-25.4843\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-13.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- custom_pca&#45;&gt;custom_xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>custom_pca&#45;&gt;custom_xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M65.1405,-28.2843C73.2715,-28.2843 82.1469,-28.2843 90.7095,-28.2843\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"90.9278,-31.7844 100.928,-28.2843 90.9277,-24.7844 90.9278,-31.7844\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f48790f2dd8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "custom_pca = CustomPCA.customize_schema(\n",
       "    n_components={\"type\": \"integer\", \"minimum\": 2, \"maximum\": 54}\n",
       ")(n_components=20, svd_solver=\"full\", whiten=True)\n",
       "custom_xg_boost = CustomXGBoost.customize_schema(\n",
       "    n_estimators={\"type\": \"integer\", \"minimum\": 1, \"maximum\": 10}\n",
       ")(\n",
       "    gamma=0.37068548766270437,\n",
       "    learning_rate=0.02005982973762002,\n",
       "    max_depth=2,\n",
       "    min_child_weight=9,\n",
       "    n_estimators=5,\n",
       "    reg_alpha=0.8716519284632148,\n",
       "    reg_lambda=0.7305593001592293,\n",
       "    subsample=0.9559232064468288,\n",
       ")\n",
       "pipeline = custom_pca >> custom_xg_boost\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline(worst_name).visualize()\n",
    "hpo_trained.get_pipeline(worst_name).pretty_print(\n",
    "    ipython_display=True, show_imports=False, customize_schema=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combined Algorithm Selection and Hyperparameter Tuning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"184pt\" height=\"185pt\"\n",
       " viewBox=\"0.00 0.00 184.00 185.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 181)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-181 180,-181 180,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\"><title>cluster:choice_0</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice_0 = norm | no_op\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"8,-47 8,-169 78,-169 78,-47 8,-47\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<g id=\"clust2\" class=\"cluster\"><title>cluster:choice_1</title>\n",
       "<g id=\"a_clust2\"><a xlink:title=\"choice_1 = tree | lr | knn\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"98,-8 98,-169 168,-169 168,-8 98,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm -->\n",
       "<g id=\"node1\" class=\"node\"><title>norm</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.normalizer.html\" xlink:title=\"norm = Norm\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"43\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">Norm</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- tree -->\n",
       "<g id=\"node3\" class=\"node\"><title>tree</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.decision_tree_classifier.html\" xlink:title=\"tree = Tree\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">Tree</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm&#45;&gt;tree -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>norm&#45;&gt;tree</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M77.7296,-120C83.6523,-120 89.838,-120 95.8241,-120\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"88.0002,-123.5 98,-120 87.9998,-116.5 88.0002,-123.5\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node2\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"43\" cy=\"-75\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-66.2\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- lr -->\n",
       "<g id=\"node4\" class=\"node\"><title>lr</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html\" xlink:title=\"lr = LR(solver=&#39;liblinear&#39;)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"133\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-74.2\" font-family=\"Times,serif\" font-size=\"11.00\">LR</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node5\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.k_neighbors_classifier.html\" xlink:title=\"knn = KNN\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-34\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-31.2\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f487925c470>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.preprocessing import Normalizer as Norm\n",
    "from sklearn.linear_model import LogisticRegression as LR\n",
    "from sklearn.tree import DecisionTreeClassifier as Tree\n",
    "from sklearn.neighbors import KNeighborsClassifier as KNN\n",
    "from lale.lib.lale import NoOp\n",
    "lale.wrap_imported_operators()\n",
    "\n",
    "KNN = KNN.customize_schema(n_neighbors=schemas.Int(minimum=1, maximum=10))\n",
    "transp_planned = (Norm | NoOp) >> (Tree | LR(solver='liblinear') | KNN)\n",
    "transp_planned.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|█████████| 3/3 [01:24<00:00, 28.25s/trial, best loss: -0.8390927596840342]\n",
      "CPU times: user 1min 26s, sys: 953 ms, total: 1min 27s\n",
      "Wall time: 1min 26s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "transp_trained = transp_planned.auto_configure(\n",
    "    train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ---  Excursion: Bindings as Lifecycle ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-bindings.png\" style=\"width:450px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "knn = KNN.customize_schema(\n",
       "    n_neighbors={\"type\": \"integer\", \"minimum\": 1, \"maximum\": 10}\n",
       ")(algorithm=\"ball_tree\", metric=\"manhattan\", n_neighbors=9)\n",
       "pipeline = NoOp() >> knn\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"48pt\"\n",
       " viewBox=\"0.00 0.00 152.00 47.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 43.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-43.598 148,-43.598 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node1\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node2\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.k_neighbors_classifier.html\" xlink:title=\"knn = KNN(algorithm=&#39;ball_tree&#39;, metric=&#39;manhattan&#39;, n_neighbors=9)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;knn -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>no_op&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f47e4aa4518>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "transp_trained.pretty_print(\n",
    "    ipython_display=True, show_imports=False, customize_schema=True)\n",
    "transp_trained.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 86.6%\n",
      "CPU times: user 51.5 s, sys: 31.2 ms, total: 51.5 s\n",
      "Wall time: 52 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "transp_y = transp_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, transp_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Non-Linear Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'description': 'Features of forest covertypes dataset (classification).',\n",
       " 'documentation_url': 'https://scikit-learn.org/0.20/datasets/index.html#forest-covertypes',\n",
       " 'type': 'array',\n",
       " 'items': {'type': 'array',\n",
       "  'minItems': 54,\n",
       "  'maxItems': 54,\n",
       "  'items': [{'description': 'Elevation', 'type': 'integer'},\n",
       "   {'description': 'Aspect', 'type': 'integer'},\n",
       "   {'description': 'Slope', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Hydrology', 'type': 'integer'},\n",
       "   {'description': 'Vertical_Distance_To_Hydrology', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Roadways', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_9am', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_Noon', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_3pm', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Fire_Points', 'type': 'integer'},\n",
       "   {'description': 'Wilderness_Area1', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area2', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area3', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area4', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type1', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type2', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type3', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type4', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type5', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type6', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type7', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type8', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type9', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type10', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type11', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type12', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type13', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type14', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type15', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type16', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type17', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type18', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type19', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type20', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type21', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type22', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type23', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type24', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type25', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type26', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type27', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type28', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type29', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type30', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type31', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type32', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type33', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type34', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type35', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type36', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type37', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type38', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type39', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type40', 'enum': [0, 1]}]},\n",
       " 'minItems': 58102,\n",
       " 'maxItems': 58102}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_X.json_schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Wilderness_Area1', 'Wilderness_Area2', 'Wilderness_Area3', 'Wilderness_Area4', 'Soil_Type1', 'Soil_Type2', 'Soil_Type3', 'Soil_Type4', 'Soil_Type5', 'Soil_Type6', 'Soil_Type7', 'Soil_Type8', 'Soil_Type9', 'Soil_Type10', 'Soil_Type11', 'Soil_Type12', 'Soil_Type13', 'Soil_Type14', 'Soil_Type15', 'Soil_Type16', 'Soil_Type17', 'Soil_Type18', 'Soil_Type19', 'Soil_Type20', 'Soil_Type21', 'Soil_Type22', 'Soil_Type23', 'Soil_Type24', 'Soil_Type25', 'Soil_Type26', 'Soil_Type27', 'Soil_Type28', 'Soil_Type29', 'Soil_Type30', 'Soil_Type31', 'Soil_Type32', 'Soil_Type33', 'Soil_Type34', 'Soil_Type35', 'Soil_Type36', 'Soil_Type37', 'Soil_Type38', 'Soil_Type39', 'Soil_Type40']\n"
     ]
    }
   ],
   "source": [
    "from lale.lib.lale import categorical\n",
    "print(categorical(max_values=2)(test_X))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"332pt\" height=\"186pt\"\n",
       " viewBox=\"0.00 0.00 332.00 185.80\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 181.799)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-181.799 328,-181.799 328,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\"><title>cluster:choice</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice = norm | no_op\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"82,-8 82,-130 152,-130 152,-8 82,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-114.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0 -->\n",
       "<g id=\"node1\" class=\"node\"><title>project_0</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_0 = Project(columns=lale.lib.lale.categorical(max_values=2))\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-158\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-155.2\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel -->\n",
       "<g id=\"node2\" class=\"node\"><title>feat_sel</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.select_k_best.html\" xlink:title=\"feat_sel = FeatSel\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"117\" cy=\"-158\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-161.2\" font-family=\"Times,serif\" font-size=\"11.00\">Feat&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-149.2\" font-family=\"Times,serif\" font-size=\"11.00\">Sel</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0&#45;&gt;feat_sel -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>project_0&#45;&gt;feat_sel</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-158C62.3932,-158 71.3106,-158 79.8241,-158\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-161.5 89.919,-158 79.919,-154.5 79.919,-161.5\"/>\n",
       "</g>\n",
       "<!-- concat -->\n",
       "<g id=\"node6\" class=\"node\"><title>concat</title>\n",
       "<g id=\"a_node6\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.rasl.concat_features.html\" xlink:title=\"concat = Concat\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"207\" cy=\"-119\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-116.2\" font-family=\"Times,serif\" font-size=\"11.00\">Concat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel&#45;&gt;concat -->\n",
       "<g id=\"edge3\" class=\"edge\"><title>feat_sel&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M140.662,-147.957C150.982,-143.383 163.367,-137.894 174.567,-132.93\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"176.298,-135.992 184.023,-128.74 173.462,-129.592 176.298,-135.992\"/>\n",
       "</g>\n",
       "<!-- project_1 -->\n",
       "<g id=\"node3\" class=\"node\"><title>project_1</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_1 = Project(drop_columns=lale.lib.lale.categorical(max_values=2))\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-81\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm -->\n",
       "<g id=\"node4\" class=\"node\"><title>norm</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.normalizer.html\" xlink:title=\"norm = Norm\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"117\" cy=\"-81\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">Norm</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_1&#45;&gt;norm -->\n",
       "<g id=\"edge2\" class=\"edge\"><title>project_1&#45;&gt;norm</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-81C62.3932,-81 71.3106,-81 79.8241,-81\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"72.0002,-84.5005 82,-81 71.9998,-77.5005 72.0002,-84.5005\"/>\n",
       "</g>\n",
       "<!-- norm&#45;&gt;concat -->\n",
       "<g id=\"edge4\" class=\"edge\"><title>norm&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M151.589,-95.5043C159.149,-98.7687 167.143,-102.221 174.609,-105.445\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"173.233,-108.663 183.801,-109.414 176.008,-102.236 173.233,-108.663\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node5\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-36\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-39.2\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-27.2\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node7\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node7\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.k_neighbors_classifier.html\" xlink:title=\"knn = KNN\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"297\" cy=\"-119\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"297\" y=\"-116.2\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- concat&#45;&gt;knn -->\n",
       "<g id=\"edge5\" class=\"edge\"><title>concat&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M234.403,-119C242.393,-119 251.311,-119 259.824,-119\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"259.919,-122.5 269.919,-119 259.919,-115.5 259.919,-122.5\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f47e4b2e0f0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from lale.lib.lale import Project\n",
    "from lale.lib.lale import ConcatFeatures as Concat\n",
    "from sklearn.feature_selection import SelectKBest as FeatSel\n",
    "lale.wrap_imported_operators()\n",
    "\n",
    "binary_prep = Project(columns=categorical(max_values=2)) >> FeatSel\n",
    "other_prep = Project(drop_columns=categorical(max_values=2)) >> (Norm | NoOp)\n",
    "nonlin_planned = (binary_prep & other_prep) >> Concat >> KNN\n",
    "nonlin_planned.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|█████████| 3/3 [02:17<00:00, 45.88s/trial, best loss: -0.8620412868709595]\n",
      "CPU times: user 2min 18s, sys: 359 ms, total: 2min 19s\n",
      "Wall time: 2min 21s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "nonlin_trained = nonlin_planned.auto_configure(\n",
    "    train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=3, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### --- Excursion: Combinators ---\n",
    "\n",
    "| Lale feature            | Name | Description  | Scikit-learn feature                |\n",
    "| ----------------------- | ---- | ------------ | ----------------------------------- |\n",
    "| >> or `make_pipeline`   | pipe | feed to next | `make_pipeline`                     |\n",
    "| & or `make_union`       | and  | run both     | `make_union` or `ColumnTransformer` |\n",
    "| &#x7c; or `make_choice` | or   | choose one   | N/A (specific to given AutoML tool) |\n",
    "\n",
    "### --- Excursion: Interoperability ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-interop.png\" style=\"width:550px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"332pt\" height=\"95pt\"\n",
       " viewBox=\"0.00 0.00 332.00 94.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 90.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-90.598 328,-90.598 328,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- project_0 -->\n",
       "<g id=\"node1\" class=\"node\"><title>project_0</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_0 = Project(columns=lale.lib.lale.categorical(max_values=2))\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-66.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-63.999\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel -->\n",
       "<g id=\"node2\" class=\"node\"><title>feat_sel</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.select_k_best.html\" xlink:title=\"feat_sel = FeatSel(k=8)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-66.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-69.999\" font-family=\"Times,serif\" font-size=\"11.00\">Feat&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-57.999\" font-family=\"Times,serif\" font-size=\"11.00\">Sel</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0&#45;&gt;feat_sel -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>project_0&#45;&gt;feat_sel</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-66.799C62.3932,-66.799 71.3106,-66.799 79.8241,-66.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-70.2991 89.919,-66.799 79.919,-63.2991 79.919,-70.2991\"/>\n",
       "</g>\n",
       "<!-- concat -->\n",
       "<g id=\"node5\" class=\"node\"><title>concat</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.rasl.concat_features.html\" xlink:title=\"concat = Concat()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"207\" cy=\"-42.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-39.999\" font-family=\"Times,serif\" font-size=\"11.00\">Concat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel&#45;&gt;concat -->\n",
       "<g id=\"edge3\" class=\"edge\"><title>feat_sel&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M142.513,-60.1137C151.572,-57.6431 162.012,-54.7958 171.775,-52.1331\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"172.892,-55.4564 181.619,-49.4485 171.05,-48.703 172.892,-55.4564\"/>\n",
       "</g>\n",
       "<!-- project_1 -->\n",
       "<g id=\"node3\" class=\"node\"><title>project_1</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_1 = Project(drop_columns=lale.lib.lale.categorical(max_values=2))\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node4\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_1&#45;&gt;no_op -->\n",
       "<g id=\"edge2\" class=\"edge\"><title>project_1&#45;&gt;no_op</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;concat -->\n",
       "<g id=\"edge4\" class=\"edge\"><title>no_op&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M142.982,-26.3283C151.958,-28.6743 162.241,-31.362 171.861,-33.8763\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"171.004,-37.2697 181.564,-36.4123 172.774,-30.4972 171.004,-37.2697\"/>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node6\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node6\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.k_neighbors_classifier.html\" xlink:title=\"knn = KNN(algorithm=&#39;kd_tree&#39;, n_neighbors=7, weights=&#39;distance&#39;)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"297\" cy=\"-42.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"297\" y=\"-39.999\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- concat&#45;&gt;knn -->\n",
       "<g id=\"edge5\" class=\"edge\"><title>concat&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M234.403,-42.799C242.393,-42.799 251.311,-42.799 259.824,-42.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"259.919,-46.2991 269.919,-42.799 259.919,-39.2991 259.919,-46.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f47e07c8748>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "project_0 = Project(columns=lale.lib.lale.categorical(max_values=2))\n",
       "feat_sel = FeatSel(k=8)\n",
       "pipeline_0 = make_pipeline(project_0, feat_sel)\n",
       "project_1 = Project(drop_columns=lale.lib.lale.categorical(max_values=2))\n",
       "pipeline_1 = make_pipeline(project_1, NoOp())\n",
       "union = make_union(pipeline_0, pipeline_1)\n",
       "knn = KNN(algorithm=\"kd_tree\", n_neighbors=7, weights=\"distance\")\n",
       "pipeline = make_pipeline(union, knn)\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "nonlin_trained.visualize()\n",
    "nonlin_trained.pretty_print(ipython_display=True, show_imports=False, combinators=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 88.6%\n",
      "CPU times: user 4.31 s, sys: 46.9 ms, total: 4.36 s\n",
      "Wall time: 4.44 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "nonlin_y = nonlin_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, nonlin_y):.1%}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Wilderness_Area1</th>\n",
       "      <th>Wilderness_Area4</th>\n",
       "      <th>Soil_Type2</th>\n",
       "      <th>Soil_Type3</th>\n",
       "      <th>Soil_Type4</th>\n",
       "      <th>Soil_Type10</th>\n",
       "      <th>Soil_Type38</th>\n",
       "      <th>Soil_Type39</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Wilderness_Area1  Wilderness_Area4  Soil_Type2  Soil_Type3  Soil_Type4  \\\n",
       "0               1.0               0.0         0.0         0.0         0.0   \n",
       "1               0.0               0.0         0.0         0.0         0.0   \n",
       "2               0.0               0.0         0.0         0.0         0.0   \n",
       "3               1.0               0.0         0.0         0.0         0.0   \n",
       "4               1.0               0.0         0.0         0.0         0.0   \n",
       "5               0.0               1.0         0.0         0.0         0.0   \n",
       "6               1.0               0.0         0.0         0.0         0.0   \n",
       "7               1.0               0.0         0.0         0.0         0.0   \n",
       "8               1.0               0.0         0.0         0.0         0.0   \n",
       "9               1.0               0.0         0.0         0.0         0.0   \n",
       "\n",
       "   Soil_Type10  Soil_Type38  Soil_Type39  \n",
       "0          0.0          1.0          0.0  \n",
       "1          0.0          0.0          0.0  \n",
       "2          0.0          0.0          0.0  \n",
       "3          0.0          0.0          0.0  \n",
       "4          0.0          0.0          0.0  \n",
       "5          1.0          0.0          0.0  \n",
       "6          0.0          0.0          0.0  \n",
       "7          0.0          1.0          0.0  \n",
       "8          0.0          0.0          0.0  \n",
       "9          0.0          0.0          0.0  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "binary_prep_trainable = Project(columns=categorical(max_values=2)) >> FeatSel(k=8)\n",
    "binary_prep_trained = binary_prep_trainable.fit(train_X, train_y)\n",
    "binary_prep_trained.transform(test_X.head(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "- code and documentation: https://github.com/ibm/lale\n",
    "- more examples: https://github.com/IBM/lale/tree/master/examples/\n",
    "- frequently asked questions: https://github.com/IBM/lale/blob/master/docs/faq.rst\n",
    "- arXiv paper: https://arxiv.org/pdf/1906.03957.pdf\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-summary.png\" style=\"width:350px\" align=\"left\">"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}