{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lale and its Impact on the Data Science Workflow\n",
    "\n",
    "Guillaume Baudart, Martin Hirzel, Kiran Kate, Pari Ram, and Avi Shinnar\n",
    "\n",
    "27 March 2020\n",
    "\n",
    "Examples, documentation, code: https://github.com/ibm/lale\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/docs/img/lale_logo.jpg\" alt=\"logo\" width=\"140px\" align=\"left\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Value Proposition\n",
    "\n",
    "- **target user**: data scientist familiar with Python and scikit-learn\n",
    "- **scope**: data preparation and machine learning (including some DL)\n",
    "- **value**: consistent API for both manual machine learning and auto-ML\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-three-values.png\" style=\"width:350px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: lale in /home/hirzel/python3.6venv/lib/python3.6/site-packages (0.3.5)\n",
      "Requirement already satisfied: lightgbm in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (2.2.3)\n",
      "Requirement already satisfied: astunparse in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (1.6.2)\n",
      "Requirement already satisfied: hyperopt==0.2.3 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.2.3)\n",
      "Requirement already satisfied: pandas<=0.25.3 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.25.0)\n",
      "Requirement already satisfied: xgboost in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.90)\n",
      "Requirement already satisfied: jsonsubschema in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.0.0)\n",
      "Requirement already satisfied: jsonschema in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (3.2.0)\n",
      "Requirement already satisfied: h5py in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (2.9.0)\n",
      "Requirement already satisfied: scikit-learn==0.20.3 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.20.3)\n",
      "Requirement already satisfied: scipy in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (1.3.0)\n",
      "Requirement already satisfied: numpy in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (1.17.0)\n",
      "Requirement already satisfied: graphviz in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (0.11.1)\n",
      "Requirement already satisfied: decorator in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from lale) (4.4.0)\n",
      "Requirement already satisfied: six<2.0,>=1.6.1 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from astunparse->lale) (1.12.0)\n",
      "Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from astunparse->lale) (0.33.4)\n",
      "Requirement already satisfied: future in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from hyperopt==0.2.3->lale) (0.17.1)\n",
      "Requirement already satisfied: cloudpickle in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from hyperopt==0.2.3->lale) (1.3.0)\n",
      "Requirement already satisfied: tqdm in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from hyperopt==0.2.3->lale) (4.32.2)\n",
      "Requirement already satisfied: networkx==2.2 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from hyperopt==0.2.3->lale) (2.2)\n",
      "Requirement already satisfied: pytz>=2017.2 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from pandas<=0.25.3->lale) (2019.1)\n",
      "Requirement already satisfied: python-dateutil>=2.6.1 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from pandas<=0.25.3->lale) (2.8.0)\n",
      "Requirement already satisfied: python-intervals in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonsubschema->lale) (1.8.0)\n",
      "Requirement already satisfied: greenery in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonsubschema->lale) (3.1)\n",
      "Requirement already satisfied: pyrsistent>=0.14.0 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonschema->lale) (0.15.7)\n",
      "Requirement already satisfied: setuptools in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonschema->lale) (41.0.1)\n",
      "Requirement already satisfied: attrs>=17.4.0 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonschema->lale) (19.1.0)\n",
      "Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from jsonschema->lale) (1.3.0)\n",
      "Requirement already satisfied: zipp>=0.5 in /home/hirzel/python3.6venv/lib/python3.6/site-packages (from importlib-metadata; python_version < \"3.8\"->jsonschema->lale) (0.5.2)\n"
     ]
    }
   ],
   "source": [
    "!pip install lale"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape train_X_all (522910, 54), test_X (58102, 54)\n"
     ]
    }
   ],
   "source": [
    "import lale.datasets\n",
    "(train_X_all, train_y_all), (test_X, test_y) = lale.datasets.covtype_df(test_size=0.1)\n",
    "print(f'shape train_X_all {train_X_all.shape}, test_X {test_X.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape train_X (52291, 54), other_X (470619, 54)\n"
     ]
    }
   ],
   "source": [
    "import sklearn.model_selection\n",
    "train_X, other_X, train_y, other_y = sklearn.model_selection.train_test_split(\n",
    "    train_X_all, train_y_all, test_size=0.9)\n",
    "print(f'shape train_X {train_X.shape}, other_X {other_X.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y</th>\n",
       "      <th>Elevation</th>\n",
       "      <th>Aspect</th>\n",
       "      <th>Slope</th>\n",
       "      <th>Horizontal_Distance_To_Hydrology</th>\n",
       "      <th>Vertical_Distance_To_Hydrology</th>\n",
       "      <th>Horizontal_Distance_To_Roadways</th>\n",
       "      <th>Hillshade_9am</th>\n",
       "      <th>Hillshade_Noon</th>\n",
       "      <th>Hillshade_3pm</th>\n",
       "      <th>Horizontal_Distance_To_Fire_Points</th>\n",
       "      <th>Wilderness_Area1</th>\n",
       "      <th>Wilderness_Area2</th>\n",
       "      <th>Wilderness_Area3</th>\n",
       "      <th>Wilderness_Area4</th>\n",
       "      <th>Soil_Type1</th>\n",
       "      <th>Soil_Type2</th>\n",
       "      <th>Soil_Type3</th>\n",
       "      <th>Soil_Type4</th>\n",
       "      <th>Soil_Type5</th>\n",
       "      <th>Soil_Type6</th>\n",
       "      <th>Soil_Type7</th>\n",
       "      <th>Soil_Type8</th>\n",
       "      <th>Soil_Type9</th>\n",
       "      <th>Soil_Type10</th>\n",
       "      <th>Soil_Type11</th>\n",
       "      <th>Soil_Type12</th>\n",
       "      <th>Soil_Type13</th>\n",
       "      <th>Soil_Type14</th>\n",
       "      <th>Soil_Type15</th>\n",
       "      <th>Soil_Type16</th>\n",
       "      <th>Soil_Type17</th>\n",
       "      <th>Soil_Type18</th>\n",
       "      <th>Soil_Type19</th>\n",
       "      <th>Soil_Type20</th>\n",
       "      <th>Soil_Type21</th>\n",
       "      <th>Soil_Type22</th>\n",
       "      <th>Soil_Type23</th>\n",
       "      <th>Soil_Type24</th>\n",
       "      <th>Soil_Type25</th>\n",
       "      <th>Soil_Type26</th>\n",
       "      <th>Soil_Type27</th>\n",
       "      <th>Soil_Type28</th>\n",
       "      <th>Soil_Type29</th>\n",
       "      <th>Soil_Type30</th>\n",
       "      <th>Soil_Type31</th>\n",
       "      <th>Soil_Type32</th>\n",
       "      <th>Soil_Type33</th>\n",
       "      <th>Soil_Type34</th>\n",
       "      <th>Soil_Type35</th>\n",
       "      <th>Soil_Type36</th>\n",
       "      <th>Soil_Type37</th>\n",
       "      <th>Soil_Type38</th>\n",
       "      <th>Soil_Type39</th>\n",
       "      <th>Soil_Type40</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>325384</th>\n",
       "      <td>2</td>\n",
       "      <td>3064.0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>702.0</td>\n",
       "      <td>259.0</td>\n",
       "      <td>721.0</td>\n",
       "      <td>247.0</td>\n",
       "      <td>189.0</td>\n",
       "      <td>56.0</td>\n",
       "      <td>1714.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>442177</th>\n",
       "      <td>1</td>\n",
       "      <td>3277.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>454.0</td>\n",
       "      <td>70.0</td>\n",
       "      <td>1570.0</td>\n",
       "      <td>215.0</td>\n",
       "      <td>206.0</td>\n",
       "      <td>124.0</td>\n",
       "      <td>2754.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185316</th>\n",
       "      <td>2</td>\n",
       "      <td>3138.0</td>\n",
       "      <td>257.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>228.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>5649.0</td>\n",
       "      <td>185.0</td>\n",
       "      <td>248.0</td>\n",
       "      <td>200.0</td>\n",
       "      <td>3051.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189541</th>\n",
       "      <td>3</td>\n",
       "      <td>2317.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>42.0</td>\n",
       "      <td>644.0</td>\n",
       "      <td>231.0</td>\n",
       "      <td>240.0</td>\n",
       "      <td>141.0</td>\n",
       "      <td>781.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>428374</th>\n",
       "      <td>2</td>\n",
       "      <td>2970.0</td>\n",
       "      <td>47.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>319.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>1919.0</td>\n",
       "      <td>220.0</td>\n",
       "      <td>178.0</td>\n",
       "      <td>80.0</td>\n",
       "      <td>3060.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>234638</th>\n",
       "      <td>1</td>\n",
       "      <td>3278.0</td>\n",
       "      <td>335.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>360.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>5763.0</td>\n",
       "      <td>209.0</td>\n",
       "      <td>233.0</td>\n",
       "      <td>163.0</td>\n",
       "      <td>646.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172207</th>\n",
       "      <td>1</td>\n",
       "      <td>3175.0</td>\n",
       "      <td>343.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>162.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>4395.0</td>\n",
       "      <td>183.0</td>\n",
       "      <td>212.0</td>\n",
       "      <td>166.0</td>\n",
       "      <td>2965.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240801</th>\n",
       "      <td>1</td>\n",
       "      <td>3355.0</td>\n",
       "      <td>346.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>180.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>1922.0</td>\n",
       "      <td>188.0</td>\n",
       "      <td>213.0</td>\n",
       "      <td>163.0</td>\n",
       "      <td>4906.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>435277</th>\n",
       "      <td>1</td>\n",
       "      <td>3154.0</td>\n",
       "      <td>316.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>339.0</td>\n",
       "      <td>122.0</td>\n",
       "      <td>2688.0</td>\n",
       "      <td>143.0</td>\n",
       "      <td>209.0</td>\n",
       "      <td>201.0</td>\n",
       "      <td>2720.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>297100</th>\n",
       "      <td>7</td>\n",
       "      <td>3344.0</td>\n",
       "      <td>313.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4317.0</td>\n",
       "      <td>163.0</td>\n",
       "      <td>221.0</td>\n",
       "      <td>196.0</td>\n",
       "      <td>4092.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        y  Elevation  Aspect  Slope  Horizontal_Distance_To_Hydrology  \\\n",
       "325384  2     3064.0    86.0   25.0                             702.0   \n",
       "442177  1     3277.0    31.0   15.0                             454.0   \n",
       "185316  2     3138.0   257.0   14.0                             228.0   \n",
       "189541  3     2317.0   150.0    8.0                             150.0   \n",
       "428374  2     2970.0    47.0   25.0                             319.0   \n",
       "234638  1     3278.0   335.0    5.0                             360.0   \n",
       "172207  1     3175.0   343.0   17.0                             162.0   \n",
       "240801  1     3355.0   346.0   16.0                             180.0   \n",
       "435277  1     3154.0   316.0   26.0                             339.0   \n",
       "297100  7     3344.0   313.0   20.0                               0.0   \n",
       "\n",
       "        Vertical_Distance_To_Hydrology  Horizontal_Distance_To_Roadways  \\\n",
       "325384                           259.0                            721.0   \n",
       "442177                            70.0                           1570.0   \n",
       "185316                            30.0                           5649.0   \n",
       "189541                            42.0                            644.0   \n",
       "428374                           100.0                           1919.0   \n",
       "234638                            35.0                           5763.0   \n",
       "172207                             3.0                           4395.0   \n",
       "240801                             6.0                           1922.0   \n",
       "435277                           122.0                           2688.0   \n",
       "297100                             0.0                           4317.0   \n",
       "\n",
       "        Hillshade_9am  Hillshade_Noon  Hillshade_3pm  \\\n",
       "325384          247.0           189.0           56.0   \n",
       "442177          215.0           206.0          124.0   \n",
       "185316          185.0           248.0          200.0   \n",
       "189541          231.0           240.0          141.0   \n",
       "428374          220.0           178.0           80.0   \n",
       "234638          209.0           233.0          163.0   \n",
       "172207          183.0           212.0          166.0   \n",
       "240801          188.0           213.0          163.0   \n",
       "435277          143.0           209.0          201.0   \n",
       "297100          163.0           221.0          196.0   \n",
       "\n",
       "        Horizontal_Distance_To_Fire_Points  Wilderness_Area1  \\\n",
       "325384                              1714.0               1.0   \n",
       "442177                              2754.0               0.0   \n",
       "185316                              3051.0               1.0   \n",
       "189541                               781.0               0.0   \n",
       "428374                              3060.0               0.0   \n",
       "234638                               646.0               1.0   \n",
       "172207                              2965.0               1.0   \n",
       "240801                              4906.0               0.0   \n",
       "435277                              2720.0               1.0   \n",
       "297100                              4092.0               1.0   \n",
       "\n",
       "        Wilderness_Area2  Wilderness_Area3  Wilderness_Area4  Soil_Type1  \\\n",
       "325384               0.0               0.0               0.0         0.0   \n",
       "442177               0.0               1.0               0.0         0.0   \n",
       "185316               0.0               0.0               0.0         0.0   \n",
       "189541               0.0               0.0               1.0         0.0   \n",
       "428374               0.0               1.0               0.0         0.0   \n",
       "234638               0.0               0.0               0.0         0.0   \n",
       "172207               0.0               0.0               0.0         0.0   \n",
       "240801               1.0               0.0               0.0         0.0   \n",
       "435277               0.0               0.0               0.0         0.0   \n",
       "297100               0.0               0.0               0.0         0.0   \n",
       "\n",
       "        Soil_Type2  Soil_Type3  Soil_Type4  Soil_Type5  Soil_Type6  \\\n",
       "325384         0.0         0.0         0.0         0.0         0.0   \n",
       "442177         0.0         0.0         0.0         0.0         0.0   \n",
       "185316         0.0         0.0         0.0         0.0         0.0   \n",
       "189541         0.0         0.0         1.0         0.0         0.0   \n",
       "428374         0.0         0.0         0.0         0.0         0.0   \n",
       "234638         0.0         0.0         0.0         0.0         0.0   \n",
       "172207         0.0         0.0         0.0         0.0         0.0   \n",
       "240801         0.0         0.0         0.0         0.0         0.0   \n",
       "435277         0.0         0.0         0.0         0.0         0.0   \n",
       "297100         0.0         0.0         0.0         0.0         0.0   \n",
       "\n",
       "        Soil_Type7  Soil_Type8  Soil_Type9  Soil_Type10  Soil_Type11  \\\n",
       "325384         0.0         0.0         0.0          0.0          0.0   \n",
       "442177         0.0         0.0         0.0          0.0          0.0   \n",
       "185316         0.0         0.0         0.0          0.0          0.0   \n",
       "189541         0.0         0.0         0.0          0.0          0.0   \n",
       "428374         0.0         0.0         0.0          0.0          0.0   \n",
       "234638         0.0         0.0         0.0          0.0          0.0   \n",
       "172207         0.0         0.0         0.0          0.0          0.0   \n",
       "240801         0.0         0.0         0.0          0.0          0.0   \n",
       "435277         0.0         0.0         0.0          0.0          0.0   \n",
       "297100         0.0         0.0         0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type12  Soil_Type13  Soil_Type14  Soil_Type15  Soil_Type16  \\\n",
       "325384          0.0          0.0          0.0          0.0          0.0   \n",
       "442177          0.0          0.0          0.0          0.0          0.0   \n",
       "185316          0.0          0.0          0.0          0.0          0.0   \n",
       "189541          0.0          0.0          0.0          0.0          0.0   \n",
       "428374          0.0          0.0          0.0          0.0          0.0   \n",
       "234638          0.0          0.0          0.0          0.0          0.0   \n",
       "172207          0.0          0.0          0.0          0.0          0.0   \n",
       "240801          0.0          0.0          0.0          0.0          0.0   \n",
       "435277          0.0          0.0          0.0          0.0          0.0   \n",
       "297100          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type17  Soil_Type18  Soil_Type19  Soil_Type20  Soil_Type21  \\\n",
       "325384          0.0          0.0          0.0          0.0          0.0   \n",
       "442177          0.0          0.0          0.0          0.0          0.0   \n",
       "185316          0.0          0.0          0.0          0.0          0.0   \n",
       "189541          0.0          0.0          0.0          0.0          0.0   \n",
       "428374          0.0          0.0          0.0          0.0          0.0   \n",
       "234638          0.0          0.0          0.0          0.0          0.0   \n",
       "172207          0.0          0.0          0.0          0.0          0.0   \n",
       "240801          0.0          0.0          0.0          0.0          0.0   \n",
       "435277          0.0          0.0          0.0          0.0          0.0   \n",
       "297100          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type22  Soil_Type23  Soil_Type24  Soil_Type25  Soil_Type26  \\\n",
       "325384          0.0          0.0          0.0          0.0          0.0   \n",
       "442177          0.0          0.0          0.0          0.0          0.0   \n",
       "185316          1.0          0.0          0.0          0.0          0.0   \n",
       "189541          0.0          0.0          0.0          0.0          0.0   \n",
       "428374          0.0          0.0          0.0          0.0          0.0   \n",
       "234638          1.0          0.0          0.0          0.0          0.0   \n",
       "172207          0.0          1.0          0.0          0.0          0.0   \n",
       "240801          0.0          0.0          0.0          0.0          0.0   \n",
       "435277          0.0          0.0          0.0          0.0          0.0   \n",
       "297100          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type27  Soil_Type28  Soil_Type29  Soil_Type30  Soil_Type31  \\\n",
       "325384          0.0          0.0          0.0          1.0          0.0   \n",
       "442177          0.0          0.0          0.0          0.0          0.0   \n",
       "185316          0.0          0.0          0.0          0.0          0.0   \n",
       "189541          0.0          0.0          0.0          0.0          0.0   \n",
       "428374          0.0          0.0          0.0          0.0          0.0   \n",
       "234638          0.0          0.0          0.0          0.0          0.0   \n",
       "172207          0.0          0.0          0.0          0.0          0.0   \n",
       "240801          0.0          0.0          0.0          0.0          0.0   \n",
       "435277          0.0          0.0          1.0          0.0          0.0   \n",
       "297100          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type32  Soil_Type33  Soil_Type34  Soil_Type35  Soil_Type36  \\\n",
       "325384          0.0          0.0          0.0          0.0          0.0   \n",
       "442177          1.0          0.0          0.0          0.0          0.0   \n",
       "185316          0.0          0.0          0.0          0.0          0.0   \n",
       "189541          0.0          0.0          0.0          0.0          0.0   \n",
       "428374          1.0          0.0          0.0          0.0          0.0   \n",
       "234638          0.0          0.0          0.0          0.0          0.0   \n",
       "172207          0.0          0.0          0.0          0.0          0.0   \n",
       "240801          0.0          0.0          0.0          0.0          0.0   \n",
       "435277          0.0          0.0          0.0          0.0          0.0   \n",
       "297100          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type37  Soil_Type38  Soil_Type39  Soil_Type40  \n",
       "325384          0.0          0.0          0.0          0.0  \n",
       "442177          0.0          0.0          0.0          0.0  \n",
       "185316          0.0          0.0          0.0          0.0  \n",
       "189541          0.0          0.0          0.0          0.0  \n",
       "428374          0.0          0.0          0.0          0.0  \n",
       "234638          0.0          0.0          0.0          0.0  \n",
       "172207          0.0          0.0          0.0          0.0  \n",
       "240801          0.0          0.0          1.0          0.0  \n",
       "435277          0.0          0.0          0.0          0.0  \n",
       "297100          0.0          1.0          0.0          0.0  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.concat([pd.DataFrame({'y': train_y}, index=train_X.index),\n",
    "           train_X], axis=1).tail(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Manual Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.decomposition import PCA\n",
    "from xgboost import XGBClassifier as XGBoost\n",
    "lale.wrap_imported_operators()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"48pt\"\n",
       " viewBox=\"0.00 0.00 152.00 47.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 43.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-43.598 148,-43.598 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA(n_components=6)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn\" xlink:title=\"xg_boost = XGBoost(n_estimators=3)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>pca&#45;&gt;xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5577e88710>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "manual_trainable = PCA(n_components=6) >> XGBoost(n_estimators=3)\n",
    "manual_trainable.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.34 s, sys: 1.2 s, total: 3.55 s\n",
      "Wall time: 2.05 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "manual_trained = manual_trainable.fit(train_X, train_y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 64.5%\n"
     ]
    }
   ],
   "source": [
    "import sklearn.metrics\n",
    "manual_y = manual_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, manual_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hyperparameter Tuning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'description': 'Number of trees to fit.',\n",
       " 'type': 'integer',\n",
       " 'default': 100,\n",
       " 'minimumForOptimizer': 10,\n",
       " 'maximumForOptimizer': 1500}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "XGBoost.hyperparam_schema('n_estimators')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\n"
     ]
    }
   ],
   "source": [
    "print(PCA.documentation_url())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lale.lib.lale import Hyperopt\n",
    "import lale.schemas as schemas\n",
    "\n",
    "CustomPCA = PCA.customize_schema(n_components=schemas.Int(min=2, max=54))\n",
    "CustomXGBoost = XGBoost.customize_schema(n_estimators=schemas.Int(min=1, max=10))\n",
    "\n",
    "hpo_planned = CustomPCA >> CustomXGBoost\n",
    "hpo_trainable = Hyperopt(estimator=hpo_planned, max_evals=10, cv=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|███████| 10/10 [01:20<00:00,  6.64s/trial, best loss: -0.7885106540569516]\n",
      "CPU times: user 1min 50s, sys: 22.2 s, total: 2min 12s\n",
      "Wall time: 1min 28s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "hpo_trained = hpo_trainable.fit(train_X, train_y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### --- Excursions: Types as Search Spaces ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1208-loops.png\" style=\"width:700px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 80.1%\n"
     ]
    }
   ],
   "source": [
    "hpo_y = hpo_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, hpo_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspecting Automation Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"48pt\"\n",
       " viewBox=\"0.00 0.00 152.00 47.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 43.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-43.598 148,-43.598 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA(n_components=39, svd_solver=&#39;full&#39;)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn\" xlink:title=\"xg_boost = XGBoost(colsample_bylevel=0.6016063807304212, colsample_bytree=0.7763972782064467, learning_rate=0.16389357351003786, max_depth=10, min_child_weight=5, n_estimators=5, reg_alpha=0.10485915855270356, reg_lambda=0.9268502695024392, subsample=0.4503841871781402)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>pca&#45;&gt;xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5573fb5588>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline().visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "from lale.lib.sklearn import PCA\n",
       "from lale.lib.xgboost.xgb_classifier import XGBoost\n",
       "import lale\n",
       "lale.wrap_imported_operators()\n",
       "\n",
       "pca = PCA(n_components=39, svd_solver='full')\n",
       "xg_boost = XGBoost(colsample_bylevel=0.6016063807304212, colsample_bytree=0.7763972782064467, learning_rate=0.16389357351003786, max_depth=10, min_child_weight=5, n_estimators=5, reg_alpha=0.10485915855270356, reg_lambda=0.9268502695024392, subsample=0.4503841871781402)\n",
       "pipeline = pca >> xg_boost\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline().pretty_print(ipython_display=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tid</th>\n",
       "      <th>loss</th>\n",
       "      <th>time</th>\n",
       "      <th>log_loss</th>\n",
       "      <th>status</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>p0</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.667916</td>\n",
       "      <td>1.532263</td>\n",
       "      <td>1.250336</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p1</th>\n",
       "      <td>1</td>\n",
       "      <td>-0.635559</td>\n",
       "      <td>1.395001</td>\n",
       "      <td>1.120280</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p2</th>\n",
       "      <td>2</td>\n",
       "      <td>-0.670229</td>\n",
       "      <td>2.745617</td>\n",
       "      <td>1.087269</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p3</th>\n",
       "      <td>3</td>\n",
       "      <td>-0.788511</td>\n",
       "      <td>5.876360</td>\n",
       "      <td>1.049096</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p4</th>\n",
       "      <td>4</td>\n",
       "      <td>-0.718938</td>\n",
       "      <td>3.725537</td>\n",
       "      <td>0.661428</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p5</th>\n",
       "      <td>5</td>\n",
       "      <td>-0.482052</td>\n",
       "      <td>1.952195</td>\n",
       "      <td>1.241045</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p6</th>\n",
       "      <td>6</td>\n",
       "      <td>-0.482052</td>\n",
       "      <td>1.209477</td>\n",
       "      <td>1.338511</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p7</th>\n",
       "      <td>7</td>\n",
       "      <td>-0.669484</td>\n",
       "      <td>2.106700</td>\n",
       "      <td>0.844174</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p8</th>\n",
       "      <td>8</td>\n",
       "      <td>-0.632346</td>\n",
       "      <td>1.612136</td>\n",
       "      <td>0.925707</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p9</th>\n",
       "      <td>9</td>\n",
       "      <td>-0.622306</td>\n",
       "      <td>1.474229</td>\n",
       "      <td>1.882534</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      tid      loss      time  log_loss status\n",
       "name                                          \n",
       "p0      0 -0.667916  1.532263  1.250336     ok\n",
       "p1      1 -0.635559  1.395001  1.120280     ok\n",
       "p2      2 -0.670229  2.745617  1.087269     ok\n",
       "p3      3 -0.788511  5.876360  1.049096     ok\n",
       "p4      4 -0.718938  3.725537  0.661428     ok\n",
       "p5      5 -0.482052  1.952195  1.241045     ok\n",
       "p6      6 -0.482052  1.209477  1.338511     ok\n",
       "p7      7 -0.669484  2.106700  0.844174     ok\n",
       "p8      8 -0.632346  1.612136  0.925707     ok\n",
       "p9      9 -0.622306  1.474229  1.882534     ok"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hpo_trained.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p5\n"
     ]
    }
   ],
   "source": [
    "worst_name = hpo_trained.summary().loss.argmax()\n",
    "print(worst_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"48pt\"\n",
       " viewBox=\"0.00 0.00 152.00 47.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 43.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-43.598 148,-43.598 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA(n_components=48, svd_solver=&#39;full&#39;, whiten=True)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn\" xlink:title=\"xg_boost = XGBoost(booster=&#39;gblinear&#39;, colsample_bylevel=0.41777546097517426, colsample_bytree=0.6852556915729863, learning_rate=0.4299362917360751, max_depth=15, min_child_weight=18, n_estimators=7, reg_alpha=0.5266202371276923, reg_lambda=0.494226267796831, subsample=0...)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>pca&#45;&gt;xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5577dabb70>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "pca = PCA(n_components=48, svd_solver='full', whiten=True)\n",
       "xg_boost = XGBoost(booster='gblinear', colsample_bylevel=0.41777546097517426, colsample_bytree=0.6852556915729863, learning_rate=0.4299362917360751, max_depth=15, min_child_weight=18, n_estimators=7, reg_alpha=0.5266202371276923, reg_lambda=0.494226267796831, subsample=0.8015579071911012)\n",
       "pipeline = pca >> xg_boost\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline(worst_name).visualize()\n",
    "hpo_trained.get_pipeline(worst_name).pretty_print(ipython_display=True, show_imports=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combined Algorithm Selection and Hyperparameter Tuning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"184pt\" height=\"185pt\"\n",
       " viewBox=\"0.00 0.00 184.00 185.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 181)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-181 180,-181 180,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\"><title>cluster:choice_0</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice_0 = norm | no_op\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"8,-47 8,-169 78,-169 78,-47 8,-47\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<g id=\"clust2\" class=\"cluster\"><title>cluster:choice_1</title>\n",
       "<g id=\"a_clust2\"><a xlink:title=\"choice_1 = tree | lr | knn\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"98,-8 98,-169 168,-169 168,-8 98,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm -->\n",
       "<g id=\"node1\" class=\"node\"><title>norm</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html\" xlink:title=\"norm = Norm\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"43\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">Norm</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- tree -->\n",
       "<g id=\"node3\" class=\"node\"><title>tree</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html\" xlink:title=\"tree = Tree\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">Tree</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm&#45;&gt;tree -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>norm&#45;&gt;tree</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M77.7296,-120C83.6523,-120 89.838,-120 95.8241,-120\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"88.0002,-123.5 98,-120 87.9998,-116.5 88.0002,-123.5\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node2\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"43\" cy=\"-75\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-66.2\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- lr -->\n",
       "<g id=\"node4\" class=\"node\"><title>lr</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html\" xlink:title=\"lr = LR(dual=True)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"133\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-74.2\" font-family=\"Times,serif\" font-size=\"11.00\">LR</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node5\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html\" xlink:title=\"knn = KNN\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-34\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-31.2\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5577dabb70>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.preprocessing import Normalizer as Norm\n",
    "from sklearn.linear_model import LogisticRegression as LR\n",
    "from sklearn.tree import DecisionTreeClassifier as Tree\n",
    "from sklearn.neighbors import KNeighborsClassifier as KNN\n",
    "from lale.lib.lale import NoOp\n",
    "lale.wrap_imported_operators()\n",
    "\n",
    "KNN = KNN.customize_schema(n_neighbors=schemas.Int(min=1, max=10))\n",
    "transp_planned = (Norm | NoOp) >> (Tree | LR(dual=True) | KNN)\n",
    "transp_planned.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|█████████| 3/3 [01:48<00:00, 32.59s/trial, best loss: -0.8376392446578157]\n",
      "CPU times: user 1min 50s, sys: 1.12 s, total: 1min 51s\n",
      "Wall time: 1min 49s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "transp_trained = transp_planned.auto_configure(\n",
    "    train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ---  Excursion: Bindings as Lifecycle ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-bindings.png\" style=\"width:450px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "knn = KNN(algorithm='ball_tree', metric='manhattan', n_neighbors=9)\n",
       "pipeline = NoOp() >> knn\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"48pt\"\n",
       " viewBox=\"0.00 0.00 152.00 47.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 43.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-43.598 148,-43.598 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node1\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node2\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html\" xlink:title=\"knn = KNN(algorithm=&#39;ball_tree&#39;, metric=&#39;manhattan&#39;, n_neighbors=9)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;knn -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>no_op&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f55733a56a0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "transp_trained.pretty_print(ipython_display=True, show_imports=False)\n",
    "transp_trained.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 86.6%\n",
      "CPU times: user 50.6 s, sys: 15.6 ms, total: 50.6 s\n",
      "Wall time: 50.7 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "transp_y = transp_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, transp_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Non-Linear Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'description': 'Features of forest covertypes dataset (classification).',\n",
       " 'documentation_url': 'https://scikit-learn.org/0.20/datasets/index.html#forest-covertypes',\n",
       " 'type': 'array',\n",
       " 'items': {'type': 'array',\n",
       "  'minItems': 54,\n",
       "  'maxItems': 54,\n",
       "  'items': [{'description': 'Elevation', 'type': 'integer'},\n",
       "   {'description': 'Aspect', 'type': 'integer'},\n",
       "   {'description': 'Slope', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Hydrology', 'type': 'integer'},\n",
       "   {'description': 'Vertical_Distance_To_Hydrology', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Roadways', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_9am', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_Noon', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_3pm', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Fire_Points', 'type': 'integer'},\n",
       "   {'description': 'Wilderness_Area1', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area2', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area3', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area4', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type1', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type2', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type3', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type4', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type5', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type6', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type7', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type8', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type9', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type10', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type11', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type12', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type13', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type14', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type15', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type16', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type17', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type18', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type19', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type20', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type21', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type22', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type23', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type24', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type25', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type26', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type27', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type28', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type29', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type30', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type31', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type32', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type33', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type34', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type35', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type36', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type37', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type38', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type39', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type40', 'enum': [0, 1]}]},\n",
       " 'minItems': 58102,\n",
       " 'maxItems': 58102}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_X.json_schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "other columns: Elevation, Aspect, Slope, Horizontal_Distance_To_Hydrology, Vertical_Distance_To_Hydrology, Horizontal_Distance_To_Roadways, Hillshade_9am, Hillshade_Noon, Hillshade_3pm, Horizontal_Distance_To_Fire_Points\n"
     ]
    }
   ],
   "source": [
    "area_columns = [f'Wilderness_Area{i}' for i in range(1, 5)]\n",
    "soil_columns = [f'Soil_Type{i}' for i in range(1, 41)]\n",
    "binary_columns = area_columns + soil_columns\n",
    "other_columns = [c for c in train_X.columns if c not in binary_columns]\n",
    "print(f'other columns: {\", \".join(other_columns)}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"332pt\" height=\"186pt\"\n",
       " viewBox=\"0.00 0.00 332.00 185.80\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 181.799)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-181.799 328,-181.799 328,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\"><title>cluster:choice</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice = norm | no_op\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"82,-8 82,-130 152,-130 152,-8 82,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-114.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0 -->\n",
       "<g id=\"node1\" class=\"node\"><title>project_0</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_0 = Project(columns=[&#39;Wilderness_Area1&#39;, &#39;Wilderness_Area2&#39;, &#39;Wilderness_Area3&#39;, &#39;Wilderness_Area4&#39;, &#39;Soil_Type1&#39;, &#39;Soil_Type2&#39;, &#39;Soil_Type3&#39;, &#39;Soil_Type4&#39;, &#39;Soil_Type5&#39;, &#39;Soil_Type6&#39;, &#39;Soil_Type7&#39;, &#39;Soil_Type8&#39;, &#39;Soil_Type9&#39;, &#39;Soil_Type10&#39;, &#39;Soil_Type11&#39;, &#39;Soil_T...)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-158\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-155.2\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel -->\n",
       "<g id=\"node2\" class=\"node\"><title>feat_sel</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html\" xlink:title=\"feat_sel = FeatSel\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"117\" cy=\"-158\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-161.2\" font-family=\"Times,serif\" font-size=\"11.00\">Feat&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-149.2\" font-family=\"Times,serif\" font-size=\"11.00\">Sel</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0&#45;&gt;feat_sel -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>project_0&#45;&gt;feat_sel</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-158C62.3932,-158 71.3106,-158 79.8241,-158\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-161.5 89.919,-158 79.919,-154.5 79.919,-161.5\"/>\n",
       "</g>\n",
       "<!-- concat -->\n",
       "<g id=\"node6\" class=\"node\"><title>concat</title>\n",
       "<g id=\"a_node6\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.concat_features.html\" xlink:title=\"concat = Concat\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"207\" cy=\"-119\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-116.2\" font-family=\"Times,serif\" font-size=\"11.00\">Concat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel&#45;&gt;concat -->\n",
       "<g id=\"edge3\" class=\"edge\"><title>feat_sel&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M140.662,-147.957C150.982,-143.383 163.367,-137.894 174.567,-132.93\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"176.298,-135.992 184.023,-128.74 173.462,-129.592 176.298,-135.992\"/>\n",
       "</g>\n",
       "<!-- project_1 -->\n",
       "<g id=\"node3\" class=\"node\"><title>project_1</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_1 = Project(columns=[&#39;Elevation&#39;, &#39;Aspect&#39;, &#39;Slope&#39;, &#39;Horizontal_Distance_To_Hydrology&#39;, &#39;Vertical_Distance_To_Hydrology&#39;, &#39;Horizontal_Distance_To_Roadways&#39;, &#39;Hillshade_9am&#39;, &#39;Hillshade_Noon&#39;, &#39;Hillshade_3pm&#39;, &#39;Horizontal_Distance_To_Fire_Points&#39;])\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-81\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm -->\n",
       "<g id=\"node4\" class=\"node\"><title>norm</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html\" xlink:title=\"norm = Norm\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"117\" cy=\"-81\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">Norm</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_1&#45;&gt;norm -->\n",
       "<g id=\"edge2\" class=\"edge\"><title>project_1&#45;&gt;norm</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-81C62.3932,-81 71.3106,-81 79.8241,-81\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"72.0002,-84.5005 82,-81 71.9998,-77.5005 72.0002,-84.5005\"/>\n",
       "</g>\n",
       "<!-- norm&#45;&gt;concat -->\n",
       "<g id=\"edge4\" class=\"edge\"><title>norm&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M151.589,-95.5043C159.149,-98.7687 167.143,-102.221 174.609,-105.445\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"173.233,-108.663 183.801,-109.414 176.008,-102.236 173.233,-108.663\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node5\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-36\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-39.2\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-27.2\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node7\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node7\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html\" xlink:title=\"knn = KNN\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"297\" cy=\"-119\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"297\" y=\"-116.2\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- concat&#45;&gt;knn -->\n",
       "<g id=\"edge5\" class=\"edge\"><title>concat&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M234.403,-119C242.393,-119 251.311,-119 259.824,-119\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"259.919,-122.5 269.919,-119 259.919,-115.5 259.919,-122.5\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5573ce4400>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from lale.lib.lale import Project\n",
    "from lale.lib.lale import ConcatFeatures as Concat\n",
    "from sklearn.feature_selection import SelectKBest as FeatSel\n",
    "lale.wrap_imported_operators()\n",
    "\n",
    "binary_prep = Project(columns=binary_columns) >> FeatSel\n",
    "other_prep = Project(columns=other_columns) >> (Norm | NoOp)\n",
    "nonlin_planned = (binary_prep & other_prep) >> Concat >> KNN\n",
    "nonlin_planned.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|█████████| 3/3 [02:08<00:00, 34.88s/trial, best loss: -0.8584651324517477]\n",
      "CPU times: user 2min 9s, sys: 62.5 ms, total: 2min 9s\n",
      "Wall time: 2min 10s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "nonlin_trained = nonlin_planned.auto_configure(\n",
    "    train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### --- Excursion: Combinators ---\n",
    "\n",
    "| Lale feature            | Name | Description  | Scikit-learn feature                |\n",
    "| ----------------------- | ---- | ------------ | ----------------------------------- |\n",
    "| >> or `make_pipeline`   | pipe | feed to next | `make_pipeline`                     |\n",
    "| & or `make_union`       | and  | run both     | `make_union` or `ColumnTransformer` |\n",
    "| &#x7c; or `make_choice` | or   | choose one   | N/A (specific to given AutoML tool) |\n",
    "\n",
    "### --- Excursion: Interoperability ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-interop.png\" style=\"width:550px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"332pt\" height=\"95pt\"\n",
       " viewBox=\"0.00 0.00 332.00 94.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 90.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-90.598 328,-90.598 328,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- project_0 -->\n",
       "<g id=\"node1\" class=\"node\"><title>project_0</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_0 = Project(columns=[&#39;Wilderness_Area1&#39;, &#39;Wilderness_Area2&#39;, &#39;Wilderness_Area3&#39;, &#39;Wilderness_Area4&#39;, &#39;Soil_Type1&#39;, &#39;Soil_Type2&#39;, &#39;Soil_Type3&#39;, &#39;Soil_Type4&#39;, &#39;Soil_Type5&#39;, &#39;Soil_Type6&#39;, &#39;Soil_Type7&#39;, &#39;Soil_Type8&#39;, &#39;Soil_Type9&#39;, &#39;Soil_Type10&#39;, &#39;Soil_Type11&#39;, &#39;Soil_T...)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-66.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-63.999\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel -->\n",
       "<g id=\"node2\" class=\"node\"><title>feat_sel</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html\" xlink:title=\"feat_sel = FeatSel(k=8)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-66.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-69.999\" font-family=\"Times,serif\" font-size=\"11.00\">Feat&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-57.999\" font-family=\"Times,serif\" font-size=\"11.00\">Sel</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0&#45;&gt;feat_sel -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>project_0&#45;&gt;feat_sel</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-66.799C62.3932,-66.799 71.3106,-66.799 79.8241,-66.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-70.2991 89.919,-66.799 79.919,-63.2991 79.919,-70.2991\"/>\n",
       "</g>\n",
       "<!-- concat -->\n",
       "<g id=\"node5\" class=\"node\"><title>concat</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.concat_features.html\" xlink:title=\"concat = Concat()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"207\" cy=\"-42.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-39.999\" font-family=\"Times,serif\" font-size=\"11.00\">Concat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel&#45;&gt;concat -->\n",
       "<g id=\"edge3\" class=\"edge\"><title>feat_sel&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M142.513,-60.1137C151.572,-57.6431 162.012,-54.7958 171.775,-52.1331\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"172.892,-55.4564 181.619,-49.4485 171.05,-48.703 172.892,-55.4564\"/>\n",
       "</g>\n",
       "<!-- project_1 -->\n",
       "<g id=\"node3\" class=\"node\"><title>project_1</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_1 = Project(columns=[&#39;Elevation&#39;, &#39;Aspect&#39;, &#39;Slope&#39;, &#39;Horizontal_Distance_To_Hydrology&#39;, &#39;Vertical_Distance_To_Hydrology&#39;, &#39;Horizontal_Distance_To_Roadways&#39;, &#39;Hillshade_9am&#39;, &#39;Hillshade_Noon&#39;, &#39;Hillshade_3pm&#39;, &#39;Horizontal_Distance_To_Fire_Points&#39;])\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node4\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_1&#45;&gt;no_op -->\n",
       "<g id=\"edge2\" class=\"edge\"><title>project_1&#45;&gt;no_op</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;concat -->\n",
       "<g id=\"edge4\" class=\"edge\"><title>no_op&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M142.982,-26.3283C151.958,-28.6743 162.241,-31.362 171.861,-33.8763\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"171.004,-37.2697 181.564,-36.4123 172.774,-30.4972 171.004,-37.2697\"/>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node6\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node6\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html\" xlink:title=\"knn = KNN(algorithm=&#39;kd_tree&#39;, n_neighbors=7, weights=&#39;distance&#39;)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"297\" cy=\"-42.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"297\" y=\"-39.999\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- concat&#45;&gt;knn -->\n",
       "<g id=\"edge5\" class=\"edge\"><title>concat&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M234.403,-42.799C242.393,-42.799 251.311,-42.799 259.824,-42.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"259.919,-46.2991 269.919,-42.799 259.919,-39.2991 259.919,-46.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f5577e52e10>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "project_0 = Project(columns=['Wilderness_Area1', 'Wilderness_Area2', 'Wilderness_Area3', 'Wilderness_Area4', 'Soil_Type1', 'Soil_Type2', 'Soil_Type3', 'Soil_Type4', 'Soil_Type5', 'Soil_Type6', 'Soil_Type7', 'Soil_Type8', 'Soil_Type9', 'Soil_Type10', 'Soil_Type11', 'Soil_Type12', 'Soil_Type13', 'Soil_Type14', 'Soil_Type15', 'Soil_Type16', 'Soil_Type17', 'Soil_Type18', 'Soil_Type19', 'Soil_Type20', 'Soil_Type21', 'Soil_Type22', 'Soil_Type23', 'Soil_Type24', 'Soil_Type25', 'Soil_Type26', 'Soil_Type27', 'Soil_Type28', 'Soil_Type29', 'Soil_Type30', 'Soil_Type31', 'Soil_Type32', 'Soil_Type33', 'Soil_Type34', 'Soil_Type35', 'Soil_Type36', 'Soil_Type37', 'Soil_Type38', 'Soil_Type39', 'Soil_Type40'])\n",
       "feat_sel = FeatSel(k=8)\n",
       "pipeline_0 = make_pipeline(project_0, feat_sel)\n",
       "project_1 = Project(columns=['Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways', 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm', 'Horizontal_Distance_To_Fire_Points'])\n",
       "pipeline_1 = make_pipeline(project_1, NoOp())\n",
       "union = make_union(pipeline_0, pipeline_1)\n",
       "knn = KNN(algorithm='kd_tree', n_neighbors=7, weights='distance')\n",
       "pipeline = make_pipeline(union, knn)\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "nonlin_trained.visualize()\n",
    "nonlin_trained.pretty_print(ipython_display=True, show_imports=False, combinators=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 88.6%\n",
      "CPU times: user 4.12 s, sys: 93.8 ms, total: 4.22 s\n",
      "Wall time: 4.19 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "nonlin_y = nonlin_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, nonlin_y):.1%}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Wilderness_Area1</th>\n",
       "      <th>Wilderness_Area4</th>\n",
       "      <th>Soil_Type2</th>\n",
       "      <th>Soil_Type3</th>\n",
       "      <th>Soil_Type4</th>\n",
       "      <th>Soil_Type10</th>\n",
       "      <th>Soil_Type38</th>\n",
       "      <th>Soil_Type39</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Wilderness_Area1  Wilderness_Area4  Soil_Type2  Soil_Type3  Soil_Type4  \\\n",
       "0               1.0               0.0         0.0         0.0         0.0   \n",
       "1               0.0               0.0         0.0         0.0         0.0   \n",
       "2               0.0               0.0         0.0         0.0         0.0   \n",
       "3               1.0               0.0         0.0         0.0         0.0   \n",
       "4               1.0               0.0         0.0         0.0         0.0   \n",
       "5               0.0               1.0         0.0         0.0         0.0   \n",
       "6               1.0               0.0         0.0         0.0         0.0   \n",
       "7               1.0               0.0         0.0         0.0         0.0   \n",
       "8               1.0               0.0         0.0         0.0         0.0   \n",
       "9               1.0               0.0         0.0         0.0         0.0   \n",
       "\n",
       "   Soil_Type10  Soil_Type38  Soil_Type39  \n",
       "0          0.0          1.0          0.0  \n",
       "1          0.0          0.0          0.0  \n",
       "2          0.0          0.0          0.0  \n",
       "3          0.0          0.0          0.0  \n",
       "4          0.0          0.0          0.0  \n",
       "5          1.0          0.0          0.0  \n",
       "6          0.0          0.0          0.0  \n",
       "7          0.0          1.0          0.0  \n",
       "8          0.0          0.0          0.0  \n",
       "9          0.0          0.0          0.0  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "binary_prep_trainable = Project(columns=binary_columns) >> FeatSel(k=8)\n",
    "binary_prep_trained = binary_prep_trainable.fit(train_X, train_y)\n",
    "binary_prep_trained.transform(test_X.head(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "- code and documentation: https://github.com/ibm/lale\n",
    "- more examples: https://nbviewer.jupyter.org/github/IBM/lale/tree/master/examples/\n",
    "- frequently asked questions: https://github.com/IBM/lale/blob/master/docs/faq.rst\n",
    "- arXiv paper: https://arxiv.org/pdf/1906.03957.pdf\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-summary.png\" style=\"width:350px\" align=\"left\">"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}