{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lale and its Impact on the Data Science Workflow\n",
    "\n",
    "Guillaume Baudart, Martin Hirzel, Kiran Kate, Pari Ram, and Avi Shinnar\n",
    "\n",
    "27 March 2020\n",
    "\n",
    "Examples, documentation, code: https://github.com/ibm/lale\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/docs/img/lale_logo.jpg\" alt=\"logo\" width=\"140px\" align=\"left\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Value Proposition\n",
    "\n",
    "- **target user**: data scientist familiar with Python and scikit-learn\n",
    "- **scope**: data preparation and machine learning (including some DL)\n",
    "- **value**: consistent API for both manual machine learning and auto-ML\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-three-values.png\" style=\"width:350px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install --quiet lale"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape train_X_all (522910, 54), test_X (58102, 54)\n"
     ]
    }
   ],
   "source": [
    "import lale.datasets\n",
    "(train_X_all, train_y_all), (test_X, test_y) = lale.datasets.covtype_df(test_size=0.1)\n",
    "print(f'shape train_X_all {train_X_all.shape}, test_X {test_X.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape train_X (52291, 54), other_X (470619, 54)\n"
     ]
    }
   ],
   "source": [
    "import sklearn.model_selection\n",
    "train_X, other_X, train_y, other_y = sklearn.model_selection.train_test_split(\n",
    "    train_X_all, train_y_all, test_size=0.9)\n",
    "print(f'shape train_X {train_X.shape}, other_X {other_X.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>y</th>\n",
       "      <th>Elevation</th>\n",
       "      <th>Aspect</th>\n",
       "      <th>Slope</th>\n",
       "      <th>Horizontal_Distance_To_Hydrology</th>\n",
       "      <th>Vertical_Distance_To_Hydrology</th>\n",
       "      <th>Horizontal_Distance_To_Roadways</th>\n",
       "      <th>Hillshade_9am</th>\n",
       "      <th>Hillshade_Noon</th>\n",
       "      <th>Hillshade_3pm</th>\n",
       "      <th>Horizontal_Distance_To_Fire_Points</th>\n",
       "      <th>Wilderness_Area1</th>\n",
       "      <th>Wilderness_Area2</th>\n",
       "      <th>Wilderness_Area3</th>\n",
       "      <th>Wilderness_Area4</th>\n",
       "      <th>Soil_Type1</th>\n",
       "      <th>Soil_Type2</th>\n",
       "      <th>Soil_Type3</th>\n",
       "      <th>Soil_Type4</th>\n",
       "      <th>Soil_Type5</th>\n",
       "      <th>Soil_Type6</th>\n",
       "      <th>Soil_Type7</th>\n",
       "      <th>Soil_Type8</th>\n",
       "      <th>Soil_Type9</th>\n",
       "      <th>Soil_Type10</th>\n",
       "      <th>Soil_Type11</th>\n",
       "      <th>Soil_Type12</th>\n",
       "      <th>Soil_Type13</th>\n",
       "      <th>Soil_Type14</th>\n",
       "      <th>Soil_Type15</th>\n",
       "      <th>Soil_Type16</th>\n",
       "      <th>Soil_Type17</th>\n",
       "      <th>Soil_Type18</th>\n",
       "      <th>Soil_Type19</th>\n",
       "      <th>Soil_Type20</th>\n",
       "      <th>Soil_Type21</th>\n",
       "      <th>Soil_Type22</th>\n",
       "      <th>Soil_Type23</th>\n",
       "      <th>Soil_Type24</th>\n",
       "      <th>Soil_Type25</th>\n",
       "      <th>Soil_Type26</th>\n",
       "      <th>Soil_Type27</th>\n",
       "      <th>Soil_Type28</th>\n",
       "      <th>Soil_Type29</th>\n",
       "      <th>Soil_Type30</th>\n",
       "      <th>Soil_Type31</th>\n",
       "      <th>Soil_Type32</th>\n",
       "      <th>Soil_Type33</th>\n",
       "      <th>Soil_Type34</th>\n",
       "      <th>Soil_Type35</th>\n",
       "      <th>Soil_Type36</th>\n",
       "      <th>Soil_Type37</th>\n",
       "      <th>Soil_Type38</th>\n",
       "      <th>Soil_Type39</th>\n",
       "      <th>Soil_Type40</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>274665</th>\n",
       "      <td>3</td>\n",
       "      <td>2354.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>285.0</td>\n",
       "      <td>80.0</td>\n",
       "      <td>277.0</td>\n",
       "      <td>250.0</td>\n",
       "      <td>220.0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>874.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120210</th>\n",
       "      <td>2</td>\n",
       "      <td>2985.0</td>\n",
       "      <td>91.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>886.0</td>\n",
       "      <td>187.0</td>\n",
       "      <td>3180.0</td>\n",
       "      <td>244.0</td>\n",
       "      <td>209.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>828.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>111775</th>\n",
       "      <td>2</td>\n",
       "      <td>3142.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>684.0</td>\n",
       "      <td>-52.0</td>\n",
       "      <td>551.0</td>\n",
       "      <td>245.0</td>\n",
       "      <td>204.0</td>\n",
       "      <td>80.0</td>\n",
       "      <td>1082.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>400567</th>\n",
       "      <td>3</td>\n",
       "      <td>2493.0</td>\n",
       "      <td>108.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>182.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>666.0</td>\n",
       "      <td>243.0</td>\n",
       "      <td>223.0</td>\n",
       "      <td>107.0</td>\n",
       "      <td>1294.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>224682</th>\n",
       "      <td>2</td>\n",
       "      <td>2796.0</td>\n",
       "      <td>352.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>594.0</td>\n",
       "      <td>84.0</td>\n",
       "      <td>2955.0</td>\n",
       "      <td>205.0</td>\n",
       "      <td>225.0</td>\n",
       "      <td>158.0</td>\n",
       "      <td>1471.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>424723</th>\n",
       "      <td>1</td>\n",
       "      <td>3126.0</td>\n",
       "      <td>197.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>5344.0</td>\n",
       "      <td>216.0</td>\n",
       "      <td>251.0</td>\n",
       "      <td>166.0</td>\n",
       "      <td>1148.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>445777</th>\n",
       "      <td>1</td>\n",
       "      <td>2981.0</td>\n",
       "      <td>333.0</td>\n",
       "      <td>16.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>2704.0</td>\n",
       "      <td>182.0</td>\n",
       "      <td>218.0</td>\n",
       "      <td>175.0</td>\n",
       "      <td>655.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>388163</th>\n",
       "      <td>1</td>\n",
       "      <td>3380.0</td>\n",
       "      <td>219.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>395.0</td>\n",
       "      <td>88.0</td>\n",
       "      <td>2895.0</td>\n",
       "      <td>213.0</td>\n",
       "      <td>246.0</td>\n",
       "      <td>169.0</td>\n",
       "      <td>1224.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>522588</th>\n",
       "      <td>7</td>\n",
       "      <td>3397.0</td>\n",
       "      <td>113.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>706.0</td>\n",
       "      <td>240.0</td>\n",
       "      <td>1507.0</td>\n",
       "      <td>245.0</td>\n",
       "      <td>223.0</td>\n",
       "      <td>103.0</td>\n",
       "      <td>1040.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128441</th>\n",
       "      <td>2</td>\n",
       "      <td>2831.0</td>\n",
       "      <td>155.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>27.0</td>\n",
       "      <td>4235.0</td>\n",
       "      <td>239.0</td>\n",
       "      <td>236.0</td>\n",
       "      <td>116.0</td>\n",
       "      <td>5071.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        y  Elevation  Aspect  Slope  Horizontal_Distance_To_Hydrology  \\\n",
       "274665  3     2354.0   130.0   23.0                             285.0   \n",
       "120210  2     2985.0    91.0   18.0                             886.0   \n",
       "111775  2     3142.0    88.0   20.0                             684.0   \n",
       "400567  3     2493.0   108.0   14.0                             182.0   \n",
       "224682  2     2796.0   352.0    9.0                             594.0   \n",
       "424723  1     3126.0   197.0   13.0                              85.0   \n",
       "445777  1     2981.0   333.0   16.0                             150.0   \n",
       "388163  1     3380.0   219.0    6.0                             395.0   \n",
       "522588  7     3397.0   113.0   15.0                             706.0   \n",
       "128441  2     2831.0   155.0   21.0                              85.0   \n",
       "\n",
       "        Vertical_Distance_To_Hydrology  Horizontal_Distance_To_Roadways  \\\n",
       "274665                            80.0                            277.0   \n",
       "120210                           187.0                           3180.0   \n",
       "111775                           -52.0                            551.0   \n",
       "400567                            34.0                            666.0   \n",
       "224682                            84.0                           2955.0   \n",
       "424723                            10.0                           5344.0   \n",
       "445777                            14.0                           2704.0   \n",
       "388163                            88.0                           2895.0   \n",
       "522588                           240.0                           1507.0   \n",
       "128441                            27.0                           4235.0   \n",
       "\n",
       "        Hillshade_9am  Hillshade_Noon  Hillshade_3pm  \\\n",
       "274665          250.0           220.0           86.0   \n",
       "120210          244.0           209.0           88.0   \n",
       "111775          245.0           204.0           80.0   \n",
       "400567          243.0           223.0          107.0   \n",
       "224682          205.0           225.0          158.0   \n",
       "424723          216.0           251.0          166.0   \n",
       "445777          182.0           218.0          175.0   \n",
       "388163          213.0           246.0          169.0   \n",
       "522588          245.0           223.0          103.0   \n",
       "128441          239.0           236.0          116.0   \n",
       "\n",
       "        Horizontal_Distance_To_Fire_Points  Wilderness_Area1  \\\n",
       "274665                               874.0               0.0   \n",
       "120210                               828.0               0.0   \n",
       "111775                              1082.0               0.0   \n",
       "400567                              1294.0               0.0   \n",
       "224682                              1471.0               0.0   \n",
       "424723                              1148.0               1.0   \n",
       "445777                               655.0               0.0   \n",
       "388163                              1224.0               0.0   \n",
       "522588                              1040.0               0.0   \n",
       "128441                              5071.0               1.0   \n",
       "\n",
       "        Wilderness_Area2  Wilderness_Area3  Wilderness_Area4  Soil_Type1  \\\n",
       "274665               0.0               1.0               0.0         0.0   \n",
       "120210               0.0               1.0               0.0         0.0   \n",
       "111775               1.0               0.0               0.0         0.0   \n",
       "400567               0.0               0.0               1.0         0.0   \n",
       "224682               0.0               1.0               0.0         0.0   \n",
       "424723               0.0               0.0               0.0         0.0   \n",
       "445777               0.0               1.0               0.0         0.0   \n",
       "388163               0.0               1.0               0.0         0.0   \n",
       "522588               0.0               1.0               0.0         0.0   \n",
       "128441               0.0               0.0               0.0         0.0   \n",
       "\n",
       "        Soil_Type2  Soil_Type3  Soil_Type4  Soil_Type5  Soil_Type6  \\\n",
       "274665         0.0         1.0         0.0         0.0         0.0   \n",
       "120210         0.0         0.0         0.0         0.0         0.0   \n",
       "111775         0.0         0.0         0.0         0.0         0.0   \n",
       "400567         0.0         0.0         0.0         0.0         1.0   \n",
       "224682         0.0         0.0         0.0         0.0         0.0   \n",
       "424723         0.0         0.0         0.0         0.0         0.0   \n",
       "445777         0.0         0.0         0.0         0.0         0.0   \n",
       "388163         0.0         0.0         0.0         0.0         0.0   \n",
       "522588         0.0         0.0         0.0         0.0         0.0   \n",
       "128441         0.0         0.0         0.0         0.0         0.0   \n",
       "\n",
       "        Soil_Type7  Soil_Type8  Soil_Type9  Soil_Type10  Soil_Type11  \\\n",
       "274665         0.0         0.0         0.0          0.0          0.0   \n",
       "120210         0.0         0.0         0.0          1.0          0.0   \n",
       "111775         0.0         0.0         0.0          0.0          0.0   \n",
       "400567         0.0         0.0         0.0          0.0          0.0   \n",
       "224682         0.0         0.0         0.0          0.0          0.0   \n",
       "424723         0.0         0.0         0.0          0.0          0.0   \n",
       "445777         0.0         0.0         0.0          0.0          0.0   \n",
       "388163         0.0         0.0         0.0          0.0          0.0   \n",
       "522588         0.0         0.0         0.0          0.0          0.0   \n",
       "128441         0.0         0.0         0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type12  Soil_Type13  Soil_Type14  Soil_Type15  Soil_Type16  \\\n",
       "274665          0.0          0.0          0.0          0.0          0.0   \n",
       "120210          0.0          0.0          0.0          0.0          0.0   \n",
       "111775          0.0          0.0          0.0          0.0          0.0   \n",
       "400567          0.0          0.0          0.0          0.0          0.0   \n",
       "224682          0.0          0.0          0.0          0.0          0.0   \n",
       "424723          0.0          0.0          0.0          0.0          0.0   \n",
       "445777          0.0          0.0          0.0          0.0          0.0   \n",
       "388163          0.0          0.0          0.0          0.0          0.0   \n",
       "522588          0.0          0.0          0.0          0.0          0.0   \n",
       "128441          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type17  Soil_Type18  Soil_Type19  Soil_Type20  Soil_Type21  \\\n",
       "274665          0.0          0.0          0.0          0.0          0.0   \n",
       "120210          0.0          0.0          0.0          0.0          0.0   \n",
       "111775          0.0          0.0          1.0          0.0          0.0   \n",
       "400567          0.0          0.0          0.0          0.0          0.0   \n",
       "224682          0.0          0.0          0.0          0.0          0.0   \n",
       "424723          0.0          0.0          0.0          0.0          0.0   \n",
       "445777          0.0          0.0          0.0          0.0          0.0   \n",
       "388163          0.0          0.0          0.0          0.0          0.0   \n",
       "522588          0.0          0.0          0.0          0.0          0.0   \n",
       "128441          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type22  Soil_Type23  Soil_Type24  Soil_Type25  Soil_Type26  \\\n",
       "274665          0.0          0.0          0.0          0.0          0.0   \n",
       "120210          0.0          0.0          0.0          0.0          0.0   \n",
       "111775          0.0          0.0          0.0          0.0          0.0   \n",
       "400567          0.0          0.0          0.0          0.0          0.0   \n",
       "224682          0.0          0.0          0.0          0.0          0.0   \n",
       "424723          0.0          0.0          0.0          0.0          0.0   \n",
       "445777          0.0          0.0          0.0          0.0          0.0   \n",
       "388163          0.0          0.0          0.0          0.0          0.0   \n",
       "522588          0.0          0.0          0.0          0.0          0.0   \n",
       "128441          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type27  Soil_Type28  Soil_Type29  Soil_Type30  Soil_Type31  \\\n",
       "274665          0.0          0.0          0.0          0.0          0.0   \n",
       "120210          0.0          0.0          0.0          0.0          0.0   \n",
       "111775          0.0          0.0          0.0          0.0          0.0   \n",
       "400567          0.0          0.0          0.0          0.0          0.0   \n",
       "224682          0.0          0.0          0.0          0.0          0.0   \n",
       "424723          0.0          0.0          1.0          0.0          0.0   \n",
       "445777          0.0          0.0          0.0          0.0          0.0   \n",
       "388163          0.0          0.0          0.0          0.0          0.0   \n",
       "522588          0.0          0.0          0.0          0.0          0.0   \n",
       "128441          0.0          0.0          0.0          1.0          0.0   \n",
       "\n",
       "        Soil_Type32  Soil_Type33  Soil_Type34  Soil_Type35  Soil_Type36  \\\n",
       "274665          0.0          0.0          0.0          0.0          0.0   \n",
       "120210          0.0          0.0          0.0          0.0          0.0   \n",
       "111775          0.0          0.0          0.0          0.0          0.0   \n",
       "400567          0.0          0.0          0.0          0.0          0.0   \n",
       "224682          1.0          0.0          0.0          0.0          0.0   \n",
       "424723          0.0          0.0          0.0          0.0          0.0   \n",
       "445777          1.0          0.0          0.0          0.0          0.0   \n",
       "388163          1.0          0.0          0.0          0.0          0.0   \n",
       "522588          0.0          0.0          0.0          0.0          0.0   \n",
       "128441          0.0          0.0          0.0          0.0          0.0   \n",
       "\n",
       "        Soil_Type37  Soil_Type38  Soil_Type39  Soil_Type40  \n",
       "274665          0.0          0.0          0.0          0.0  \n",
       "120210          0.0          0.0          0.0          0.0  \n",
       "111775          0.0          0.0          0.0          0.0  \n",
       "400567          0.0          0.0          0.0          0.0  \n",
       "224682          0.0          0.0          0.0          0.0  \n",
       "424723          0.0          0.0          0.0          0.0  \n",
       "445777          0.0          0.0          0.0          0.0  \n",
       "388163          0.0          0.0          0.0          0.0  \n",
       "522588          0.0          0.0          0.0          1.0  \n",
       "128441          0.0          0.0          0.0          0.0  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.concat([pd.DataFrame({'y': train_y}, index=train_X.index),\n",
    "           train_X], axis=1).tail(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Manual Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.decomposition import PCA\n",
    "from xgboost import XGBClassifier as XGBoost\n",
    "lale.wrap_imported_operators()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"48pt\"\n",
       " viewBox=\"0.00 0.00 152.00 47.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 43.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-43.598 148,-43.598 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA(n_components=6)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.xgboost.xgb_classifier.html\" xlink:title=\"xg_boost = XGBoost(n_estimators=3)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>pca&#45;&gt;xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f4771a31668>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "manual_trainable = PCA(n_components=6) >> XGBoost(n_estimators=3)\n",
    "manual_trainable.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 3.56 s, sys: 672 ms, total: 4.23 s\n",
      "Wall time: 3.88 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "manual_trained = manual_trainable.fit(train_X, train_y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 75.5%\n"
     ]
    }
   ],
   "source": [
    "import sklearn.metrics\n",
    "manual_y = manual_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, manual_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hyperparameter Tuning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'description': 'Number of trees to fit.',\n",
       " 'type': 'integer',\n",
       " 'default': 1000,\n",
       " 'minimumForOptimizer': 500,\n",
       " 'maximumForOptimizer': 1500}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "XGBoost.hyperparam_schema('n_estimators')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\n"
     ]
    }
   ],
   "source": [
    "print(PCA.documentation_url())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lale.lib.lale import Hyperopt\n",
    "import lale.schemas as schemas\n",
    "\n",
    "CustomPCA = PCA.customize_schema(n_components=schemas.Int(min=2, max=54))\n",
    "CustomXGBoost = XGBoost.customize_schema(n_estimators=schemas.Int(min=1, max=10))\n",
    "\n",
    "hpo_planned = CustomPCA >> CustomXGBoost\n",
    "hpo_trainable = Hyperopt(estimator=hpo_planned, max_evals=10, cv=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|███████| 10/10 [04:22<00:00, 26.22s/trial, best loss: -0.8287659271127307]\n",
      "CPU times: user 4min 57s, sys: 20 s, total: 5min 17s\n",
      "Wall time: 4min 53s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "hpo_trained = hpo_trainable.fit(train_X, train_y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### --- Excursions: Types as Search Spaces ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1208-loops.png\" style=\"width:700px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 84.2%\n"
     ]
    }
   ],
   "source": [
    "hpo_y = hpo_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, hpo_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspecting Automation Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"174pt\" height=\"65pt\"\n",
       " viewBox=\"0.00 0.00 174.11 64.57\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 60.5685)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-60.5685 170.108,-60.5685 170.108,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- custom_pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>custom_pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"custom_pca = CustomPCA(n_components=43, svd_solver=&#39;full&#39;, whiten=True)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"32.5269\" cy=\"-28.2843\" rx=\"32.5538\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"32.5269\" y=\"-31.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Custom&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"32.5269\" y=\"-19.4843\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- custom_xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>custom_xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.xgboost.xgb_classifier.html\" xlink:title=\"custom_xg_boost = CustomXGBoost(gamma=0.42208258595069725, learning_rate=0.6558019595096513, max_depth=13, min_child_weight=13, n_estimators=9, reg_alpha=0.3590229319214039, reg_lambda=0.7978279409450941, subsample=0.6209085649172931)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"133.581\" cy=\"-28.2843\" rx=\"32.5538\" ry=\"28.0702\"/>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-37.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Custom&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-25.4843\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-13.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- custom_pca&#45;&gt;custom_xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>custom_pca&#45;&gt;custom_xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M65.1405,-28.2843C73.2715,-28.2843 82.1469,-28.2843 90.7095,-28.2843\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"90.9278,-31.7844 100.928,-28.2843 90.9277,-24.7844 90.9278,-31.7844\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f47f36c4ef0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline().visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "from sklearn.decomposition import PCA as CustomPCA\n",
       "from xgboost import XGBClassifier as CustomXGBoost\n",
       "import lale\n",
       "\n",
       "lale.wrap_imported_operators()\n",
       "custom_pca = CustomPCA(n_components=43, svd_solver=\"full\", whiten=True)\n",
       "custom_xg_boost = CustomXGBoost(\n",
       "    gamma=0.42208258595069725,\n",
       "    learning_rate=0.6558019595096513,\n",
       "    max_depth=13,\n",
       "    min_child_weight=13,\n",
       "    n_estimators=9,\n",
       "    reg_alpha=0.3590229319214039,\n",
       "    reg_lambda=0.7978279409450941,\n",
       "    subsample=0.6209085649172931,\n",
       ")\n",
       "pipeline = custom_pca >> custom_xg_boost\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline().pretty_print(ipython_display=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tid</th>\n",
       "      <th>loss</th>\n",
       "      <th>time</th>\n",
       "      <th>log_loss</th>\n",
       "      <th>status</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>p0</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.754298</td>\n",
       "      <td>4.080399</td>\n",
       "      <td>1.039077</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p1</th>\n",
       "      <td>1</td>\n",
       "      <td>-0.774493</td>\n",
       "      <td>7.493949</td>\n",
       "      <td>0.799467</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p2</th>\n",
       "      <td>2</td>\n",
       "      <td>-0.725306</td>\n",
       "      <td>6.744288</td>\n",
       "      <td>0.948600</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p3</th>\n",
       "      <td>3</td>\n",
       "      <td>-0.783175</td>\n",
       "      <td>4.715054</td>\n",
       "      <td>1.036146</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p4</th>\n",
       "      <td>4</td>\n",
       "      <td>-0.759672</td>\n",
       "      <td>8.948971</td>\n",
       "      <td>0.576866</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p5</th>\n",
       "      <td>5</td>\n",
       "      <td>-0.823029</td>\n",
       "      <td>11.589523</td>\n",
       "      <td>0.514666</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p6</th>\n",
       "      <td>6</td>\n",
       "      <td>-0.783404</td>\n",
       "      <td>12.232503</td>\n",
       "      <td>0.765154</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p7</th>\n",
       "      <td>7</td>\n",
       "      <td>-0.828766</td>\n",
       "      <td>20.878259</td>\n",
       "      <td>0.435281</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p8</th>\n",
       "      <td>8</td>\n",
       "      <td>-0.724561</td>\n",
       "      <td>4.045507</td>\n",
       "      <td>0.669205</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p9</th>\n",
       "      <td>9</td>\n",
       "      <td>-0.731828</td>\n",
       "      <td>4.792484</td>\n",
       "      <td>1.780335</td>\n",
       "      <td>ok</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      tid      loss       time  log_loss status\n",
       "name                                           \n",
       "p0      0 -0.754298   4.080399  1.039077     ok\n",
       "p1      1 -0.774493   7.493949  0.799467     ok\n",
       "p2      2 -0.725306   6.744288  0.948600     ok\n",
       "p3      3 -0.783175   4.715054  1.036146     ok\n",
       "p4      4 -0.759672   8.948971  0.576866     ok\n",
       "p5      5 -0.823029  11.589523  0.514666     ok\n",
       "p6      6 -0.783404  12.232503  0.765154     ok\n",
       "p7      7 -0.828766  20.878259  0.435281     ok\n",
       "p8      8 -0.724561   4.045507  0.669205     ok\n",
       "p9      9 -0.731828   4.792484  1.780335     ok"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hpo_trained.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p8\n"
     ]
    }
   ],
   "source": [
    "worst_name = hpo_trained.summary().loss.argmax()\n",
    "if not isinstance(worst_name, str): #newer pandas argmax returns index\n",
    "    worst_name = hpo_trained.summary().index[worst_name]\n",
    "print(worst_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"174pt\" height=\"65pt\"\n",
       " viewBox=\"0.00 0.00 174.11 64.57\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 60.5685)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-60.5685 170.108,-60.5685 170.108,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- custom_pca -->\n",
       "<g id=\"node1\" class=\"node\"><title>custom_pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"custom_pca = CustomPCA(n_components=19, svd_solver=&#39;full&#39;)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"32.5269\" cy=\"-28.2843\" rx=\"32.5538\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"32.5269\" y=\"-31.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Custom&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"32.5269\" y=\"-19.4843\" font-family=\"Times,serif\" font-size=\"11.00\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- custom_xg_boost -->\n",
       "<g id=\"node2\" class=\"node\"><title>custom_xg_boost</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.xgboost.xgb_classifier.html\" xlink:title=\"custom_xg_boost = CustomXGBoost(gamma=0.025801085053521078, learning_rate=0.5793622466253201, max_depth=3, min_child_weight=8, n_estimators=9, reg_alpha=0.49646670359671663, reg_lambda=0.9280083037935846, subsample=0.5479690370134093)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"133.581\" cy=\"-28.2843\" rx=\"32.5538\" ry=\"28.0702\"/>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-37.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Custom&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-25.4843\" font-family=\"Times,serif\" font-size=\"11.00\">XG&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"133.581\" y=\"-13.4843\" font-family=\"Times,serif\" font-size=\"11.00\">Boost</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- custom_pca&#45;&gt;custom_xg_boost -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>custom_pca&#45;&gt;custom_xg_boost</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M65.1405,-28.2843C73.2715,-28.2843 82.1469,-28.2843 90.7095,-28.2843\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"90.9278,-31.7844 100.928,-28.2843 90.9277,-24.7844 90.9278,-31.7844\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f473d39c9e8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "custom_pca = CustomPCA(n_components=19, svd_solver=\"full\")\n",
       "custom_xg_boost = CustomXGBoost(\n",
       "    gamma=0.025801085053521078,\n",
       "    learning_rate=0.5793622466253201,\n",
       "    max_depth=3,\n",
       "    min_child_weight=8,\n",
       "    n_estimators=9,\n",
       "    reg_alpha=0.49646670359671663,\n",
       "    reg_lambda=0.9280083037935846,\n",
       "    subsample=0.5479690370134093,\n",
       ")\n",
       "pipeline = custom_pca >> custom_xg_boost\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hpo_trained.get_pipeline(worst_name).visualize()\n",
    "hpo_trained.get_pipeline(worst_name).pretty_print(ipython_display=True, show_imports=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combined Algorithm Selection and Hyperparameter Tuning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"184pt\" height=\"185pt\"\n",
       " viewBox=\"0.00 0.00 184.00 185.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 181)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-181 180,-181 180,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\"><title>cluster:choice_0</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice_0 = norm | no_op\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"8,-47 8,-169 78,-169 78,-47 8,-47\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<g id=\"clust2\" class=\"cluster\"><title>cluster:choice_1</title>\n",
       "<g id=\"a_clust2\"><a xlink:title=\"choice_1 = tree | lr | knn\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"98,-8 98,-169 168,-169 168,-8 98,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-153.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm -->\n",
       "<g id=\"node1\" class=\"node\"><title>norm</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.normalizer.html\" xlink:title=\"norm = Norm\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"43\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">Norm</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- tree -->\n",
       "<g id=\"node3\" class=\"node\"><title>tree</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.decision_tree_classifier.html\" xlink:title=\"tree = Tree\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-120\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-117.2\" font-family=\"Times,serif\" font-size=\"11.00\">Tree</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm&#45;&gt;tree -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>norm&#45;&gt;tree</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M77.7296,-120C83.6523,-120 89.838,-120 95.8241,-120\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"88.0002,-123.5 98,-120 87.9998,-116.5 88.0002,-123.5\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node2\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"43\" cy=\"-75\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"43\" y=\"-66.2\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- lr -->\n",
       "<g id=\"node4\" class=\"node\"><title>lr</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html\" xlink:title=\"lr = LR(solver=&#39;liblinear&#39;)\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"133\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-74.2\" font-family=\"Times,serif\" font-size=\"11.00\">LR</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node5\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.k_neighbors_classifier.html\" xlink:title=\"knn = KNN\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"133\" cy=\"-34\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"133\" y=\"-31.2\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f4771aa1780>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.preprocessing import Normalizer as Norm\n",
    "from sklearn.linear_model import LogisticRegression as LR\n",
    "from sklearn.tree import DecisionTreeClassifier as Tree\n",
    "from sklearn.neighbors import KNeighborsClassifier as KNN\n",
    "from lale.lib.lale import NoOp\n",
    "lale.wrap_imported_operators()\n",
    "\n",
    "KNN = KNN.customize_schema(n_neighbors=schemas.Int(min=1, max=10))\n",
    "transp_planned = (Norm | NoOp) >> (Tree | LR(solver='liblinear') | KNN)\n",
    "transp_planned.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|█████████| 3/3 [01:25<00:00, 28.55s/trial, best loss: -0.8412346112501562]\n",
      "CPU times: user 1min 27s, sys: 1.34 s, total: 1min 28s\n",
      "Wall time: 1min 27s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "transp_trained = transp_planned.auto_configure(\n",
    "    train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ---  Excursion: Bindings as Lifecycle ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-bindings.png\" style=\"width:450px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "knn = KNN(algorithm=\"ball_tree\", metric=\"manhattan\", n_neighbors=9)\n",
       "pipeline = NoOp() >> knn\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"152pt\" height=\"48pt\"\n",
       " viewBox=\"0.00 0.00 152.00 47.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 43.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-43.598 148,-43.598 148,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node1\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node2\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.k_neighbors_classifier.html\" xlink:title=\"knn = KNN(algorithm=&#39;ball_tree&#39;, metric=&#39;manhattan&#39;, n_neighbors=9)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;knn -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>no_op&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f4771a612b0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "transp_trained.pretty_print(ipython_display=True, show_imports=False)\n",
    "transp_trained.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 86.6%\n",
      "CPU times: user 52.4 s, sys: 78.1 ms, total: 52.5 s\n",
      "Wall time: 53 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "transp_y = transp_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, transp_y):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Non-Linear Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'description': 'Features of forest covertypes dataset (classification).',\n",
       " 'documentation_url': 'https://scikit-learn.org/0.20/datasets/index.html#forest-covertypes',\n",
       " 'type': 'array',\n",
       " 'items': {'type': 'array',\n",
       "  'minItems': 54,\n",
       "  'maxItems': 54,\n",
       "  'items': [{'description': 'Elevation', 'type': 'integer'},\n",
       "   {'description': 'Aspect', 'type': 'integer'},\n",
       "   {'description': 'Slope', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Hydrology', 'type': 'integer'},\n",
       "   {'description': 'Vertical_Distance_To_Hydrology', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Roadways', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_9am', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_Noon', 'type': 'integer'},\n",
       "   {'description': 'Hillshade_3pm', 'type': 'integer'},\n",
       "   {'description': 'Horizontal_Distance_To_Fire_Points', 'type': 'integer'},\n",
       "   {'description': 'Wilderness_Area1', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area2', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area3', 'enum': [0, 1]},\n",
       "   {'description': 'Wilderness_Area4', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type1', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type2', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type3', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type4', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type5', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type6', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type7', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type8', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type9', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type10', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type11', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type12', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type13', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type14', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type15', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type16', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type17', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type18', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type19', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type20', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type21', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type22', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type23', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type24', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type25', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type26', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type27', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type28', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type29', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type30', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type31', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type32', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type33', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type34', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type35', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type36', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type37', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type38', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type39', 'enum': [0, 1]},\n",
       "   {'description': 'Soil_Type40', 'enum': [0, 1]}]},\n",
       " 'minItems': 58102,\n",
       " 'maxItems': 58102}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_X.json_schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Wilderness_Area1', 'Wilderness_Area2', 'Wilderness_Area3', 'Wilderness_Area4', 'Soil_Type1', 'Soil_Type2', 'Soil_Type3', 'Soil_Type4', 'Soil_Type5', 'Soil_Type6', 'Soil_Type7', 'Soil_Type8', 'Soil_Type9', 'Soil_Type10', 'Soil_Type11', 'Soil_Type12', 'Soil_Type13', 'Soil_Type14', 'Soil_Type15', 'Soil_Type16', 'Soil_Type17', 'Soil_Type18', 'Soil_Type19', 'Soil_Type20', 'Soil_Type21', 'Soil_Type22', 'Soil_Type23', 'Soil_Type24', 'Soil_Type25', 'Soil_Type26', 'Soil_Type27', 'Soil_Type28', 'Soil_Type29', 'Soil_Type30', 'Soil_Type31', 'Soil_Type32', 'Soil_Type33', 'Soil_Type34', 'Soil_Type35', 'Soil_Type36', 'Soil_Type37', 'Soil_Type38', 'Soil_Type39', 'Soil_Type40']\n"
     ]
    }
   ],
   "source": [
    "from lale.lib.lale import categorical\n",
    "print(categorical(max_values=2)(test_X))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"332pt\" height=\"186pt\"\n",
       " viewBox=\"0.00 0.00 332.00 185.80\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 181.799)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-181.799 328,-181.799 328,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\"><title>cluster:choice</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice = norm | no_op\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"black\" points=\"82,-8 82,-130 152,-130 152,-8 82,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-114.8\" font-family=\"Times,serif\" font-size=\"14.00\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0 -->\n",
       "<g id=\"node1\" class=\"node\"><title>project_0</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_0 = Project(columns=lale.lib.lale.categorical(max_values=2))\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-158\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-155.2\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel -->\n",
       "<g id=\"node2\" class=\"node\"><title>feat_sel</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.select_k_best.html\" xlink:title=\"feat_sel = FeatSel\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"117\" cy=\"-158\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-161.2\" font-family=\"Times,serif\" font-size=\"11.00\">Feat&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-149.2\" font-family=\"Times,serif\" font-size=\"11.00\">Sel</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0&#45;&gt;feat_sel -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>project_0&#45;&gt;feat_sel</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-158C62.3932,-158 71.3106,-158 79.8241,-158\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-161.5 89.919,-158 79.919,-154.5 79.919,-161.5\"/>\n",
       "</g>\n",
       "<!-- concat -->\n",
       "<g id=\"node6\" class=\"node\"><title>concat</title>\n",
       "<g id=\"a_node6\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.concat_features.html\" xlink:title=\"concat = Concat\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"207\" cy=\"-119\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-116.2\" font-family=\"Times,serif\" font-size=\"11.00\">Concat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel&#45;&gt;concat -->\n",
       "<g id=\"edge3\" class=\"edge\"><title>feat_sel&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M140.662,-147.957C150.982,-143.383 163.367,-137.894 174.567,-132.93\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"176.298,-135.992 184.023,-128.74 173.462,-129.592 176.298,-135.992\"/>\n",
       "</g>\n",
       "<!-- project_1 -->\n",
       "<g id=\"node3\" class=\"node\"><title>project_1</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_1 = Project(drop_columns=lale.lib.lale.categorical(max_values=2))\">\n",
       "<ellipse fill=\"#b0e2ff\" stroke=\"black\" cx=\"27\" cy=\"-81\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- norm -->\n",
       "<g id=\"node4\" class=\"node\"><title>norm</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.normalizer.html\" xlink:title=\"norm = Norm\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"117\" cy=\"-81\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-78.2\" font-family=\"Times,serif\" font-size=\"11.00\">Norm</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_1&#45;&gt;norm -->\n",
       "<g id=\"edge2\" class=\"edge\"><title>project_1&#45;&gt;norm</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-81C62.3932,-81 71.3106,-81 79.8241,-81\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"72.0002,-84.5005 82,-81 71.9998,-77.5005 72.0002,-84.5005\"/>\n",
       "</g>\n",
       "<!-- norm&#45;&gt;concat -->\n",
       "<g id=\"edge4\" class=\"edge\"><title>norm&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M151.589,-95.5043C159.149,-98.7687 167.143,-102.221 174.609,-105.445\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"173.233,-108.663 183.801,-109.414 176.008,-102.236 173.233,-108.663\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node5\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-36\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-39.2\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-27.2\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node7\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node7\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.k_neighbors_classifier.html\" xlink:title=\"knn = KNN\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"black\" cx=\"297\" cy=\"-119\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"297\" y=\"-116.2\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- concat&#45;&gt;knn -->\n",
       "<g id=\"edge5\" class=\"edge\"><title>concat&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M234.403,-119C242.393,-119 251.311,-119 259.824,-119\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"259.919,-122.5 269.919,-119 259.919,-115.5 259.919,-122.5\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f4737aa6978>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from lale.lib.lale import Project\n",
    "from lale.lib.lale import ConcatFeatures as Concat\n",
    "from sklearn.feature_selection import SelectKBest as FeatSel\n",
    "lale.wrap_imported_operators()\n",
    "\n",
    "binary_prep = Project(columns=categorical(max_values=2)) >> FeatSel\n",
    "other_prep = Project(drop_columns=categorical(max_values=2)) >> (Norm | NoOp)\n",
    "nonlin_planned = (binary_prep & other_prep) >> Concat >> KNN\n",
    "nonlin_planned.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|█████████| 3/3 [02:30<00:00, 50.20s/trial, best loss: -0.8618882829755578]\n",
      "CPU times: user 2min 32s, sys: 344 ms, total: 2min 33s\n",
      "Wall time: 2min 35s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "nonlin_trained = nonlin_planned.auto_configure(\n",
    "    train_X, train_y, optimizer=Hyperopt, cv=3, max_evals=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### --- Excursion: Combinators ---\n",
    "\n",
    "| Lale feature            | Name | Description  | Scikit-learn feature                |\n",
    "| ----------------------- | ---- | ------------ | ----------------------------------- |\n",
    "| >> or `make_pipeline`   | pipe | feed to next | `make_pipeline`                     |\n",
    "| & or `make_union`       | and  | run both     | `make_union` or `ColumnTransformer` |\n",
    "| &#x7c; or `make_choice` | or   | choose one   | N/A (specific to given AutoML tool) |\n",
    "\n",
    "### --- Excursion: Interoperability ---\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-interop.png\" style=\"width:550px\" align=\"left\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.38.0 (20140413.2041)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"332pt\" height=\"95pt\"\n",
       " viewBox=\"0.00 0.00 332.00 94.60\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 90.598)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-90.598 328,-90.598 328,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- project_0 -->\n",
       "<g id=\"node1\" class=\"node\"><title>project_0</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_0 = Project(columns=lale.lib.lale.categorical(max_values=2))\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-66.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-63.999\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel -->\n",
       "<g id=\"node2\" class=\"node\"><title>feat_sel</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.select_k_best.html\" xlink:title=\"feat_sel = FeatSel(k=8)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-66.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-69.999\" font-family=\"Times,serif\" font-size=\"11.00\">Feat&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-57.999\" font-family=\"Times,serif\" font-size=\"11.00\">Sel</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_0&#45;&gt;feat_sel -->\n",
       "<g id=\"edge1\" class=\"edge\"><title>project_0&#45;&gt;feat_sel</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-66.799C62.3932,-66.799 71.3106,-66.799 79.8241,-66.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-70.2991 89.919,-66.799 79.919,-63.2991 79.919,-70.2991\"/>\n",
       "</g>\n",
       "<!-- concat -->\n",
       "<g id=\"node5\" class=\"node\"><title>concat</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.concat_features.html\" xlink:title=\"concat = Concat()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"207\" cy=\"-42.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-39.999\" font-family=\"Times,serif\" font-size=\"11.00\">Concat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- feat_sel&#45;&gt;concat -->\n",
       "<g id=\"edge3\" class=\"edge\"><title>feat_sel&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M142.513,-60.1137C151.572,-57.6431 162.012,-54.7958 171.775,-52.1331\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"172.892,-55.4564 181.619,-49.4485 171.05,-48.703 172.892,-55.4564\"/>\n",
       "</g>\n",
       "<!-- project_1 -->\n",
       "<g id=\"node3\" class=\"node\"><title>project_1</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.project.html\" xlink:title=\"project_1 = Project(drop_columns=lale.lib.lale.categorical(max_values=2))\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"27\" cy=\"-19.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-16.999\" font-family=\"Times,serif\" font-size=\"11.00\">Project</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node4\" class=\"node\"><title>no_op</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp()\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"117\" cy=\"-19.799\" rx=\"27\" ry=\"19.6\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-22.999\" font-family=\"Times,serif\" font-size=\"11.00\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-10.999\" font-family=\"Times,serif\" font-size=\"11.00\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- project_1&#45;&gt;no_op -->\n",
       "<g id=\"edge2\" class=\"edge\"><title>project_1&#45;&gt;no_op</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M54.4029,-19.799C62.3932,-19.799 71.3106,-19.799 79.8241,-19.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"79.919,-23.2991 89.919,-19.799 79.919,-16.2991 79.919,-23.2991\"/>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;concat -->\n",
       "<g id=\"edge4\" class=\"edge\"><title>no_op&#45;&gt;concat</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M142.982,-26.3283C151.958,-28.6743 162.241,-31.362 171.861,-33.8763\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"171.004,-37.2697 181.564,-36.4123 172.774,-30.4972 171.004,-37.2697\"/>\n",
       "</g>\n",
       "<!-- knn -->\n",
       "<g id=\"node6\" class=\"node\"><title>knn</title>\n",
       "<g id=\"a_node6\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.k_neighbors_classifier.html\" xlink:title=\"knn = KNN(algorithm=&#39;kd_tree&#39;, n_neighbors=7, weights=&#39;distance&#39;)\">\n",
       "<ellipse fill=\"white\" stroke=\"black\" cx=\"297\" cy=\"-42.799\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"297\" y=\"-39.999\" font-family=\"Times,serif\" font-size=\"11.00\">KNN</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- concat&#45;&gt;knn -->\n",
       "<g id=\"edge5\" class=\"edge\"><title>concat&#45;&gt;knn</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M234.403,-42.799C242.393,-42.799 251.311,-42.799 259.824,-42.799\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"259.919,-46.2991 269.919,-42.799 259.919,-39.2991 259.919,-46.2991\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7f47f04cb4a8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "```python\n",
       "project_0 = Project(columns=lale.lib.lale.categorical(max_values=2))\n",
       "feat_sel = FeatSel(k=8)\n",
       "pipeline_0 = make_pipeline(project_0, feat_sel)\n",
       "project_1 = Project(drop_columns=lale.lib.lale.categorical(max_values=2))\n",
       "pipeline_1 = make_pipeline(project_1, NoOp())\n",
       "union = make_union(pipeline_0, pipeline_1)\n",
       "knn = KNN(algorithm=\"kd_tree\", n_neighbors=7, weights=\"distance\")\n",
       "pipeline = make_pipeline(union, knn)\n",
       "```"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "nonlin_trained.visualize()\n",
    "nonlin_trained.pretty_print(ipython_display=True, show_imports=False, combinators=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 88.6%\n",
      "CPU times: user 5.02 s, sys: 78.1 ms, total: 5.09 s\n",
      "Wall time: 5.13 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "nonlin_y = nonlin_trained.predict(test_X)\n",
    "print(f'accuracy {sklearn.metrics.accuracy_score(test_y, nonlin_y):.1%}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Wilderness_Area1</th>\n",
       "      <th>Wilderness_Area4</th>\n",
       "      <th>Soil_Type2</th>\n",
       "      <th>Soil_Type3</th>\n",
       "      <th>Soil_Type4</th>\n",
       "      <th>Soil_Type10</th>\n",
       "      <th>Soil_Type38</th>\n",
       "      <th>Soil_Type39</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Wilderness_Area1  Wilderness_Area4  Soil_Type2  Soil_Type3  Soil_Type4  \\\n",
       "0               1.0               0.0         0.0         0.0         0.0   \n",
       "1               0.0               0.0         0.0         0.0         0.0   \n",
       "2               0.0               0.0         0.0         0.0         0.0   \n",
       "3               1.0               0.0         0.0         0.0         0.0   \n",
       "4               1.0               0.0         0.0         0.0         0.0   \n",
       "5               0.0               1.0         0.0         0.0         0.0   \n",
       "6               1.0               0.0         0.0         0.0         0.0   \n",
       "7               1.0               0.0         0.0         0.0         0.0   \n",
       "8               1.0               0.0         0.0         0.0         0.0   \n",
       "9               1.0               0.0         0.0         0.0         0.0   \n",
       "\n",
       "   Soil_Type10  Soil_Type38  Soil_Type39  \n",
       "0          0.0          1.0          0.0  \n",
       "1          0.0          0.0          0.0  \n",
       "2          0.0          0.0          0.0  \n",
       "3          0.0          0.0          0.0  \n",
       "4          0.0          0.0          0.0  \n",
       "5          1.0          0.0          0.0  \n",
       "6          0.0          0.0          0.0  \n",
       "7          0.0          1.0          0.0  \n",
       "8          0.0          0.0          0.0  \n",
       "9          0.0          0.0          0.0  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "binary_prep_trainable = Project(columns=categorical(max_values=2)) >> FeatSel(k=8)\n",
    "binary_prep_trained = binary_prep_trainable.fit(train_X, train_y)\n",
    "binary_prep_trained.transform(test_X.head(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "- code and documentation: https://github.com/ibm/lale\n",
    "- more examples: https://nbviewer.jupyter.org/github/IBM/lale/tree/master/examples/\n",
    "- frequently asked questions: https://github.com/IBM/lale/blob/master/docs/faq.rst\n",
    "- arXiv paper: https://arxiv.org/pdf/1906.03957.pdf\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/IBM/lale/d8ecf5c46e46653eb65a9eecc14dc991372cc162/examples/img/2019-1105-summary.png\" style=\"width:350px\" align=\"left\">"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}