{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Demo Import from Sklearn with Schemas from Lale\n",
    "\n",
    "This notebook shows how to use Lale directly with sklearn operators.\n",
    "The function `lale.wrap_imported_operators()` will automatically wrap\n",
    "known sklearn operators into Lale operators."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usability\n",
    "\n",
    "To make Lale easy to learn and use, its APIs imitate those of\n",
    "[sklearn](https://scikit-learn.org/), with init, fit, and predict,\n",
    "and with pipelines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "truth  [6, 9, 3, 7, 2, 1, 5, 2, 5, 2, 1, 9, 4, 0, 4, 2, 3, 7, 8, 8]\n"
     ]
    }
   ],
   "source": [
    "import sklearn.datasets\n",
    "import sklearn.model_selection\n",
    "digits = sklearn.datasets.load_digits()\n",
    "X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(\n",
    "    digits.data, digits.target, test_size=0.2, random_state=42)\n",
    "print(f'truth  {y_test.tolist()[:20]}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "actual [6, 9, 3, 7, 2, 2, 5, 2, 5, 2, 1, 4, 4, 0, 4, 2, 3, 7, 8, 8]\n"
     ]
    }
   ],
   "source": [
    "import lale\n",
    "from sklearn.linear_model import LogisticRegression as LR\n",
    "lale.wrap_imported_operators()\n",
    "\n",
    "trainable_lr = LR(LR.solver.lbfgs, C=0.0001)\n",
    "trained_lr = trainable_lr.fit(X_train, y_train)\n",
    "predictions = trained_lr.predict(X_test)\n",
    "print(f'actual {predictions.tolist()[:20]}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 91.7%\n"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics import accuracy_score\n",
    "print(f'accuracy {accuracy_score(y_test, predictions):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Correctness\n",
    "\n",
    "Lale uses [JSON Schema](https://json-schema.org/) to check for valid\n",
    "hyperparameters. These schemas enable not just validation but also\n",
    "interactive documentation. Thanks to using a single source of truth, the\n",
    "documentation is correct by construction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Invalid configuration for LR(solver='adam', C=0.01) due to invalid value solver=adam.\n",
      "Schema of argument solver: {\n",
      "    'description': 'Algorithm for optimization problem.',\n",
      "    'enum': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],\n",
      "    'default': 'liblinear'}\n",
      "Value: adam\n"
     ]
    }
   ],
   "source": [
    "from jsonschema import ValidationError\n",
    "try:\n",
    "    lale_lr = LR(solver='adam', C=0.01)\n",
    "except ValidationError as e:\n",
    "    print(e.message)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'description': 'Inverse regularization strength. Smaller values specify stronger regularization.',\n",
       " 'type': 'number',\n",
       " 'distribution': 'loguniform',\n",
       " 'minimum': 0.0,\n",
       " 'exclusiveMinimum': True,\n",
       " 'default': 1.0,\n",
       " 'minimumForOptimizer': 0.03125,\n",
       " 'maximumForOptimizer': 32768}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "LR.hyperparam_schema('C')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'solver': 'liblinear',\n",
       " 'penalty': 'l2',\n",
       " 'dual': False,\n",
       " 'C': 1.0,\n",
       " 'tol': 0.0001,\n",
       " 'fit_intercept': True,\n",
       " 'intercept_scaling': 1.0,\n",
       " 'class_weight': None,\n",
       " 'random_state': None,\n",
       " 'max_iter': 100,\n",
       " 'multi_class': 'ovr',\n",
       " 'verbose': 0,\n",
       " 'warm_start': False,\n",
       " 'n_jobs': None}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "LR.hyperparam_defaults()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Automation\n",
    "\n",
    "Lale includes a compiler that converts types (expressed as JSON\n",
    "Schema) to optimizer search spaces. It currently has back-ends for\n",
    "[hyperopt](http://hyperopt.github.io/hyperopt/),\n",
    "[GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html), and\n",
    "[SMAC](http://www.automl.org/smac/).\n",
    "We are also actively working towards various other forms of AI\n",
    "automation using various other tools."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|██████████| 10/10 [00:02<00:00,  4.48trial/s, best loss: -0.9777777777777777]\n",
      "best hyperparams {'C': 1685.8724563983353, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'multi_class': 'auto', 'penalty': 'l2', 'solver': 'sag', 'tol': 0.026043867770646253}\n",
      "\n",
      "accuracy 97.8%\n"
     ]
    }
   ],
   "source": [
    "from lale.search.op2hp import hyperopt_search_space\n",
    "from hyperopt import STATUS_OK, Trials, fmin, tpe, space_eval\n",
    "import lale.helpers\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "def objective(hyperparams):\n",
    "    trainable = LR(**lale.helpers.dict_without(hyperparams, 'name'))\n",
    "    trained = trainable.fit(X_train, y_train)\n",
    "    predictions = trained.predict(X_test)\n",
    "    return {'loss': -accuracy_score(y_test, predictions), 'status': STATUS_OK}\n",
    "\n",
    "search_space = hyperopt_search_space(LR)\n",
    "\n",
    "trials = Trials()\n",
    "fmin(objective, search_space, algo=tpe.suggest, max_evals=10, trials=trials)\n",
    "best_hps = space_eval(search_space, trials.argmin)\n",
    "print(f'best hyperparams {lale.helpers.dict_without(best_hps, \"name\")}\\n')\n",
    "print(f'accuracy {-min(trials.losses()):.1%}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Composition\n",
    "\n",
    "Lale supports composite models, which resemble sklearn pipelines but are\n",
    "more expressive.\n",
    "\n",
    "| Symbol | Name | Description  | Sklearn feature |\n",
    "| ------ | ---- | ------------ | --------------- |\n",
    "| >>     | pipe | Feed to next | `make_pipeline` |\n",
    "| &      | and  | Run both     | `make_union`, includes concat |\n",
    "| &#x7c; | or   | Choose one   | (missing) |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"258pt\" height=\"141pt\"\n",
       " viewBox=\"0.00 0.00 258.00 141.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 137)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-137 254,-137 254,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\">\n",
       "<title>cluster:choice</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice = lr | svc\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"#000000\" points=\"172,-8 172,-125 242,-125 242,-8 172,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-109.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"#000000\" cx=\"27\" cy=\"-99\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-95.7\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- cat -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>cat</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.concat_features.html\" xlink:title=\"cat = Cat\">\n",
       "<ellipse fill=\"#ffffff\" stroke=\"#000000\" cx=\"117\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-73.7\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">Cat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;cat -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>pca&#45;&gt;cat</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M52.5497,-92.7545C61.5751,-90.5483 71.8933,-88.0261 81.5886,-85.6561\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"82.5036,-89.0356 91.3865,-83.2611 80.8414,-82.2358 82.5036,-89.0356\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>no_op</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"#ffffff\" stroke=\"#000000\" cx=\"27\" cy=\"-56\" rx=\"27\" ry=\"18.2703\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-58.2\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-47.2\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;cat -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>no_op&#45;&gt;cat</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M52.5497,-61.9616C61.5751,-64.0675 71.8933,-66.4751 81.5886,-68.7373\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"80.8527,-72.1596 91.3865,-71.0235 82.4434,-65.3427 80.8527,-72.1596\"/>\n",
       "</g>\n",
       "<!-- lr -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>lr</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html\" xlink:title=\"lr = LR\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"#000000\" cx=\"207\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-73.7\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">LR</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- cat&#45;&gt;lr -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>cat&#45;&gt;lr</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M144.003,-77C152.0277,-77 160.9665,-77 169.5309,-77\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"162.0001,-80.5004 172,-77 161.9999,-73.5004 162.0001,-80.5004\"/>\n",
       "</g>\n",
       "<!-- svc -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>svc</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html\" xlink:title=\"svc = SVC\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"#000000\" cx=\"207\" cy=\"-34\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-30.7\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">SVC</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x13681f128>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.decomposition import PCA\n",
    "from sklearn.svm import SVC\n",
    "from lale.lib.lale import ConcatFeatures as Cat\n",
    "from lale.lib.lale import NoOp\n",
    "lale.wrap_imported_operators()\n",
    "\n",
    "optimizable = (PCA & NoOp) >> Cat >> (LR | SVC)\n",
    "optimizable.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"258pt\" height=\"141pt\"\n",
       " viewBox=\"0.00 0.00 258.00 141.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 137)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-137 254,-137 254,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<g id=\"clust1\" class=\"cluster\">\n",
       "<title>cluster:choice</title>\n",
       "<g id=\"a_clust1\"><a xlink:title=\"choice = lr | svc\">\n",
       "<polygon fill=\"#7ec0ee\" stroke=\"#000000\" points=\"172,-8 172,-125 242,-125 242,-8 172,-8\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-109.8\" font-family=\"Times,serif\" font-size=\"14.00\" fill=\"#000000\">Choice</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"#000000\" cx=\"27\" cy=\"-99\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-95.7\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- cat -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>cat</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.concat_features.html\" xlink:title=\"cat = Cat()\">\n",
       "<ellipse fill=\"#ffffff\" stroke=\"#000000\" cx=\"117\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-73.7\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">Cat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;cat -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>pca&#45;&gt;cat</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M52.5497,-92.7545C61.5751,-90.5483 71.8933,-88.0261 81.5886,-85.6561\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"82.5036,-89.0356 91.3865,-83.2611 80.8414,-82.2358 82.5036,-89.0356\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>no_op</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp\">\n",
       "<ellipse fill=\"#ffffff\" stroke=\"#000000\" cx=\"27\" cy=\"-56\" rx=\"27\" ry=\"18.2703\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-58.2\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-47.2\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;cat -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>no_op&#45;&gt;cat</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M52.5497,-61.9616C61.5751,-64.0675 71.8933,-66.4751 81.5886,-68.7373\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"80.8527,-72.1596 91.3865,-71.0235 82.4434,-65.3427 80.8527,-72.1596\"/>\n",
       "</g>\n",
       "<!-- lr -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>lr</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.logistic_regression.html\" xlink:title=\"lr = LR\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"#000000\" cx=\"207\" cy=\"-77\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-73.7\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">LR</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- cat&#45;&gt;lr -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>cat&#45;&gt;lr</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M144.003,-77C152.0277,-77 160.9665,-77 169.5309,-77\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"162.0001,-80.5004 172,-77 161.9999,-73.5004 162.0001,-80.5004\"/>\n",
       "</g>\n",
       "<!-- svc -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>svc</title>\n",
       "<g id=\"a_node5\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html\" xlink:title=\"svc = SVC\">\n",
       "<ellipse fill=\"#7ec0ee\" stroke=\"#000000\" cx=\"207\" cy=\"-34\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-30.7\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">SVC</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x1316524a8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from lale.operators import make_pipeline, make_union, make_choice\n",
    "optimizable = make_pipeline(make_union(PCA, NoOp), make_choice(LR, SVC))\n",
    "optimizable.visualize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|██████████| 10/10 [00:31<00:00,  3.18s/trial, best loss: -0.9902556438206324]\n"
     ]
    }
   ],
   "source": [
    "import lale.lib.lale.hyperopt\n",
    "Optimizer = lale.lib.lale.hyperopt.Hyperopt\n",
    "trained = optimizable.auto_configure(X_train, y_train, optimizer=Optimizer, max_evals=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy 98.9%\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.40.1 (20161225.0304)\n",
       " -->\n",
       "<!-- Title: cluster:(root) Pages: 1 -->\n",
       "<svg width=\"242pt\" height=\"87pt\"\n",
       " viewBox=\"0.00 0.00 242.00 87.38\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 83.3848)\">\n",
       "<title>cluster:(root)</title>\n",
       "<g id=\"a_graph0\"><a xlink:title=\"(root) = ...\">\n",
       "<polygon fill=\"#ffffff\" stroke=\"transparent\" points=\"-4,4 -4,-83.3848 238,-83.3848 238,4 -4,4\"/>\n",
       "</a>\n",
       "</g>\n",
       "<!-- pca -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>pca</title>\n",
       "<g id=\"a_node1\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.sklearn.pca.html\" xlink:title=\"pca = PCA(svd_solver=&#39;full&#39;)\">\n",
       "<ellipse fill=\"#ffffff\" stroke=\"#000000\" cx=\"27\" cy=\"-61.3848\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-58.0848\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">PCA</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- cat -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>cat</title>\n",
       "<g id=\"a_node3\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.concat_features.html\" xlink:title=\"cat = Cat()\">\n",
       "<ellipse fill=\"#ffffff\" stroke=\"#000000\" cx=\"117\" cy=\"-39.3848\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-36.0848\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">Cat</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- pca&#45;&gt;cat -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>pca&#45;&gt;cat</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M52.5497,-55.1393C61.5751,-52.9331 71.8933,-50.4109 81.5886,-48.0409\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"82.5036,-51.4204 91.3865,-45.6459 80.8414,-44.6206 82.5036,-51.4204\"/>\n",
       "</g>\n",
       "<!-- no_op -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>no_op</title>\n",
       "<g id=\"a_node2\"><a xlink:href=\"https://lale.readthedocs.io/en/latest/modules/lale.lib.lale.no_op.html\" xlink:title=\"no_op = NoOp()\">\n",
       "<ellipse fill=\"#ffffff\" stroke=\"#000000\" cx=\"27\" cy=\"-18.3848\" rx=\"27\" ry=\"18.2703\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-20.5848\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">No&#45;</text>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-9.5848\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">Op</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- no_op&#45;&gt;cat -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>no_op&#45;&gt;cat</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M52.5497,-24.3464C61.5751,-26.4523 71.8933,-28.8599 81.5886,-31.1221\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"80.8527,-34.5444 91.3865,-33.4083 82.4434,-27.7275 80.8527,-34.5444\"/>\n",
       "</g>\n",
       "<!-- svc -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>svc</title>\n",
       "<g id=\"a_node4\"><a xlink:href=\"https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html\" xlink:title=\"svc = SVC(gamma=&#39;scale&#39;, kernel=&#39;poly&#39;, probability=True, tol=0.005980474794650505)\">\n",
       "<ellipse fill=\"#ffffff\" stroke=\"#000000\" cx=\"207\" cy=\"-39.3848\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"207\" y=\"-36.0848\" font-family=\"Times,serif\" font-size=\"11.00\" fill=\"#000000\">SVC</text>\n",
       "</a>\n",
       "</g>\n",
       "</g>\n",
       "<!-- cat&#45;&gt;svc -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>cat&#45;&gt;svc</title>\n",
       "<path fill=\"none\" stroke=\"#000000\" d=\"M144.003,-39.3848C152.0277,-39.3848 160.9665,-39.3848 169.5309,-39.3848\"/>\n",
       "<polygon fill=\"#000000\" stroke=\"#000000\" points=\"169.7051,-42.8849 179.705,-39.3848 169.705,-35.8849 169.7051,-42.8849\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x136fb6160>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "predictions = trained.predict(X_test)\n",
    "print(f'accuracy {accuracy_score(y_test, predictions):.1%}')\n",
    "trained.visualize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Input and Output Schemas\n",
    "\n",
    "Besides schemas for hyperparameter, Lale also provides operator tags\n",
    "and schemas for input and output data of operators."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'pre': ['~categoricals'],\n",
       " 'op': ['estimator', 'classifier', 'interpretable'],\n",
       " 'post': ['probabilities']}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "LR.get_tags()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'$schema': 'http://json-schema.org/draft-04/schema#',\n",
       " 'type': 'object',\n",
       " 'required': ['X', 'y'],\n",
       " 'additionalProperties': False,\n",
       " 'properties': {'X': {'description': 'Features; the outer array is over samples.',\n",
       "   'type': 'array',\n",
       "   'items': {'type': 'array', 'items': {'type': 'number'}}},\n",
       "  'y': {'description': 'Target class labels; the array is over samples.',\n",
       "   'anyOf': [{'type': 'array', 'items': {'type': 'number'}},\n",
       "    {'type': 'array', 'items': {'type': 'string'}}]}}}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "LR.get_schema('input_fit')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'$schema': 'http://json-schema.org/draft-04/schema#',\n",
       " 'description': 'Predicted class label per sample.',\n",
       " 'anyOf': [{'type': 'array', 'items': {'type': 'number'}},\n",
       "  {'type': 'array', 'items': {'type': 'string'}}]}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "LR.get_schema('output_predict')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}