{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lale: Auto-ML and Types for Scikit-learn\n", "\n", "This notebook is an introductory guide to\n", "[Lale](https://github.com/ibm/lale) for scikit-learn users.\n", "[Scikit-learn](https://scikit-learn.org) is a popular, easy-to-use,\n", "and comprehensive data science library for Python. This notebook aims\n", "to show how Lale can make scikit-learn even better in two areas:\n", "auto-ML and type checking. First, if you do not want to manually\n", "select all algorithms or tune all hyperparameters, you can leave it to\n", "Lale to do that for you automatically. Second, when you pass\n", "hyperparameters or datasets to scikit-learn, Lale checks that these\n", "are type-correct. For both auto-ML and type-checking, Lale uses a\n", "single source of truth: machine-readable schemas associated with\n", "scikit-learn compatible transformers and estimators. Rather than\n", "invent a new schema specification language, Lale uses [JSON\n", "Schema](https://json-schema.org/understanding-json-schema/), because\n", "it is popular, widely-supported, and makes it easy to store or send\n", "hyperparameters as JSON objects. Furthermore, by using the same\n", "schemas both for auto-ML and for type-checking, Lale ensures that\n", "auto-ML is consistent with type checking while also reducing the\n", "maintenance burden to a single set of schemas.\n", "\n", "Lale is an open-source Python library and you can install it by doing\n", "`pip install lale`. See\n", "[installation](https://github.com/IBM/lale/blob/master/docs/installation.rst)\n", "for further instructions. Lale uses the term *operator* to refer to\n", "what scikit-learn calls machine-learning transformer or estimator.\n", "Lale provides schemas for 144\n", "[operators](https://github.com/IBM/lale/tree/master/lale/lib). Most of\n", "these operators come from scikit-learn itself, but there are also\n", "operators from other frameworks such as XGBoost or PyTorch.\n", "If Lale does not yet support your favorite operator, you can add it\n", "yourself by following this\n", "[guide](https://nbviewer.jupyter.org/github/IBM/lale/blob/master/examples/docs_new_operators.ipynb).\n", "If you do add a new operator, please consider contributing it back to\n", "Lale!\n", "\n", "The rest of this notebook first demonstrates auto-ML, then reveals\n", "some of the schemas that make that possible, and finally demonstrates\n", "how to also use the very same schemas for type checking." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Auto-ML with Lale\n", "\n", "Lale serves as an interface for two Auto-ML tasks: hyperparameter tuning\n", "and algorithm selection. Rather than provide new implementations for\n", "these tasks, Lale reuses existing implementations. The next few cells\n", "demonstrate how to use Hyperopt and GridSearchCV from Lale. Lale also\n", "supports additional optimizers, not shown in this notebook. In all\n", "cases, the syntax for specifying the search space is the same.\n", "\n", "### 1.1 Hyperparameter Tuning with Lale and Hyperopt\n", "\n", "Let's start by looking at hyperparameter tuning, which is an important\n", "subtask of auto-ML. To demonstrate it, we first need a dataset.\n", "Therefore, we load the California Housing dataset and display the\n", "first few rows to get a feeling for the data. Lale can process both\n", "Pandas dataframes and Numpy ndarrays; here we use dataframes." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | MedInc | \n", "HouseAge | \n", "AveRooms | \n", "AveBedrms | \n", "Population | \n", "AveOccup | \n", "Latitude | \n", "Longitude | \n", "target | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "3.2596 | \n", "33.0 | \n", "5.017657 | \n", "1.006421 | \n", "2300.0 | \n", "3.691814 | \n", "32.71 | \n", "-117.03 | \n", "1.030 | \n", "
1 | \n", "3.8125 | \n", "49.0 | \n", "4.473545 | \n", "1.041005 | \n", "1314.0 | \n", "1.738095 | \n", "33.77 | \n", "-118.16 | \n", "3.821 | \n", "
2 | \n", "4.1563 | \n", "4.0 | \n", "5.645833 | \n", "0.985119 | \n", "915.0 | \n", "2.723214 | \n", "34.66 | \n", "-120.48 | \n", "1.726 | \n", "
3 | \n", "1.9425 | \n", "36.0 | \n", "4.002817 | \n", "1.033803 | \n", "1418.0 | \n", "3.994366 | \n", "32.69 | \n", "-117.11 | \n", "0.934 | \n", "
4 | \n", "3.5542 | \n", "43.0 | \n", "6.268421 | \n", "1.134211 | \n", "874.0 | \n", "2.300000 | \n", "36.78 | \n", "-119.80 | \n", "0.965 | \n", "