{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "_This notebook contains code and comments from Section 5.3 of the book [Ensemble Methods for Machine Learning](https://www.manning.com/books/ensemble-methods-for-machine-learning). Please see the book for additional details on this topic. This notebook and code are released under the [MIT license](https://github.com/gkunapuli/ensemble-methods-notebooks/blob/master/LICENSE)._\n", "\n", "## 5.3 LightGBM: A Framework for Gradient Boosting \n", "\n", "[LightGBM](https://lightgbm.readthedocs.io/en/latest/), or Light Gradient Boosted Machines, is an open source gradient boosting framework that was originally developed and released by Microsoft. \n", "At its core, LightGBM is essentially a histogram-based gradient boosting approach. However, it also has several modeling and algorithmic features that enable it to handle large-scale data. In particular, LightGBM offers the following advantages:\n", "\n", "* Algorithmic speedups such as gradient based one-sided sampling and exclusive feature bundling that result in faster training and lower memory usage; these are described in more detail in Section 5.3.1;\n", "* Support for a large number of loss functions for classification, regression and ranking as well as application-specific custom loss functions (Section 5.3.2);\n", "* Support for parallel and GPU learning, which enables it handle large-scale data sets (parallel/GPU-based machine learning is out-of-scope for this book).\n", "\n", "\n", "### 5.3.2 Gradient Boosting with LightGBM\n", "LightGBM is available for various platforms including Windows, Linux and MacOS, and can either be built from scratch or installed using tools such as ``pip``. See the documentation of LightGBM for [installation instructions](https://lightgbm.readthedocs.io/en/latest/). Its usage syntax is quite similar to ``scikit-learn``’s. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Continuing with the breast cancer data set from Section 5.2.3, we can learn a gradient boosting model using LightGBM as follows:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_breast_cancer\n", "from sklearn.model_selection import train_test_split\n", "X, y = load_breast_cancer(return_X_y=True)\n", "Xtrn, Xtst, ytrn, ytst = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train a gradient boosting classifier using LightGBM" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LGBMClassifier(max_depth=1, n_estimators=20)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LGBMClassifier(max_depth=1, n_estimators=20)
LGBMClassifier(early_stopping=5, max_depth=1, n_estimators=50)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LGBMClassifier(early_stopping=5, max_depth=1, n_estimators=50)