{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## one_hot: One-Hot encoding function for class label arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A function that performs one-hot encoding for class labels." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> from mlxtend.preprocessing import one_hot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Typical supervised machine learning algorithms for classifications assume that the class labels are *nominal* (a special case of *categorical* where no order is implied). A typical example of an nominal feature would be \"color\" since we can't say (in most applications) that \"orange > blue > red\".\n", "\n", "The `one_hot` function provides a simple interface to convert class label integers into a so-called one-hot array, where each unique label is represented as a column in the new array.\n", "\n", "For example, let's assume we have 5 data points from 3 different classes: 0, 1, and 2.\n", "\n", " y = [0, # sample 1, class 0 \n", " 1, # sample 2, class 1 \n", " 0, # sample 3, class 0\n", " 2, # sample 4, class 2\n", " 2] # sample 5, class 2\n", " \n", "After one-hot encoding, we then obtain the following array (note that the index position of the \"1\" in each row denotes the class label of this sample):\n", "\n", " y = [[1, 0, 0], # sample 1, class 0 \n", " [0, 1, 0], # sample 2, class 1 \n", " [1, 0, 0], # sample 3, class 0\n", " [0, 0, 1], # sample 4, class 2\n", " [0, 0, 1] # sample 5, class 2\n", " ]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1 - Defaults" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 0., 0.],\n", " [ 0., 1., 0.],\n", " [ 0., 0., 1.],\n", " [ 0., 1., 0.],\n", " [ 0., 0., 1.]])" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from mlxtend.preprocessing import one_hot\n", "import numpy as np\n", "\n", "y = np.array([0, 1, 2, 1, 2])\n", "one_hot(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 2 - Python Lists" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 0., 0.],\n", " [ 0., 1., 0.],\n", " [ 0., 0., 1.],\n", " [ 0., 1., 0.],\n", " [ 0., 0., 1.]])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from mlxtend.preprocessing import one_hot\n", "\n", "y = [0, 1, 2, 1, 2]\n", "one_hot(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 3 - Integer Arrays" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [0, 1, 0],\n", " [0, 0, 1],\n", " [0, 1, 0],\n", " [0, 0, 1]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from mlxtend.preprocessing import one_hot\n", "\n", "y = [0, 1, 2, 1, 2]\n", "one_hot(y, dtype='int')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 4 - Arbitrary Numbers of Class Labels" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", " [ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],\n", " [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],\n", " [ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],\n", " [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from mlxtend.preprocessing import one_hot\n", "\n", "y = [0, 1, 2, 1, 2]\n", "one_hot(y, num_labels=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## API" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "## one_hot\n", "\n", "*one_hot(y, num_labels='auto', dtype='float')*\n", "\n", "One-hot encoding of class labels\n", "\n", "**Parameters**\n", "\n", "- `y` : array-like, shape = [n_classlabels]\n", "\n", " Python list or numpy array consisting of class labels.\n", "\n", "- `num_labels` : int or 'auto'\n", "\n", " Number of unique labels in the class label array. Infers the number\n", " of unique labels from the input array if set to 'auto'.\n", "\n", "- `dtype` : str\n", "\n", " NumPy array type (float, float32, float64) of the output array.\n", "\n", "**Returns**\n", "\n", "- `ary` : numpy.ndarray, shape = [n_classlabels]\n", "\n", " One-hot encoded array, where each sample is represented as\n", " a row vector in the returned array.\n", "\n", "**Examples**\n", "\n", "For usage examples, please see\n", " [https://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/](https://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/)\n", "\n", "\n" ] } ], "source": [ "with open('../../api_modules/mlxtend.preprocessing/one_hot.md', 'r') as f:\n", " print(f.read())" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }