{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
**Everytime we try to select the best features :)**
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Image on web](https://i.giphy.com/media/5yLgoceFO3BdJW1zvFu/giphy.webp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, the informal intro is coming to an end. It's time to understand some formalized theory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Introduce to SequentialFeatureSelector" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please, install some libraries, if you haven't it in your system" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#!pip install pandas\n", "#!pip install mlxtend\n", "#!pip install scikit-learn\n", "#!pip install matplotlib" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "from mlxtend.feature_selection import SequentialFeatureSelector\n", "from mlxtend.plotting import plot_sequential_feature_selection as plot_sfs\n", "from sklearn.decomposition import PCA\n", "from sklearn.feature_selection import RFE\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.model_selection import KFold, cross_val_score, train_test_split\n", "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mlxtend SequentialFeatureSelector is greedy search algorithm that is used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace where k < d." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 4 different flavors of SFAs available via the *SequentialFeatureSelector*:\n", "\n", "* Sequential Forward Selection (SFS)\n", "* Sequential Backward Selection (SBS)\n", "* Sequential Forward Floating Selection (SFFS)\n", "* Sequential Backward Floating Selection (SBFS)\n", "\n", "In \"forward\" algorithm we start with no features in our subset and add one feature on each iteration, that maximize quality metric. In the contrary, \"backward\" algorithm start with full subset of features and remove one feature on each iteration maximizing quality of our model.\n", "\n", "The floating variants, SFFS and SBFS, can be considered as extensions to the simpler SFS and SBS algorithms. The floating algorithms have an additional exclusion or inclusion step to remove features once they were included (or excluded), so that a larger number of feature subset combinations can be sampled.\n", "\n", "Lets take a look at each of them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sequential Forward Selection (SFS)\n", "\n", "\n", "### Input: \n", "\n", "Y={y1,y2,...,yd}\n", "\n", "* The SFS algorithm takes the whole d-dimensional feature set as input.\n", "\n", "### Output: \n", "\n", "Xk={xj|j=1,2,...,k;xj∈Y}, where k=(0,1,2,...,d)\n", "\n", "* SFS returns a subset of features; the number of selected features k, where k**How we will choose features after this tutorial :)**
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Of course, RFE, PCA and SBS solve slightly different tasks. It's important to know how and when we should implement one or another instrument. And more important is to have an inquiring mind \n", "and test the craziest hypotheses :)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial we studied something new about feature selection, understood how SequentialFeatureSelector from Mlxtend library works - it allows very easy selection from new generated features and boost model's quality. Then we compared it with another feature selection and dimension reducing algorithms.