{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "1eece162-2c5e-40f6-830b-9392fca6d7e7", "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython xgboost" ] }, { "cell_type": "markdown", "id": "6aac3a13", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "id": "ee615ac5", "metadata": {}, "source": [ "# XGBoost + k-fold CV + Feature Importance\n", "\n", "As we all know that a lot of competitions were won using only one algorithm - **XGBoost**. It is one of the most popular machine learning algorithm these days. It works well for both types of tasks - regression and classification.\n", "\n", "So, in this kernel, we will discuss XGBoost and develop a simple baseline XGBoost model with Python." ] }, { "cell_type": "markdown", "id": "67b9311a", "metadata": {}, "source": [ "## Introduction to XGBoost Algorithm\n", "\n", "- **XGBoost** stands for **Extreme Gradient Boosting.**\n", "\n", "- It is a performant machine learning library based on the paper [Greedy Function Approximation: A Gradient Boosting Machine, by Friedman](https://statweb.stanford.edu/~jhf/ftp/trebst.pdf)\n", "\n", "- It is an open source machine learning library providing a high-performance implementation of gradient boosted decision trees.\n", "\n", "- It is originally written in C++ and is comparatively faster than other ensemble classifiers.\n", "\n", "- It belongs to a family of boosting algorithms and uses the **gradient boosting (GBM)** framework at its core. \n", "\n", "- XGBoost implements a [Gradient Boosting algorithm](https://en.wikipedia.org/wiki/Gradient_boosting) based on decision trees.\n", "\n", "- So, to understand XGBoost completely, we need to understand **Gradient Boosting Algorithm** (discussed later).\n", "\n", "- Please follow the links below for more in-depth discussion on XGBoost.\n", "\n", " [XGBoost Official Documentation](https://xgboost.readthedocs.io/en/latest/)\n", "\n", " [XGBoost from Wikipedia](https://en.wikipedia.org/wiki/XGBoost)" ] }, { "cell_type": "markdown", "id": "c231cf7e", "metadata": {}, "source": [ "### Evolution of tree-based algorithms\n", "\n", "- The tree based algorithms have evolved over the years.\n", "\n", "- XGBoost belongs to a family of tree-based algorithms.\n", "\n", "- Please see the chart below for the evolution of tree-based algorithms over the years." ] }, { "cell_type": "markdown", "id": "26e06d03", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/ml-advanced/xgboost_k_fold_cv_feature_importance/Evolution_of_tree_based_algorithms.jpg\n", "---\n", "name: 'Evolution of tree-based algorithms'\n", "width: 90%\n", "---\n", "Evolution of tree-based algorithms\n", ":::" ] }, { "cell_type": "markdown", "id": "e3c22fe6", "metadata": {}, "source": [ "### Main features of XGBoost\n", "\n", "- The primary reasons we should use this algorithm are its **accuracy**, **efficiency** and **feasibility**. \n", "\n", "- It is a linear model and a [tree learning](https://en.wikipedia.org/wiki/Decision_tree_learning) algorithm that does parallel computations on a single machine. \n", "\n", "- It also has extra features for doing cross validation and computing feature importance. \n", "\n", "- Given below are some of the main features of the model:\n", "\n", " - **Sparsity** : It accepts sparse input for tree booster and linear booster.\n", " \n", " - **Customization** : It supports customized objective and evaluation functions.\n", " \n", " - **DMatrix** : Its optimized data structure that improves its performance and efficiency." ] }, { "cell_type": "markdown", "id": "12731c15", "metadata": {}, "source": [ "## Bagging Vs Boosting\n", "\n", "- To know bagging and boosting, we need to know ensemble methods.\n", "\n", "- Ensemble methods combine several decision trees to produce better predictive performance than utilizing a single decision tree. \n", "\n", "- The main principle behind the ensemble model is that a group of weak learners come together to form a strong learner.\n", "\n", "- Now, we will talk about two techniques to perform ensemble decision trees. These are as follows:\n", "\n", " - Bagging\n", "\n", " - Boosting\n", " \n", "- You can follow the link below for more in-depth discussion on Bagging and Boosting.\n", "\n", " [Bagging vs Boosting](https://www.kaggle.com/prashant111/bagging-vs-boosting)" ] }, { "cell_type": "markdown", "id": "88a724d2", "metadata": {}, "source": [ "### Bagging\n", "\n", "- Bagging (or Bootstrap Aggregation), is a simple and very powerful ensemble method. \n", "\n", "- Bagging is the application of the Bootstrap procedure to a high-variance machine learning algorithm, typically decision trees.\n", "\n", "- The idea behind bagging is combining the results of multiple models (for instance, all decision trees) to get a generalized result. \n", "\n", "- Bagging technique uses these subsets (bags) to get a fair idea of the distribution (complete set). The size of subsets created for bagging may be less than the original set.\n", "\n", "- Bagging had each model run independently and then aggregate the outputs at the end without preference to any model.\n", "\n", "- Bagging can be depicted with the following diagram:" ] }, { "cell_type": "markdown", "id": "4ce0d0e4", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/ml-advanced/xgboost_k_fold_cv_feature_importance/Process_of_bagging.webp\n", "---\n", "name: 'Process of Bagging'\n", "width: 90%\n", "---\n", "Process of Bagging\n", ":::" ] }, { "cell_type": "markdown", "id": "f0aa400b", "metadata": {}, "source": [ "### Boosting\n", "\n", "- Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model. The succeeding models are dependent on the previous model.\n", "\n", "- In this technique, learners are learned sequentially with early learners fitting simple models to the data and then analyzing data for errors. In other words, we fit consecutive trees (random sample) and at every step, the goal is to solve for net error from the prior tree.\n", "\n", "- When an input is misclassified by a hypothesis, its weight is increased so that next hypothesis is more likely to classify it correctly. By combining the whole set at the end converts weak learners into better performing model.\n", "\n", "- Boosting is another ensemble technique to create a collection of models. \n", "\n", "- Boosting can be depicted with the following diagram:" ] }, { "cell_type": "markdown", "id": "aaa6c4f9", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/ml-advanced/xgboost_k_fold_cv_feature_importance/Process_of_boosting.webp\n", "---\n", "name: 'Process of Boosting'\n", "width: 90%\n", "---\n", "Process of Boosting\n", ":::" ] }, { "cell_type": "markdown", "id": "5ba8fdef", "metadata": {}, "source": [ "## XGBoost algorithm intuition\n", "\n", "- XGBoost is a powerful and lightning fast machine learning library. It’s commonly used to win some professional level competitions. \n", "\n", "- It’s an intimidating algorithm, especially because of the number of large parameters that XGBoost provides.\n", "\n", "- Also, there is some confusion regarding gradient boosting, gradient boosted trees and XGBoost.\n", "\n", "- So, in this section, we will discuss - **gradient boosting**, **gradient boosted trees** and **XGBoost**. The purpose of this section is to clarify these concepts." ] }, { "cell_type": "markdown", "id": "84f19877", "metadata": {}, "source": [ "### Gradient boosting\n", "\n", "- Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.\n", "\n", "- It builds the model in a stage-wise fashion like other boosting methods do and it generalizes them by allowing optimization of an arbitrary differentiable loss function.\n", "\n", "- The objective of any supervised learning algorithm is to define a loss function and minimize it. The same is true for Gradient Boosting algorithm. Here, we have mean squared error (MSE) as loss-function defined as follows:\n", "\n", "$$ Loss = MSE = \\sum(y_i-y_i^p)^2 $$" ] }, { "cell_type": "markdown", "id": "fb4491af-58b5-45cf-bb0c-2f34b23436a9", "metadata": {}, "source": [ "- where $y_i = ith$ target value, $y_i^p = ith$ prediction, $L(y_i,y_i^p)$ is loss function\n", "\n", "- We want our predictions, such that our loss function (MSE) is minimum. \n", "\n", "- By using gradient descent and updating our predictions based on a learning rate, we can find the values where MSE is minimum.\n", "\n", "- It can be depicted as follows:\n", "$$ y_i^p = y_i^p + a * \\sigma\\sum(y_i - y_i^p)^2/\\sigma y_i^p $$" ] }, { "cell_type": "markdown", "id": "67f518c4-68c1-4460-9241-dd3d94383315", "metadata": {}, "source": [ "- which becomes, $y_i^p = y_i^p - a * 2 * \\sum(y_i - y_i^p)$\n", "\n", "- where, $a$ is learning rate and $\\sum(y_i-y_i^p)$ is sum of residuals\n", "\n", "- So, we are basically updating the predictions such that the sum of our residuals is close to 0 (or minimum) and predicted values are sufficiently close to actual values.\n", "\n", "- Now, we train our second model on the gradient of the error with respect to the loss predictions of the first model. Thus, we should correct the mistakes of the first model. \n", "\n", "- This is the core of gradient boosting, and what allows many simple models to compensate for each other’s weaknesses to better fit the data.\n", "\n", "- Gradient Boosting is an iterative procedure. So, we will repeat the above process over and over again. Each time we fit a new model to the gradient of the error of the updated sum of models.\n", "\n", "- So, gradient boosting is a method for optimizing the function F, but it doesn’t really care about h (since nothing about the optimization of h is defined). This means that any base model h can be used to construct F." ] }, { "cell_type": "markdown", "id": "29035db5", "metadata": {}, "source": [ "### Gradient Boosted Trees\n", "\n", "- Gradient boosted trees consider the special case where the simple model h is a decision tree. \n", "\n", "- It can be depicted with the following diagram which is taken from XGBoost’s documentation." ] }, { "cell_type": "markdown", "id": "7cb6cf42", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/ml-advanced/xgboost_k_fold_cv_feature_importance/Gradient_boosted_trees.png\n", "---\n", "name: 'Gradient Boosted Trees'\n", "width: 90%\n", "---\n", "Gradient Boosted Trees\n", ":::" ] }, { "cell_type": "markdown", "id": "eee8a270", "metadata": {}, "source": [ "- In this case, there are 2 kinds of parameters P - **the weights at each leaf w** and **the number of leaves T in each tree** (so that in the above example, T=3 and w=[2, 0.1, -1]).\n", "\n", "- When building a decision tree, a challenge is to decide how to split a current leaf. For instance, in the above image, how could we add another layer to the (age > 15) leaf. \n", "\n", "- A ‘greedy’ way to do this is to consider every possible split on the remaining features (so, gender and occupation), and calculate the new loss for each split. We could then pick the tree which most reduces our loss." ] }, { "cell_type": "markdown", "id": "cb08b5a2", "metadata": {}, "source": [ ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/ml-advanced/xgboost_k_fold_cv_feature_importance/New_tree_minimizing_loss.png\n", "---\n", "name: 'New Tree minimizing loss'\n", "width: 90%\n", "---\n", "New Tree minimizing loss\n", ":::" ] }, { "cell_type": "markdown", "id": "78394939", "metadata": {}, "source": [ "- In addition to finding the new tree structures, the weights at each node need to be calculated as well, such that the loss is minimized. Since the tree structure is now fixed, this can be done analytically now by setting the loss function = 0.\n", "\n", "- After derivation, we get the following result.\n", "\n", "$$ w_j = \\frac{\\sum_{i\\in I_j} \\frac{\\partial loss}{\\partial (\\hat{y}=0)}}{\\sum_{i\\in I_j}(\\frac{\\partial^2 loss}{\\partial (\\hat{y}=0)^2})+ \\lambda} $$\n", "\n", "- Where $I_j$ is a set containing all the instances ((x, y) datapoints) at a leaf, and $w_j$ is the weight at leaf j. \n", "\n", "- This looks more intimidating than it is; for some intuition, if we consider $loss = MSE =(y-\\hat{y})^2$, then taking the first and second gradients where $\\hat{y} =0$ yields\n", "\n", "$$ w_j = \\frac{\\sum_{i\\in I_j} y}{\\sum_{i\\in I_j} 2 + \\lambda} $$\n", "\n", "- Here, the weights effectively become the average of the true labels at each leaf (with some regularization from the λ constant)." ] }, { "cell_type": "markdown", "id": "4ce7cce9", "metadata": {}, "source": [ "### Extreme gradient boosting (XGBoost)\n", "\n", "- XGBoost is one of the fastest implementations of gradient boosted trees. It does this by tackling one of the major inefficiencies of gradient boosted trees. \n", "\n", "- Consider the case where there are thousands of features, and therefore thousands of possible splits. Now, if we consider the potential loss for all possible splits to create a new branch we have thousands of potential splits and losses.\n", "\n", "- XGBoost tackles this inefficiency by looking at the distribution of features across all data points in a leaf and using this information to reduce the search space of possible feature splits.\n", "\n", "- Although XGBoost implements a few regularization tricks, this speed up is by far the most useful feature of the library, allowing many hyperparameter settings to be investigated quickly. \n", "\n", "- This is helpful because there are many hyperparameters to tune which are designed to limit overfitting." ] }, { "cell_type": "markdown", "id": "aa0d5b4e", "metadata": {}, "source": [ "## Implementing XGBoost in Python" ] }, { "cell_type": "markdown", "id": "6adb96b8", "metadata": {}, "source": [ "### Load libraries" ] }, { "cell_type": "code", "execution_count": 23, "id": "2e72408c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "ef1f470a", "metadata": {}, "source": [ "### Read dataset" ] }, { "cell_type": "code", "execution_count": 24, "id": "2daa4d06", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "data = 'https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/wholesale_customers_data.csv'\n", "\n", "df = pd.read_csv(data)" ] }, { "cell_type": "markdown", "id": "ea7344e7", "metadata": {}, "source": [ "### EDA" ] }, { "cell_type": "markdown", "id": "d821e8a7", "metadata": {}, "source": [ "#### Shape of dataset\n", "\n", "- I will start off by checking the shape of the dataset." ] }, { "cell_type": "code", "execution_count": 25, "id": "227f8764", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "(440, 8)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "markdown", "id": "979ad8eb", "metadata": {}, "source": [ "We can see that there are 440 instances and 8 attributes in the dataset." ] }, { "cell_type": "markdown", "id": "675a1194", "metadata": {}, "source": [ "#### Preview dataset" ] }, { "cell_type": "code", "execution_count": 26, "id": "a39f72fe", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "output-scoll" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ChannelRegionFreshMilkGroceryFrozenDetergents_PaperDelicassen
023126699656756121426741338
123705798109568176232931776
223635388087684240535167844
313132651196422164045071788
4232261554107198391517775185
\n", "
" ], "text/plain": [ " Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicassen\n", "0 2 3 12669 9656 7561 214 2674 1338\n", "1 2 3 7057 9810 9568 1762 3293 1776\n", "2 2 3 6353 8808 7684 2405 3516 7844\n", "3 1 3 13265 1196 4221 6404 507 1788\n", "4 2 3 22615 5410 7198 3915 1777 5185" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "134476d8", "metadata": {}, "source": [ "- We can see that `Channel` variable contains values as 1 and 2. \n", "\n", "- These two values classify the customers from two different channels as\n", " - 1 for Horeca (Hotel/Retail/Café) customers and \n", " - 2 for Retail channel (nominal) customers." ] }, { "cell_type": "markdown", "id": "0fd48237", "metadata": {}, "source": [ "#### Summary of dataset" ] }, { "cell_type": "code", "execution_count": 27, "id": "85730528", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 440 entries, 0 to 439\n", "Data columns (total 8 columns):\n", " # Column Non-Null Count Dtype\n", "--- ------ -------------- -----\n", " 0 Channel 440 non-null int64\n", " 1 Region 440 non-null int64\n", " 2 Fresh 440 non-null int64\n", " 3 Milk 440 non-null int64\n", " 4 Grocery 440 non-null int64\n", " 5 Frozen 440 non-null int64\n", " 6 Detergents_Paper 440 non-null int64\n", " 7 Delicassen 440 non-null int64\n", "dtypes: int64(8)\n", "memory usage: 27.6 KB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "markdown", "id": "2bf588c9", "metadata": {}, "source": [ "#### Summary statistics of dataset" ] }, { "cell_type": "code", "execution_count": 28, "id": "c335dacb", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "output-scoll" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ChannelRegionFreshMilkGroceryFrozenDetergents_PaperDelicassen
count440.000000440.000000440.000000440.000000440.000000440.000000440.000000440.000000
mean1.3227272.54318212000.2977275796.2659097951.2772733071.9318182881.4931821524.870455
std0.4680520.77427212647.3288657380.3771759503.1628294854.6733334767.8544482820.105937
min1.0000001.0000003.00000055.0000003.00000025.0000003.0000003.000000
25%1.0000002.0000003127.7500001533.0000002153.000000742.250000256.750000408.250000
50%1.0000003.0000008504.0000003627.0000004755.5000001526.000000816.500000965.500000
75%2.0000003.00000016933.7500007190.25000010655.7500003554.2500003922.0000001820.250000
max2.0000003.000000112151.00000073498.00000092780.00000060869.00000040827.00000047943.000000
\n", "
" ], "text/plain": [ " Channel Region Fresh Milk Grocery \\\n", "count 440.000000 440.000000 440.000000 440.000000 440.000000 \n", "mean 1.322727 2.543182 12000.297727 5796.265909 7951.277273 \n", "std 0.468052 0.774272 12647.328865 7380.377175 9503.162829 \n", "min 1.000000 1.000000 3.000000 55.000000 3.000000 \n", "25% 1.000000 2.000000 3127.750000 1533.000000 2153.000000 \n", "50% 1.000000 3.000000 8504.000000 3627.000000 4755.500000 \n", "75% 2.000000 3.000000 16933.750000 7190.250000 10655.750000 \n", "max 2.000000 3.000000 112151.000000 73498.000000 92780.000000 \n", "\n", " Frozen Detergents_Paper Delicassen \n", "count 440.000000 440.000000 440.000000 \n", "mean 3071.931818 2881.493182 1524.870455 \n", "std 4854.673333 4767.854448 2820.105937 \n", "min 25.000000 3.000000 3.000000 \n", "25% 742.250000 256.750000 408.250000 \n", "50% 1526.000000 816.500000 965.500000 \n", "75% 3554.250000 3922.000000 1820.250000 \n", "max 60869.000000 40827.000000 47943.000000 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "id": "6f5d9583", "metadata": {}, "source": [ "#### Check for missing values" ] }, { "cell_type": "code", "execution_count": 29, "id": "b4fc95a9", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "Channel 0\n", "Region 0\n", "Fresh 0\n", "Milk 0\n", "Grocery 0\n", "Frozen 0\n", "Detergents_Paper 0\n", "Delicassen 0\n", "dtype: int64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isnull().sum()" ] }, { "cell_type": "markdown", "id": "10e4c602", "metadata": {}, "source": [ "We can see that there are no missing values in the dataset." ] }, { "cell_type": "markdown", "id": "3097837b", "metadata": {}, "source": [ "### Declare feature vector and target variable" ] }, { "cell_type": "code", "execution_count": 30, "id": "dea502bb", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "X = df.drop('Channel', axis=1)\n", "\n", "y = df['Channel']" ] }, { "cell_type": "markdown", "id": "96218c72", "metadata": {}, "source": [ "- Now, let's take a look at feature vector(X) and target variable(y)." ] }, { "cell_type": "code", "execution_count": 31, "id": "be252613", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "output-scoll" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RegionFreshMilkGroceryFrozenDetergents_PaperDelicassen
03126699656756121426741338
13705798109568176232931776
23635388087684240535167844
33132651196422164045071788
432261554107198391517775185
\n", "
" ], "text/plain": [ " Region Fresh Milk Grocery Frozen Detergents_Paper Delicassen\n", "0 3 12669 9656 7561 214 2674 1338\n", "1 3 7057 9810 9568 1762 3293 1776\n", "2 3 6353 8808 7684 2405 3516 7844\n", "3 3 13265 1196 4221 6404 507 1788\n", "4 3 22615 5410 7198 3915 1777 5185" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.head()" ] }, { "cell_type": "code", "execution_count": 32, "id": "22de8390", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "0 2\n", "1 2\n", "2 2\n", "3 1\n", "4 2\n", "Name: Channel, dtype: int64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.head()" ] }, { "cell_type": "markdown", "id": "2aac2c79", "metadata": {}, "source": [ "- We can see that the y label contain values as 1 and 2. \n", "\n", "- We will need to convert it into 0 and 1 for further analysis. \n", "\n", "- We will do it as follows:" ] }, { "cell_type": "code", "execution_count": 33, "id": "90dca21d", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "# convert labels into binary values\n", "\n", "y[y == 2] = 0\n", "\n", "y[y == 1] = 1" ] }, { "cell_type": "code", "execution_count": 34, "id": "4eb0aca6", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "0 0\n", "1 0\n", "2 0\n", "3 1\n", "4 0\n", "Name: Channel, dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# again preview the y label\n", "\n", "y.head()" ] }, { "cell_type": "markdown", "id": "36b19174", "metadata": {}, "source": [ "- Now, we will convert the dataset into an optimized data structure called **Dmatrix** that XGBoost supports and gives it acclaimed performance and efficiency gains. \n", "\n", "- We will do it as follows:" ] }, { "cell_type": "code", "execution_count": 35, "id": "eba5aa92", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "# import XGBoost\n", "import xgboost as xgb\n", "\n", "# define data_dmatrix\n", "data_dmatrix = xgb.DMatrix(data=X,label=y)" ] }, { "cell_type": "markdown", "id": "65c7fd50", "metadata": {}, "source": [ "### Split data into separate training and test set" ] }, { "cell_type": "code", "execution_count": 36, "id": "f58da809", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "# split X and y into training and testing sets\n", "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)" ] }, { "cell_type": "markdown", "id": "82284170", "metadata": {}, "source": [ "### Train the XGBoost Classifier\n", "\n", "- In order to train the XGBoost classifier, we need to know different parameters that XGBoost provides.\n", "\n", "- We will discuss these parameters in later sections.\n", "\n", "- Now, its time to train the XGBoost Classifier.\n", "\n", "- We will proceed as follows:" ] }, { "cell_type": "code", "execution_count": 37, "id": "cdb622dd", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/html": [ "
XGBClassifier(alpha=10, base_score=0.5, booster='gbtree', callbacks=None,\n",
       "              colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,\n",
       "              early_stopping_rounds=None, enable_categorical=False,\n",
       "              eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise',\n",
       "              importance_type=None, interaction_constraints='',\n",
       "              learning_rate=1.0, max_bin=256, max_cat_to_onehot=4,\n",
       "              max_delta_step=0, max_depth=4, max_leaves=0, min_child_weight=1,\n",
       "              missing=nan, monotone_constraints='()', n_estimators=100,\n",
       "              n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0,\n",
       "              reg_alpha=10, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "XGBClassifier(alpha=10, base_score=0.5, booster='gbtree', callbacks=None,\n", " colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,\n", " early_stopping_rounds=None, enable_categorical=False,\n", " eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise',\n", " importance_type=None, interaction_constraints='',\n", " learning_rate=1.0, max_bin=256, max_cat_to_onehot=4,\n", " max_delta_step=0, max_depth=4, max_leaves=0, min_child_weight=1,\n", " missing=nan, monotone_constraints='()', n_estimators=100,\n", " n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0,\n", " reg_alpha=10, ...)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# import XGBClassifier\n", "from xgboost import XGBClassifier\n", "\n", "\n", "# declare parameters\n", "params = {\n", " 'objective':'binary:logistic',\n", " 'max_depth': 4,\n", " 'alpha': 10,\n", " 'learning_rate': 1.0,\n", " 'n_estimators':100\n", " } \n", " \n", " \n", "# instantiate the classifier \n", "xgb_clf = XGBClassifier(**params)\n", "\n", "\n", "# fit the classifier to the training data\n", "xgb_clf.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "id": "00962183", "metadata": {}, "source": [ "we can view the parameters of the xgb trained model as follows:" ] }, { "cell_type": "code", "execution_count": 38, "id": "94bbf482", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "XGBClassifier(alpha=10, base_score=0.5, booster='gbtree', callbacks=None,\n", " colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1,\n", " early_stopping_rounds=None, enable_categorical=False,\n", " eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise',\n", " importance_type=None, interaction_constraints='',\n", " learning_rate=1.0, max_bin=256, max_cat_to_onehot=4,\n", " max_delta_step=0, max_depth=4, max_leaves=0, min_child_weight=1,\n", " missing=nan, monotone_constraints='()', n_estimators=100,\n", " n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0,\n", " reg_alpha=10, ...)\n" ] } ], "source": [ "print(xgb_clf)" ] }, { "cell_type": "markdown", "id": "f9436e76", "metadata": {}, "source": [ "### Make predictions with XGBoost classifier" ] }, { "cell_type": "code", "execution_count": 39, "id": "6130b295", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "# make predictions on test data\n", "\n", "y_pred = xgb_clf.predict(X_test)" ] }, { "cell_type": "markdown", "id": "e0198a7e", "metadata": {}, "source": [ "### Check accuracy score" ] }, { "cell_type": "code", "execution_count": 40, "id": "19fc41bc", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "XGBoost model accuracy score: 0.8864\n" ] } ], "source": [ "# compute and print accuracy score\n", "\n", "from sklearn.metrics import accuracy_score\n", "\n", "print('XGBoost model accuracy score: {0:0.4f}'. format(accuracy_score(y_test, y_pred)))" ] }, { "cell_type": "markdown", "id": "1c39a1ae", "metadata": {}, "source": [ "We can see that XGBoost obtain accuracy score of 88.64%." ] }, { "cell_type": "markdown", "id": "4b37d43b", "metadata": {}, "source": [ "## k-fold Cross Validation using XGBoost\n", "\n", "- To build more robust models with XGBoost, we should always perform k-fold cross validation. \n", "\n", "- In this way, we ensure that the original training dataset is used for both training and validation. \n", "\n", "- Also, each entry is used for validation just once. \n", "\n", "- XGBoost supports k-fold cross validation using the cv() method. \n", "\n", "- In this method, we will specify several parameters which are as follows:\n", "\n", " - **nfolds** - This parameter specifies the number of cross-validation sets we want to build.\n", "\n", " - **num_boost_round** - It denotes the number of trees we build.\n", "\n", " - **metrics** - It is the performance evaluation metrics to be considered during CV.\n", "\n", " - **as_pandas** - It is used to return the results in a pandas DataFrame.\n", "\n", " - **early_stopping_rounds** - This parameter stops training of the model early if the hold-out metric does not improve for a given number of rounds.\n", "\n", " - **seed** - This parameter is used for reproducibility of results.\n", "\n", "We can use these parameters to build a k-fold cross-validation model by calling XGBoost's CV() method." ] }, { "cell_type": "code", "execution_count": 41, "id": "691934f2", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "from xgboost import cv\n", "\n", "params = {\"objective\":\"binary:logistic\",'colsample_bytree': 0.3,'learning_rate': 0.1,\n", " 'max_depth': 5, 'alpha': 10}\n", "\n", "xgb_cv = cv(dtrain=data_dmatrix, params=params, nfold=3,\n", " num_boost_round=50, early_stopping_rounds=10, metrics=\"auc\", as_pandas=True, seed=123)" ] }, { "cell_type": "markdown", "id": "b535e1d3", "metadata": {}, "source": [ "- **xgb_cv** contains train and test auc metrics for each boosting round. \n", "\n", "- Let's preview **xgb_cv**." ] }, { "cell_type": "code", "execution_count": 42, "id": "e57098d9", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "output-scoll" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
train-auc-meantrain-auc-stdtest-auc-meantest-auc-std
00.9149990.0097040.8809650.021050
10.9343740.0132630.9235620.022810
20.9362520.0137230.9244330.025777
30.9438780.0090320.9271520.022228
40.9578810.0088450.9351910.016437
\n", "
" ], "text/plain": [ " train-auc-mean train-auc-std test-auc-mean test-auc-std\n", "0 0.914999 0.009704 0.880965 0.021050\n", "1 0.934374 0.013263 0.923562 0.022810\n", "2 0.936252 0.013723 0.924433 0.025777\n", "3 0.943878 0.009032 0.927152 0.022228\n", "4 0.957881 0.008845 0.935191 0.016437" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xgb_cv.head()" ] }, { "cell_type": "markdown", "id": "024ec036", "metadata": {}, "source": [ "## Feature importance with XGBoost\n", "\n", "- XGBoost provides a way to examine the importance of each feature in the original dataset within the model. \n", "\n", "- It involves counting the number of times each feature is split on across all boosting trees in the model. \n", "\n", "- Then we visualize the result as a bar graph, with the features ordered according to how many times they appear.\n", "\n", "- We will proceed as follows:" ] }, { "cell_type": "code", "execution_count": 43, "id": "3ff9cf79", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsYAAAHHCAYAAAC83J6NAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAABMXklEQVR4nO3deZzO5f7H8dc9i8EwE2IsjZ0o2dORRKIhKdWhtDBpKkW2ckoqa03rKaW0nWiTfpV0yk60oFMK0U6WFtpOYsgY5vv7o4f7NA1lpjG3Ma/n43E/mvv6Xt/7+7muuePtmuv+TigIggBJkiSpmIuKdAGSJEnSocBgLEmSJGEwliRJkgCDsSRJkgQYjCVJkiTAYCxJkiQBBmNJkiQJMBhLkiRJgMFYkiRJAgzGkqR8mDx5MqFQiPXr10e6FEkqMAZjSToAe4Pgvh7XX3/9QbnmkiVLGDVqFFu2bDkor1+c7dixg1GjRrFo0aJIlyLpEBIT6QIkqSgZM2YMtWrVytHWqFGjg3KtJUuWMHr0aFJTUzniiCMOyjXy6+KLL+b8888nLi4u0qXky44dOxg9ejQA7du3j2wxkg4ZBmNJyoMuXbrQsmXLSJfxl2zfvp34+Pi/9BrR0dFER0cXUEWFJzs7m127dkW6DEmHKLdSSFIBmjVrFm3btiU+Pp6yZcvStWtXPvzwwxx9PvjgA1JTU6lduzYlS5akcuXK9O3blx9//DHcZ9SoUQwbNgyAWrVqhbdtrF+/nvXr1xMKhZg8eXKu64dCIUaNGpXjdUKhEB999BEXXHAB5cqV46STTgoff/rpp2nRogWlSpWifPnynH/++Xz55Zd/Os597TGuWbMmZ5xxBosWLaJly5aUKlWK4447LrxdYdq0aRx33HGULFmSFi1asHz58hyvmZqaSpkyZfjiiy9ISUkhPj6eqlWrMmbMGIIgyNF3+/btXHPNNSQnJxMXF8fRRx/NXXfdlatfKBRiwIABPPPMMxx77LHExcXx0EMPUbFiRQBGjx4dntu983Yg35/fzu2aNWvCq/qJiYlccskl7NixI9ecPf3007Rq1YrSpUtTrlw5Tj75ZObOnZujz4G8fyQdPK4YS1Ie/Pzzz/zwww852o488kgAnnrqKfr06UNKSgq33347O3bsYOLEiZx00kksX76cmjVrAjBv3jy++OILLrnkEipXrsyHH37II488wocffsjbb79NKBTinHPO4bPPPuPZZ5/lnnvuCV+jYsWKfP/993muu0ePHtSrV49bb701HB5vueUWbrrpJnr27ElaWhrff/89999/PyeffDLLly/P1/aNNWvWcMEFF3DFFVdw0UUXcdddd9GtWzceeughbrjhBq666ioA0tPT6dmzJ59++ilRUf9bo9mzZw+dO3fmb3/7G3fccQezZ89m5MiR7N69mzFjxgAQBAFnnnkmCxcu5NJLL6Vp06bMmTOHYcOG8fXXX3PPPffkqOm1117j//7v/xgwYABHHnkkTZo0YeLEiVx55ZWcffbZnHPOOQA0btwYOLDvz2/17NmTWrVqkZ6ezvvvv89jjz1GpUqVuP3228N9Ro8ezahRozjxxBMZM2YMJUqU4D//+Q+vvfYap512GnDg7x9JB1EgSfpTkyZNCoB9PoIgCLZt2xYcccQRwWWXXZbjvM2bNweJiYk52nfs2JHr9Z999tkACN54441w25133hkAwbp163L0XbduXQAEkyZNyvU6QDBy5Mjw85EjRwZA0KtXrxz91q9fH0RHRwe33HJLjvZVq1YFMTExudr3Nx+/ra1GjRoBECxZsiTcNmfOnAAISpUqFWzYsCHc/vDDDwdAsHDhwnBbnz59AiC4+uqrw23Z2dlB165dgxIlSgTff/99EARBMH369AAIxo0bl6Omv//970EoFArWrFmTYz6ioqKCDz/8MEff77//Ptdc7XWg35+9c9u3b98cfc8+++ygQoUK4eeff/55EBUVFZx99tnBnj17cvTNzs4OgiBv7x9JB49bKSQpDx544AHmzZuX4wG/rjJu2bKFXr168cMPP4Qf0dHRnHDCCSxcuDD8GqVKlQp/vXPnTn744Qf+9re/AfD+++8flLr79euX4/m0adPIzs6mZ8+eOeqtXLky9erVy1FvXhxzzDG0bt06/PyEE04AoEOHDlSvXj1X+xdffJHrNQYMGBD+eu9WiF27djF//nwAZs6cSXR0NAMHDsxx3jXXXEMQBMyaNStHe7t27TjmmGMOeAx5/f78fm7btm3Ljz/+yNatWwGYPn062dnZ3HzzzTlWx/eOD/L2/pF08LiVQpLyoFWrVvv88N3nn38O/BoA9yUhISH89X//+19Gjx7N1KlT+e6773L0+/nnnwuw2v/5/Z00Pv/8c4IgoF69evvsHxsbm6/r/Db8AiQmJgKQnJy8z/affvopR3tUVBS1a9fO0Va/fn2A8H7mDRs2ULVqVcqWLZujX8OGDcPHf+v3Y/8zef3+/H7M5cqVA34dW0JCAmvXriUqKuoPw3le3j+SDh6DsSQVgOzsbODXfaKVK1fOdTwm5n9/3Pbs2ZMlS5YwbNgwmjZtSpkyZcjOzqZz587h1/kjv9/juteePXv2e85vV0H31hsKhZg1a9Y+7y5RpkyZP61jX/Z3p4r9tQe/+7DcwfD7sf+ZvH5/CmJseXn/SDp4/D9NkgpAnTp1AKhUqRIdO3bcb7+ffvqJBQsWMHr0aG6++eZw+94Vw9/aXwDeuyL5+1/88fuV0j+rNwgCatWqFV6RPRRkZ2fzxRdf5Kjps88+Awh/+KxGjRrMnz+fbdu25Vg1/uSTT8LH/8z+5jYv358DVadOHbKzs/noo49o2rTpfvvAn79/JB1c7jGWpAKQkpJCQkICt956K1lZWbmO772TxN7Vxd+vJt577725ztl7r+HfB+CEhASOPPJI3njjjRztDz744AHXe8455xAdHc3o0aNz1RIEQa5bkxWmCRMm5KhlwoQJxMbGcuqppwJw+umns2fPnhz9AO655x5CoRBdunT502uULl0ayD23efn+HKju3bsTFRXFmDFjcq04773Ogb5/JB1crhhLUgFISEhg4sSJXHzxxTRv3pzzzz+fihUrsnHjRmbMmEGbNm2YMGECCQkJnHzyydxxxx1kZWVRrVo15s6dy7p163K9ZosWLQAYMWIE559/PrGxsXTr1o34+HjS0tK47bbbSEtLo2XLlrzxxhvhldUDUadOHcaNG8fw4cNZv3493bt3p2zZsqxbt46XXnqJyy+/nGuvvbbA5udAlSxZktmzZ9OnTx9OOOEEZs2axYwZM7jhhhvC9x7u1q0bp5xyCiNGjGD9+vU0adKEuXPn8vLLLzN48ODw6usfKVWqFMcccwzPPfcc9evXp3z58jRq1IhGjRod8PfnQNWtW5cRI0YwduxY2rZtyznnnENcXBzvvvsuVatWJT09/YDfP5IOsgjdDUOSipS9tyd79913/7DfwoULg5SUlCAxMTEoWbJkUKdOnSA1NTVYtmxZuM9XX30VnH322cERRxwRJCYmBj169Ai++eabfd4+bOzYsUG1atWCqKioHLdH27FjR3DppZcGiYmJQdmyZYOePXsG33333X5v17b3Vme/9+KLLwYnnXRSEB8fH8THxwcNGjQI+vfvH3z66acHNB+/v11b165dc/UFgv79++do23vLuTvvvDPc1qdPnyA+Pj5Yu3ZtcNpppwWlS5cOkpKSgpEjR+a6zdm2bduCIUOGBFWrVg1iY2ODevXqBXfeeWf49md/dO29lixZErRo0SIoUaJEjnk70O/P/uZ2X3MTBEHw+OOPB82aNQvi4uKCcuXKBe3atQvmzZuXo8+BvH8kHTyhICiETz5IkvQnUlNTeeGFF8jIyIh0KZKKKfcYS5IkSRiMJUmSJMBgLEmSJAHgHmNJkiQJV4wlSZIkwGAsSZIkAf6CDx0isrOz+eabbyhbtux+f1WrJEk6tARBwLZt26hatSpRUUV/vdVgrEPCN998Q3JycqTLkCRJ+fDll19y1FFHRbqMv8xgrENC2bJlAVi3bh3ly5ePcDWRkZWVxdy5cznttNOIjY2NdDmFrriPH5wDx1+8xw/OQVEc/9atW0lOTg7/PV7UGYx1SNi7faJs2bIkJCREuJrIyMrKonTp0iQkJBSZPxALUnEfPzgHjr94jx+cg6I8/sNlG2TR3wwiSZIkFQCDsSRJkoTBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAiIl0AdJvnZC+gN0x8ZEuIyLiogPuaAWNRs0hc08o0uUUuuI+fnAOHH/xHj84B3vHL6hZsyYbNmzI1X7VVVfxwAMP7POc559/nptuuon169dTr149br/9dk4//fQ8XdcVY0mSJB1S3n33XTZt2hR+zJs3D4AePXrss/+SJUvo1asXl156KcuXL6d79+50796d1atX5+m6EQ3GqamphEIhQqEQsbGxJCUl0alTJx5//HGys7MP+HVGjRpF06ZND16hB0nNmjW59957C+S11q9fH57LUChEhQoVOO2001i+fHmBvL4kSVJhqVixIpUrVw4/Xn31VerUqUO7du322X/8+PF07tyZYcOG0bBhQ8aOHUvz5s2ZMGFCnq4b8RXjzp07s2nTJtavX8+sWbM45ZRTGDRoEGeccQa7d+8u1Fp27dpVqNc7GObPn8+mTZuYM2cOGRkZdOnShS1btkSsnsNhTiVJUuTs2rWLp59+mr59+xIK7XuLzdKlS+nYsWOOtpSUFJYuXZqna0U8GMfFxVG5cmWqVatG8+bNueGGG3j55ZeZNWsWkydPBmDLli2kpaVRsWJFEhIS6NChAytXrgRg8uTJjB49mpUrV4ZXSw/kPPjfSvNjjz1GrVq1KFmyJACffPIJJ510EiVLluSYY45h/vz5hEIhpk+fHj73yy+/pGfPnhxxxBGUL1+es846i/Xr14ePp6am0r17d+666y6qVKlChQoV6N+/P1lZWQC0b9+eDRs2MGTIkHDdABs2bKBbt26UK1eO+Ph4jj32WGbOnHnA81mhQgUqV65My5Ytueuuu/j222/5z3/+w9q1aznrrLNISkqiTJkyHH/88cyfPz/HuTVr1mTs2LH06tWL+Ph4qlWrlmsfT37nVJIkKT+mT5/Oli1bSE1N3W+fzZs3k5SUlKMtKSmJzZs35+lah+SH7zp06ECTJk2YNm0aaWlp9OjRg1KlSjFr1iwSExN5+OGHOfXUU/nss88477zzWL16NbNnzw4HvcTERIA/PK98+fIArFmzhhdffJFp06YRHR3Nnj176N69O9WrV+c///kP27Zt45prrslRX1ZWFikpKbRu3Zo333yTmJgYxo0bR+fOnfnggw8oUaIEAAsXLqRKlSosXLiQNWvWcN5559G0aVMuu+wypk2bRpMmTbj88su57LLLwq/dv39/du3axRtvvEF8fDwfffQRZcqUydc8lipVCvj1X1oZGRmcfvrp3HLLLcTFxfHkk0/SrVs3Pv30U6pXrx4+58477+SGG25g9OjRzJkzh0GDBlG/fn06deqU7zndl8zMTDIzM8PPt27dCkBcVEB0dJCv8RZ1cVFBjv8WN8V9/OAcOP7iPX5wDvaOe+8iWlFQGLX+61//okuXLlStWvWgX+uQDMYADRo04IMPPuCtt97inXfe4bvvviMuLg6Au+66i+nTp/PCCy9w+eWXU6ZMGWJiYqhcuXL4/AM5D34NjU8++SQVK1YEYPbs2axdu5ZFixaFX++WW24JB0OA5557juzsbB577LHwSu+kSZM44ogjWLRoEaeddhoA5cqVY8KECURHR9OgQQO6du3KggULuOyyyyhfvjzR0dGULVs2R90bN27k3HPP5bjjjgOgdu3a+Zq/LVu2MHbsWMqUKUOrVq1ISkqiSZMm4eNjx47lpZde4t///jcDBgwIt7dp04brr78egPr167N48WLuueceOnXqlO853Zf09HRGjx6dq/3GZtmULr0nX2M+XIxteeD76w9HxX384Bw4/uI9fnAO9n7QrCjYsWPHQX39DRs2MH/+fKZNm/aH/SpXrsy3336bo+3bb7/NkbEOxCEbjIMgIBQKsXLlSjIyMqhQoUKO47/88gtr167d7/kHel6NGjVyBLhPP/2U5OTkHBPZqlXOe6esXLmSNWvWULZs2RztO3fuzPHaxx57bI4V0ypVqrBq1ao/GjYDBw7kyiuvZO7cuXTs2JFzzz2Xxo0b/+E5v3XiiScSFRXF9u3bqV27Ns899xxJSUlkZGQwatQoZsyYwaZNm9i9eze//PILGzduzHF+69atcz3f+wHB/M7pvgwfPpyhQ4eGn2/dupXk5GTGLY9id+y+V5kPd3FRAWNbZnPTsigys4vhbYqK+fjBOXD8xXv84BzsHX+nTp2IjY2NdDkHZO9PfA+WSZMmUalSJbp27fqH/Vq3bs2CBQsYPHhwuG3evHm5cs2fOWSD8ccff0ytWrXIyMigSpUqLFq0KFefI444Yr/nH+h58fF5v2duRkYGLVq04Jlnnsl17LeB8Pdv6lAo9Kd320hLSyMlJYUZM2Ywd+5c0tPTufvuu7n66qsPqLbnnnuOY445hgoVKuQY57XXXsu8efO46667qFu3LqVKleLvf/97nj4cV5BzGhcXF151/q3M7BC7i+G9K38rMztULO/fuVdxHz84B46/eI8fnIPY2NgiE4wPZp3Z2dlMmjSJPn36EBOTM7L27t2batWqkZ6eDsCgQYNo164dd999N127dmXq1KksW7aMRx55JE/XPCSD8WuvvcaqVasYMmQIRx11FJs3byYmJoaaNWvus3+JEiXYsyfnj9+bN2/+p+fty9FHH82XX37Jt99+G97E/e677+Z67eeee45KlSqRkJCQp7H9Wd0AycnJ9OvXj379+jF8+HAeffTRAw7GycnJ1KlTJ1f74sWLSU1N5eyzzwZ+Dbm//bDgXm+//Xau5w0bNgTyP6eSJEl5NX/+fDZu3Ejfvn1zHdu4cSNRUf+7h8SJJ57IlClTuPHGG7nhhhuoV68e06dPp1GjRnm6ZsTvSpGZmcnmzZv5+uuvef/997n11ls566yzOOOMM+jduzcdO3akdevWdO/enblz57J+/XqWLFnCiBEjWLZsGfDr3RTWrVvHihUr+OGHH8jMzDyg8/alU6dO1KlThz59+vDBBx+wePFibrzxRoDwfuILL7yQI488krPOOos333yTdevWsWjRIgYOHMhXX311wGOvWbMmb7zxBl9//TU//PADAIMHD2bOnDmsW7eO999/n4ULF4aD6V9Rr149pk2bxooVK1i5ciUXXHDBPlevFy9ezB133MFnn33GAw88wPPPP8+gQYMA8j2nkiRJeXXaaacRBAH169fPdWzRokXhu5Dt1aNHDz799FMyMzNZvXp1nn/rHRwCwXj27NlUqVKFmjVr0rlzZxYuXMh9993Hyy+/THR0NKFQiJkzZ3LyySdzySWXUL9+fc4//3w2bNgQXtE999xz6dy5M6eccgoVK1bk2WefPaDz9iU6Oprp06eTkZHB8ccfT1paGiNGjAAI33qsdOnSvPHGG1SvXp1zzjmHhg0bcumll7Jz5848rSCPGTOG9evXU6dOnfAWjD179tC/f38aNmxI586dqV+/Pg8++GB+pzfsn//8J+XKlePEE0+kW7dupKSk0Lx581z9rrnmGpYtW0azZs0YN24c//znP0lJSQHI95xKkiQVBaEgCIrnPVHyYPHixZx00kmsWbNmn9sUDhc1a9Zk8ODBOTauF5atW7eSmJjIDz/8kOvDfcVFVlYWM2fO5PTTTy8ye8sKUnEfPzgHjr94jx+cg6I4/r1/f//8889/aXvpoeKQ3GMcaS+99BJlypShXr16rFmzhkGDBtGmTZvDOhRLkiQVdxHfSnEo2rZtG/3796dBgwakpqZy/PHH8/LLL0e0pn79+lGmTJl9Pvr16xfR2iRJkg4HrhjvQ+/evendu3eky8hhzJgxXHvttfs8VlA/utjXXSokSZKKC4NxEVGpUiUqVaoU6TIkSZIOW26lkCRJkjAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRIAMZEuQPqtE9IXsDsmPtJlRERcdMAdraDRqDlk7glFupxCV9zHD86B4y/e4wfnYO/4FTmuGEuSJOmQUrNmTUKhUK5H//7993vO888/T4MGDShZsiTHHXccM2fOzPN1DcZFUGpq6j7fLGvWrIl0aZIkSX/Zu+++y6ZNm8KPefPmAdCjR4999l+yZAm9evXi0ksvZfny5XTv3p3u3buzevXqPF3XYFxEde7cOccbZtOmTdSqVStHn127dkWoOkmSpPyrWLEilStXDj9effVV6tSpQ7t27fbZf/z48XTu3Jlhw4bRsGFDxo4dS/PmzZkwYUKermswLqLi4uJyvGEqV67MqaeeyoABAxg8eDBHHnkkKSkpALz++uu0atWKuLg4qlSpwvXXX8/u3bsBWL9+/T5Xn9u3bx++1ltvvUXbtm0pVaoUycnJDBw4kO3bt4eP16xZk1tvvZW+fftStmxZqlevziOPPFKo8yFJkg5Pu3bt4umnn6Zv376EQvvee7506VI6duyYoy0lJYWlS5fm6Vp++O4w88QTT3DllVeyePFiAL7++mtOP/10UlNTefLJJ/nkk0+47LLLKFmyJKNGjSI5OZlNmzaFz9+8eTMdO3bk5JNPBmDt2rV07tyZcePG8fjjj/P9998zYMAABgwYwKRJk8Ln3X333YwdO5YbbriBF154gSuvvJJ27dpx9NFH77POzMxMMjMzw8+3bt0KQFxUQHR0UODzUhTERQU5/lvcFPfxg3Pg+Iv3+ME52DvurKysCFdy4Aqj1unTp7NlyxZSU1P322fz5s0kJSXlaEtKSmLz5s15ulYoCILi+e4rwlJTU3n66acpWbJkuK1Lly58//33bN26lffffz/cPmLECF588UU+/vjj8L+yHnzwQa677jp+/vlnoqL+90ODnTt30r59eypWrMjLL79MVFQUaWlpREdH8/DDD4f7vfXWW7Rr147t27dTsmRJatasSdu2bXnqqacACIKAypUrM3r0aPr167fPMYwaNYrRo0fnap8yZQqlS5f+axMkSZIKxY4dO7jgggv4+eefSUhIOCjXSElJoUSJErzyyiv77VOiRAmeeOIJevXqFW578MEHGT16NN9+++0BX8sV4yLqlFNOYeLEieHn8fHx9OrVixYtWuTo9/HHH9O6descP3po06YNGRkZfPXVV1SvXj3c3rdvX7Zt28a8efPCgXnlypV88MEHPPPMM+F+QRCQnZ3NunXraNiwIQCNGzcOHw+FQlSuXJnvvvtuv/UPHz6coUOHhp9v3bqV5ORkxi2PYndsdF6n47AQFxUwtmU2Ny2LIjO7GN6mqJiPH5wDx1+8xw/Owd7xd+rUidjY2EiXc0D2/sT3YNmwYQPz589n2rRpf9ivcuXKuQLwt99+S+XKlfN0PYNxERUfH0/dunX32Z4f48aNY86cObzzzjuULVs23J6RkcEVV1zBwIEDc53z21D9+/+BQ6EQ2dnZ+71eXFwccXFxudozs0PsLob3rvytzOxQsbx/517FffzgHDj+4j1+cA5iY2OLTDA+2HVOmjSJSpUq0bVr1z/s17p1axYsWMDgwYPDbfPmzaN169Z5up7B+DDXsGFDXnzxRYIgCK8aL168mLJly3LUUUcB8OKLLzJmzBhmzZpFnTp1cpzfvHlzPvroo32GcEmSpIMlOzubSZMm0adPH2JickbW3r17U61aNdLT0wEYNGgQ7dq14+6776Zr165MnTqVZcuW5flmAN6V4jB31VVX8eWXX3L11VfzySef8PLLLzNy5EiGDh1KVFQUq1evpnfv3lx33XUce+yxbN68mc2bN/Pf//4XgOuuu44lS5YwYMAAVqxYweeff87LL7/MgAEDIjwySZJ0OJs/fz4bN26kb9++uY5t3Lgxx80DTjzxRKZMmcIjjzxCkyZNeOGFF5g+fTqNGjXK0zVdMT7MVatWjZkzZzJs2DCaNGlC+fLlufTSS7nxxhsBWLZsGTt27GDcuHGMGzcufF67du1YtGgRjRs35vXXX2fEiBG0bduWIAioU6cO5513XqSGJEmSioHTTjuN/d0jYtGiRbnaevTosd9fAHKgvCuFDglbt24lMTGRH374gQoVKkS6nIjIyspi5syZnH766UVmb1lBKu7jB+fA8Rfv8YNzUBTHv/fv74N5V4rC5FYKSZIkCYOxJEmSBBiMJUmSJMBgLEmSJAEGY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEGY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEGY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEFGIy3bNlSUC8lSZIkFbp8BePbb7+d5557Lvy8Z8+eVKhQgWrVqrFy5coCK06SJEkqLPkKxg899BDJyckAzJs3j3nz5jFr1iy6dOnCsGHDCrRASZIkqTDE5OekzZs3h4Pxq6++Ss+ePTnttNOoWbMmJ5xwQoEWKEmSJBWGfK0YlytXji+//BKA2bNn07FjRwCCIGDPnj0FV50kSZJUSPK1YnzOOedwwQUXUK9ePX788Ue6dOkCwPLly6lbt26BFihJkiQVhnwF43vuuYeaNWvy5Zdfcscdd1CmTBkANm3axFVXXVWgBUqSJEmFIV/BODY2lmuvvTZX+5AhQ/5yQZIkSVIk5Ps+xk899RQnnXQSVatWZcOGDQDce++9vPzyywVWnCRJklRY8hWMJ06cyNChQ+nSpQtbtmwJf+DuiCOO4N577y3I+iRJkqRCka9gfP/99/Poo48yYsQIoqOjw+0tW7Zk1apVBVacJEmSVFjyFYzXrVtHs2bNcrXHxcWxffv2v1yUJEmSVNjyFYxr1arFihUrcrXPnj2bhg0b/tWaJEmSpEKXr7tSDB06lP79+7Nz506CIOCdd97h2WefJT09nccee6yga5QkSZIOunwF47S0NEqVKsWNN97Ijh07uOCCC6hatSrjx4/n/PPPL+gaJUmSpIMuz8F49+7dTJkyhZSUFC688EJ27NhBRkYGlSpVOhj1SZIkSYUiz3uMY2Ji6NevHzt37gSgdOnShmJJkiQVefn68F2rVq1Yvnx5QdciSZIkRUy+9hhfddVVXHPNNXz11Ve0aNGC+Pj4HMcbN25cIMVJkiRJhSVfwXjvB+wGDhwYbguFQgRBQCgUCv8mPEmSJKmoyFcwXrduXUHXIUmSJEVUvoJxjRo1CroOCYAT0hewOyb+zzsehuKiA+5oBY1GzSFzTyjS5RS64j5+cA4c/6/jF4wZM4Zx48blaDv66KP55JNP9nvO888/z0033cT69eupV68et99+O6effvrBLlWHmXwF4yeffPIPj/fu3TtfxejQ1b59e5o2bcq9994LQM2aNRk8eDCDBw8Gft1K89JLL9G9e/eI1ShJOnwce+yxzJ8/P/w8Jmb/kWXJkiX06tWL9PR0zjjjDKZMmUL37t15//33adSoUWGUq8NEvoLxoEGDcjzPyspix44dlChRgtKlSxuMi4jU1FSeeOIJrrjiCh566KEcx/r378+DDz5Inz59mDx5MtOmTSM2NjZClUqSipuYmBgqV658QH3Hjx9P586dGTZsGABjx45l3rx5TJgwIdffb9Ifydft2n766accj4yMDD799FNOOukknn322YKuUQdRcnIyU6dO5Zdffgm37dy5kylTplC9evVwW/ny5SlbtmwkSpQkFUOff/45VatWpXbt2lx44YVs3Lhxv32XLl1Kx44dc7SlpKSwdOnSg12mDjP5Csb7Uq9ePW677bZcq8k6tDVv3pzk5GSmTZsWbps2bRrVq1enWbNm4bb27duHt00ciJEjR1KlShU++OCDgixXklQMtGrVismTJzN79mwmTpzIunXraNu2Ldu2bdtn/82bN5OUlJSjLSkpic2bNxdGuTqM5GsrxX5fLCaGb775piBfUoWgb9++TJo0iQsvvBCAxx9/nEsuuYRFixbl+bWCIGDgwIG8+uqrvPnmm9StW3ef/TIzM8nMzAw/37p1KwBxUQHR0UHeB3EYiIsKcvy3uCnu4wfnwPH/Ou6srKwIVxI5e8d+6qmnhrfvNWzYkObNm1O3bl2effZZLrnkkn2eu3v37hxzt/fWsUVpPvfWWhRrPlzkKxj/+9//zvE8CAI2bdrEhAkTaNOmTYEUpsJz0UUXMXz4cDZs2ADA4sWLmTp1ap6D8e7du7noootYvnw5b731FtWqVdtv3/T0dEaPHp2r/cZm2ZQuXbzvgz22ZXakS4io4j5+cA6K+/jnzZsX6RIibl9zUKlSJebOnZtrZRggMTGRRYsWkZCQEG5bvHgxpUuXZubMmQe11oOhKL0HduzYEekSClS+gvHv7zwQCoWoWLEiHTp04O677y6IulSIKlasSNeuXZk8eTJBENC1a1eOPPLIPL/OkCFDiIuL4+233/7T84cPH87QoUPDz7du3UpycjLjlkexOzY6z9c+HMRFBYxtmc1Ny6LIzC6Gt6oq5uMH58Dx/zr+Tp06FdsPO2dlZTFv3rxcc5CRkcGPP/5ImzZt9nkLtvbt27N58+Ycx2677TY6depUpG7Ztr/xH8r2/sT3cJGvYJydXbz/NX846tu3LwMGDADggQceyNdrdOrUiWeffZY5c+aEt2XsT1xcHHFxcbnaM7ND7C6G9y/9rczsULG8h+texX384BwU9/HHxsYWmVB0sNx4442cddZZ1KhRg2+++YaRI0cSHR3NRRddRGxsLL1796ZatWqkp6cDvy7MtGvXjvvuu4+uXbsydepU3nvvPR599NEiOZdF6T1QVOo8UPn68N2YMWP2uXT+yy+/MGbMmL9clApf586d2bVrF1lZWaSkpOTrNc4880ymTJlCWloaU6dOLeAKJUnFxVdffUWvXr04+uij6dmzJxUqVODtt9+mYsWKAGzcuJFNmzaF+5944olMmTKFRx55hCZNmvDCCy8wffp072GsPMvXivHo0aPp168fpUuXztG+Y8cORo8ezc0331wgxanwREdH8/HHH4e/zq+zzz6bp556iosvvpiYmBj+/ve/F1SJkqRi4plnnvnDlch9fQamR48e9OjR4yBWpeIgX8E4CAJCodw/5lq5ciXly5f/y0UpMn77oYW/4u9//zvZ2dlcfPHFREVFcc455xTI60qSJB1MeQrG5cqVIxQKEQqFqF+/fo5wvGfPHjIyMujXr1+BF6mDY/LkyX94fPr06eGvf/+v8/Xr1+d4HgQ5b6/Us2dPevbsmeea/jP8VCpUqJDn8w4HWVlZzJw5k9WjUg67PVsHoriPH5wDx59VJO+gIB1O8hSM7733XoIgoG/fvowePZrExMTwsRIlSlCzZk1at25d4EVKkiRJB1uegnGfPn0AqFWrFieeeGKx/Be9JEmSDk/52mPcrl278Nc7d+5k165dOY4X1F5VSZIkqbDk63ZtO3bsYMCAAVSqVIn4+HjKlSuX4yFJkiQVNfkKxsOGDeO1115j4sSJxMXF8dhjjzF69GiqVq3Kk08+WdA1SpIkSQddvrZSvPLKKzz55JO0b9+eSy65hLZt21K3bl1q1KjBM88886e/9UySJEk61ORrxfi///0vtWvXBn7dT/zf//4XgJNOOok33nij4KqTJEmSCkm+gnHt2rVZt24dAA0aNOD//u//gF9Xko844ogCK06SJEkqLPkKxpdccgkrV64E4Prrr+eBBx6gZMmSDBkyhGHDhhVogZIkSVJhyNce4yFDhoS/7tixI5988gnvvfcedevWpXHjxgVWnCRJklRY8hWMf2vnzp3UqFGDGjVqFEQ9kiRJUkTkayvFnj17GDt2LNWqVaNMmTJ88cUXANx0003861//KtACJUmSpMKQr2B8yy23MHnyZO644w5KlCgRbm/UqBGPPfZYgRUnSZIkFZZ8BeMnn3ySRx55hAsvvJDo6Ohwe5MmTfjkk08KrDhJkiSpsOQrGH/99dfUrVs3V3t2djZZWVl/uShJkiSpsOUrGB9zzDG8+eabudpfeOEFmjVr9peLkiRJkgpbvu5KcfPNN9OnTx++/vprsrOzmTZtGp9++ilPPvkkr776akHXKEmSJB10eVox/uKLLwiCgLPOOotXXnmF+fPnEx8fz80338zHH3/MK6+8QqdOnQ5WrZIkSdJBk6cV43r16rFp0yYqVapE27ZtKV++PKtWrSIpKelg1SdJkiQVijytGAdBkOP5rFmz2L59e4EWJEmSJEVCvj58t9fvg7IkSZJUVOUpGIdCIUKhUK42SZIkqajL0x7jIAhITU0lLi4OgJ07d9KvXz/i4+Nz9Js2bVrBVShJkiQVgjwF4z59+uR4ftFFFxVoMZIkSVKk5CkYT5o06WDVIUmSJEXUX/rwnSRJknS4MBhLkiRJGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIgJtIFSL91QvoCdsfER7qMiIiLDrijFTQaNYfMPaFIl1Poivv44X9zIEmKDFeMJUmHlIcffpjGjRuTkJBAQkICrVu3ZtasWX94zvPPP0+DBg0oWbIkxx13HDNnziykaiUdTgzGh6DNmzczaNAg6tatS8mSJUlKSqJNmzZMnDiRHTt2RLo8STqoqlWrxm233cZ7773HsmXL6NChA2eddRYffvjhPvsvWbKEXr16cemll7J8+XK6d+9O9+7dWb16dSFXLqmocyvFIeaLL76gTZs2HHHEEdx6660cd9xxxMXFsWrVKh555BGqVavGmWeemeu8rKwsYmNjC73eXbt2UaJEiUK/rqTD1xlnnJHjz7NbbrmFiRMn8vbbb3Psscfm6j9+/Hg6d+7MsGHDABg7dizz5s1jwoQJPPTQQ4VWt6SizxXjQ8xVV11FTEwMy5Yto2fPnjRs2JDatWtz1llnMWPGDLp16wZAKBRi4sSJnHnmmcTHx3PLLbcAMHHiROrUqUOJEiU4+uijeeqpp3K8/pYtW7jiiitISkqiZMmSNGrUiFdffTV8/K233qJt27aUKlWK5ORkBg4cyPbt28PHa9asydixY+nduzcJCQlcfvnldOjQgQEDBuS4zvfff0+JEiVYsGDBwZoqScXAnj17mDp1Ktu3b6d169b77LN06VI6duyYoy0lJYWlS5cWRomSDiMG40PIjz/+yNy5c+nfvz/x8fv+AFoo9L8PJY0aNYqzzz6bVatW0bdvX1566SUGDRrENddcw+rVq7niiiu45JJLWLhwIQDZ2dl06dKFxYsX8/TTT/PRRx9x2223ER0dDcDatWvp3Lkz5557Lh988AHPPfccb731Vq7Qe9ddd9GkSROWL1/OTTfdRFpaGlOmTCEzMzPc5+mnn6ZatWp06NChoKdJUjGwatUqypQpQ1xcHP369eOll17imGOO2WffzZs3k5SUlKMtKSmJzZs3F0apkg4jbqU4hKxZs4YgCDj66KNztB955JHs3LkTgP79+3P77bcDcMEFF3DJJZeE+/Xq1YvU1FSuuuoqAIYOHcrbb7/NXXfdxSmnnML8+fN55513+Pjjj6lfvz4AtWvXDp+fnp7OhRdeyODBgwGoV68e9913H+3atWPixImULFkSgA4dOnDNNdeEz6tWrRoDBgzg5ZdfpmfPngBMnjyZ1NTUHEH+tzIzM3ME6a1btwIQFxUQHR3kceYOD3FRQY7/FjfFffzwv7FnZWVFuJLI2DvurKwsateuzbvvvsvWrVt58cUX6dOnD/Pnz99vON69e3eOeduzZ0+O1ywKfjv+4qq4z0FRHH9RqvVAGIyLgHfeeYfs7GwuvPDCHGGyZcuWOfp9/PHHXH755Tna2rRpw/jx4wFYsWIFRx11VDgU/97KlSv54IMPeOaZZ8JtQRCQnZ3NunXraNiw4T6vW7JkSS6++GIef/xxevbsyfvvv8/q1av597//vd8xpaenM3r06FztNzbLpnTpPfs9rzgY2zI70iVEVHEfP8C8efMiXUJE/X78bdq0Yc6cOfzjH/8I/8P/txITE1m0aBEJCQnhtsWLF1O6dOkieXeK4v79B+egKI3/cLspgMH4EFK3bl1CoRCffvppjva9q7qlSpXK0b6/7Rb78/vzfy8jI4MrrriCgQMH5jpWvXr1P7xuWloaTZs25auvvmLSpEl06NCBGjVq7Pdaw4cPZ+jQoeHnW7duJTk5mXHLo9gdG30gwznsxEUFjG2ZzU3LosjMLn738S3u44f/zUGnTp0i8mHaSMvKymLevHn7HP+9995LUlISp59+eq7z2rdvz+bNm3Mcu+222+jUqdM++x+q/mj8xUVxn4OiOP69P/E9XBiMDyEVKlSgU6dOTJgwgauvvjrPwbdhw4YsXryYPn36hNsWL14c/tFj48aN+eqrr/jss8/2uWrcvHlzPvroI+rWrZvn2o877jhatmzJo48+ypQpU5gwYcIf9o+LiyMuLi5Xe2Z2iN3F9Jc77JWZHSq2v+ACHD9AbGxskflL8WAYNWoUZ5xxBtWrV2fbtm1MmTKF119/nTlz5hAbG0vv3r2pVq0a6enpAAwZMoR27dpx33330bVrV6ZOncp7773Ho48+WiTnsbh//8E5KErjLyp1HiiD8SHmwQcfpE2bNrRs2ZJRo0bRuHFjoqKiePfdd/nkk09o0aLFfs8dNmwYPXv2pFmzZnTs2JFXXnmFadOmMX/+fADatWvHySefzLnnnss///lP6tatyyeffEIoFKJz585cd911/O1vf2PAgAGkpaURHx/PRx99FL7t0Z9JS0tjwIABxMfHc/bZZxfYnEgqXr7//nt69+7Npk2bSExMpHHjxsyZM4dOnToBsHHjRqKi/vfZ8RNPPJEpU6Zw4403csMNN1CvXj2mT59Oo0aNIjUESUWUwfgQU6dOHZYvX86tt97K8OHD+eqrr4iLi+OYY47h2muv3ef+ur26d+/O+PHjueuuuxg0aBC1atVi0qRJtG/fPtznxRdf5Nprr6VXr15s376dunXrcttttwG/rii//vrrjBgxgrZt2xIEAXXq1OG88847oNp79erF4MGD6dWrV/iDepKUV4888sgfrkItWrQoV1uPHj3o0aPHQaxKUnFgMD4EValShfvvv5/7779/v32CYN+f3L/yyiu58sor93te+fLlefzxx/d7/Pjjj2fu3Ln7Pb5+/fr9Hvvhhx/YuXMnl1566X77SJIkHaoMxvrLsrKy+PHHH7nxxhv529/+RvPmzfP9Wv8ZfioVKlQowOqKjqysLGbOnMnqUSmH3Z6tA1Hcxw//mwNJUmT4Cz70ly1evJgqVarw7rvv+utXJUlSkeWKsf6y9u3b73drhyRJUlHhirEkSZKEwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEGY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEGY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEGY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEGY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEGY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJABiIl2A9FsnpC9gd0x8pMuIiLjogDtaRbqKQ8Mbb7zBnXfeyXvvvcemTZt46aWX6N69+x+es2jRIoYOHcqHH35IcnIyN954I6mpqYVSryTp8OCK8SFi1KhRNG3aNPw8NTX1T4OAdLjavn07TZo04YEHHjig/uvWraNr166ccsoprFixgsGDB5OWlsacOXMOcqWSpMOJK8Z/UWpqKk888QQAMTExlC9fnsaNG9OrVy9SU1OJisrfvz3Gjx9PEAQFWapUZHTp0oUuXboccP+HHnqIWrVqcffddwPQsGFD3nrrLe655x5SUlIOVpmSpMOMK8YFoHPnzmzatIn169cza9YsTjnlFAYNGsQZZ5zB7t278/WaiYmJHHHEEQVbqHSYWrp0KR07dszRlpKSwtKlSyNUkSSpKDIYF4C4uDgqV65MtWrVaN68OTfccAMvv/wys2bNYvLkyQBs2bKFtLQ0KlasSEJCAh06dGDlypX7fc3fb6XIzs7mjjvuoG7dusTFxVG9enVuueWW8PHrrruO+vXrU7p0aWrXrs1NN91EVlZW+PjKlSs55ZRTKFu2LAkJCbRo0YJly5YBsGHDBrp160a5cuWIj4/n2GOPZebMmeFzV69eTZcuXShTpgxJSUlcfPHF/PDDD+Hj7du3Z+DAgfzjH/+gfPnyVK5cmVGjRv3FWZUO3ObNm0lKSsrRlpSUxNatW/nll18iVJUkqahxK8VB0qFDB5o0acK0adNIS0ujR48elCpVilmzZpGYmMjDDz/MqaeeymeffUb58uX/9PWGDx/Oo48+yj333MNJJ53Epk2b+OSTT8LHy5Yty+TJk6latSqrVq3isssuo2zZsvzjH/8A4MILL6RZs2ZMnDiR6OhoVqxYQWxsLAD9+/dn165dvPHGG8THx/PRRx9RpkwZ4NdA36FDB9LS0rjnnnv45ZdfuO666+jZsyevvfZa+PpPPPEEQ4cO5T//+Q9Lly4lNTWVNm3a0KlTp32OJzMzk8zMzPDzrVu3AhAXFRAdXTy3kMRF/Tru3/6DpjjZO+59jX/37t1/OC9BELBnz54cffb+tCYrK4uYmKLxR90fzUFx4PiL9/jBOSiK4y9KtR6IovG3RRHVoEEDPvjgA9566y3eeecdvvvuO+Li4gC46667mD59Oi+88AKXX375H77Otm3bGD9+PBMmTKBPnz4A1KlTh5NOOinc58Ybbwx/XbNmTa699lqmTp0aDsYbN25k2LBhNGjQAIB69eqF+2/cuJFzzz2X4447DoDatWuHj02YMIFmzZpx6623htsef/xxkpOT+eyzz6hfvz4AjRs3ZuTIkeHXnjBhAgsWLNhvME5PT2f06NG52m9slk3p0nv+cD4Od/PmzYt0CRG1r/G/99574X/I7UuJEiX4z3/+k+MnHQsWLKB06dIsXLjwoNR5MPkecPzFXXGfg6I0/h07dkS6hAJlMD6IgiAgFAqxcuVKMjIyqFChQo7jv/zyC2vXrv3T1/n444/JzMzk1FNP3W+f5557jvvuu4+1a9eSkZHB7t27SUhICB8fOnQoaWlpPPXUU3Ts2JEePXpQp04dAAYOHMiVV17J3Llz6dixI+eeey6NGzcGft2CsXDhwvAK8m+tXbs2RzD+rSpVqvDdd9/tt97hw4czdOjQ8POtW7eSnJzMuOVR7I6N/tM5ORzFRQWMbZlNp06d/jAEHq6ysrKYN2/ePsffokULTj/99P2e++abbzJ79uwcfZ599llOOumkPzzvUPNHc1AcOP7iPX5wDori+Pf+xPdwYTA+iD7++GNq1apFRkYGVapUYdGiRbn6HMgH7EqVKvWHx5cuXcqFF17I6NGjSUlJITExkalTp4Y/oQ+/3g7uggsuYMaMGcyaNYuRI0cydepUzj77bNLS0khJSWHGjBnMnTuX9PR07r77bq6++moyMjLo1q0bt99+e67rVqlSJfz17/8HDoVCZGdn77fmuLi48Or5b2Vmh9i9J/SH4z3cxcbGFpk/EA+G2NhYMjMzWbNmTbjtyy+/5MMPP6R8+fJUr16d4cOH8/XXX/Pkk08Cv24HmjhxIiNGjKBv37689tprvPDCC8yYMaNIzqXvAcdfnMcPzkFRGn9RqfNAGYwPktdee41Vq1YxZMgQjjrqKDZv3kxMTAw1a9bM82vVq1ePUqVKsWDBAtLS0nIdX7JkCTVq1GDEiBHhtg0bNuTqV79+ferXr8+QIUPo1asXkyZN4uyzzwYgOTmZfv360a9fv/B+5quvvprmzZvz4osvUrNmzSKzT1NF37JlyzjllFPCz/f+dKFPnz5MnjyZTZs2sXHjxvDxWrVqMWPGDIYMGcL48eM56qijeOyxx7xVmyQpT0w6BSAzM5PNmzezZ88evv32W2bPnk16ejpnnHEGvXv3JioqitatW9O9e3fuuOMO6tevzzfffMOMGTM4++yzadmy5R++fsmSJbnuuuv4xz/+QYkSJWjTpg3ff/89H374IZdeein16tVj48aNTJ06leOPP54ZM2bw0ksvhc//5ZdfGDZsGH//+9+pVasWX331Fe+++y7nnnsuAIMHD6ZLly7Ur1+fn376iYULF9KwYUPg15W4Rx99lF69eoXvOrFmzRqmTp3KY489RnR08dz2oIOrffv2f3gf7713e/n9OcuXLz+IVUmSDncG4wIwe/ZsqlSpQkxMDOXKlaNJkybcd9999OnTJ/wLPmbOnMmIESO45JJL+P7776lcuTInn3xyrltM7c9NN91ETEwMN998M9988w1VqlShX79+AJx55pkMGTKEAQMGkJmZSdeuXbnpppvCt0yLjo7mxx9/pHfv3nz77bcceeSRnHPOOeEPv+3Zs4f+/fvz1VdfkZCQQOfOnbnnnnsAqFq1KosXL+a6667jtNNOIzMzkxo1atC5c+d8//ISSZKkQ1Eo8Ner6RCwdetWEhMT+eGHH3J9SLG4yMrKYubMmZx++umH3Z6tA1Hcxw/OgeMv3uMH56Aojn/v398///xzjg/9F1Uu+UmSJEkYjCVJkiTAYCxJkiQBBmNJkiQJMBhLkiRJgMFYkiRJAgzGkiRJEmAwliRJkgCDsSRJkgQYjCVJkiTAYCxJkiQBBmNJkiQJMBhLkiRJgMFYkiRJAgzGkiRJEmAwliRJkgCDsSRJkgQYjCVJkiTAYCxJkiQBBmNJkiQJMBhLkiRJgMFYkiRJAgzGkiRJEmAwliRJkgCDsSRJkgQYjCVJkiTAYCxJkiQBBmNJkiQJMBhLkiRJgMFYkiRJAgzGkiRJEmAwliRJkgCDsSRJkgQYjCVJkiTAYCxJkiQBBmNJkiQJMBhLkiRJgMFYkiRJAgzGkiRJEmAwliRJkgCDsSRJkgQYjCVJkiTAYCxJkiQBBmNJkiQJMBhLkiRJgMFYkiRJAgzGkiRJEmAwliRJkgCDsSRJkgQYjCVJkiTAYCxJkiQBBmNJkiQJMBhLkiRJgMFYkiRJAiAm0gVIAEEQALBt2zZiY2MjXE1kZGVlsWPHDrZu3Vos56C4jx+cA8dfvMcPzkFRHP/WrVuB//09XtQZjHVI+PHHHwGoVatWhCuRJEl5tW3bNhITEyNdxl9mMNYhoXz58gBs3LjxsPgfKz+2bt1KcnIyX375JQkJCZEup9AV9/GDc+D4i/f4wTkoiuMPgoBt27ZRtWrVSJdSIAzGOiRERf263T0xMbHI/GFwsCQkJBTrOSju4wfnwPEX7/GDc1DUxn84LWj54TtJkiQJg7EkSZIEGIx1iIiLi2PkyJHExcVFupSIKe5zUNzHD86B4y/e4wfnoLiP/1AQCg6X+2tIkiRJf4ErxpIkSRIGY0mSJAkwGEuSJEmAwViSJEkCDMY6RDzwwAPUrFmTkiVLcsIJJ/DOO+9EuqRC88Ybb9CtWzeqVq1KKBRi+vTpkS6pUKWnp3P88cdTtmxZKlWqRPfu3fn0008jXVahmThxIo0bNw7f0L9169bMmjUr0mVFzG233UYoFGLw4MGRLqXQjBo1ilAolOPRoEGDSJdVqL7++msuuugiKlSoQKlSpTjuuONYtmxZpMsqNDVr1sz1HgiFQvTv3z/SpRU7BmNF3HPPPcfQoUMZOXIk77//Pk2aNCElJYXvvvsu0qUViu3bt9OkSRMeeOCBSJcSEa+//jr9+/fn7bffZt68eWRlZXHaaaexffv2SJdWKI466ihuu+023nvvPZYtW0aHDh0466yz+PDDDyNdWqF79913efjhh2ncuHGkSyl0xx57LJs2bQo/3nrrrUiXVGh++ukn2rRpQ2xsLLNmzeKjjz7i7rvvply5cpEurdC8++67Ob7/8+bNA6BHjx4Rrqz48XZtirgTTjiB448/ngkTJgCQnZ1NcnIyV199Nddff32EqytcoVCIl156ie7du0e6lIj5/vvvqVSpEq+//jonn3xypMuJiPLly3PnnXdy6aWXRrqUQpORkUHz5s158MEHGTduHE2bNuXee++NdFmFYtSoUUyfPp0VK1ZEupSIuP7661m8eDFvvvlmpEs5ZAwePJhXX32Vzz//nFAoFOlyihVXjBVRu3bt4r333qNjx47htqioKDp27MjSpUsjWJki5eeffwZ+DYfFzZ49e5g6dSrbt2+ndevWkS6nUPXv35+uXbvm+LOgOPn888+pWrUqtWvX5sILL2Tjxo2RLqnQ/Pvf/6Zly5b06NGDSpUq0axZMx599NFIlxUxu3bt4umnn6Zv376G4ggwGCuifvjhB/bs2UNSUlKO9qSkJDZv3hyhqhQp2dnZDB48mDZt2tCoUaNIl1NoVq1aRZkyZYiLi6Nfv3689NJLHHPMMZEuq9BMnTqV999/n/T09EiXEhEnnHACkydPZvbs2UycOJF169bRtm1btm3bFunSCsUXX3zBxIkTqVevHnPmzOHKK69k4MCBPPHEE5EuLSKmT5/Oli1bSE1NjXQpxVJMpAuQpL369+/P6tWri9X+SoCjjz6aFStW8PPPP/PCCy/Qp08fXn/99WIRjr/88ksGDRrEvHnzKFmyZKTLiYguXbqEv27cuDEnnHACNWrU4P/+7/+KxXaa7OxsWrZsya233gpAs2bNWL16NQ899BB9+vSJcHWF71//+hddunShatWqkS6lWHLFWBF15JFHEh0dzbfffpuj/dtvv6Vy5coRqkqRMGDAAF599VUWLlzIUUcdFelyClWJEiWoW7cuLVq0ID09nSZNmjB+/PhIl1Uo3nvvPb777juaN29OTEwMMTExvP7669x3333ExMSwZ8+eSJdY6I444gjq16/PmjVrIl1KoahSpUqufwQ2bNiwWG0n2WvDhg3Mnz+ftLS0SJdSbBmMFVElSpSgRYsWLFiwINyWnZ3NggULit0ey+IqCAIGDBjASy+9xGuvvUatWrUiXVLEZWdnk5mZGekyCsWpp57KqlWrWLFiRfjRsmVLLrzwQlasWEF0dHSkSyx0GRkZrF27lipVqkS6lELRpk2bXLdo/Oyzz6hRo0aEKoqcSZMmUalSJbp27RrpUoott1Io4oYOHUqfPn1o2bIlrVq14t5772X79u1ccsklkS6tUGRkZORYGVq3bh0rVqygfPnyVK9ePYKVFY7+/fszZcoUXn75ZcqWLRveW56YmEipUqUiXN3BN3z4cLp06UL16tXZtm0bU6ZMYdGiRcyZMyfSpRWKsmXL5tpPHh8fT4UKFYrNPvNrr72Wbt26UaNGDb755htGjhxJdHQ0vXr1inRphWLIkCGceOKJ3HrrrfTs2ZN33nmHRx55hEceeSTSpRWq7OxsJk2aRJ8+fYiJMZ5FTCAdAu6///6gevXqQYkSJYJWrVoFb7/9dqRLKjQLFy4MgFyPPn36RLq0QrGvsQPBpEmTIl1aoejbt29Qo0aNoESJEkHFihWDU089NZg7d26ky4qodu3aBYMGDYp0GYXmvPPOC6pUqRKUKFEiqFatWnDeeecFa9asiXRZheqVV14JGjVqFMTFxQUNGjQIHnnkkUiXVOjmzJkTAMGnn34a6VKKNe9jLEmSJOEeY0mSJAkwGEuSJEmAwViSJEkCDMaSJEkSYDCWJEmSAIOxJEmSBBiMJUmSJMBgLEmSJAEGY0kSkJqaSigUyvX47a8rl6TDnb+MW5IEQOfOnZk0aVKOtooVK0aompyysrKIjY2NdBmSDnOuGEuSAIiLi6Ny5co5HtHR0fvsu2HDBrp160a5cuWIj4/n2GOPZebMmeHjH374IWeccQYJCQmULVuWtm3bsnbtWgCys7MZM2YMRx11FHFxcTRt2pTZs2eHz12/fj2hUIjnnnuOdu3aUbJkSZ555hkAHnvsMRo2bEjJkiVp0KABDz744EGcEUnFjSvGkqQ869+/P7t27eKNN94gPj6ejz76iDJlygDw9ddfc/LJJ9O+fXtee+01EhISWLx4Mbt37wZg/Pjx3H333Tz88MM0a9aMxx9/nDPPPJMPP/yQevXqha9x/fXXc/fdd9OsWbNwOL755puZMGECzZo1Y/ny5Vx22WXEx8fTp0+fiMyDpMNLKAiCINJFSJIiKzU1laeffpqSJUuG27p06cLzzz+/z/6NGzfm3HPPZeTIkbmO3XDDDUydOpVPP/10n9sfqlWrRv/+/bnhhhvCba1ateL444/ngQceYP369dSqVYt7772XQYMGhfvUrVuXsWPH0qtXr3DbuHHjmDlzJkuWLMnXuCXpt1wxliQBcMoppzBx4sTw8/j4+P32HThwIFdeeSVz586lY8eOnHvuuTRu3BiAFStW0LZt232G4q1bt/LNN9/Qpk2bHO1t2rRh5cqVOdpatmwZ/nr79u2sXbuWSy+9lMsuuyzcvnv3bhITE/M2UEnaD4OxJAn4NQjXrVv3gPqmpaWRkpLCjBkzmDt3Lunp6dx9991cffXVlCpVqsDq2SsjIwOARx99lBNOOCFHv/3tg5akvPLDd5KkfElOTqZfv35MmzaNa665hkcffRT4dZvFm2++SVZWVq5zEhISqFq1KosXL87RvnjxYo455pj9XispKYmqVavyxRdfULdu3RyPWrVqFezAJBVbrhhLkvJs8ODBdOnShfr16/PTTz+xcOFCGjZsCMCAAQO4//77Of/88xk+fDiJiYm8/fbbtGrViqOPPpphw4YxcuRI6tSpQ9OmTZk0aRIrVqwI33lif0aPHs3AgQNJTEykc+fOZGZmsmzZMn766SeGDh1aGMOWdJgzGEuS8mzPnj3079+fr776ioSEBDp37sw999wDQIUKFXjttdcYNmwY7dq1Izo6mqZNm4b3FQ8cOJCff/6Za665hu+++45jjjmGf//73znuSLEvaWlplC5dmjvvvJNhw4YRHx/Pcccdx+DBgw/2cCUVE96VQpIkScI9xpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkwGAsSZIkAQZjSZIkCTAYS5IkSYDBWJIkSQIMxpIkSRJgMJYkSZIAg7EkSZIEGIwlSZIkAP4fogjmhYTZL/UAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "xgb.plot_importance(xgb_clf)\n", "plt.figure(figsize = (16, 12))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "73b3cf51", "metadata": {}, "source": [ "- We can see that the feature `Delicassesn` has been given the highest importance score among all the features. \n", "\n", "- Based upon this importance score, we can select the features with highest importance score and discard the redundant ones.\n", "\n", "- Thus XGBoost also gives us a way to do feature selection." ] }, { "cell_type": "markdown", "id": "9d151011", "metadata": {}, "source": [ "## Results and conclusion\n", "\n", "- In this kernel, we implement XGBoost with Python and Scikit-Learn to classify the customers from two different channels as Horeca (Hotel/Retail/Café) customers or Retail channel (nominal) customers.\n", "\n", "- The y labels contain values as 1 and 2. We have converted them into 0 and 1 for further analysis.\n", "\n", "- We have trained the XGBoost classifier and found the accuracy score to be 88.64%.\n", "\n", "- We have performed k-fold cross-validation with XGBoost.\n", "\n", "- We have find the most important feature in XGBoost. We did it using the plot_importance() function in XGBoost that helps us to achieve this task." ] }, { "cell_type": "markdown", "id": "b4ddd446", "metadata": {}, "source": [ "## Your turn! 🚀\n", "\n", "TBD" ] }, { "cell_type": "markdown", "id": "4a2062a5", "metadata": {}, "source": [ "## Acknowledgments\n", "\n", "Thanks to [Prashant Banerjee](https://www.kaggle.com/prashant111) for creating the open-source course [XGBoost + k-fold CV + Feature Importance](https://www.kaggle.com/code/prashant111/xgboost-k-fold-cv-feature-importance). It inspires the majority of the content in this chapter." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }