{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 11 - Ensemble Methods - Continuation\n",
"\n",
"\n",
"by [Alejandro Correa Bahnsen](albahnsen.com/)\n",
"\n",
"version 0.2, May 2016\n",
"\n",
"## Part of the class [Machine Learning for Risk Management](https://github.com/albahnsen/ML_RiskManagement)\n",
"\n",
"\n",
"This notebook is licensed under a [Creative Commons Attribution-ShareAlike 3.0 Unported License](http://creativecommons.org/licenses/by-sa/3.0/deed.en_US). Special thanks goes to [Kevin Markham](https://github.com/justmarkham)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Why are we learning about ensembling?\n",
"\n",
"- Very popular method for improving the predictive performance of machine learning models\n",
"- Provides a foundation for understanding more sophisticated models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part 1: Combination of classifiers - Majority Voting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" The most typical form of an ensemble is made by combining $T$ different base classifiers.\n",
" Each base classifier $M(\\mathcal{S}_j)$ is trained by applying algorithm $M$ to a random subset \n",
" $\\mathcal{S}_j$ of the training set $\\mathcal{S}$. \n",
" For simplicity we define $M_j \\equiv M(\\mathcal{S}_j)$ for $j=1,\\dots,T$, and \n",
" $\\mathcal{M}=\\{M_j\\}_{j=1}^{T}$ a set of base classifiers.\n",
" Then, these models are combined using majority voting to create the ensemble $H$ as follows\n",
" $$\n",
" f_{mv}(\\mathcal{S},\\mathcal{M}) = max_{c \\in \\{0,1\\}} \\sum_{j=1}^T \n",
" \\mathbf{1}_c(M_j(\\mathcal{S})).\n",
" $$\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# read in and prepare the chrun data\n",
"# Download the dataset\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"data = pd.read_csv('../datasets/churn.csv')\n",
"\n",
"# Create X and y\n",
"\n",
"# Select only the numeric features\n",
"X = data.iloc[:, [1,2,6,7,8,9,10]].astype(np.float)\n",
"# Convert bools to floats\n",
"X = X.join((data.iloc[:, [4,5]] == 'no').astype(np.float))\n",
"\n",
"y = (data.iloc[:, -1] == 'True.').astype(np.int)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" Account Length | \n",
" Area Code | \n",
" VMail Message | \n",
" Day Mins | \n",
" Day Calls | \n",
" Day Charge | \n",
" Eve Mins | \n",
" Int'l Plan | \n",
" VMail Plan | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 128.0 | \n",
" 415.0 | \n",
" 25.0 | \n",
" 265.1 | \n",
" 110.0 | \n",
" 45.07 | \n",
" 197.4 | \n",
" 1.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 107.0 | \n",
" 415.0 | \n",
" 26.0 | \n",
" 161.6 | \n",
" 123.0 | \n",
" 27.47 | \n",
" 195.5 | \n",
" 1.0 | \n",
" 0.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 137.0 | \n",
" 415.0 | \n",
" 0.0 | \n",
" 243.4 | \n",
" 114.0 | \n",
" 41.38 | \n",
" 121.2 | \n",
" 1.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 84.0 | \n",
" 408.0 | \n",
" 0.0 | \n",
" 299.4 | \n",
" 71.0 | \n",
" 50.90 | \n",
" 61.9 | \n",
" 0.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 75.0 | \n",
" 415.0 | \n",
" 0.0 | \n",
" 166.7 | \n",
" 113.0 | \n",
" 28.34 | \n",
" 148.3 | \n",
" 0.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Account Length Area Code VMail Message Day Mins Day Calls Day Charge \\\n",
"0 128.0 415.0 25.0 265.1 110.0 45.07 \n",
"1 107.0 415.0 26.0 161.6 123.0 27.47 \n",
"2 137.0 415.0 0.0 243.4 114.0 41.38 \n",
"3 84.0 408.0 0.0 299.4 71.0 50.90 \n",
"4 75.0 415.0 0.0 166.7 113.0 28.34 \n",
"\n",
" Eve Mins Int'l Plan VMail Plan \n",
"0 197.4 1.0 0.0 \n",
"1 195.5 1.0 0.0 \n",
"2 121.2 1.0 1.0 \n",
"3 61.9 0.0 1.0 \n",
"4 148.3 0.0 1.0 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" percentage | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2850 | \n",
" 0.855086 | \n",
"
\n",
" \n",
" 1 | \n",
" 483 | \n",
" 0.144914 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count percentage\n",
"0 2850 0.855086\n",
"1 483 0.144914"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y.value_counts().to_frame('count').assign(percentage = lambda x: x/x.sum())"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.cross_validation import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create 100 decision trees"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"n_estimators = 100\n",
"# set a seed for reproducibility\n",
"np.random.seed(123)\n",
"\n",
"n_samples = X_train.shape[0]\n",
"\n",
"# create bootstrap samples (will be used to select rows from the DataFrame)\n",
"samples = [np.random.choice(a=n_samples, size=n_samples, replace=True) for _ in range(n_estimators)]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from sklearn.tree import DecisionTreeClassifier\n",
"\n",
"np.random.seed(123) \n",
"seeds = np.random.randint(1, 10000, size=n_estimators)\n",
"\n",
"trees = {}\n",
"for i in range(n_estimators):\n",
" trees[i] = DecisionTreeClassifier(max_features=\"sqrt\", max_depth=None, random_state=seeds[i])\n",
" trees[i].fit(X_train.iloc[samples[i]], y_train.iloc[samples[i]])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
" 7 | \n",
" 8 | \n",
" 9 | \n",
" ... | \n",
" 90 | \n",
" 91 | \n",
" 92 | \n",
" 93 | \n",
" 94 | \n",
" 95 | \n",
" 96 | \n",
" 97 | \n",
" 98 | \n",
" 99 | \n",
"
\n",
" \n",
" \n",
" \n",
" 438 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 2674 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1345 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1957 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" ... | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 2148 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 100 columns
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 \\\n",
"438 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 \n",
"2674 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n",
"1345 0 0 0 1 0 0 0 0 0 1 ... 0 0 0 1 1 0 0 \n",
"1957 0 0 0 0 0 0 0 0 0 1 ... 1 0 1 0 0 0 0 \n",
"2148 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 \n",
"\n",
" 97 98 99 \n",
"438 0 0 0 \n",
"2674 0 0 0 \n",
"1345 1 1 0 \n",
"1957 0 1 0 \n",
"2148 0 1 0 \n",
"\n",
"[5 rows x 100 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Predict \n",
"y_pred_df = pd.DataFrame(index=X_test.index, columns=list(range(n_estimators)))\n",
"for i in range(n_estimators):\n",
" y_pred_df.ix[:, i] = trees[i].predict(X_test)\n",
"\n",
"y_pred_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Predict using majority voting"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"438 2\n",
"2674 5\n",
"1345 35\n",
"1957 17\n",
"2148 3\n",
"3106 4\n",
"1786 22\n",
"321 6\n",
"3082 10\n",
"2240 5\n",
"dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_pred_df.sum(axis=1)[:10]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.52459016393442637"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_pred = (y_pred_df.sum(axis=1) >= (n_estimators / 2)).astype(np.int)\n",
"\n",
"from sklearn import metrics\n",
"metrics.f1_score(y_pred, y_test)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.89454545454545453"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"metrics.accuracy_score(y_pred, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using majority voting with sklearn"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.ensemble import BaggingClassifier\n",
"clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, bootstrap=True,\n",
" random_state=42, n_jobs=-1, oob_score=True)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.53600000000000003, 0.89454545454545453)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf.fit(X_train, y_train)\n",
"y_pred = clf.predict(X_test)\n",
"metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part 2: Combination of classifiers - Weighted Voting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The majority voting approach gives the same weight to each classfier regardless of the performance of each one. Why not take into account the oob performance of each classifier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, in the traditional approach, a \n",
"similar comparison of the votes of the base classifiers is made, but giving a weight $\\alpha_j$ \n",
"to each classifier $M_j$ during the voting phase\n",
"$$\n",
" f_{wv}(\\mathcal{S},\\mathcal{M}, \\alpha)\n",
" =\\max_{c \\in \\{0,1\\}} \\sum_{j=1}^T \\alpha_j \\mathbf{1}_c(M_j(\\mathcal{S})),\n",
"$$\n",
"where $\\alpha=\\{\\alpha_j\\}_{j=1}^T$.\n",
"The calculation of $\\alpha_j$ is related to the performance of each classifier $M_j$.\n",
"It is usually defined as the normalized misclassification error $\\epsilon$ of the base \n",
"classifier $M_j$ in the out of bag set $\\mathcal{S}_j^{oob}=\\mathcal{S}-\\mathcal{S}_j$\n",
"\\begin{equation}\n",
" \\alpha_j=\\frac{1-\\epsilon(M_j(\\mathcal{S}_j^{oob}))}{\\sum_{j_1=1}^T \n",
" 1-\\epsilon(M_{j_1}(\\mathcal{S}_{j_1}^{oob}))}.\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select each oob sample"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"samples_oob = []\n",
"# show the \"out-of-bag\" observations for each sample\n",
"for sample in samples:\n",
" samples_oob.append(sorted(set(range(n_samples)) - set(sample)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Estimate the oob error of each classifier"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"errors = np.zeros(n_estimators)\n",
"\n",
"for i in range(n_estimators):\n",
" y_pred_ = trees[i].predict(X_train.iloc[samples_oob[i]])\n",
" errors[i] = 1 - metrics.accuracy_score(y_train.iloc[samples_oob[i]], y_pred_)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbYAAAEjCAYAAABeoiSAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtYVNX+P/D3MOAFjUDkooh6EkRUBKMMxQuSiX7LwLyl\neaMML2RZmoiXUx6tFEM7WeoptNS8FGqRWZjHQAG5WAhxfAqxFBNjIBAzMoGZ+f3hj4mRgZmBGWbv\nPe/X8/Q8uWfN7LU/M3t91lp77Y2sqqpKDSIiIomwsXQFiIiITImJjYiIJIWJjYiIJIWJjYiIJIWJ\njYiIJIWJjYiIJIWJjUiEysrKsHDhQgwcOBDOzs7o0qULfvnlF0tXy2BXrlyBk5MTJkyYYOmqkATZ\nWroCZD4FBQV47733cObMGZSWlkImk6FHjx4YOXIkFi5ciH/84x/Nvj89PR27d+9GdnY2ysrK0K5d\nO/Tq1QtjxozBggUL4ObmpvM9uhqrjh07olevXhg3bhxeeOEFODo6muw4rdHChQuRkpKC8ePH46mn\nnoKNjQ3uvfdeS1fLYpycnNCzZ0/k5+dbuiokADLeoC1N69evx+bNmyGXyzFq1CgMGDAAKpUKubm5\nyMzMhFwux4YNG/DMM880em9tbS2WLFmC/fv3o0OHDnj44YfRt29f/PXXX8jMzEReXh46deqEHTt2\n4LHHHtN6b31i69mzJ2bMmAEAUKvVqKiowMmTJ3Hp0iV4e3sjNTUV9vb2bRILqamtrYW7uzu8vLyQ\nnZ1t6eq0yJUrV+Dv74/hw4fj6NGjrf48JjZqiCM2CXrzzTcRHx8PT09PHDhwAAMGDNB6PT09HbNm\nzcLLL78MR0dHTJo0Sev1pUuXYv/+/fDz88O+ffvg6emp9fqnn36KhQsX4umnn0ZSUhKGDh3aqA49\ne/ZETEyM1ra6ujo88sgjyM/PR1JSEqZPn26iI7YupaWlUKlUcHV1tXRViASJ19gk5pdffsHGjRth\nZ2eHgwcPNkpqADB8+HD85z//gVqtRkxMDP7880/Nazk5Odi7dy8cHR1x6NChRkkNACZOnIj169ej\ntrYWL774osF1s7W1RXBwMADgt99+M+q4Tp8+jSeffBJeXl5wdXWFn58fli1bhrKyskZlH330UTg5\nOaG4uBjbt2/HsGHD4O7ujpkzZwIA9u3bBycnJ2zcuBHffvstJk+ejN69e6NLly74/fffNZ+TlpaG\nqVOn4r777oObmxsCAgIQGxuLioqKRvtcuHAhnJyckJGRgYMHDyI0NBQeHh4YOXKkQcdXUFCAuXPn\nom/fvnB1dcXAgQOxePFiFBcXa5UbNGgQBg0aBJlMhvT0dDg5OcHJyQnR0dEG7efSpUtYvHgx/Pz8\n4ObmBi8vL8ycOVPnSKe0tBQbN27EuHHj4OPjA1dXV/j6+mLevHn48ccfm9zHuXPn8PTTT6N///5w\ndXWFj48PHn/8cRw4cEBn+crKSrzwwgvo168f3NzcMHToUOzbt8+g46mPgUwm01y30xUTJycn+Pv7\n4/fff8eKFSvg5+eHrl27YseOHZoyt2/fxtatWxESEoIePXrAw8MDo0ePxgcffNDk/vPz8/H000/D\n19cXrq6u6NevH+bPn49Lly4ZVH8yD47YJGbv3r2oq6vDE088gf79+zdZbuzYsRg8eDDy8vK0Rk+7\ndu2CTCbD3Llzmx0RREZGIj4+HhcuXEBGRoYmYTWnrq4O6enpAIDAwECDj+mtt97C2rVr0aVLF4wd\nOxZubm44f/48du7cieTkZJw4cQLdunXTlJfJZJDJZFi+fDlycnIwduxYhIWFoXPnzlqvZ2dnIz4+\nHsOHD8ecOXOgUCggl8sBAHv27MGSJUtgb2+P8PBwuLu7Izs7Gzt27MCxY8dw/Phxnft8++23kZaW\nhvHjxyMkJAQ1NTV6j+/EiROYNWsWVCoVJkyYgH/84x/43//+h48++ghffPEFjh49ioEDBwIAFi1a\nhCtXrmD79u1a071+fn5693Pq1Ck89dRTqKmpQVhYGPr06YNr167hiy++wH//+18cOHAAo0eP1pQ/\nc+YM3n77bQwfPhyPP/44OnfujJ9++glHjx7FV199heTk5Eb73bNnD1566SXI5XKMGzcO3t7eqKio\nQH5+Pnbs2NFolH7jxg2EhYWhXbt2CA8PR01NDT777DM899xzkMvlePLJJ5s9pp49e2LFihXYsGED\n7r33XixatAhqtVpnTGpqavD444/j999/x9ixY9GxY0d0794dAPDHH38gPDwc586dw6BBgzRx/eab\nb/DSSy/h22+/xbvvvqv1eZ988gmio6PRvn17jB8/Hh4eHvj5559x5MgRJCcn49ixY5rvjdoWr7FJ\nTHh4ONLS0vDWW29h9uzZzZZdt24dNm/ejNmzZ+Pf//43AGDw4MEoLi7Gp59+ilGjRjX7/meffRaH\nDx/GqlWrsHTpUgDa19jqGzG1Wo3KykqcPHkSv/76K55//nnExsYadDwZGRmYMGEChgwZgsTERNxz\nzz2a1z755BPMnz8fjz/+OHbv3q3Z/thjjyEjIwPdu3fH8ePH0aNHD63P3L9/P6KjoyGTyXTGqaSk\nBPfffz/atWuHkydPom/fvprXXn/9dWzatAlhYWE4ePCgZvuiRYtw4MABdOrUCcePH9c5Utblzz//\nhJ+fH27cuIGkpCStDsJHH32ExYsXo3///sjIyNBsb8n1qd9//x0BAQGwsbHBV199BW9vb81rRUVF\nCA0NxT333IP8/HzY2dkBACoqKtChQwd06tRJ67POnz+PsLAwDB06FImJiZrthYWFGD58ODp16oTk\n5GT069dP633Xrl3TJJL6Y5DJZJg9eza2bNkCmUym+Zzg4GB4e3sjMzPToOPTd42tflQ3evRo7Nu3\nDx06dNB6ffHixdi3bx/Wrl2LxYsXa7bX1tZi5syZOHHiBA4cOICwsDAAd0a+Q4cOhYeHB7788kut\nhVQZGRkIDw+Hn58fUlJSDKo/mRanIiVGoVAAQKPGXBcPDw8Ad6ac7n5//Wv63q9Wq7XeX++XX35B\nXFwc4uLisGnTJuzcuROXL1/GiBEjMHbsWIOOBYBmqmjz5s1aSQ0Apk6dikGDBuHLL79EdXW11msy\nmQwvvPBCs3Hw8/PTmfwPHDiA2tpazJs3TyupAXeuP3br1g1ff/21JlYN9zl37lyDkxoAHDt2DJWV\nlQgPD2806p05cyb8/f3xww8/4NtvvzX4M3U5cOAAqqqqsHz5cq2kBgDe3t6YPXs2SktLcerUKc12\nZ2fnRkkNAAYMGIARI0YgPT0dSqVSsz0hIQFKpRLLli1rlNQAaJJaQ/b29li/fr0mqQGAj48PHnro\nIRQWFmpNk5vCunXrGiW1qqoqHDx4EIMGDdJKagBgZ2eHf/7zn1Cr1fj444812xMSElBTU4PXXnut\n0erg4OBgjB8/Hvn5+bhw4YJJ60+G4VQkmUVwcLDWaKKqqgrZ2dlYvnw5xo8fj/3792PMmDF6Pycn\nJwe2trb4/PPP8fnnnzd6vaamBkqlEhcvXoS/v7/Wa/fff3+zn93UdOj3338PABgxYkSj19q3b4+g\noCB89tln+P777/HII48Ytc+75efnQyaT6dwXAISEhOD7779Hfn4+HnjgAaM+u6GcnBwAwP/+9z9s\n2LCh0esXL16EWq1GYWGh1vdy/Phx7Nq1C/n5+aioqEBdXZ3mNZlMhoqKCs2U9XfffQcABn2v9e67\n7z7NFHFD9R2rqqoqk62e7dChg87p+e+++w51dXWwsbHRGZva2loAd0aS9erjmZGRgby8vEbvKS8v\n17zn7s4RmR8Tm8S4urriwoULuHr1qt6yJSUlAAB3d3et91+5cgUlJSXw8vLS+36ZTKb1/qY4Ojoi\nLCwMHTp0QEREBFauXGlQA1hZWQmlUom4uLgmy8hkskYjtvpjaU5Tr9cvIGnq9foe+o0bN4zeZ0v2\npVarde7LGJWVlVCr1fjoo4+aLHN3HLdv346VK1fCyckJo0ePRo8ePdCxY0fIZDJ88cUXOH/+PG7f\nvq0pX1/Hhtce9Wnq3jtb2ztNU8MRYWt17dpV5/bKykoAQF5ens4kBdyJTcPRY/177r7udjddv0sy\nPyY2iQkKCkJaWhpSU1P1XmNLTU2FTCZDUFCQ1vuvXLmClJSUZq+xKZVKzUKQhu/Xp36UdPHiRdy8\nebPR9OLdHBwcUFtbiytXrhi8j3oNp7eMed3BwQEAdK64BP6erq0vZ8w+W7IvmUymc1/G7kcmk+HU\nqVMGLTRRKpXYuHEj3N3dcfr0abi4uGi9npOTg/Pnz2ttq09Sv/76qyBvFtf3fUdFRekcsTX3nkuX\nLgnyWK0dr7FJzFNPPQVbW1scO3YMP/zwQ5PlTpw4gdzcXDg7OyM8PFyzfc6cOVCr1dizZ0+TjS0A\nfPjhhygtLYWPj49BKyLrVVVVaf5fpVLpLf/ggw/ijz/+aNSImpO/vz/UajXS0tIavVZTU6O5Kfru\nqU9T7wuA5ppXQEBAq/bz4IMPQq1W48yZMwaVr6iowI0bNzBkyJBGSa26ulrnIo36qdITJ060qq4t\nYWNj0+LR3QMPPAAbGxuDF6oAd+IJwOB4UttiYpOYXr16YdmyZaipqcG0adN0JoT09HRERUVBJpNh\nw4YNWtcwhg4dihkzZuD69euYMmWKzucPJiUlYfXq1bCzs8PmzZuNqt8777wDABg4cKBBPd3o6Gio\n1WosWbIE165da/T67du3kZWVZVQd9Jk6dSratWuHnTt3oqioSOu1+Ph4XLt2TXPbQWs9+uij6NKl\nC5KSkho1kvv27UNeXh58fX1bdX0NuLMQxdHREZs2bcLZs2d1lsnKytJcQ3NxcYG9vT3y8vK0ptPq\n6uoQExOj816+Z555BnK5HPHx8To7Vbq+P1Pp0qULKioqtKZGDeXs7Ixp06ahoKAAGzZs0Jkgr127\npvVbiIqKgp2dHVavXt3oNwLcGfE21Vkh8+NUpATFxMTg9u3beOuttzBq1CiEhIRoHql17tw5ZGRk\nwM7ODm+++Wajp44AwJYtW6BSqfDxxx9jyJAhWo/UysrKwrlz53DPPfcgISFB51NHgDvLuRtO61y/\nfh05OTnIy8uDvb09Nm3aZNCxjBgxAuvWrcOrr76KwMBAPPLII+jduzdu3bqFq1ev4syZM+jVqxdO\nnz7dsmDp4OnpiY0bN2Lp0qUYPXo0IiIi4ObmhuzsbGRkZKBHjx6Ij483yb7s7e2xbds2zJkzBxER\nEXj88cfRu3dvFBQU4MSJE3ByctK6ibilHB0dsWfPHsycORNjx47FyJEj0a9fP9jZ2aGkpATffvst\nSkpKcPnyZc205fz58/HWW29h2LBh+L//+z/U1tYiLS0NVVVVmlWRDfn4+CA+Ph4vvfQSQkJCNPex\nXb9+Hd9//z1qamq0Vl2a0ujRo3Ho0CE88cQTGDZsGNq3b4+BAwdi3LhxBr0/Li4Oly5dQlxcHD7+\n+GMMGzYMbm5uUCgUuHjxIs6ePYvXX39ds6LUy8sL27dvR3R0NIYOHYqHH34YXl5eUCqVKCkpQXZ2\nNmpqanD58mWzHC81j4lNov75z38iIiIC77//PjIyMpCZmQmZTAYPDw9ERUVhwYIFTT4EuV27dti+\nfTtmzJiBPXv2ICsrC//97381D0F+8cUXsWDBgiYXPMhkMs1y/4af2a1bN8yZMweLFy9Gnz59DD6W\n5557DkOHDsWOHTuQmZmJ48ePo3PnznB3d8e0adMwceJEnXVoTv0N1U2ZO3cu+vTpg61bt2puJ+jW\nrRsWLFiApUuXNrkQoSXCwsLw9ddfY/PmzTh9+jQ+//xzuLi44KmnnsKyZcvQq1cvo+uvy4gRI3Dm\nzBm88847OHnyJM6ePQtbW1u4ubnhoYcewqOPPqp1LW/16tXo2rUr9u7di927d8PBwQGjR4/GqlWr\n8Prrr+vc/+zZszFgwABs3boVWVlZSE5ORpcuXeDj44OoqCijjsGY49uwYQPkcjlSU1ORnZ0NlUqF\n6dOnayW25j6vc+fO+OKLL7B371588sknOHbsGP766y907doVvXv3xtq1axv9zp544gkMHDgQ7777\nLk6dOoVTp06hQ4cOcHd3x9ixY7Wm+Klt8QZtIiKSFF5jIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFi\nIyIiSWFiIyIiSWFikyBdT0KgpjFehmOsDMdYWQ4TGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxER\nSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoT\nGxERSYrBiS0hIQH+/v5wd3dHSEgIMjMzmyybnp6OGTNmoF+/fujevTuCg4Px0UcfaZU5evQonnji\nCXh5ecHT0xNjxozBV1991fIjISIigoGJ7ciRI4iNjcWyZcuQlpaGIUOGYMqUKSgpKdFZPicnBwMG\nDMCePXuQmZmJZ555BkuWLMHhw4c1ZTIyMjBq1CgkJiYiLS0NjzzyCGbOnImsrCzTHBkREVklWVVV\nlVpfoTFjxsDPzw9btmzRbAsMDERERATWrFlj0I4iIyOhUqmwe/fuJss8/PDDGDZsGNatW2fQZ5Ju\nRUVF8Pb2tnQ1RIPxMhxjZTjGynL0jthqa2uRl5eHkJAQre2hoaHIzs42eEc3b96Eo6Njs2X++OMP\nvWWIiIiaozexVVRUQKlUwtXVVWu7i4sLysrKDNpJcnIyTp8+jcjIyCbLvP/++/j1118xbdo0gz6T\niIhIF1tz7yArKwtRUVGIi4tDQECAzjJJSUl49dVX8cEHH6BHjx7mrhIREUmY3sTm7OwMuVzeaHRW\nXl7eaBR3t8zMTEybNg2rVq3C3LlzdZZJSkrCwoUL8Z///Adjx47VW+GioiK9ZYhxMhbjZTjGynCM\nlX7muA6pN7HZ2dkhICAAqampCA8P12xPSUlBREREk+/LyMjAk08+iZUrV2L+/Pk6y3z66aeIjo7G\n9u3bMWHCBIMqzIux+vGitXEYL8MxVoZjrCzHoKnI6OhoLFiwAIMHD0ZQUBB27twJhUKhuWa2du1a\n5ObmIikpCQCQlpaGJ598EvPmzcOkSZM0oz25XA5nZ2cAwOHDh7FgwQKsX78eQ4cO1ZRp164dF5AQ\nEVGLGZTYJk6ciOvXryM+Ph4KhQK+vr5ITEyEh4cHAEChUKC4uFhT/sCBA7h16xa2bt2KrVu3arZ7\nenoiPz8fAPDBBx9AqVQiNjYWsbGxmjLBwcE4evSoSQ6OiIisj0H3sZG4cArEOIyX4RgrwzFWlsNn\nRRIRkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRER\nkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaTYWroC1DylEsjPl+Pn\nn2W47z41AgKUsGF3hIioSUxsApefL0dYWCfU1spgZ6dGcnI1AgOVlq4WEZFgse8vcD//LENtrQwA\nUFsrw6VLMgvXiIhI2JjYBO6++9Sws1MDAOzs1LjvPrWFa0REJGycihS4gAAlkpOrcenS39fYiIio\naaJPbFJfXGFjAwQGKhEYaOmaEBGJg+gTGxdXEBFRQ6If23BxBRERNST6xMbFFURE1JDopyK5uIKI\niBoSfWLj4goiImpI9FORREREDYl+xEZEliX1W25IfJjYiKhVeMsNCQ37VUTUKrzlhoSGiY2IWoW3\n3JDQcCqSiFqFt9yQ0DCxEVGr8JYbEhomNiIiapbYVr4ysRERUbPEtvLV4JybkJAAf39/uLu7IyQk\nBJmZmU2WTU9Px4wZM9CvXz90794dwcHB+Oijj3SWCwkJgbu7OwYPHowPPvigZUdBRERmI7aVrwYl\ntiNHjiA2NhbLli1DWloahgwZgilTpqCkpERn+ZycHAwYMAB79uxBZmYmnnnmGSxZsgSHDx/WlCku\nLsa0adMQFBSEtLQ0vPjii1i+fDmOHj1qmiMjIiKTENvKV1lVVZXeGo4ZMwZ+fn7YsmWLZltgYCAi\nIiKwZs0ag3YUGRkJlUqF3bt3AwBeeeUVHDt2DN9++62mzPPPP4/CwkIcP37c2OOgBoqKiuDt7W3p\naogG42U4xspwUoqVSgWcOyfXWvkq5GtseqtWW1uLvLw8hISEaG0PDQ1Fdna2wTu6efMmHB0dNf8+\ne/YsRo8erVXm4Ycfxrlz56BUCnfulojI2tSvfJ08uQ733y/spAYYkNgqKiqgVCrh6uqqtd3FxQVl\nZWUG7SQ5ORmnT59GZGSkZltZWZnOz6yrq0NFRYVBn0skBEolkJsrx6FDtsjNlUOlsnSNiKyb2VdF\nZmVlISoqCnFxcQgICGj15xUVFZmgVtLX2jjJZLYoK/NAcbEdevWqhZtbCVSqOhPVTnhaE6/y8l4I\nD++qWTH22We/wdW12IS1ExZrPAdbej5YY6yMZY7pWr2JzdnZGXK5vNHorLy8vNGI626ZmZmYNm0a\nVq1ahblz52q95urqqvMzbW1t4ezs3ORnSmXO2pxMMbefmytHeHjD5b0dBb28tzVaG6/8fFutFWO/\n/mqP4GBp/k5bEisx3AOlr44tOR+kdI1NbPT+vOzs7BAQEIDU1FSt7SkpKQgKCmryfRkZGZg6dSpi\nY2Mxf/78Rq8PGTKk0Wd+8803GDx4MORyuWG1J7MR2/JeSxLbirG2Vn8P1Lx5nRAW1gnnzgnv/NZX\nR54P4mJQvyk6Ohr79+/Hnj17cOHCBcTExEChUGiuma1duxbh4eGa8mlpaZg6dSqefvppTJo0CWVl\nZSgrK9O6dhYZGYlff/0VsbGxuHDhAvbs2YODBw9i8eLFJj5Eagk21oarf1ZiQkI1jh+vFuWzEs15\nnVAMSUFfHXk+iItB19gmTpyI69evIz4+HgqFAr6+vkhMTISHhwcAQKFQoLj472sKBw4cwK1bt7B1\n61Zs3bpVs93T0xP5+fkAgF69euGTTz7BypUr8cEHH8Dd3R1xcXF47LHHTHl81EJ8sK3hpPCsRHM+\nWaI+KdR/thCTgr468nwQF4PuYyNx4dy+cRgv4NAhW8yb10nz74SEakye3HhxREtiJYZ7oMxRR/6u\nLIfPiiQis46qhDKibW6BiFDqSKbBxEZEVjHVJrYH+VLLMbERkVWMWHQtEJHy8Vozgc10ExGZB1c2\nWg+O2IioETHcVG0sa5hupTuY2JogxRObyFBSvB5lDdOtdAcTWxOkeGITGYrXo0jMOAZpghielkBk\nDGOeLsLrUSRmHLE1QQxPSyBxstQ0tzGzEFK4HsXLCdaLia0JUjixSZgsNc1tzPSiFK5H8XKC9ZJ0\nYmtNj02MJ3b98RYWeuHmTTl7qAJlqetXYpyFaM05zOuE1kvSic3aemzWdrxiZakEI8ZZiNb8psWY\nyFuDU69/k3Ris7Yem7Udr1hZKsGIcRaiNb9pMSby1hBqx9YSCVfSic3aemzWdrxiJcYEYymt+U1b\nW5yF2rG1RMKVdGKzth5b/fFeuFAHHx9byR8vNU8KU1PWdg63hik7tqb87Vgi4Uo6sQm1x2auBqf+\neB0cLvLvQJFgp6aMIdRzWIhM2Qkw5W/HEjNJkk5slqIvcUmhwbF2YhgNCXVqiszDlJ0AU/52LDHq\nZmIzA32Jiw2O+ImhcyLGa65i6DBYA1P+diwx6mZiMwN9iUuMDQ5pE0PnRIzXp8TQYbAGYvztNMTE\nZgb6EpfYfzRtSag9eDF0TsR4fUoMHQZrIMbfTkNMbGagL3GJ/UfTloTag2fnxDzE0GEg4WNiMwMm\nLtMRag+e37F5sMNApsDERoLGHrx1YYeBTEFyiU2o12SEwJSxaas4m7IHz99GywnlAdv8DskQkkts\nQr0mIwSmjE1bxfnuHnz9H8tsScPWVnWWYuMrlPNKKPUgYRP56dYY//J100wZG0vFub5hmzevE8LC\nOuHcObnB722rOremjkIllPNKKPUgYZNcYuOftG+aKWNjrjjXj8gOHbJFbq4cKpX2661p2NrqtyHF\nxlco55VQ6kHCJrmpyNZck5HiFFJDprxeZa7Va/qmmlqzmKStVtxJccGLUB6wzVWTZAhZVVWV+M86\nE8nNlcb8fVFRkWgfgnzokC3mzeuk+XdCQjUmT67T/FulAs6dk2s1bK3tfJg6Xuaoo1CI+bfV1hgr\ny5HciK01hHrPlDXRN9oxZjm4vhG4uVb6ccm64aQ+S2IsxsM0mNgakOIUkti05Z/ekMIKO7E3hFL4\nDkyJ8TANJrYGxDh/r6thE7O2/NMbUhihi70hNOV3IPYkD0jjNykETGwNiHEKSVfD5uBg6VoJg74R\nuBRG6GJvCE35HYg9yQPS+E0KARObyOlq2Pz9LVwpgdA3AhfKSr/WEGJDaMzIyZSzJGJP8oA4Z42E\niIlN5FrTsElh6qY5+kbg9a87OFwU7eo1ITaExoycTDlLIsQkbywxzhoJERObyOlq2H76ybD3SmHq\nRur0dT6E2BBaauQkxCRPlsHE1kJCGe0017Dpq6MUpm6kToydD0uNnISY5M1JKG2QEBkchoSEBPj7\n+8Pd3R0hISHIzMxssuzt27exaNEiBAcHw8XFBRMmTNBZLjExESNGjED37t3h4+ODqKgolJWVGX8U\nLaTv8U3NEcPzAPXVkY8nEj4xPp6rfuSUkFCN48erRTlyak3b0FbE0AZZikGJ7ciRI4iNjcWyZcuQ\nlpaGIUOGYMqUKSgpKdFZXqlUomPHjpg/fz7CwsJ0lsnKysKCBQvw1FNPISsrC/v378eFCxcQFRXV\n8qMxkhgeqNsa+urIBqhttKaOYux81I+cJk+uw/33Nx5FmOo7M+d3L4akIYY2yFIMmorctm0bZs6c\niVmzZgEA4uLicPLkSezatQtr1qxpVN7e3h7x8fEAgIKCAty4caNRmbNnz8LDwwMLFiwAAPTs2RPP\nPvssVqxY0eKDMVZrpuLMNd1iyukFUz7FQ6jEMFXXmjpK8bqRqb4zc373Ypim13d+t9VUpTH7aas6\n6U1stbXvjfGQAAAU+klEQVS1yMvLw+LFi7W2h4aGIjs7u8U7DgoKwvr165GcnIxx48ahoqICR44c\nwdixY5t9X1s2/M2x1EOAjSHFRvFuYmiAWlNHc3Y+LHWNxlTfmTm/ezGssNR3frdVp8+Y/bRVnfQm\ntoqKCiiVSri6umptd3FxwalTp1q84wcffBAJCQmIiorCrVu3UFdXh9DQUGzbtq3Z9wml4TdXg2PK\nk1UKIzJ9xNAACbWOlhrtmioe5oyrGDqF+s7vtur0GbOftqqTxVZF/vjjj4iJicHy5csRGhoKhUKB\n1atX44UXXsCOHTuafF9hYZ1WYC5cqIODw8UW18PBAZobmg1dJm9O3bv3gp2dveZk7d79TxQVFRv9\nOUVFRWaonfDce68tPvvMA1eu2KFXr1rce28Jiorq9L/xLuaMl6nqaGqFhV4tOpdaGytTxcPccTVF\n22DJ89BUbYkp96OrLNDO5HXSm9icnZ0hl8sbrVYsLy9vNIozxpYtWxAYGIjnnnsOANC/f3/Ex8dj\n/PjxeOWVV9CtWzed7/PxsdXqpfn42Ar+5lpjpnz69MFdPcV2sLEx7vis7c9l9OkDBAcDd06Qfxj9\n/raIV2vraA43b8qNPpdMFStTxUOIca1n6fPQFG2Jqfejq6w56E1sdnZ2CAgIQGpqKsLDwzXbU1JS\nEBER0eId37p1C3K59kojGxsbyGQyqJpZ3iSGKYK7WepJDETNEeO5RIZrq7bEmP20VZ0MmoqMjo7G\nggULMHjwYAQFBWHnzp1QKBSIjIwEAKxduxa5ublISkrSvKewsBC3b99GZWUlqqurUVBQAADw8/MD\nAIwbNw5LlizBrl27EBoaitLSUqxcuRIBAQHw8PBosi5ibPjFsMCBrI8YzyUiQxiU2CZOnIjr168j\nPj4eCoUCvr6+SExM1CQghUKB4mLtOdUpU6bg6tWrmn+PHDkSMpkMlZWVAIAZM2aguroaCQkJWLNm\nDe69916MGDECr776arN1Uakgurvrhbp4gAzHpzxoYzxIyGRVVVWiamW/+04uuPuU9FGpgHPn5FpT\nPuZsBCwxty/mhs6QeOXmCv9+ubZQHyvGQz9LX2OzZqJ7VqQYp/GsYcpHDDdKt4ZQppOF0oEQSjyI\ndBFdYpPiNJ5QGqvWkHpDJ5TpZKF0IIQSD7GTwrkvRKJLbEJdudWaH6hQGqvWkHpDJ5QVhELpQAgl\nHmInlHNfaglWdIlNqMFuzQ9UKI1Va0i9oRPKdLJQOhBCiYfYGXvumysBmTLBCiFJii6xCZUQH6jc\nltjQtQ2pdyCsjbHnvrlGeKbsXAthFMrEZiJCfKAytZwQep26sAMhLcae++aa3TFl51oIM1BWldjM\n2VgJ8YHKQiXUpNGQEHqdJH3Gnvvmmt0xZedaCDNQVpXYzNlYSSE5tVXCEUPSEEKvk+hu5prdMWX7\nJYQZKKtKbGysmmeqhKMvQYrhexBCr5PobmLoQAuhjlaV2NhYNc9UCUdfghTD9yCEXicRtYxVJTYp\nNFbmnC40VcLRlyDF8D0IoddJ4lR/jhYWeuHmTbkgryFLnVUlNik0Vua8PmWqhKMvQUrheyBqihiu\nIUudVSU2KTDn9SlTJRwxjMiIzEUM15CljolNZMRwfYojMrJmYjhHpY6JTWQ4GiISlruvew8adOcc\nvXChDj4+toI8R8VwL2lrMLGJDEdDRK1nyoa9qWtqDg4XBfv32KR+HZCJjUhApN6TFgpTNuxivKYm\nxjobg4mNSECk3pMWClM27GK8pibGOhuDiY0sjqOUv0m9Jy0UpmzYjb3uLYTfu9Sv1TOxkcW11aO8\nxEDqPWmhMGXDbux1byGMyqV+rZ6JjSyurR7lJQZS70kLhSUbdo7KzY+JjSyurR7lJQZS70kTR+Vt\ngYmNLK6tHuVFJAQclZsfExtZHB/lRdaEo3LzY2IjyWCDQUQAILI1Y0RERM1jYiMiIknhVCQRERlM\nDPeLMrERERlBDA27OYnhflEmNiIiI4ihYTcnMdwvakX9jNZRKoHcXDkOHbJFbq4cKpWla9S2rP34\nierpatitSf39ogAEe78oR2wGsvZemrUfP1E9a38QgBjuF2ViM5AYht/mZO3HT1RPDA27OYnhflEm\nNgNZey/N2o+fqJ4YGnZrx8RmIGvvpVn78ROReDCxGcjae2nWfvxEJB5cFUlERJJicGJLSEiAv78/\n3N3dERISgszMzCbL3r59G4sWLUJwcDBcXFwwYcIEneVqa2vx2muvwd/fH25ubvDz88N7771n/FEQ\nEW/JIPr/DJqKPHLkCGJjY7F582YEBQXh/fffx5QpU5CdnQ0PD49G5ZVKJTp27Ij58+fj66+/xo0b\nN3R+bmRkJEpLS/H222/jvvvuQ3l5OW7dutW6IyKyUrwlg+gOgxLbtm3bMHPmTMyaNQsAEBcXh5Mn\nT2LXrl1Ys2ZNo/L29vaIj48HABQUFOhMbN988w3S0tKQl5cHJycnAICnp2eLD4TI2vGWDKI79E5F\n1tbWIi8vDyEhIVrbQ0NDkZ2d3eIdf/nll7j//vvxzjvvYMCAAQgMDERMTAyqq6tb/JlE1kwMT4Qg\nagt6R2wVFRVQKpVwdXXV2u7i4oJTp061eMeXL19GZmYm2rVrh7179+LGjRt4+eWXoVAo8OGHH7b4\nc4msFW/JILrDYsv9VSoVbGxssHPnTnTu3BkAsGnTJkyaNAm//fYbunbtqvN9RUVFbVlN0RJSnGQy\nW5SVeaC42A69etXCza0EKlWdpaulRUjxag0HB8Df/87///STefYhlVi1BSHFSqjnobe3t8k/U29i\nc3Z2hlwuR1lZmdb28vLyRqM4Y7i5uaFbt26apAYAffv2hVqtxtWrV5tMbOYIgtQUFRUJKk65uXKE\nhzdc1NBRUIsahBYvIWOsDCe0WAn9PDQlvdfY7OzsEBAQgNTUVK3tKSkpCAoKavGOg4KCUFpaij//\n/FOz7eLFi5DJZFxEIjHW/jR0IiGwpvPQoPvYoqOjsX//fuzZswcXLlxATEwMFAoFIiMjAQBr165F\neHi41nsKCwvx/fffo7KyEtXV1SgoKEBBQYHm9cmTJ8PJyQnR0dH48ccfkZWVhdjYWERERMDZ2dmE\nh0iWxkUNRJZnTeehQdfYJk6ciOvXryM+Ph4KhQK+vr5ITEzU3MOmUChQXFys9Z4pU6bg6tWrmn+P\nHDkSMpkMlZWVAIBOnTohKSkJy5cvx8MPPwxHR0c8+uijeOWVV0x1bCQQXNRAZHnWdB7KqqqqpJu2\nrZTQ5vaFjvEyHGNlOMbKcvisSCIikhQ+3Z/IgpTKO4/C+vnnv6eHbNjdJGoVJjYiC+LzHYlMj31D\nIguypiXYRG2FiY3IgqxpCTZRW+FUJJEFWdMSbKK2wsRGZEE2NkBgoJJ/XobIhJjYiIgEiqtmW4aJ\njYhIoLhqtmWY+4mIzEipvPNk/UOHbJGbK4dKZfh7uWq2ZThiIyIyo9aMuupXzda/l6tmDcPERkRk\nRrpGXYYuFuKq2ZZhYiMiMqPWjLq4arZlmNiIiMyIo662x8RGRGRGHHW1Pa6KJCIiSWFiIyIiSWFi\nIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSeGzIolw\n549B5ufL8fPPfz+o1obdPiJRYmIjQuv+GCQRCQv7pETQ/ccgiUicmNiI8PcfgwRg9B+DJCJh4VQk\nEfjHIImkhImNCPxjkERSwqlIIiKSFCY2IiKSFCY2IiKSFCY2IiKSFIMTW0JCAvz9/eHu7o6QkBBk\nZmY2Wfb27dtYtGgRgoOD4eLiggkTJjT72ZmZmejatSuGDRtmeM2JiIh0MCixHTlyBLGxsVi2bBnS\n0tIwZMgQTJkyBSUlJTrLK5VKdOzYEfPnz0dYWFizn11VVYWFCxciJCTE6MoTERHdzaDEtm3bNsyc\nOROzZs2Ct7c34uLi4Obmhl27duksb29vj/j4eMyePRvdunVr9rMXL16MGTNm4IEHHjC+9kRERHfR\nm9hqa2uRl5fXaEQVGhqK7OzsVu08ISEBv/32G15++eVWfQ4REVE9vYmtoqICSqUSrq6uWttdXFxQ\nVlbW4h2fP38emzZtwnvvvQeZjM/lIyIi07DIqsiamho888wzWLduHTw9PQEAajWfzUdERK2n95Fa\nzs7OkMvljUZn5eXljUZxhiotLUVhYSGio6OxaNEiAIBKpYJarYaLiwsSExObXExSVFTUon1aG8bJ\nOIyX4RgrwzFW+nl7e5v8M/UmNjs7OwQEBCA1NRXh4eGa7SkpKYiIiGjRTrt3797odoGEhASkpqZi\n3759mlGcLuYIgtQUFRUxTkZgvAzHWBmOsbIcgx6CHB0djQULFmDw4MEICgrCzp07oVAoEBkZCQBY\nu3YtcnNzkZSUpHlPYWEhbt++jcrKSlRXV6OgoAAA4OfnB1tbW/Tr109rH127dkW7du3g4+NjqmMj\nIiIrZFBimzhxIq5fv474+HgoFAr4+voiMTERHh4eAACFQoHi4mKt90yZMgVXr17V/HvkyJGQyWSo\nrKw0YfWJiIi0yaqqqrhqQ2I4BWIcxstwjJXhGCvL4bMiiYhIUpjYiIhIUpjYiIhIUpjYiIhIUpjY\niIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhI\nUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUmRVVVVqS1eCiIjIVDhiIyIiSWFiIyIiSWFiIyIiSWFi\nIyIiSWFiIyIiSRFFYktISIC/vz/c3d0REhKCzMxMS1fJ4jZv3ozQ0FD07NkTXl5eePLJJ/HDDz80\nKvfGG2/A19cX3bp1w2OPPYYff/zRArUVjs2bN8PJyQnLly/X2s44/U2hUGDhwoXw8vKCu7s7hg4d\nijNnzmiVYbwAlUqF9evXa9omf39/rF+/HiqVSqucNcbqzJkzmD59Ovr37w8nJyccOHCgURl9camp\nqcHLL7+MPn36wMPDA9OnT8e1a9cM2r/gE9uRI0cQGxuLZcuWIS0tDUOGDMGUKVNQUlJi6apZ1Jkz\nZ/Dss8/i66+/xtGjR2Fra4uIiAhUVVVpyrz11lvYvn07Nm3ahJSUFLi4uGDixImorq62YM0t5+zZ\ns9i9ezcGDhyotZ1x+tuNGzcQFhYGmUyGQ4cOIScnBxs3boSLi4umDON1x5YtW7Br1y5s2rQJZ8+e\nxcaNG7Fz505s3rxZU8ZaY1VdXY0BAwZgw4YNsLe3b/S6IXFZsWIFjh07hl27duGrr77CzZs3MW3a\nNKjV+u9QE/x9bGPGjIGfnx+2bNmi2RYYGIiIiAisWbPGgjUTlurqavTs2RP79+9HWFgYAKBfv36Y\nP38+XnzxRQDAX3/9BW9vb6xfvx5z5syxZHXb3I0bNxASEoKtW7diw4YN6N+/P+Li4gAwTg3961//\nQmZmJr766qsmyzBed0ybNg3Ozs7Ytm2bZtvChQtx/fp1HDx4EABjBQA9evTApk2bMH36dM02fXH5\n/fff4eXlhe3bt2PSpEkAgJKSEvj5+eHw4cMYPXp0s/sU9IittrYWeXl5CAkJ0doeGhqK7Oxsy1RK\noG7evAmVSgVHR0cAwOXLl6FQKLR+AB06dMCwYcOsMnZLlizBxIkTMXz4cK3tjJO2L7/8EoGBgXj6\n6afh7e2NESNG4P3339e8znj9bejQoUhLS0NRUREA4Mcff0RaWpqmY8lY6WZIXM6dO4e6ujqtMh4e\nHvDx8TEodramr7bpVFRUQKlUwtXVVWu7i4sLTp06ZaFaCdOKFSvg7++PIUOGAADKysogk8m0ppCA\nO7ErLS21RBUtZvfu3bh8+TJ27tzZ6DXGSVt9nBYtWoQXX3wRBQUFWL58OWQyGebNm8d4NbBkyRL8\n8ccfeOihhyCXy6FUKrF06VJERkYC4G+rKYbEpby8HHK5HF26dGlUpqysTO8+BJ3YyDArV65ETk4O\nkpOTIZPJLF0dQbl48SLWrVuH48ePw8ZG0BMUgqBSqRAYGKiZ5vfz88NPP/2EhIQEzJs3z8K1E5bD\nhw/j4MGD2LVrF3x8fFBQUICYmBj06tULM2fOtHT1rJqgz3RnZ2fI5fJGGbq8vLzRKM5axcbG4tNP\nP8XRo0fRs2dPzXZXV1eo1WqUl5drlbe22OXk5KCyshIPPfQQunbtiq5duyIjIwMJCQlwcXFBly5d\nGKcG3Nzc0LdvX61tffv2xdWrVwHwd9XQK6+8gueffx4RERHw9fXF1KlTER0drVkPwFjpZkhcXF1d\noVQqUVlZ2WSZ5gg6sdnZ2SEgIACpqala21NSUhAUFGSZSglITEyMJqn16dNH67XevXvDzc0NKSkp\nmm1//fUXMjMzrSp2jz32GM6cOYP09HTNf4MHD8bkyZORnp4OLy8vxqmBoKAgzTWjekVFRfD09ATA\n31VDf/75Z6NZABsbG81yf8ZKN0PiEhAQAFtbW60yJSUlKCwsNCh28hUrVrxq8pqb0D333IM33ngD\nbm5u6NixI+Li4pCVlYV33nkHDg4Olq6exSxbtgwff/wxPvzwQ3h4eKC6ulqzVLZdu3YAAKVSiS1b\ntsDLywtKpRKrVq1CWVkZtmzZoikjde3bt9eM1Or/S0xMhKenp2aVFuP0N09PT8TFxcHGxgbdunXD\nqVOnsH79eixduhSDBw8GwHjVKywsxMcffwwvLy/Y2dnh9OnTWL9+PSZPnqxZ9GCtsaqurkZhYSEU\nCgX27t2LAQMGwMHBAbW1tXBwcNAbl/bt26O0tBQJCQkYMGAAbty4gZdeegmOjo549dVX9V5yEfxy\nfwDYtWsX/v3vf0OhUMDX1xdvvPGGVfd4AMDJyUnnlxsTE4OYmBjNvzdu3IgPP/wQVVVVCAwMxJtv\nvol+/fq1ZVUFZ8KECfD19dUs9wcYp4ZOnDiBtWvX4qeffkKPHj0QFRWFZ599VqsM43Wn8X7ttdfw\nxRdf4LfffoObmxsmTZqE5cuXayUta4xVeno6JkyY0KiNmj59Ot59910A+uNSW1uL1atX49ChQ/jr\nr78watQovPnmm+jevbve/YsisRERERlK0NfYiIiIjMXERkREksLERkREksLERkREksLERkREksLE\nRkREksLERkREksLERkREksLERkREkvL/ALrDGlp1jC3nAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('fivethirtyeight')\n",
"\n",
"plt.scatter(range(n_estimators), errors)\n",
"plt.xlim([0, n_estimators])\n",
"plt.title('OOB error of each tree')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Estimate $\\alpha$"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"alpha = (1 - errors) / (1 - errors).sum()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"weighted_sum_1 = ((y_pred_df) * alpha).sum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"438 0.019993\n",
"2674 0.050009\n",
"1345 0.350236\n",
"1957 0.170230\n",
"2148 0.030047\n",
"3106 0.040100\n",
"1786 0.219819\n",
"321 0.059707\n",
"3082 0.100178\n",
"2240 0.050128\n",
"1910 0.180194\n",
"2124 0.190111\n",
"2351 0.049877\n",
"1736 0.950014\n",
"879 0.039378\n",
"785 0.219632\n",
"2684 0.010104\n",
"787 0.710568\n",
"170 0.220390\n",
"1720 0.020166\n",
"dtype: float64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weighted_sum_1.head(20)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.52674897119341557, 0.8954545454545455)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_pred = (weighted_sum_1 >= 0.5).astype(np.int)\n",
"\n",
"metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using Weighted voting with sklearn"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.53600000000000003, 0.89454545454545453)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, bootstrap=True,\n",
" random_state=42, n_jobs=-1, oob_score=True)\n",
"clf.fit(X_train, y_train)\n",
"y_pred = clf.predict(X_test)\n",
"metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"errors = np.zeros(clf.n_estimators)\n",
"y_pred_all_ = np.zeros((X_test.shape[0], clf.n_estimators))\n",
"\n",
"for i in range(clf.n_estimators):\n",
" oob_sample = ~clf.estimators_samples_[i]\n",
" y_pred_ = clf.estimators_[i].predict(X_train.values[oob_sample])\n",
" errors[i] = metrics.accuracy_score(y_pred_, y_train.values[oob_sample])\n",
" y_pred_all_[:, i] = clf.estimators_[i].predict(X_test)\n",
" \n",
"alpha = (1 - errors) / (1 - errors).sum()\n",
"y_pred = (np.sum(y_pred_all_ * alpha, axis=1) >= 0.5).astype(np.int)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.55335968379446643, 0.89727272727272722)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part 3: Combination of classifiers - Stacking"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The staking method consists in combining the different base classifiers by learning a \n",
"second level algorithm on top of them. In this framework, once the base \n",
"classifiers are constructed using the training set $\\mathcal{S}$, a new set is constructed \n",
"where the output of the base classifiers are now considered as the features while keeping the \n",
"class labels.\n",
"\n",
"Even though there is no restriction on which algorithm can be used as a second level learner, \n",
"it is common to use a linear model, such as \n",
"$$\n",
" f_s(\\mathcal{S},\\mathcal{M},\\beta) =\n",
" g \\left( \\sum_{j=1}^T \\beta_j M_j(\\mathcal{S}) \\right),\n",
"$$\n",
"where $\\beta=\\{\\beta_j\\}_{j=1}^T$, and $g(\\cdot)$ is the sign function \n",
"$g(z)=sign(z)$ in the case of a linear regression or the sigmoid function, defined \n",
"as $g(z)=1/(1+e^{-z})$, in the case of a logistic regression. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets first get a new training set consisting of the output of every classifier"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X_train_2 = pd.DataFrame(index=X_train.index, columns=list(range(n_estimators)))\n",
"\n",
"for i in range(n_estimators):\n",
" X_train_2[i] = trees[i].predict(X_train)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
" 7 | \n",
" 8 | \n",
" 9 | \n",
" ... | \n",
" 90 | \n",
" 91 | \n",
" 92 | \n",
" 93 | \n",
" 94 | \n",
" 95 | \n",
" 96 | \n",
" 97 | \n",
" 98 | \n",
" 99 | \n",
"
\n",
" \n",
" \n",
" \n",
" 2360 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1412 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1404 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 626 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" ... | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 347 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 100 columns
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 \\\n",
"2360 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n",
"1412 0 0 1 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 \n",
"1404 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 1 0 0 \n",
"626 1 1 0 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 \n",
"347 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n",
"\n",
" 97 98 99 \n",
"2360 0 0 0 \n",
"1412 0 0 0 \n",
"1404 0 0 0 \n",
"626 1 1 1 \n",
"347 0 0 0 \n",
"\n",
"[5 rows x 100 columns]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train_2.head()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegressionCV"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,\n",
" fit_intercept=True, intercept_scaling=1.0, max_iter=100,\n",
" multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,\n",
" refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lr = LogisticRegressionCV()\n",
"lr.fit(X_train_2, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.10093102, 0.1042197 , 0.09431205, 0.09652843, 0.09709429,\n",
" 0.09902616, 0.11100235, 0.09662288, 0.09340919, 0.09112994,\n",
" 0.10012606, 0.09821902, 0.09383543, 0.09553507, 0.09147579,\n",
" 0.09649564, 0.08965686, 0.09196857, 0.09684012, 0.09020758,\n",
" 0.09839592, 0.09513808, 0.1044603 , 0.10028703, 0.09671603,\n",
" 0.09725639, 0.10912207, 0.10590827, 0.10275491, 0.10275279,\n",
" 0.10607316, 0.09803225, 0.10319411, 0.0926599 , 0.09702325,\n",
" 0.09524124, 0.088848 , 0.09960894, 0.09053403, 0.09010282,\n",
" 0.0990557 , 0.0987997 , 0.10538386, 0.09584352, 0.09633964,\n",
" 0.09001206, 0.09181887, 0.08995095, 0.10130986, 0.10827168,\n",
" 0.10064992, 0.09771002, 0.08922346, 0.10078438, 0.10173442,\n",
" 0.1052274 , 0.09743252, 0.09597317, 0.08932798, 0.10033609,\n",
" 0.10346122, 0.10145004, 0.09017084, 0.10348697, 0.09335995,\n",
" 0.09795824, 0.10166729, 0.09306547, 0.09538575, 0.10997592,\n",
" 0.09352845, 0.09860336, 0.1059772 , 0.09583408, 0.09823145,\n",
" 0.09995048, 0.10224689, 0.10065135, 0.10208938, 0.11257989,\n",
" 0.09956423, 0.11515946, 0.09798322, 0.10092449, 0.10150098,\n",
" 0.10275192, 0.09180693, 0.0990442 , 0.10016612, 0.10145948,\n",
" 0.09848122, 0.10322931, 0.09913907, 0.08925477, 0.09950337,\n",
" 0.10277594, 0.09249331, 0.0954106 , 0.1053263 , 0.09849884]])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lr.coef_"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"y_pred = lr.predict(y_pred_df)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.53658536585365846, 0.89636363636363636)"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using sklearn"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.56250000000000011, 0.89818181818181819)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_pred_all_ = np.zeros((X_test.shape[0], clf.n_estimators))\n",
"X_train_3 = np.zeros((X_train.shape[0], clf.n_estimators))\n",
"\n",
"for i in range(clf.n_estimators):\n",
"\n",
" X_train_3[:, i] = clf.estimators_[i].predict(X_train)\n",
" y_pred_all_[:, i] = clf.estimators_[i].predict(X_test)\n",
" \n",
"lr = LogisticRegressionCV()\n",
"lr.fit(X_train_3, y_train)\n",
"\n",
"y_pred = lr.predict(y_pred_all_)\n",
"metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"vs using only one dt"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.44510385756676557, 0.82999999999999996)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dt = DecisionTreeClassifier()\n",
"dt.fit(X_train, y_train)\n",
"y_pred = dt.predict(X_test)\n",
"metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Part 4: Boosting\n",
"\n",
"While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data is reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight (some boosting algorithms actually decrease the weight of repeatedly misclassified examples, e.g., boost by majority and BrownBoost). Thus, future weak learners focus more on the examples that previous weak learners misclassified. (Wikipedia)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"![](images/OurMethodv81.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adaboost\n",
"\n",
"AdaBoost (adaptive boosting) is an ensemble learning algorithm that can be used for classification or regression. Although AdaBoost is more resistant to overfitting than many machine learning algorithms, it is often sensitive to noisy data and outliers.\n",
"\n",
"AdaBoost is called adaptive because it uses multiple iterations to generate a single composite strong learner. AdaBoost creates the strong learner (a classifier that is well-correlated to the true classifier) by iteratively adding weak learners (a classifier that is only slightly correlated to the true classifier). During each round of training, a new weak learner is added to the ensemble and a weighting vector is adjusted to focus on examples that were misclassified in previous rounds. The result is a classifier that has higher accuracy than the weak learners’ classifiers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Algorithm:\n",
"\n",
"* Initialize all weights ($w_i$) to 1 / n_samples\n",
"* Train a classifier $h_t$ using weights\n",
"* Estimate training error $e_t$\n",
"* set $alpha_t = log\\left(\\frac{1-e_t}{e_t}\\right)$\n",
"* Update weights \n",
"$$w_i^{t+1} = w_i^{t}e^{\\left(\\alpha_t \\mathbf{I}\\left(y_i \\ne h_t(x_t)\\right)\\right)}$$\n",
"* Repeat while $e_t<0.5$ and $t= 1).astype(np.int)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.51051051051051044, 0.85181818181818181)"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"metrics.f1_score(y_pred, y_test.values), metrics.accuracy_score(y_pred, y_test.values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using sklearn"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.ensemble import AdaBoostClassifier"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,\n",
" learning_rate=1.0, n_estimators=50, random_state=None)"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf = AdaBoostClassifier()\n",
"clf"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.29107981220657275, 0.86272727272727268)"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf.fit(X_train, y_train)\n",
"y_pred = clf.predict(X_test)\n",
"metrics.f1_score(y_pred, y_test.values), metrics.accuracy_score(y_pred, y_test.values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gradient Boosting"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"GradientBoostingClassifier(init=None, learning_rate=0.1, loss='deviance',\n",
" max_depth=3, max_features=None, max_leaf_nodes=None,\n",
" min_samples_leaf=1, min_samples_split=2,\n",
" min_weight_fraction_leaf=0.0, n_estimators=100,\n",
" presort='auto', random_state=None, subsample=1.0, verbose=0,\n",
" warm_start=False)"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.ensemble import GradientBoostingClassifier\n",
"\n",
"clf = GradientBoostingClassifier()\n",
"clf"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.52892561983471076, 0.89636363636363636)"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"clf.fit(X_train, y_train)\n",
"y_pred = clf.predict(X_test)\n",
"metrics.f1_score(y_pred, y_test.values), metrics.accuracy_score(y_pred, y_test.values)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}