{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 11 - Ensemble Methods - Continuation\n", "\n", "\n", "by [Alejandro Correa Bahnsen](albahnsen.com/)\n", "\n", "version 0.2, May 2016\n", "\n", "## Part of the class [Machine Learning for Risk Management](https://github.com/albahnsen/ML_RiskManagement)\n", "\n", "\n", "This notebook is licensed under a [Creative Commons Attribution-ShareAlike 3.0 Unported License](http://creativecommons.org/licenses/by-sa/3.0/deed.en_US). Special thanks goes to [Kevin Markham](https://github.com/justmarkham)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why are we learning about ensembling?\n", "\n", "- Very popular method for improving the predictive performance of machine learning models\n", "- Provides a foundation for understanding more sophisticated models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 1: Combination of classifiers - Majority Voting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " The most typical form of an ensemble is made by combining $T$ different base classifiers.\n", " Each base classifier $M(\\mathcal{S}_j)$ is trained by applying algorithm $M$ to a random subset \n", " $\\mathcal{S}_j$ of the training set $\\mathcal{S}$. \n", " For simplicity we define $M_j \\equiv M(\\mathcal{S}_j)$ for $j=1,\\dots,T$, and \n", " $\\mathcal{M}=\\{M_j\\}_{j=1}^{T}$ a set of base classifiers.\n", " Then, these models are combined using majority voting to create the ensemble $H$ as follows\n", " $$\n", " f_{mv}(\\mathcal{S},\\mathcal{M}) = max_{c \\in \\{0,1\\}} \\sum_{j=1}^T \n", " \\mathbf{1}_c(M_j(\\mathcal{S})).\n", " $$\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# read in and prepare the chrun data\n", "# Download the dataset\n", "import pandas as pd\n", "import numpy as np\n", "\n", "data = pd.read_csv('../datasets/churn.csv')\n", "\n", "# Create X and y\n", "\n", "# Select only the numeric features\n", "X = data.iloc[:, [1,2,6,7,8,9,10]].astype(np.float)\n", "# Convert bools to floats\n", "X = X.join((data.iloc[:, [4,5]] == 'no').astype(np.float))\n", "\n", "y = (data.iloc[:, -1] == 'True.').astype(np.int)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Account LengthArea CodeVMail MessageDay MinsDay CallsDay ChargeEve MinsInt'l PlanVMail Plan
0128.0415.025.0265.1110.045.07197.41.00.0
1107.0415.026.0161.6123.027.47195.51.00.0
2137.0415.00.0243.4114.041.38121.21.01.0
384.0408.00.0299.471.050.9061.90.01.0
475.0415.00.0166.7113.028.34148.30.01.0
\n", "
" ], "text/plain": [ " Account Length Area Code VMail Message Day Mins Day Calls Day Charge \\\n", "0 128.0 415.0 25.0 265.1 110.0 45.07 \n", "1 107.0 415.0 26.0 161.6 123.0 27.47 \n", "2 137.0 415.0 0.0 243.4 114.0 41.38 \n", "3 84.0 408.0 0.0 299.4 71.0 50.90 \n", "4 75.0 415.0 0.0 166.7 113.0 28.34 \n", "\n", " Eve Mins Int'l Plan VMail Plan \n", "0 197.4 1.0 0.0 \n", "1 195.5 1.0 0.0 \n", "2 121.2 1.0 1.0 \n", "3 61.9 0.0 1.0 \n", "4 148.3 0.0 1.0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countpercentage
028500.855086
14830.144914
\n", "
" ], "text/plain": [ " count percentage\n", "0 2850 0.855086\n", "1 483 0.144914" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.value_counts().to_frame('count').assign(percentage = lambda x: x/x.sum())" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.cross_validation import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create 100 decision trees" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "n_estimators = 100\n", "# set a seed for reproducibility\n", "np.random.seed(123)\n", "\n", "n_samples = X_train.shape[0]\n", "\n", "# create bootstrap samples (will be used to select rows from the DataFrame)\n", "samples = [np.random.choice(a=n_samples, size=n_samples, replace=True) for _ in range(n_estimators)]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.tree import DecisionTreeClassifier\n", "\n", "np.random.seed(123) \n", "seeds = np.random.randint(1, 10000, size=n_estimators)\n", "\n", "trees = {}\n", "for i in range(n_estimators):\n", " trees[i] = DecisionTreeClassifier(max_features=\"sqrt\", max_depth=None, random_state=seeds[i])\n", " trees[i].fit(X_train.iloc[samples[i]], y_train.iloc[samples[i]])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789...90919293949596979899
4380000000000...1000000000
26740000000000...0000000000
13450001000001...0001100110
19570000000001...1010000010
21480000000000...0000010010
\n", "

5 rows × 100 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 \\\n", "438 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 \n", "2674 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "1345 0 0 0 1 0 0 0 0 0 1 ... 0 0 0 1 1 0 0 \n", "1957 0 0 0 0 0 0 0 0 0 1 ... 1 0 1 0 0 0 0 \n", "2148 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 \n", "\n", " 97 98 99 \n", "438 0 0 0 \n", "2674 0 0 0 \n", "1345 1 1 0 \n", "1957 0 1 0 \n", "2148 0 1 0 \n", "\n", "[5 rows x 100 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Predict \n", "y_pred_df = pd.DataFrame(index=X_test.index, columns=list(range(n_estimators)))\n", "for i in range(n_estimators):\n", " y_pred_df.ix[:, i] = trees[i].predict(X_test)\n", "\n", "y_pred_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Predict using majority voting" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "438 2\n", "2674 5\n", "1345 35\n", "1957 17\n", "2148 3\n", "3106 4\n", "1786 22\n", "321 6\n", "3082 10\n", "2240 5\n", "dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred_df.sum(axis=1)[:10]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.52459016393442637" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred = (y_pred_df.sum(axis=1) >= (n_estimators / 2)).astype(np.int)\n", "\n", "from sklearn import metrics\n", "metrics.f1_score(y_pred, y_test)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.89454545454545453" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics.accuracy_score(y_pred, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using majority voting with sklearn" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.ensemble import BaggingClassifier\n", "clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, bootstrap=True,\n", " random_state=42, n_jobs=-1, oob_score=True)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.53600000000000003, 0.89454545454545453)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf.fit(X_train, y_train)\n", "y_pred = clf.predict(X_test)\n", "metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 2: Combination of classifiers - Weighted Voting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The majority voting approach gives the same weight to each classfier regardless of the performance of each one. Why not take into account the oob performance of each classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, in the traditional approach, a \n", "similar comparison of the votes of the base classifiers is made, but giving a weight $\\alpha_j$ \n", "to each classifier $M_j$ during the voting phase\n", "$$\n", " f_{wv}(\\mathcal{S},\\mathcal{M}, \\alpha)\n", " =\\max_{c \\in \\{0,1\\}} \\sum_{j=1}^T \\alpha_j \\mathbf{1}_c(M_j(\\mathcal{S})),\n", "$$\n", "where $\\alpha=\\{\\alpha_j\\}_{j=1}^T$.\n", "The calculation of $\\alpha_j$ is related to the performance of each classifier $M_j$.\n", "It is usually defined as the normalized misclassification error $\\epsilon$ of the base \n", "classifier $M_j$ in the out of bag set $\\mathcal{S}_j^{oob}=\\mathcal{S}-\\mathcal{S}_j$\n", "\\begin{equation}\n", " \\alpha_j=\\frac{1-\\epsilon(M_j(\\mathcal{S}_j^{oob}))}{\\sum_{j_1=1}^T \n", " 1-\\epsilon(M_{j_1}(\\mathcal{S}_{j_1}^{oob}))}.\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select each oob sample" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [ "samples_oob = []\n", "# show the \"out-of-bag\" observations for each sample\n", "for sample in samples:\n", " samples_oob.append(sorted(set(range(n_samples)) - set(sample)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Estimate the oob error of each classifier" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "errors = np.zeros(n_estimators)\n", "\n", "for i in range(n_estimators):\n", " y_pred_ = trees[i].predict(X_train.iloc[samples_oob[i]])\n", " errors[i] = 1 - metrics.accuracy_score(y_train.iloc[samples_oob[i]], y_pred_)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbYAAAEjCAYAAABeoiSAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtYVNX+P/D3MOAFjUDkooh6EkRUBKMMxQuSiX7LwLyl\neaMML2RZmoiXUx6tFEM7WeoptNS8FGqRWZjHQAG5WAhxfAqxFBNjIBAzMoGZ+f3hj4mRgZmBGWbv\nPe/X8/Q8uWfN7LU/M3t91lp77Y2sqqpKDSIiIomwsXQFiIiITImJjYiIJIWJjYiIJIWJjYiIJIWJ\njYiIJIWJjYiIJIWJjUiEysrKsHDhQgwcOBDOzs7o0qULfvnlF0tXy2BXrlyBk5MTJkyYYOmqkATZ\nWroCZD4FBQV47733cObMGZSWlkImk6FHjx4YOXIkFi5ciH/84x/Nvj89PR27d+9GdnY2ysrK0K5d\nO/Tq1QtjxozBggUL4ObmpvM9uhqrjh07olevXhg3bhxeeOEFODo6muw4rdHChQuRkpKC8ePH46mn\nnoKNjQ3uvfdeS1fLYpycnNCzZ0/k5+dbuiokADLeoC1N69evx+bNmyGXyzFq1CgMGDAAKpUKubm5\nyMzMhFwux4YNG/DMM880em9tbS2WLFmC/fv3o0OHDnj44YfRt29f/PXXX8jMzEReXh46deqEHTt2\n4LHHHtN6b31i69mzJ2bMmAEAUKvVqKiowMmTJ3Hp0iV4e3sjNTUV9vb2bRILqamtrYW7uzu8vLyQ\nnZ1t6eq0yJUrV+Dv74/hw4fj6NGjrf48JjZqiCM2CXrzzTcRHx8PT09PHDhwAAMGDNB6PT09HbNm\nzcLLL78MR0dHTJo0Sev1pUuXYv/+/fDz88O+ffvg6emp9fqnn36KhQsX4umnn0ZSUhKGDh3aqA49\ne/ZETEyM1ra6ujo88sgjyM/PR1JSEqZPn26iI7YupaWlUKlUcHV1tXRViASJ19gk5pdffsHGjRth\nZ2eHgwcPNkpqADB8+HD85z//gVqtRkxMDP7880/Nazk5Odi7dy8cHR1x6NChRkkNACZOnIj169ej\ntrYWL774osF1s7W1RXBwMADgt99+M+q4Tp8+jSeffBJeXl5wdXWFn58fli1bhrKyskZlH330UTg5\nOaG4uBjbt2/HsGHD4O7ujpkzZwIA9u3bBycnJ2zcuBHffvstJk+ejN69e6NLly74/fffNZ+TlpaG\nqVOn4r777oObmxsCAgIQGxuLioqKRvtcuHAhnJyckJGRgYMHDyI0NBQeHh4YOXKkQcdXUFCAuXPn\nom/fvnB1dcXAgQOxePFiFBcXa5UbNGgQBg0aBJlMhvT0dDg5OcHJyQnR0dEG7efSpUtYvHgx/Pz8\n4ObmBi8vL8ycOVPnSKe0tBQbN27EuHHj4OPjA1dXV/j6+mLevHn48ccfm9zHuXPn8PTTT6N///5w\ndXWFj48PHn/8cRw4cEBn+crKSrzwwgvo168f3NzcMHToUOzbt8+g46mPgUwm01y30xUTJycn+Pv7\n4/fff8eKFSvg5+eHrl27YseOHZoyt2/fxtatWxESEoIePXrAw8MDo0ePxgcffNDk/vPz8/H000/D\n19cXrq6u6NevH+bPn49Lly4ZVH8yD47YJGbv3r2oq6vDE088gf79+zdZbuzYsRg8eDDy8vK0Rk+7\ndu2CTCbD3Llzmx0RREZGIj4+HhcuXEBGRoYmYTWnrq4O6enpAIDAwECDj+mtt97C2rVr0aVLF4wd\nOxZubm44f/48du7cieTkZJw4cQLdunXTlJfJZJDJZFi+fDlycnIwduxYhIWFoXPnzlqvZ2dnIz4+\nHsOHD8ecOXOgUCggl8sBAHv27MGSJUtgb2+P8PBwuLu7Izs7Gzt27MCxY8dw/Phxnft8++23kZaW\nhvHjxyMkJAQ1NTV6j+/EiROYNWsWVCoVJkyYgH/84x/43//+h48++ghffPEFjh49ioEDBwIAFi1a\nhCtXrmD79u1a071+fn5693Pq1Ck89dRTqKmpQVhYGPr06YNr167hiy++wH//+18cOHAAo0eP1pQ/\nc+YM3n77bQwfPhyPP/44OnfujJ9++glHjx7FV199heTk5Eb73bNnD1566SXI5XKMGzcO3t7eqKio\nQH5+Pnbs2NFolH7jxg2EhYWhXbt2CA8PR01NDT777DM899xzkMvlePLJJ5s9pp49e2LFihXYsGED\n7r33XixatAhqtVpnTGpqavD444/j999/x9ixY9GxY0d0794dAPDHH38gPDwc586dw6BBgzRx/eab\nb/DSSy/h22+/xbvvvqv1eZ988gmio6PRvn17jB8/Hh4eHvj5559x5MgRJCcn49ixY5rvjdoWr7FJ\nTHh4ONLS0vDWW29h9uzZzZZdt24dNm/ejNmzZ+Pf//43AGDw4MEoLi7Gp59+ilGjRjX7/meffRaH\nDx/GqlWrsHTpUgDa19jqGzG1Wo3KykqcPHkSv/76K55//nnExsYadDwZGRmYMGEChgwZgsTERNxz\nzz2a1z755BPMnz8fjz/+OHbv3q3Z/thjjyEjIwPdu3fH8ePH0aNHD63P3L9/P6KjoyGTyXTGqaSk\nBPfffz/atWuHkydPom/fvprXXn/9dWzatAlhYWE4ePCgZvuiRYtw4MABdOrUCcePH9c5Utblzz//\nhJ+fH27cuIGkpCStDsJHH32ExYsXo3///sjIyNBsb8n1qd9//x0BAQGwsbHBV199BW9vb81rRUVF\nCA0NxT333IP8/HzY2dkBACoqKtChQwd06tRJ67POnz+PsLAwDB06FImJiZrthYWFGD58ODp16oTk\n5GT069dP633Xrl3TJJL6Y5DJZJg9eza2bNkCmUym+Zzg4GB4e3sjMzPToOPTd42tflQ3evRo7Nu3\nDx06dNB6ffHixdi3bx/Wrl2LxYsXa7bX1tZi5syZOHHiBA4cOICwsDAAd0a+Q4cOhYeHB7788kut\nhVQZGRkIDw+Hn58fUlJSDKo/mRanIiVGoVAAQKPGXBcPDw8Ad6ac7n5//Wv63q9Wq7XeX++XX35B\nXFwc4uLisGnTJuzcuROXL1/GiBEjMHbsWIOOBYBmqmjz5s1aSQ0Apk6dikGDBuHLL79EdXW11msy\nmQwvvPBCs3Hw8/PTmfwPHDiA2tpazJs3TyupAXeuP3br1g1ff/21JlYN9zl37lyDkxoAHDt2DJWV\nlQgPD2806p05cyb8/f3xww8/4NtvvzX4M3U5cOAAqqqqsHz5cq2kBgDe3t6YPXs2SktLcerUKc12\nZ2fnRkkNAAYMGIARI0YgPT0dSqVSsz0hIQFKpRLLli1rlNQAaJJaQ/b29li/fr0mqQGAj48PHnro\nIRQWFmpNk5vCunXrGiW1qqoqHDx4EIMGDdJKagBgZ2eHf/7zn1Cr1fj444812xMSElBTU4PXXnut\n0erg4OBgjB8/Hvn5+bhw4YJJ60+G4VQkmUVwcLDWaKKqqgrZ2dlYvnw5xo8fj/3792PMmDF6Pycn\nJwe2trb4/PPP8fnnnzd6vaamBkqlEhcvXoS/v7/Wa/fff3+zn93UdOj3338PABgxYkSj19q3b4+g\noCB89tln+P777/HII48Ytc+75efnQyaT6dwXAISEhOD7779Hfn4+HnjgAaM+u6GcnBwAwP/+9z9s\n2LCh0esXL16EWq1GYWGh1vdy/Phx7Nq1C/n5+aioqEBdXZ3mNZlMhoqKCs2U9XfffQcABn2v9e67\n7z7NFHFD9R2rqqoqk62e7dChg87p+e+++w51dXWwsbHRGZva2loAd0aS9erjmZGRgby8vEbvKS8v\n17zn7s4RmR8Tm8S4urriwoULuHr1qt6yJSUlAAB3d3et91+5cgUlJSXw8vLS+36ZTKb1/qY4Ojoi\nLCwMHTp0QEREBFauXGlQA1hZWQmlUom4uLgmy8hkskYjtvpjaU5Tr9cvIGnq9foe+o0bN4zeZ0v2\npVarde7LGJWVlVCr1fjoo4+aLHN3HLdv346VK1fCyckJo0ePRo8ePdCxY0fIZDJ88cUXOH/+PG7f\nvq0pX1/Hhtce9Wnq3jtb2ztNU8MRYWt17dpV5/bKykoAQF5ens4kBdyJTcPRY/177r7udjddv0sy\nPyY2iQkKCkJaWhpSU1P1XmNLTU2FTCZDUFCQ1vuvXLmClJSUZq+xKZVKzUKQhu/Xp36UdPHiRdy8\nebPR9OLdHBwcUFtbiytXrhi8j3oNp7eMed3BwQEAdK64BP6erq0vZ8w+W7IvmUymc1/G7kcmk+HU\nqVMGLTRRKpXYuHEj3N3dcfr0abi4uGi9npOTg/Pnz2ttq09Sv/76qyBvFtf3fUdFRekcsTX3nkuX\nLgnyWK0dr7FJzFNPPQVbW1scO3YMP/zwQ5PlTpw4gdzcXDg7OyM8PFyzfc6cOVCr1dizZ0+TjS0A\nfPjhhygtLYWPj49BKyLrVVVVaf5fpVLpLf/ggw/ijz/+aNSImpO/vz/UajXS0tIavVZTU6O5Kfru\nqU9T7wuA5ppXQEBAq/bz4IMPQq1W48yZMwaVr6iowI0bNzBkyJBGSa26ulrnIo36qdITJ060qq4t\nYWNj0+LR3QMPPAAbGxuDF6oAd+IJwOB4UttiYpOYXr16YdmyZaipqcG0adN0JoT09HRERUVBJpNh\nw4YNWtcwhg4dihkzZuD69euYMmWKzucPJiUlYfXq1bCzs8PmzZuNqt8777wDABg4cKBBPd3o6Gio\n1WosWbIE165da/T67du3kZWVZVQd9Jk6dSratWuHnTt3oqioSOu1+Ph4XLt2TXPbQWs9+uij6NKl\nC5KSkho1kvv27UNeXh58fX1bdX0NuLMQxdHREZs2bcLZs2d1lsnKytJcQ3NxcYG9vT3y8vK0ptPq\n6uoQExOj816+Z555BnK5HPHx8To7Vbq+P1Pp0qULKioqtKZGDeXs7Ixp06ahoKAAGzZs0Jkgr127\npvVbiIqKgp2dHVavXt3oNwLcGfE21Vkh8+NUpATFxMTg9u3beOuttzBq1CiEhIRoHql17tw5ZGRk\nwM7ODm+++Wajp44AwJYtW6BSqfDxxx9jyJAhWo/UysrKwrlz53DPPfcgISFB51NHgDvLuRtO61y/\nfh05OTnIy8uDvb09Nm3aZNCxjBgxAuvWrcOrr76KwMBAPPLII+jduzdu3bqFq1ev4syZM+jVqxdO\nnz7dsmDp4OnpiY0bN2Lp0qUYPXo0IiIi4ObmhuzsbGRkZKBHjx6Ij483yb7s7e2xbds2zJkzBxER\nEXj88cfRu3dvFBQU4MSJE3ByctK6ibilHB0dsWfPHsycORNjx47FyJEj0a9fP9jZ2aGkpATffvst\nSkpKcPnyZc205fz58/HWW29h2LBh+L//+z/U1tYiLS0NVVVVmlWRDfn4+CA+Ph4vvfQSQkJCNPex\nXb9+Hd9//z1qamq0Vl2a0ujRo3Ho0CE88cQTGDZsGNq3b4+BAwdi3LhxBr0/Li4Oly5dQlxcHD7+\n+GMMGzYMbm5uUCgUuHjxIs6ePYvXX39ds6LUy8sL27dvR3R0NIYOHYqHH34YXl5eUCqVKCkpQXZ2\nNmpqanD58mWzHC81j4lNov75z38iIiIC77//PjIyMpCZmQmZTAYPDw9ERUVhwYIFTT4EuV27dti+\nfTtmzJiBPXv2ICsrC//97381D0F+8cUXsWDBgiYXPMhkMs1y/4af2a1bN8yZMweLFy9Gnz59DD6W\n5557DkOHDsWOHTuQmZmJ48ePo3PnznB3d8e0adMwceJEnXVoTv0N1U2ZO3cu+vTpg61bt2puJ+jW\nrRsWLFiApUuXNrkQoSXCwsLw9ddfY/PmzTh9+jQ+//xzuLi44KmnnsKyZcvQq1cvo+uvy4gRI3Dm\nzBm88847OHnyJM6ePQtbW1u4ubnhoYcewqOPPqp1LW/16tXo2rUr9u7di927d8PBwQGjR4/GqlWr\n8Prrr+vc/+zZszFgwABs3boVWVlZSE5ORpcuXeDj44OoqCijjsGY49uwYQPkcjlSU1ORnZ0NlUqF\n6dOnayW25j6vc+fO+OKLL7B371588sknOHbsGP766y907doVvXv3xtq1axv9zp544gkMHDgQ7777\nLk6dOoVTp06hQ4cOcHd3x9ixY7Wm+Klt8QZtIiKSFF5jIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFi\nIyIiSWFiIyIiSWFikyBdT0KgpjFehmOsDMdYWQ4TGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxER\nSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoTGxERSQoT\nGxERSYrBiS0hIQH+/v5wd3dHSEgIMjMzmyybnp6OGTNmoF+/fujevTuCg4Px0UcfaZU5evQonnji\nCXh5ecHT0xNjxozBV1991fIjISIigoGJ7ciRI4iNjcWyZcuQlpaGIUOGYMqUKSgpKdFZPicnBwMG\nDMCePXuQmZmJZ555BkuWLMHhw4c1ZTIyMjBq1CgkJiYiLS0NjzzyCGbOnImsrCzTHBkREVklWVVV\nlVpfoTFjxsDPzw9btmzRbAsMDERERATWrFlj0I4iIyOhUqmwe/fuJss8/PDDGDZsGNatW2fQZ5Ju\nRUVF8Pb2tnQ1RIPxMhxjZTjGynL0jthqa2uRl5eHkJAQre2hoaHIzs42eEc3b96Eo6Njs2X++OMP\nvWWIiIiaozexVVRUQKlUwtXVVWu7i4sLysrKDNpJcnIyTp8+jcjIyCbLvP/++/j1118xbdo0gz6T\niIhIF1tz7yArKwtRUVGIi4tDQECAzjJJSUl49dVX8cEHH6BHjx7mrhIREUmY3sTm7OwMuVzeaHRW\nXl7eaBR3t8zMTEybNg2rVq3C3LlzdZZJSkrCwoUL8Z///Adjx47VW+GioiK9ZYhxMhbjZTjGynCM\nlX7muA6pN7HZ2dkhICAAqampCA8P12xPSUlBREREk+/LyMjAk08+iZUrV2L+/Pk6y3z66aeIjo7G\n9u3bMWHCBIMqzIux+vGitXEYL8MxVoZjrCzHoKnI6OhoLFiwAIMHD0ZQUBB27twJhUKhuWa2du1a\n5ObmIikpCQCQlpaGJ598EvPmzcOkSZM0oz25XA5nZ2cAwOHDh7FgwQKsX78eQ4cO1ZRp164dF5AQ\nEVGLGZTYJk6ciOvXryM+Ph4KhQK+vr5ITEyEh4cHAEChUKC4uFhT/sCBA7h16xa2bt2KrVu3arZ7\nenoiPz8fAPDBBx9AqVQiNjYWsbGxmjLBwcE4evSoSQ6OiIisj0H3sZG4cArEOIyX4RgrwzFWlsNn\nRRIRkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRER\nkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaQwsRERkaTYWroC1DylEsjPl+Pn\nn2W47z41AgKUsGF3hIioSUxsApefL0dYWCfU1spgZ6dGcnI1AgOVlq4WEZFgse8vcD//LENtrQwA\nUFsrw6VLMgvXiIhI2JjYBO6++9Sws1MDAOzs1LjvPrWFa0REJGycihS4gAAlkpOrcenS39fYiIio\naaJPbFJfXGFjAwQGKhEYaOmaEBGJg+gTGxdXEBFRQ6If23BxBRERNST6xMbFFURE1JDopyK5uIKI\niBoSfWLj4goiImpI9FORREREDYl+xEZEliX1W25IfJjYiKhVeMsNCQ37VUTUKrzlhoSGiY2IWoW3\n3JDQcCqSiFqFt9yQ0DCxEVGr8JYbEhomNiIiapbYVr4ysRERUbPEtvLV4JybkJAAf39/uLu7IyQk\nBJmZmU2WTU9Px4wZM9CvXz90794dwcHB+Oijj3SWCwkJgbu7OwYPHowPPvigZUdBRERmI7aVrwYl\ntiNHjiA2NhbLli1DWloahgwZgilTpqCkpERn+ZycHAwYMAB79uxBZmYmnnnmGSxZsgSHDx/WlCku\nLsa0adMQFBSEtLQ0vPjii1i+fDmOHj1qmiMjIiKTENvKV1lVVZXeGo4ZMwZ+fn7YsmWLZltgYCAi\nIiKwZs0ag3YUGRkJlUqF3bt3AwBeeeUVHDt2DN9++62mzPPPP4/CwkIcP37c2OOgBoqKiuDt7W3p\naogG42U4xspwUoqVSgWcOyfXWvkq5GtseqtWW1uLvLw8hISEaG0PDQ1Fdna2wTu6efMmHB0dNf8+\ne/YsRo8erVXm4Ycfxrlz56BUCnfulojI2tSvfJ08uQ733y/spAYYkNgqKiqgVCrh6uqqtd3FxQVl\nZWUG7SQ5ORmnT59GZGSkZltZWZnOz6yrq0NFRYVBn0skBEolkJsrx6FDtsjNlUOlsnSNiKyb2VdF\nZmVlISoqCnFxcQgICGj15xUVFZmgVtLX2jjJZLYoK/NAcbEdevWqhZtbCVSqOhPVTnhaE6/y8l4I\nD++qWTH22We/wdW12IS1ExZrPAdbej5YY6yMZY7pWr2JzdnZGXK5vNHorLy8vNGI626ZmZmYNm0a\nVq1ahblz52q95urqqvMzbW1t4ezs3ORnSmXO2pxMMbefmytHeHjD5b0dBb28tzVaG6/8fFutFWO/\n/mqP4GBp/k5bEisx3AOlr44tOR+kdI1NbPT+vOzs7BAQEIDU1FSt7SkpKQgKCmryfRkZGZg6dSpi\nY2Mxf/78Rq8PGTKk0Wd+8803GDx4MORyuWG1J7MR2/JeSxLbirG2Vn8P1Lx5nRAW1gnnzgnv/NZX\nR54P4mJQvyk6Ohr79+/Hnj17cOHCBcTExEChUGiuma1duxbh4eGa8mlpaZg6dSqefvppTJo0CWVl\nZSgrK9O6dhYZGYlff/0VsbGxuHDhAvbs2YODBw9i8eLFJj5Eagk21oarf1ZiQkI1jh+vFuWzEs15\nnVAMSUFfHXk+iItB19gmTpyI69evIz4+HgqFAr6+vkhMTISHhwcAQKFQoLj472sKBw4cwK1bt7B1\n61Zs3bpVs93T0xP5+fkAgF69euGTTz7BypUr8cEHH8Dd3R1xcXF47LHHTHl81EJ8sK3hpPCsRHM+\nWaI+KdR/thCTgr468nwQF4PuYyNx4dy+cRgv4NAhW8yb10nz74SEakye3HhxREtiJYZ7oMxRR/6u\nLIfPiiQis46qhDKibW6BiFDqSKbBxEZEVjHVJrYH+VLLMbERkVWMWHQtEJHy8Vozgc10ExGZB1c2\nWg+O2IioETHcVG0sa5hupTuY2JogxRObyFBSvB5lDdOtdAcTWxOkeGITGYrXo0jMOAZpghielkBk\nDGOeLsLrUSRmHLE1QQxPSyBxstQ0tzGzEFK4HsXLCdaLia0JUjixSZgsNc1tzPSiFK5H8XKC9ZJ0\nYmtNj02MJ3b98RYWeuHmTTl7qAJlqetXYpyFaM05zOuE1kvSic3aemzWdrxiZakEI8ZZiNb8psWY\nyFuDU69/k3Ris7Yem7Udr1hZKsGIcRaiNb9pMSby1hBqx9YSCVfSic3aemzWdrxiJcYEYymt+U1b\nW5yF2rG1RMKVdGKzth5b/fFeuFAHHx9byR8vNU8KU1PWdg63hik7tqb87Vgi4Uo6sQm1x2auBqf+\neB0cLvLvQJFgp6aMIdRzWIhM2Qkw5W/HEjNJkk5slqIvcUmhwbF2YhgNCXVqiszDlJ0AU/52LDHq\nZmIzA32Jiw2O+ImhcyLGa65i6DBYA1P+diwx6mZiMwN9iUuMDQ5pE0PnRIzXp8TQYbAGYvztNMTE\nZgb6EpfYfzRtSag9eDF0TsR4fUoMHQZrIMbfTkNMbGagL3GJ/UfTloTag2fnxDzE0GEg4WNiMwMm\nLtMRag+e37F5sMNApsDERoLGHrx1YYeBTEFyiU2o12SEwJSxaas4m7IHz99GywnlAdv8DskQkkts\nQr0mIwSmjE1bxfnuHnz9H8tsScPWVnWWYuMrlPNKKPUgYRP56dYY//J100wZG0vFub5hmzevE8LC\nOuHcObnB722rOremjkIllPNKKPUgYZNcYuOftG+aKWNjrjjXj8gOHbJFbq4cKpX2661p2NrqtyHF\nxlco55VQ6kHCJrmpyNZck5HiFFJDprxeZa7Va/qmmlqzmKStVtxJccGLUB6wzVWTZAhZVVWV+M86\nE8nNlcb8fVFRkWgfgnzokC3mzeuk+XdCQjUmT67T/FulAs6dk2s1bK3tfJg6Xuaoo1CI+bfV1hgr\ny5HciK01hHrPlDXRN9oxZjm4vhG4uVb6ccm64aQ+S2IsxsM0mNgakOIUkti05Z/ekMIKO7E3hFL4\nDkyJ8TANJrYGxDh/r6thE7O2/NMbUhihi70hNOV3IPYkD0jjNykETGwNiHEKSVfD5uBg6VoJg74R\nuBRG6GJvCE35HYg9yQPS+E0KARObyOlq2Pz9LVwpgdA3AhfKSr/WEGJDaMzIyZSzJGJP8oA4Z42E\niIlN5FrTsElh6qY5+kbg9a87OFwU7eo1ITaExoycTDlLIsQkbywxzhoJERObyOlq2H76ybD3SmHq\nRur0dT6E2BBaauQkxCRPlsHE1kJCGe0017Dpq6MUpm6kToydD0uNnISY5M1JKG2QEBkchoSEBPj7\n+8Pd3R0hISHIzMxssuzt27exaNEiBAcHw8XFBRMmTNBZLjExESNGjED37t3h4+ODqKgolJWVGX8U\nLaTv8U3NEcPzAPXVkY8nEj4xPp6rfuSUkFCN48erRTlyak3b0FbE0AZZikGJ7ciRI4iNjcWyZcuQ\nlpaGIUOGYMqUKSgpKdFZXqlUomPHjpg/fz7CwsJ0lsnKysKCBQvw1FNPISsrC/v378eFCxcQFRXV\n8qMxkhgeqNsa+urIBqhttKaOYux81I+cJk+uw/33Nx5FmOo7M+d3L4akIYY2yFIMmorctm0bZs6c\niVmzZgEA4uLicPLkSezatQtr1qxpVN7e3h7x8fEAgIKCAty4caNRmbNnz8LDwwMLFiwAAPTs2RPP\nPvssVqxY0eKDMVZrpuLMNd1iyukFUz7FQ6jEMFXXmjpK8bqRqb4zc373Ypim13d+t9VUpTH7aas6\n6U1stbXvjfGQAAAU+klEQVS1yMvLw+LFi7W2h4aGIjs7u8U7DgoKwvr165GcnIxx48ahoqICR44c\nwdixY5t9X1s2/M2x1EOAjSHFRvFuYmiAWlNHc3Y+LHWNxlTfmTm/ezGssNR3frdVp8+Y/bRVnfQm\ntoqKCiiVSri6umptd3FxwalTp1q84wcffBAJCQmIiorCrVu3UFdXh9DQUGzbtq3Z9wml4TdXg2PK\nk1UKIzJ9xNAACbWOlhrtmioe5oyrGDqF+s7vtur0GbOftqqTxVZF/vjjj4iJicHy5csRGhoKhUKB\n1atX44UXXsCOHTuafF9hYZ1WYC5cqIODw8UW18PBAZobmg1dJm9O3bv3gp2dveZk7d79TxQVFRv9\nOUVFRWaonfDce68tPvvMA1eu2KFXr1rce28Jiorq9L/xLuaMl6nqaGqFhV4tOpdaGytTxcPccTVF\n22DJ89BUbYkp96OrLNDO5HXSm9icnZ0hl8sbrVYsLy9vNIozxpYtWxAYGIjnnnsOANC/f3/Ex8dj\n/PjxeOWVV9CtWzed7/PxsdXqpfn42Ar+5lpjpnz69MFdPcV2sLEx7vis7c9l9OkDBAcDd06Qfxj9\n/raIV2vraA43b8qNPpdMFStTxUOIca1n6fPQFG2Jqfejq6w56E1sdnZ2CAgIQGpqKsLDwzXbU1JS\nEBER0eId37p1C3K59kojGxsbyGQyqJpZ3iSGKYK7WepJDETNEeO5RIZrq7bEmP20VZ0MmoqMjo7G\nggULMHjwYAQFBWHnzp1QKBSIjIwEAKxduxa5ublISkrSvKewsBC3b99GZWUlqqurUVBQAADw8/MD\nAIwbNw5LlizBrl27EBoaitLSUqxcuRIBAQHw8PBosi5ibPjFsMCBrI8YzyUiQxiU2CZOnIjr168j\nPj4eCoUCvr6+SExM1CQghUKB4mLtOdUpU6bg6tWrmn+PHDkSMpkMlZWVAIAZM2aguroaCQkJWLNm\nDe69916MGDECr776arN1Uakgurvrhbp4gAzHpzxoYzxIyGRVVVWiamW/+04uuPuU9FGpgHPn5FpT\nPuZsBCwxty/mhs6QeOXmCv9+ubZQHyvGQz9LX2OzZqJ7VqQYp/GsYcpHDDdKt4ZQppOF0oEQSjyI\ndBFdYpPiNJ5QGqvWkHpDJ5TpZKF0IIQSD7GTwrkvRKJLbEJdudWaH6hQGqvWkHpDJ5QVhELpQAgl\nHmInlHNfaglWdIlNqMFuzQ9UKI1Va0i9oRPKdLJQOhBCiYfYGXvumysBmTLBCiFJii6xCZUQH6jc\nltjQtQ2pdyCsjbHnvrlGeKbsXAthFMrEZiJCfKAytZwQep26sAMhLcae++aa3TFl51oIM1BWldjM\n2VgJ8YHKQiXUpNGQEHqdJH3Gnvvmmt0xZedaCDNQVpXYzNlYSSE5tVXCEUPSEEKvk+hu5prdMWX7\nJYQZKKtKbGysmmeqhKMvQYrhexBCr5PobmLoQAuhjlaV2NhYNc9UCUdfghTD9yCEXicRtYxVJTYp\nNFbmnC40VcLRlyDF8D0IoddJ4lR/jhYWeuHmTbkgryFLnVUlNik0Vua8PmWqhKMvQUrheyBqihiu\nIUudVSU2KTDn9SlTJRwxjMiIzEUM15CljolNZMRwfYojMrJmYjhHpY6JTWQ4GiISlruvew8adOcc\nvXChDj4+toI8R8VwL2lrMLGJDEdDRK1nyoa9qWtqDg4XBfv32KR+HZCJjUhApN6TFgpTNuxivKYm\nxjobg4mNSECk3pMWClM27GK8pibGOhuDiY0sjqOUv0m9Jy0UpmzYjb3uLYTfu9Sv1TOxkcW11aO8\nxEDqPWmhMGXDbux1byGMyqV+rZ6JjSyurR7lJQZS70kLhSUbdo7KzY+JjSyurR7lJQZS70kTR+Vt\ngYmNLK6tHuVFJAQclZsfExtZHB/lRdaEo3LzY2IjyWCDQUQAILI1Y0RERM1jYiMiIknhVCQRERlM\nDPeLMrERERlBDA27OYnhflEmNiIiI4ihYTcnMdwvakX9jNZRKoHcXDkOHbJFbq4cKpWla9S2rP34\nierpatitSf39ogAEe78oR2wGsvZemrUfP1E9a38QgBjuF2ViM5AYht/mZO3HT1RPDA27OYnhflEm\nNgNZey/N2o+fqJ4YGnZrx8RmIGvvpVn78ROReDCxGcjae2nWfvxEJB5cFUlERJJicGJLSEiAv78/\n3N3dERISgszMzCbL3r59G4sWLUJwcDBcXFwwYcIEneVqa2vx2muvwd/fH25ubvDz88N7771n/FEQ\nEW/JIPr/DJqKPHLkCGJjY7F582YEBQXh/fffx5QpU5CdnQ0PD49G5ZVKJTp27Ij58+fj66+/xo0b\nN3R+bmRkJEpLS/H222/jvvvuQ3l5OW7dutW6IyKyUrwlg+gOgxLbtm3bMHPmTMyaNQsAEBcXh5Mn\nT2LXrl1Ys2ZNo/L29vaIj48HABQUFOhMbN988w3S0tKQl5cHJycnAICnp2eLD4TI2vGWDKI79E5F\n1tbWIi8vDyEhIVrbQ0NDkZ2d3eIdf/nll7j//vvxzjvvYMCAAQgMDERMTAyqq6tb/JlE1kwMT4Qg\nagt6R2wVFRVQKpVwdXXV2u7i4oJTp061eMeXL19GZmYm2rVrh7179+LGjRt4+eWXoVAo8OGHH7b4\nc4msFW/JILrDYsv9VSoVbGxssHPnTnTu3BkAsGnTJkyaNAm//fYbunbtqvN9RUVFbVlN0RJSnGQy\nW5SVeaC42A69etXCza0EKlWdpaulRUjxag0HB8Df/87///STefYhlVi1BSHFSqjnobe3t8k/U29i\nc3Z2hlwuR1lZmdb28vLyRqM4Y7i5uaFbt26apAYAffv2hVqtxtWrV5tMbOYIgtQUFRUJKk65uXKE\nhzdc1NBRUIsahBYvIWOsDCe0WAn9PDQlvdfY7OzsEBAQgNTUVK3tKSkpCAoKavGOg4KCUFpaij//\n/FOz7eLFi5DJZFxEIjHW/jR0IiGwpvPQoPvYoqOjsX//fuzZswcXLlxATEwMFAoFIiMjAQBr165F\neHi41nsKCwvx/fffo7KyEtXV1SgoKEBBQYHm9cmTJ8PJyQnR0dH48ccfkZWVhdjYWERERMDZ2dmE\nh0iWxkUNRJZnTeehQdfYJk6ciOvXryM+Ph4KhQK+vr5ITEzU3MOmUChQXFys9Z4pU6bg6tWrmn+P\nHDkSMpkMlZWVAIBOnTohKSkJy5cvx8MPPwxHR0c8+uijeOWVV0x1bCQQXNRAZHnWdB7KqqqqpJu2\nrZTQ5vaFjvEyHGNlOMbKcvisSCIikhQ+3Z/IgpTKO4/C+vnnv6eHbNjdJGoVJjYiC+LzHYlMj31D\nIguypiXYRG2FiY3IgqxpCTZRW+FUJJEFWdMSbKK2wsRGZEE2NkBgoJJ/XobIhJjYiIgEiqtmW4aJ\njYhIoLhqtmWY+4mIzEipvPNk/UOHbJGbK4dKZfh7uWq2ZThiIyIyo9aMuupXzda/l6tmDcPERkRk\nRrpGXYYuFuKq2ZZhYiMiMqPWjLq4arZlmNiIiMyIo662x8RGRGRGHHW1Pa6KJCIiSWFiIyIiSWFi\nIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSWFiIyIiSeGzIolw\n549B5ufL8fPPfz+o1obdPiJRYmIjQuv+GCQRCQv7pETQ/ccgiUicmNiI8PcfgwRg9B+DJCJh4VQk\nEfjHIImkhImNCPxjkERSwqlIIiKSFCY2IiKSFCY2IiKSFCY2IiKSFIMTW0JCAvz9/eHu7o6QkBBk\nZmY2Wfb27dtYtGgRgoOD4eLiggkTJjT72ZmZmejatSuGDRtmeM2JiIh0MCixHTlyBLGxsVi2bBnS\n0tIwZMgQTJkyBSUlJTrLK5VKdOzYEfPnz0dYWFizn11VVYWFCxciJCTE6MoTERHdzaDEtm3bNsyc\nOROzZs2Ct7c34uLi4Obmhl27duksb29vj/j4eMyePRvdunVr9rMXL16MGTNm4IEHHjC+9kRERHfR\nm9hqa2uRl5fXaEQVGhqK7OzsVu08ISEBv/32G15++eVWfQ4REVE9vYmtoqICSqUSrq6uWttdXFxQ\nVlbW4h2fP38emzZtwnvvvQeZjM/lIyIi07DIqsiamho888wzWLduHTw9PQEAajWfzUdERK2n95Fa\nzs7OkMvljUZn5eXljUZxhiotLUVhYSGio6OxaNEiAIBKpYJarYaLiwsSExObXExSVFTUon1aG8bJ\nOIyX4RgrwzFW+nl7e5v8M/UmNjs7OwQEBCA1NRXh4eGa7SkpKYiIiGjRTrt3797odoGEhASkpqZi\n3759mlGcLuYIgtQUFRUxTkZgvAzHWBmOsbIcgx6CHB0djQULFmDw4MEICgrCzp07oVAoEBkZCQBY\nu3YtcnNzkZSUpHlPYWEhbt++jcrKSlRXV6OgoAAA4OfnB1tbW/Tr109rH127dkW7du3g4+NjqmMj\nIiIrZFBimzhxIq5fv474+HgoFAr4+voiMTERHh4eAACFQoHi4mKt90yZMgVXr17V/HvkyJGQyWSo\nrKw0YfWJiIi0yaqqqrhqQ2I4BWIcxstwjJXhGCvL4bMiiYhIUpjYiIhIUpjYiIhIUpjYiIhIUpjY\niIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhI\nUpjYiIhIUpjYiIhIUpjYiIhIUpjYiIhIUmRVVVVqS1eCiIjIVDhiIyIiSWFiIyIiSWFiIyIiSWFi\nIyIiSWFiIyIiSRFFYktISIC/vz/c3d0REhKCzMxMS1fJ4jZv3ozQ0FD07NkTXl5eePLJJ/HDDz80\nKvfGG2/A19cX3bp1w2OPPYYff/zRArUVjs2bN8PJyQnLly/X2s44/U2hUGDhwoXw8vKCu7s7hg4d\nijNnzmiVYbwAlUqF9evXa9omf39/rF+/HiqVSqucNcbqzJkzmD59Ovr37w8nJyccOHCgURl9camp\nqcHLL7+MPn36wMPDA9OnT8e1a9cM2r/gE9uRI0cQGxuLZcuWIS0tDUOGDMGUKVNQUlJi6apZ1Jkz\nZ/Dss8/i66+/xtGjR2Fra4uIiAhUVVVpyrz11lvYvn07Nm3ahJSUFLi4uGDixImorq62YM0t5+zZ\ns9i9ezcGDhyotZ1x+tuNGzcQFhYGmUyGQ4cOIScnBxs3boSLi4umDON1x5YtW7Br1y5s2rQJZ8+e\nxcaNG7Fz505s3rxZU8ZaY1VdXY0BAwZgw4YNsLe3b/S6IXFZsWIFjh07hl27duGrr77CzZs3MW3a\nNKjV+u9QE/x9bGPGjIGfnx+2bNmi2RYYGIiIiAisWbPGgjUTlurqavTs2RP79+9HWFgYAKBfv36Y\nP38+XnzxRQDAX3/9BW9vb6xfvx5z5syxZHXb3I0bNxASEoKtW7diw4YN6N+/P+Li4gAwTg3961//\nQmZmJr766qsmyzBed0ybNg3Ozs7Ytm2bZtvChQtx/fp1HDx4EABjBQA9evTApk2bMH36dM02fXH5\n/fff4eXlhe3bt2PSpEkAgJKSEvj5+eHw4cMYPXp0s/sU9IittrYWeXl5CAkJ0doeGhqK7Oxsy1RK\noG7evAmVSgVHR0cAwOXLl6FQKLR+AB06dMCwYcOsMnZLlizBxIkTMXz4cK3tjJO2L7/8EoGBgXj6\n6afh7e2NESNG4P3339e8znj9bejQoUhLS0NRUREA4Mcff0RaWpqmY8lY6WZIXM6dO4e6ujqtMh4e\nHvDx8TEodramr7bpVFRUQKlUwtXVVWu7i4sLTp06ZaFaCdOKFSvg7++PIUOGAADKysogk8m0ppCA\nO7ErLS21RBUtZvfu3bh8+TJ27tzZ6DXGSVt9nBYtWoQXX3wRBQUFWL58OWQyGebNm8d4NbBkyRL8\n8ccfeOihhyCXy6FUKrF06VJERkYC4G+rKYbEpby8HHK5HF26dGlUpqysTO8+BJ3YyDArV65ETk4O\nkpOTIZPJLF0dQbl48SLWrVuH48ePw8ZG0BMUgqBSqRAYGKiZ5vfz88NPP/2EhIQEzJs3z8K1E5bD\nhw/j4MGD2LVrF3x8fFBQUICYmBj06tULM2fOtHT1rJqgz3RnZ2fI5fJGGbq8vLzRKM5axcbG4tNP\nP8XRo0fRs2dPzXZXV1eo1WqUl5drlbe22OXk5KCyshIPPfQQunbtiq5duyIjIwMJCQlwcXFBly5d\nGKcG3Nzc0LdvX61tffv2xdWrVwHwd9XQK6+8gueffx4RERHw9fXF1KlTER0drVkPwFjpZkhcXF1d\noVQqUVlZ2WSZ5gg6sdnZ2SEgIACpqala21NSUhAUFGSZSglITEyMJqn16dNH67XevXvDzc0NKSkp\nmm1//fUXMjMzrSp2jz32GM6cOYP09HTNf4MHD8bkyZORnp4OLy8vxqmBoKAgzTWjekVFRfD09ATA\n31VDf/75Z6NZABsbG81yf8ZKN0PiEhAQAFtbW60yJSUlKCwsNCh28hUrVrxq8pqb0D333IM33ngD\nbm5u6NixI+Li4pCVlYV33nkHDg4Olq6exSxbtgwff/wxPvzwQ3h4eKC6ulqzVLZdu3YAAKVSiS1b\ntsDLywtKpRKrVq1CWVkZtmzZoikjde3bt9eM1Or/S0xMhKenp2aVFuP0N09PT8TFxcHGxgbdunXD\nqVOnsH79eixduhSDBw8GwHjVKywsxMcffwwvLy/Y2dnh9OnTWL9+PSZPnqxZ9GCtsaqurkZhYSEU\nCgX27t2LAQMGwMHBAbW1tXBwcNAbl/bt26O0tBQJCQkYMGAAbty4gZdeegmOjo549dVX9V5yEfxy\nfwDYtWsX/v3vf0OhUMDX1xdvvPGGVfd4AMDJyUnnlxsTE4OYmBjNvzdu3IgPP/wQVVVVCAwMxJtv\nvol+/fq1ZVUFZ8KECfD19dUs9wcYp4ZOnDiBtWvX4qeffkKPHj0QFRWFZ599VqsM43Wn8X7ttdfw\nxRdf4LfffoObmxsmTZqE5cuXayUta4xVeno6JkyY0KiNmj59Ot59910A+uNSW1uL1atX49ChQ/jr\nr78watQovPnmm+jevbve/YsisRERERlK0NfYiIiIjMXERkREksLERkREksLERkREksLERkREksLE\nRkREksLERkREksLERkREksLERkREkvL/ALrDGlp1jC3nAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "\n", "plt.scatter(range(n_estimators), errors)\n", "plt.xlim([0, n_estimators])\n", "plt.title('OOB error of each tree')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Estimate $\\alpha$" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "alpha = (1 - errors) / (1 - errors).sum()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "weighted_sum_1 = ((y_pred_df) * alpha).sum(axis=1)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "438 0.019993\n", "2674 0.050009\n", "1345 0.350236\n", "1957 0.170230\n", "2148 0.030047\n", "3106 0.040100\n", "1786 0.219819\n", "321 0.059707\n", "3082 0.100178\n", "2240 0.050128\n", "1910 0.180194\n", "2124 0.190111\n", "2351 0.049877\n", "1736 0.950014\n", "879 0.039378\n", "785 0.219632\n", "2684 0.010104\n", "787 0.710568\n", "170 0.220390\n", "1720 0.020166\n", "dtype: float64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weighted_sum_1.head(20)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.52674897119341557, 0.8954545454545455)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred = (weighted_sum_1 >= 0.5).astype(np.int)\n", "\n", "metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Weighted voting with sklearn" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.53600000000000003, 0.89454545454545453)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf = BaggingClassifier(base_estimator=DecisionTreeClassifier(), n_estimators=100, bootstrap=True,\n", " random_state=42, n_jobs=-1, oob_score=True)\n", "clf.fit(X_train, y_train)\n", "y_pred = clf.predict(X_test)\n", "metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [], "source": [ "errors = np.zeros(clf.n_estimators)\n", "y_pred_all_ = np.zeros((X_test.shape[0], clf.n_estimators))\n", "\n", "for i in range(clf.n_estimators):\n", " oob_sample = ~clf.estimators_samples_[i]\n", " y_pred_ = clf.estimators_[i].predict(X_train.values[oob_sample])\n", " errors[i] = metrics.accuracy_score(y_pred_, y_train.values[oob_sample])\n", " y_pred_all_[:, i] = clf.estimators_[i].predict(X_test)\n", " \n", "alpha = (1 - errors) / (1 - errors).sum()\n", "y_pred = (np.sum(y_pred_all_ * alpha, axis=1) >= 0.5).astype(np.int)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.55335968379446643, 0.89727272727272722)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 3: Combination of classifiers - Stacking" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The staking method consists in combining the different base classifiers by learning a \n", "second level algorithm on top of them. In this framework, once the base \n", "classifiers are constructed using the training set $\\mathcal{S}$, a new set is constructed \n", "where the output of the base classifiers are now considered as the features while keeping the \n", "class labels.\n", "\n", "Even though there is no restriction on which algorithm can be used as a second level learner, \n", "it is common to use a linear model, such as \n", "$$\n", " f_s(\\mathcal{S},\\mathcal{M},\\beta) =\n", " g \\left( \\sum_{j=1}^T \\beta_j M_j(\\mathcal{S}) \\right),\n", "$$\n", "where $\\beta=\\{\\beta_j\\}_{j=1}^T$, and $g(\\cdot)$ is the sign function \n", "$g(z)=sign(z)$ in the case of a linear regression or the sigmoid function, defined \n", "as $g(z)=1/(1+e^{-z})$, in the case of a logistic regression. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets first get a new training set consisting of the output of every classifier" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X_train_2 = pd.DataFrame(index=X_train.index, columns=list(range(n_estimators)))\n", "\n", "for i in range(n_estimators):\n", " X_train_2[i] = trees[i].predict(X_train)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789...90919293949596979899
23600000000000...0000000000
14120010000000...1000000000
14040000000000...0000100000
6261101111111...1111111111
3470000000000...0000000000
\n", "

5 rows × 100 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 \\\n", "2360 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "1412 0 0 1 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 \n", "1404 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 1 0 0 \n", "626 1 1 0 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 \n", "347 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 \n", "\n", " 97 98 99 \n", "2360 0 0 0 \n", "1412 0 0 0 \n", "1404 0 0 0 \n", "626 1 1 1 \n", "347 0 0 0 \n", "\n", "[5 rows x 100 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train_2.head()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegressionCV" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,\n", " fit_intercept=True, intercept_scaling=1.0, max_iter=100,\n", " multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,\n", " refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lr = LogisticRegressionCV()\n", "lr.fit(X_train_2, y_train)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.10093102, 0.1042197 , 0.09431205, 0.09652843, 0.09709429,\n", " 0.09902616, 0.11100235, 0.09662288, 0.09340919, 0.09112994,\n", " 0.10012606, 0.09821902, 0.09383543, 0.09553507, 0.09147579,\n", " 0.09649564, 0.08965686, 0.09196857, 0.09684012, 0.09020758,\n", " 0.09839592, 0.09513808, 0.1044603 , 0.10028703, 0.09671603,\n", " 0.09725639, 0.10912207, 0.10590827, 0.10275491, 0.10275279,\n", " 0.10607316, 0.09803225, 0.10319411, 0.0926599 , 0.09702325,\n", " 0.09524124, 0.088848 , 0.09960894, 0.09053403, 0.09010282,\n", " 0.0990557 , 0.0987997 , 0.10538386, 0.09584352, 0.09633964,\n", " 0.09001206, 0.09181887, 0.08995095, 0.10130986, 0.10827168,\n", " 0.10064992, 0.09771002, 0.08922346, 0.10078438, 0.10173442,\n", " 0.1052274 , 0.09743252, 0.09597317, 0.08932798, 0.10033609,\n", " 0.10346122, 0.10145004, 0.09017084, 0.10348697, 0.09335995,\n", " 0.09795824, 0.10166729, 0.09306547, 0.09538575, 0.10997592,\n", " 0.09352845, 0.09860336, 0.1059772 , 0.09583408, 0.09823145,\n", " 0.09995048, 0.10224689, 0.10065135, 0.10208938, 0.11257989,\n", " 0.09956423, 0.11515946, 0.09798322, 0.10092449, 0.10150098,\n", " 0.10275192, 0.09180693, 0.0990442 , 0.10016612, 0.10145948,\n", " 0.09848122, 0.10322931, 0.09913907, 0.08925477, 0.09950337,\n", " 0.10277594, 0.09249331, 0.0954106 , 0.1053263 , 0.09849884]])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lr.coef_" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y_pred = lr.predict(y_pred_df)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.53658536585365846, 0.89636363636363636)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using sklearn" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.56250000000000011, 0.89818181818181819)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred_all_ = np.zeros((X_test.shape[0], clf.n_estimators))\n", "X_train_3 = np.zeros((X_train.shape[0], clf.n_estimators))\n", "\n", "for i in range(clf.n_estimators):\n", "\n", " X_train_3[:, i] = clf.estimators_[i].predict(X_train)\n", " y_pred_all_[:, i] = clf.estimators_[i].predict(X_test)\n", " \n", "lr = LogisticRegressionCV()\n", "lr.fit(X_train_3, y_train)\n", "\n", "y_pred = lr.predict(y_pred_all_)\n", "metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "vs using only one dt" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.44510385756676557, 0.82999999999999996)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dt = DecisionTreeClassifier()\n", "dt.fit(X_train, y_train)\n", "y_pred = dt.predict(X_test)\n", "metrics.f1_score(y_pred, y_test), metrics.accuracy_score(y_pred, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4: Boosting\n", "\n", "While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data is reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight (some boosting algorithms actually decrease the weight of repeatedly misclassified examples, e.g., boost by majority and BrownBoost). Thus, future weak learners focus more on the examples that previous weak learners misclassified. (Wikipedia)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "![](images/OurMethodv81.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adaboost\n", "\n", "AdaBoost (adaptive boosting) is an ensemble learning algorithm that can be used for classification or regression. Although AdaBoost is more resistant to overfitting than many machine learning algorithms, it is often sensitive to noisy data and outliers.\n", "\n", "AdaBoost is called adaptive because it uses multiple iterations to generate a single composite strong learner. AdaBoost creates the strong learner (a classifier that is well-correlated to the true classifier) by iteratively adding weak learners (a classifier that is only slightly correlated to the true classifier). During each round of training, a new weak learner is added to the ensemble and a weighting vector is adjusted to focus on examples that were misclassified in previous rounds. The result is a classifier that has higher accuracy than the weak learners’ classifiers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Algorithm:\n", "\n", "* Initialize all weights ($w_i$) to 1 / n_samples\n", "* Train a classifier $h_t$ using weights\n", "* Estimate training error $e_t$\n", "* set $alpha_t = log\\left(\\frac{1-e_t}{e_t}\\right)$\n", "* Update weights \n", "$$w_i^{t+1} = w_i^{t}e^{\\left(\\alpha_t \\mathbf{I}\\left(y_i \\ne h_t(x_t)\\right)\\right)}$$\n", "* Repeat while $e_t<0.5$ and $t= 1).astype(np.int)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.51051051051051044, 0.85181818181818181)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics.f1_score(y_pred, y_test.values), metrics.accuracy_score(y_pred, y_test.values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using sklearn" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.ensemble import AdaBoostClassifier" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,\n", " learning_rate=1.0, n_estimators=50, random_state=None)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf = AdaBoostClassifier()\n", "clf" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.29107981220657275, 0.86272727272727268)" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf.fit(X_train, y_train)\n", "y_pred = clf.predict(X_test)\n", "metrics.f1_score(y_pred, y_test.values), metrics.accuracy_score(y_pred, y_test.values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gradient Boosting" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "GradientBoostingClassifier(init=None, learning_rate=0.1, loss='deviance',\n", " max_depth=3, max_features=None, max_leaf_nodes=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=100,\n", " presort='auto', random_state=None, subsample=1.0, verbose=0,\n", " warm_start=False)" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.ensemble import GradientBoostingClassifier\n", "\n", "clf = GradientBoostingClassifier()\n", "clf" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.52892561983471076, 0.89636363636363636)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf.fit(X_train, y_train)\n", "y_pred = clf.predict(X_test)\n", "metrics.f1_score(y_pred, y_test.values), metrics.accuracy_score(y_pred, y_test.values)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }