{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cross-validation for parameter tuning, model selection, and feature selection ([video #7](https://www.youtube.com/watch?v=6dbrR-WymjI&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=7))\n", "\n", "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n", "\n", "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Agenda\n", "\n", "- What is the drawback of using the **train/test split** procedure for model evaluation?\n", "- How does **K-fold cross-validation** overcome this limitation?\n", "- How can cross-validation be used for selecting **tuning parameters**, choosing between **models**, and selecting **features**?\n", "- What are some possible **improvements** to cross-validation?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Review of model evaluation procedures" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Motivation:** Need a way to choose between Machine Learning models\n", "\n", "- Goal is to estimate likely performance of a model on **out-of-sample data**\n", "\n", "**Initial idea:** Train and test on the same data\n", "\n", "- But, maximizing **training accuracy** rewards overly complex models which **overfit** the training data\n", "\n", "**Alternative idea:** Train/test split\n", "\n", "- Split the dataset into two pieces, so that the model can be trained and tested on **different data**\n", "- **Testing accuracy** is a better estimate than training accuracy of out-of-sample performance\n", "- But, it provides a **high variance** estimate since changing which observations happen to be in the testing set can significantly change testing accuracy" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# added empty cell so that the cell numbering matches the video" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_iris\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn import metrics" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# read in the iris data\n", "iris = load_iris()\n", "\n", "# create X (features) and y (response)\n", "X = iris.data\n", "y = iris.target" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9736842105263158\n" ] } ], "source": [ "# use train/test split with different random_state values\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=4)\n", "\n", "# check classification accuracy of KNN with K=5\n", "knn = KNeighborsClassifier(n_neighbors=5)\n", "knn.fit(X_train, y_train)\n", "y_pred = knn.predict(X_test)\n", "print(metrics.accuracy_score(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question:** What if we created a bunch of train/test splits, calculated the testing accuracy for each, and averaged the results together?\n", "\n", "**Answer:** That's the essense of cross-validation!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Steps for K-fold cross-validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Split the dataset into K **equal** partitions (or \"folds\").\n", "2. Use fold 1 as the **testing set** and the union of the other folds as the **training set**.\n", "3. Calculate **testing accuracy**.\n", "4. Repeat steps 2 and 3 K times, using a **different fold** as the testing set each time.\n", "5. Use the **average testing accuracy** as the estimate of out-of-sample accuracy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Diagram of **5-fold cross-validation:**\n", "\n", "![5-fold cross-validation](images/07_cross_validation_diagram.png)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# added empty cell so that the cell numbering matches the video" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# added empty cell so that the cell numbering matches the video" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration Training set observations Testing set observations\n", " 1 [ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [0 1 2 3 4] \n", " 2 [ 0 1 2 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [5 6 7 8 9] \n", " 3 [ 0 1 2 3 4 5 6 7 8 9 15 16 17 18 19 20 21 22 23 24] [10 11 12 13 14] \n", " 4 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 22 23 24] [15 16 17 18 19] \n", " 5 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24] \n" ] } ], "source": [ "# simulate splitting a dataset of 25 observations into 5 folds\n", "from sklearn.model_selection import KFold\n", "kf = KFold(n_splits=5, shuffle=False).split(range(25))\n", "\n", "# print the contents of each training and testing set\n", "print('{} {:^61} {}'.format('Iteration', 'Training set observations', 'Testing set observations'))\n", "for iteration, data in enumerate(kf, start=1):\n", " print('{:^9} {} {:^25}'.format(iteration, data[0], str(data[1])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Dataset contains **25 observations** (numbered 0 through 24)\n", "- 5-fold cross-validation, thus it runs for **5 iterations**\n", "- For each iteration, every observation is either in the training set or the testing set, **but not both**\n", "- Every observation is in the testing set **exactly once**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparing cross-validation to train/test split" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Advantages of **cross-validation:**\n", "\n", "- More accurate estimate of out-of-sample accuracy\n", "- More \"efficient\" use of data (every observation is used for both training and testing)\n", "\n", "Advantages of **train/test split:**\n", "\n", "- Runs K times faster than K-fold cross-validation\n", "- Simpler to examine the detailed results of the testing process" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross-validation recommendations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. K can be any number, but **K=10** is generally recommended\n", "2. For classification problems, **stratified sampling** is recommended for creating the folds\n", " - Each response class should be represented with equal proportions in each of the K folds\n", " - scikit-learn's `cross_val_score` function does this by default" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross-validation example: parameter tuning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Goal:** Select the best tuning parameters (aka \"hyperparameters\") for KNN on the iris dataset" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import cross_val_score" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 0.93333333 1. 1. 0.86666667 0.93333333\n", " 0.93333333 1. 1. 1. ]\n" ] } ], "source": [ "# 10-fold cross-validation with K=5 for KNN (the n_neighbors parameter)\n", "knn = KNeighborsClassifier(n_neighbors=5)\n", "scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')\n", "print(scores)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9666666666666668\n" ] } ], "source": [ "# use average accuracy as an estimate of out-of-sample accuracy\n", "print(scores.mean())" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.96, 0.9533333333333334, 0.9666666666666666, 0.9666666666666666, 0.9666666666666668, 0.9666666666666668, 0.9666666666666668, 0.9666666666666668, 0.9733333333333334, 0.9666666666666668, 0.9666666666666668, 0.9733333333333334, 0.9800000000000001, 0.9733333333333334, 0.9733333333333334, 0.9733333333333334, 0.9733333333333334, 0.9800000000000001, 0.9733333333333334, 0.9800000000000001, 0.9666666666666666, 0.9666666666666666, 0.9733333333333334, 0.96, 0.9666666666666666, 0.96, 0.9666666666666666, 0.9533333333333334, 0.9533333333333334, 0.9533333333333334]\n" ] } ], "source": [ "# search for an optimal value of K for KNN\n", "k_range = list(range(1, 31))\n", "k_scores = []\n", "for k in k_range:\n", " knn = KNeighborsClassifier(n_neighbors=k)\n", " scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')\n", " k_scores.append(scores.mean())\n", "print(k_scores)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Cross-Validated Accuracy')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEGCAYAAABy53LJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAA/Z0lEQVR4nO3deXhb93Xg/e8huIAiCUALRVAibMm2vMiWSCeqm6Rp1mZP68Sdtnbaxs00TTwTZ5tpO560nSTt276ZNkszTRq/Tuu8dieJmybxxO14stRt4i5pbDkGLNmWbVm2BUoktQLgvuHMH/deCoIA8AIESAI4n+fhQ/Li3osfRBGHv+0cUVWMMcYYv1rWugHGGGPqiwUOY4wxZbHAYYwxpiwWOIwxxpTFAocxxpiytK51A1bDli1bdMeOHWvdDGOMqSuPPPLIKVXtzT/eFIFjx44d7N+/f62bYYwxdUVEXih03IaqjDHGlMUChzHGmLJY4DDGGFMWCxzGGGPKYoHDGGNMWWoaOETkjSLylIgcFpHbCjy+UUTuFZHHROQhEbkm57EPi8jjInJQRL4qIkH3+CYR+Z6IPON+3ljL12CMMeZ8NQscIhIAPg+8CdgN3CQiu/NO+wgQV9W9wDuBz7rXbgc+AOxT1WuAAHCje81twAOqugt4wP3eGGPMKqllj+M64LCqHlHVOeAe4Pq8c3bjvPmjqoeAHSLS5z7WCnSKSCuwATjuHr8euMv9+i7gbTV7BaYpPT02zr8cPrXWzaiabFb564ePMjW3UNV7fu3hJNNzi1W7p6kftQwc24FkzvfD7rFcCeAGABG5DrgYGFDVY8AngaPACJBW1e+61/Sp6giA+3lroScXkfeIyH4R2X/y5MkqvSTTDP7o/id531d+TKPUqnnk6Fn+yzcO8M0fH6vaPR96/gy//Y3H+O4To1W7p6kftQwcUuBY/m/iJ4CNIhIH3g88Ciy48xbXAzuBbUCXiPxKOU+uqneo6j5V3dfbe8GOeWMKUlUSyRSpqXmePz211s2pivjRlPM5marePd17HU/NVO2epn7UMnAMA7Gc7wc4N9wEgKpmVPVdqjqEM8fRCzwH/AzwnKqeVNV54JvAy9zLxkSkH8D9fKKGr8E0maNnpjg7NQ9AoopvtGspPpwCqvt6vHuNZSxwNKNaBo6HgV0islNE2nEmt+/LPUFEIu5jAO8GHlTVDM4Q1UtEZIOICPBa4En3vPuAm92vbwa+VcPXYJpM7l/l1fwLfS15b/KHT04wPjNf1XuOpKercj9TX2oWOFR1AbgV+A7Om/7XVPVxEblFRG5xT7sKeFxEDuGsvvqge+2PgK8DPwYOuO28w73mE8DrROQZ4HXu98ZURTyZItjWwosv3tgQgePUxCzDZ6f56V1bUIUDw+kV3/NEZobjaaenMZqZXfH9TP2paXZcVb0fuD/v2O05X/8Q2FXk2o8CHy1w/DROD8SYqkskU+zZHuZFF23kS//yPHMLWdpb63efrNcz+NWXXMw/PXOK+HCKl122ZUX39ALqJVu6GEvbUFUzqt/fCGOqbH4xy8HjGQYHIgzGIswtZjk0mlnrZq1IIpmiReDlu7awc0tXVeY5EsMpAi3Ca67cysmJWRazjbH6zPhngcMY16GRceYWsgzGnMAB9T/P8WgyxeV9PWxob2VwIFyV1xNPprgy2sOOLV0sZpVTEzZc1WwscBjj8lYfDcUibAsH6e3pqOvA4S0tvvaiCOC8rrHMLKMrGF7KZpXHkmmGYhGioSAAIzZc1XQscBjjih9NsbmrnYGNnYgIgwORug4cz52aJDOzwOBABCCnF3W24nseOTXB+OwCg7EI0bATOFYSiEx9ssBhjCsxnGIoFsFZAQ7XXhThyMlJ0tPVWcK62hJeD8rtcezeFqItIMSTla+s8q4dikXoc3sctpej+VjgMAbIzMzz7MmJpb/KgaW/1KuxhHUtJJJpNrQH2LW1B4CO1gC7+0MrmiBPJFN0d7RyaW83m7vaaQsIoxY4mo4FDmNwgoMq5wWOPQNhYGVDO2vp0WSKa7aHCbScy/4zGIvw2HCq4pVQcXe5cqBFaGkRtvYEbUluE7LAYQznVk8NusECINzZxqW9XSsa2lkrswuLPHk8w7U5gRCcIabJuUWePTlR9j1n5hd5ciRzXnCNhoPW42hCFjiMwRmC2bmli8iG9vOOD8acCfJ6y5T75Mg4c4vZ897kgRUtM35iJMNCVhnKDRyhoE2ONyELHKbpqSrxZOq83oZnKBbh1MTsUoqNeuHNY+QHjp2bu+gJtlYUOLwsu7mBoy/k9DjqLbCalbHAYZreaGaGE+Oz570herxj9ZYpN5FM0dvTwTZ3yaynpUUYikUqej2J4RTRUHBpGS5ANNzB1Nwi47PVKxJl1j8LHKbpFfvrHODKaIj21pa628/h9KDOLS3ONTgQ4dDoODPz5VXvSyRTDMbO75UtLcmtsx6ZWRkLHKbpPZpM0RYQdm8LXfBYe2sLV28L1VXgSE/Nc+TU5NKO8XxDsQiLWeXgMf+T/mcn53j+9NQFwbU/3AlgE+RNxgKHaXqJZIrd/SE6WgMFHx8ciHBgOM3CYnaVW1aZx46lgHP7UPLtjXnLjFO+75nISceSy9KONCcLHKapLWaVA8PpgsNUnqFYhOn5RZ45Uf4S1rXgTWLvKTDZD7C1J8j2SGdZgSOeTCECe7aff8+toQ7AhqqajQUO09QOn5hgcm6x4MS4p94myBPDKS7t7SLc2Vb0nKFYZKkX4eueyRSX9XbTEzz/nsG2ABs3tNlQVZOxwGGaWqmJcc/FmzcQ7myri3mOpaXFJV4PwGAsTPLMNKd9pERXVRLD6aLBtS8UtHxVTcYCh2lqjyZT9ARb2bm5q+g5IrK0EXC9O5aa5tTEXMkeFJyb//DT60iemebM5FzRYGS7x5uPBQ7T1BJJJyNuS8uFy1ZzDcUiPD02ztTc+t6vkMjJXlvKnoEwLYKvdCrxIhPjnv6w7R5vNhY4TNOanlvkqbHxoquPcg3FwmR1/WfKjSfP0t7awpXRC5cW59rQ3srlfT2+5m0SyRQdrS1cEe0p+HhfKMipiTnmFupj1ZlZOQscpmkdPJ5mMS/3UjHlDO2spUQyzdXbnE2Ly7n2ImeCfLl0IXE3y25boPA9vSW5J8at19EsLHCYpuX9tb03VnjZaq7N3R3ENnUuDQWtRwuLWQ4cS/vqQYETDFNT87xweqroOfOLWQ4eKz4xDtAXtoJOzcYCh2lajyZTbI90srUnuPzJsO5LyT49NsH0fOmlxbm8ye5SvainRseZXbgwy24ur8cxml5+hZZpDBY4TNPyJsb9GopFOJaaXrdDMsV2dxdzeV8PG9oDPOpuGCzEC5T5dT1y9Ye93ePTvp7X1D8LHKYpnZqYZfjs9AVJ+0o5txFwfQ5XxY+mCHe2cfHmDb7OD7QI12wPl+xxJJIpNnW1M7Cxs+g54c42OlpbbKiqiVjgME1paeOfz/kAgKu3OSVT1+sO8sSws/GvUEbcYoZiER4/nim6IsqrU1LqniLi7uWwoapmYYHDNKVEMkWLFM/nVEhne4Aroz3rcmXV5OwCT4+NlzX0Bk7gmFvIcmg0c8Fj4zPzHD45wVBs47L36QtZ7fFmYoHDNKX4cNod428t6zpvB3k2u74q3h04liarzn6TcgyWyMN14FgaVXwN50VDtnu8mdQ0cIjIG0XkKRE5LCK3FXh8o4jcKyKPichDInKNe/wKEYnnfGRE5EPuYx8TkWM5j725lq/BNB5VJZFMFa1XUcpQLML4zALPnZ6sfsNWoJKhN4Bt4SC9PR08WiBwxMu4p5d2xErINoeaBQ4RCQCfB94E7AZuEpHdead9BIir6l7gncBnAVT1KVUdUtUh4MXAFHBvznWf8R5X1ftr9RpMY3r+9BTp6fmy32Rh/WbKTQyniG3qZHN3R1nXiQiDA4VLySaSKXZs3sDGrvZl7xMNBZlbyHJ2ar6s5zf1qZY9juuAw6p6RFXngHuA6/PO2Q08AKCqh4AdItKXd85rgWdV9YUattU0ET8ZcYu5tLebrvbAutvPET+aqigQgjO89ezJSTIz57/pJ5Kl65Tk8uqQW86q5lDLwLEdSOZ8P+wey5UAbgAQkeuAi4GBvHNuBL6ad+xWd3jrThEpOHMnIu8Rkf0isv/kyZOVvgbTgOLJFBvaA1zeVzj3UimBFmFvkb/Q18qJzAzH0zNlT4x7vMnvx3KWGY+mZxjN+L/nUu1xm+doCssGDhH5pIhcXcG9C63fyx8A/QSwUUTiwPuBR4Gl9KMi0g78HPA3Odd8AbgUGAJGgE8VenJVvUNV96nqvt7e3gqabxqVl3spsExG3GIGYxGeGMkwM79Y5ZZVxuv9VBo4vJVluavF4mX2ypZ6HBY4moKfHsch4A4R+ZGI3CIifpdtDAOxnO8HgOO5J6hqRlXf5c5lvBPoBZ7LOeVNwI9VdSznmjFVXVTVLPBFnCExY3yZXVjkieOZit9kwRnamV9Unhy5cAnrWkgMpwi0CFdvK29FlSfc2cYlvV3n7SCPJ1O0BYTd/aWz7Hq29nQgYkNVzWLZwKGqf6GqP4Xzxr4DeExEviIir17m0oeBXSKy0+053Ajcl3uCiETcxwDeDTyoqrm/jTeRN0wlIv05374dOLjcazDGc2hknLnF7AoDhzO0s16GqxLJNFdGe+hsD1R8jyF3mbG3KiqRTHFVf4hgm797tgVa2NLdYYGjSfia43BXSF3pfpzCmZv4TyJyT7FrVHUBuBX4DvAk8DVVfdzttdzinnYV8LiIHMLpXXww5zk3AK8Dvpl36z8WkQMi8hjwauDDfl6DMXBuOKaSiXFPNBykL9SxLibIs1lnafFKXg84gePUxCwj6RkWs1pWll2P7eVoHsvufhKRT+PMMzwA/JGqPuQ+9N9F5KlS17pLZe/PO3Z7ztc/BHYVuXYK2Fzg+K8u12ZjiokfTdHb08G2sL+MuMUMxSIk1kFRpyOnJhmfXVhRDwrOzY/Ekyku29rNxOxC2cGoLxRk+GzxFO2mcfjpcRwE9qrqe3OChsfmF0xdiQ87y1bLyedUyGAswnOnJklNzVWpZZVJrHBi3HNlNER7oIVEMlXxZHs03GE9jibhJ3CcBdq8b9x5ibcBqOra/8lljE/p6XmOnJwsOy1HIUNLFQHX9lcgnkzR1R7g0t7uFd2nvbWF3dtCxJMpEskUPcFWLtnSVdY9oqEgqan5dbPazNSOn8Dx0dwAoaop4KM1a5ExNfLYUr2K5ZP2LWfPQBiRtZ8gTwyn2DsQqXhpca6hWIQDx9I88sJZBgcitJR5z76QbQJsFn4CR6FzyssMZ8w64L3Jl5MRt5ieYBuX9Xav6QT5zPwiT45kVjwx7hmKRZiaW+TQ6HhZdUo8/WGnZocNVzU+P4Fjv4h8WkQuFZFLROQzwCO1bpgx1RZPprikt4twZ9vyJ/swGHN2kK9VYr8nRjLML2pVht7g/JVmlaQviYadPFm2e7zx+ek5vB/4PeCvcXaDfxd4Xy0bZRqTqqJK2UMg1XrueDLNKy7fUrV7DsUifP2RYQ4eyyztnF5NP3z2tNuOlQ+9AezYvIFwZxvp6fmKJttXc6gqm1VEWPEiB1OZZQOHqk4CF6REN6Zc//3bT/HDZ0/xrVtfvurPfTw9w6mJ2YoTARbipWX/2c/9c9XuWa5oKFi1oCUiXHtRhGfGJtgaKv+ePcE2utoDqzJU9bvfOshIapovvcsWdq4FP/s4eoHfBq4Glv43qepratgu04C+/9QJDo2Oc2Zyjk0+UnVXU9xNp7HSZau5dveH+Nw7rl3TVOLXbPOXEsSvP7j+GsZnFpY/sYi+cHBVhqq+f+gEc4tW+2Ot+Bmq+jLOMNVbgVuAmwFLN2vKMjXnlDYFZyXQq6/YuqrPnxhO0R5o4SqfuZf8EBHeundb1e63HsQ2bVjR9f3hICM1HqrysgGLwPxilraAFTJdbX7+xTer6l8C86r6A1X998BLatwu02AOHsvgVVtdiyWs8WSK3dtCtLfam0wtrUbtcW/vjCqcGJ+t6XOZwvz8Fnn98BEReYuIXMuFNTOMKSmePAs4Y/KrvYR1YTHLgeF0VYepTGHRUJAT47M1rcnu/V8C2zOyVvwEjv/HTaX+n4HfBP4CSyxoypRIpolt6uQVl29Z9SWsz5yYYHp+0QLHKoiGgyxklVOTtesJJJJpNriZgC1wrI2SgcPNirtLVdOqelBVX62qL1bV+0pdZ0y+eNLJETUYi3B2ap6jZ1YvGd5KSsWa8ixVAkzXJnB42YC9OTLbbLg2SgYOVV3EyYxrTMVOjM9wLDXNUCxyXhbW1RJPpgh3trFj88omfs3yom7gGElP1+T+XjbgV17RS3tri202XCN+hqr+VUQ+JyI/LSIv8j5q3jLTMLxa1kOxCJf39RBsayGRXL3kgHG3XoVtFqu9/nBta497vcdrYxGn/ocNVa0JP8txX+Z+/v2cYwrYPg7jSzx5rrRpW6CFa7aFz5vgrKXJWWcZ8Ot3963K8zW7zd0dBFqkZkNI8WSK7o5WLuntJhq2wlFrxc/O8eVKxBpTUmI4dV5p06FYhLv/7YVVWYN/8FiarMKQu8vb1FagRdja08FojeY4nGzAYQItsiYr9IzDz87x/1bouKr+fqHjxuTyJjPfOnhuo9xgLMLcPz/HU6PjXLO9Ogn6ilkqFVvFVCOmtL5QbXaPe9mA3/3TlwDOCq7Rx2dQVRuGXGV+/tybzPlYxKkNvqOGbTIN5LnTk2Rmzi9t6n396Cr8tRhPpoht6mRzd0fNn8s4oqFgTSbHvWzA3h8BfaEgcwtZUmuY8qVZLRs4VPVTOR9/CLwK2F7zlpmGUKi06cDGTjZ3ta/KDvJEMm29jVUWDQcZy1R/qGppYtwddvRWcNk8x+qrZIB5A3BJtRtiGlOh0qYiwmAsUvPx6dxlwGb1RMNBJmYXmJitPFliIfFkimgouLRXxKv/YYFj9S0bOETkgIg85n48DjwFfLb2TTONIJEsXNp0KBbh2ZMTZGZqN8yQyFkGbFZPtEZ1ORLJ1HmVCaNexUFbkrvq/PQ43gr8rPvxemCbqn6upq0yDWF2YZEnipQ2HYxFUIWDw7Xbz5HIWQZsVs/S7vEq9gRSU3M8f3rqvKJVW3s6ELHAsRb8BI5+4IyqvqCqx4CgiPxkjdtlGsATx4uXNh10637XcoI8MZziir5zy4DN6vAKS1UzvXp8KW3Muf9LbYEWNnd12O7xNeAncHwBmMj5fso9ZkxJ5ybGLyxtGtnQzs4tXTWbIM9mlXgyZfs31kC0Bj2ORDKNCOzJW74dDXfYHMca8BM4RHNSmapqFn87zk2TiydT9IU6ipY2HRwIE69RptznTk8yPrPAkK2oWnWd7QHCnW1VHUKKJ8+ya2s3PcG2845b2pG14SdwHBGRD4hIm/vxQeBIrRtm6l9imRoYQ7EIJ8Zna/IX41KpWOtxrIloqHrpQFSVxHDhZdWWdmRt+Akct+DkqzoGDAM/Cbynlo0y9S81NcdzpyZLpjL3HqvFcFVi+MJlwGb1VLP2+PDZac5MzhX8vxQNBUlNzTMzv1iV5zL++NkAeEJVb1TVrarap6rvUNUTfm4uIm8UkadE5LCI3Fbg8Y0icq+71PchEbnGPX6FiMRzPjIi8iH3sU0i8j0Recb9fOEAullzXnnPUkNFV/WHaAtITSbIE8kUe9ycRmb1RUMdVRtCerTAJlJPLVZwmeX52cdxl4hEcr7fKCJ3+rguAHweJ0XJbuAmEdmdd9pHgLiq7gXeibs/RFWfUtUhVR0CXowzIX+ve81twAOqugt4wP3erDOJZMqZzBwovhQ22BZgd3+o6j2OmXlnGXChSXmzOqKhICcnZplfzK74Xolkio7WFq6I9lz4POHa7BkxpfkZqtqrqinvG1U9C1zr47rrgMOqekRV54B7gOvzztmN8+aPqh4CdohIfv7r1wLPquoL7vfXA3e5X98FvM1HW8wqSyRTXNZ74WRmvsFYhAPDaRarWKP6yZHiy4DN6oiGO1GFk+MrTz2SSKa4Znu4YCZlSzuyNvwEjpbc4SAR2YS/VVXbgWTO98NcmOMqAdzg3vc64GJgIO+cG4Gv5nzfp6ojAO7nrYWeXETeIyL7RWT/yZMnfTTXVIuqLhVPWs7gQITJuUUOn5hY9ly/rFTs2qtWOpD5xSwHjhVfZNFnPY414SdwfAqnCuAfiMgfAP8K/ImP6woNLuf/WfkJYKOIxIH3A48CSwluRKQdp3Tt3/h4vvOfSPUOVd2nqvt6e3vLvdyswPDZaU5PzvlK9eGteqrmcJW3DLjfTUlhVt+52uMre0N/anSc2YVs0T8Cejpa6WoPWI9jlfmZHL8b+HlgDDgB3OAeW84wEMv5fgA4nnfvjKq+y53LeCfQCzyXc8qbgB+r6ljOsTER6QdwP/uaqDerJ15iMjPfzs1d9ARbqzpBXmzpplk91RpC8v4vXVvk/5KIVHUFl/HHV3ZcVX3CzU91P3CDiBz0cdnDwC4R2en2HG4E7ss9QUQi7mMA7wYeVNVMzik3cf4wFe49bna/vhn4lp/XYFZPqcnMfC0twlAsUrUeh7cM2PZvrK1NXe20B1pWPISUSKbY1NXOwMbivUfbBLj6/Kyq6heRD4nIQ8DjQADnDb0kVV0AbgW+AzwJfE1VHxeRW0TkFve0q4DHReQQTu/igznPuwF4HfDNvFt/AnidiDzjPv6J5dpiVldiuPhkZiGDAxGeGhtnem7la/H9LAM2tScibA2tPB1IYjjF4EC4ZIW/aKg29T9McUUnuUXkN3ACxADwNZwewbdU9eN+b66q9+P0UnKP3Z7z9Q+BXUWunQI2Fzh+GmellVmHvMnMd1x3se9rBmMRFrPKweNpfmLHphU9v59lwGZ19IdX1hMYn5nnmRMTvGXPtpLneUNV2azSYvt2VkWpPwk/j9O7eIeq/q6qPsaFk9vGnOfpsXFm5rNlDRV5GU+rMVzldxmwqb2V1h4/cCyN6vkZcQvpDwdZyCqnJq3XsVpKBY5tOHsvPu3u/v4DwH4bTUlLE+NlDBVt7QmyPdK54gnycpYBm9rz8lVVmsRyKZX6Mv+Xzq3gssCxWooGDlU9papfUNVX4AwNpYETIvKkiPzRqrXQ1BVvMjO2qbylsIOx8Ip7HN4yYAsc60M0HGRmPkt6urIqj4lkih2bN7Cxq73kebYJcPX5XVU1rKqfVNUX4+zUttBuCkok08tOZhYyFIswfHaaUxOV/9dabummWV19K3xDTyTTvv4IWEo7YoFj1fhb9pLDzSPle4LcNI+J2QWePjFe0V/83nDESnod5SwDNrXXv4Jd3aPpGUYzM77242zp7iDQIivebGj8KztwGFPMgWFnMtPPxr98ewbCtMgKA0eZy4BNba0kc+3SXJmPRRaBFqG3u6OqpWpNafYbZqomMZwClp/MLGRDeyuX9/UQd/dhlMtbBmw7xtePpaGqCiatE8MpWluE3f0hX+dHbff4qiq1j+NFpS5U1R9XvzmmnsWPprjYx2RmMUOxCP/n4CiqWvYcibcMeLmlm2b1tLe2sLmrvaK5h/jRFFf1hwi2BXydHw0FOXyyeokyTWmlstx+yv0cBPbhZLIVYC/wI+DltW2aqTeJ4RTX7ax8A99QLMI9Dyd5/vQUO7d0lffcSaencq3V4FhX+kJBRtPTZV2zmFUOHEvz9mvzk2kXFw0H+ZfDp8ptnqlQqeW4r1bVVwMvAC9yM82+GKcWx+HVaqCpD2OZGUbS/iYzi/Em1ePJs2VfG0+eZeOGtrKXAZva6g8HGS0zHcizJyeYmF0oa5FFXyjI+OwCE7MLy59sVszPHMeVqnrA+0ZVDwJDNWuRqUvxKtTA2LW1m862wFLvoRze0s1yh7hMbVWSubac7MqepfofNkG+KvwEjidF5C9E5FUi8koR+SJO0kJjliSSzmTm1dv8TWYW0hpoYc9AeOmNwy9vGXAlq7lMbUVDQc5MzjG74D+BZSKZoqejlUvKGK6Mhpyepk2Qrw4/geNdOFlxPwh8CHjCPWbMkniyvMnMYoZiEZ44ninrjcZbBmw7xtcfb1f3iTKGq+LJFHtj4bISFlrt8dXlp5DTDHA7cJuqvl1VP+MeMwaAbFZ5bLh4ec9yDMUizC1mOTQy7vsabxmwpVJff7zSrn73WMzML3JotPzeo6UdWV1+6nH8HBAHvu1+PyQi95W8yDSVSiYzizk3QZ7yfc1KlwGb2in3Df3gsTSLWS17kUVne4BQsNWGqlaJn6GqjwLXASkAVY0DO2rWIlN3zk1mrnwPxbZwkC3dHWXtIHeK/URW/Nym+rwhJL/pQCqZGM99Lts9vjr8BI4FVa1sO69pColhbzKze8X3EnFKycbd4afleMuAbWJ8fQoFW+lsC/jucSSG02wLB9nq9lTKsdL6H8Y/P4HjoIi8AwiIyC4R+TPgX2vcLlNHKpnMLGUoFubIyUnSU8un467GMmBTOyJCNBz0HTjiybMV/yxXWnHQ+Fdq57jn/cDv4KRS/wpODfE/qGWjGtm/HTnN3T98ngpr26xLh0bGec8rLqna/bw3jv/w5UcId5auHfbcqckVLwM2tdUX6uDfnj3Nf/ifj5Q8TxWSZ6b55Z/0X3Y4VzQU5NTELAuLWVorSHT590+MkZ6e5+dfPFDR8xfyvSfGmJpb4Poh/7vg64GfwPEWVf0dnOABgIj8AvA3NWtVA7vnoaP8/RMn2LFlw1o3pWou7+vhzXv6q3a/F1+8kZdcsolTE7O+6nO84ycvWvEyYFM7b927jbt/+DzP+sgltWd7mNft7qvoefrCQbIKJydm6Q+Xn0HgM3//NGOZGW540faqbST99PeeZnZ+sSkDx3/lwiBR6JjxYSQ9w2AszN/c8rK1bsq6taG9lXve89K1boapkl95ycX8yksq60WUw1vBNZKeKTtweMuAF7PKsdQ0AxtX/ofd1NwCT4+N09HaUlHizvWsVHbcNwFvBraLyP/IeSgEWEKYCo1lZthjK4CMqbpztcfLn+fwlgGDM29WjcBx8FiGxawyNbfI+OwCoWDpYdd6Umog8DiwH5gBHsn5uA94Q+2b1nhUldHMDNFQx1o3xZiG07+CErLeIovWFllRMbFcufdptOqERXscqpoAEiLyFVWtrNq8OU9meoGZ+ezSX0bGmOrZ1NVOe6ClosDhLQPuCwcrSrJZSO6S8tHMDLv6GqeksZ+lBztE5Osi8oSIHPE+at6yBuT9h/Y2RRljqkdE2BrqqOiv+3jyLEMXRRiKRThwLM3CYnbF7YkfTbFnu7MpttGWCfsJHF8CvoAzr/Fq4G7gr2rZqEY14ha0iVqPw5iaiIbK3z1+emKW5JlpBgecwDE9v8jTYyurJnhyfJZjqWnecLWzQqwZA0enqj4AiKq+oKofA15T22Y1Jm9Xqw1VGVMbldT/eMytcz8Yiyylrik3tX8+b37jJy/ZTGRDW8MlX/QTOGZEpAV4RkRuFZG3A1tr3K6GNJp29iRY4DCmNqIhZ5e6lrHD9tFkihZx9pBcvHkDkQ1tK54gTwynCLQI12wLE23AVCh+AseHgA3AB4AXA78K3Ozn5iLyRhF5SkQOi8htBR7fKCL3ishjIvKQiFyT81jEnVs5JCJPishL3eMfE5FjIhJ3P97spy3rwWhmhi3d7bS3lr+r1RizvP5wkJn5LJlp/zsGEskUl/f10NXRiogwOBBZStVfqXgyxRV9PXS2B5y6680WOFT1YVWdUNVhVX2Xqt6gqv+23HUiEgA+D7wJ2A3cJCK78077CBBX1b3AO4HP5jz2WeDbqnolMMj5VQc/o6pD7sf9y7VlvRjLzFhvw5ga6iszjbuqkhhOnZckcygW4emxcSYrrF+ezSqJZGopdY6TQ6u8uuvrXakNgH8LFO3vqerPLXPv64DDqnrEvd89wPU4FQQ9u4H/173fIRHZISJ9wDTwCuDX3MfmgLnlXsx6N5KeYZutqDKmZqI5ezmuiC6//PWF01OkpubPS6w4FIuQVThwLM1LLtlcdhuePz1JZmZhqcxAXyjI6clZ5hayDTPaUOpVfBL4FPAczhv5F92PCeCgj3tvB5I53w+7x3IlgBsAROQ64GJgALgEOAl8SUQedWue5xYgvtUd3rpTRDYWenIReY+I7BeR/SdPnvTR3Noby8wsVUQzxlTfUuEodwXjcrwhqdx6LnsHnDf8Suc5lipSxpy3pmg4iCqcGG+c4aqigUNVf6CqPwCuVdVfUtW/dT/eAbzcx70LJWbJ78F8AtgoInGcLLyP4iz7bQVeBHxBVa8FJgFvjuQLwKXAEDCCE9wKtf8OVd2nqvt6e3t9NLe2ZhcWOTM5Z0txjamhrW5WBr9DQ48eTdHZFuDyvnO1ZDZ3d3DRpg0Vr6yKH03R1R7gsq3OPb3f+UaaIPeT5LBXRC7JGXLaCfh5Jx4GYjnfD+CkMVmiqhngXe59Bad38xzOZPywqv7IPfXruIFDVce860Xki8Df+WjLmjuRcf4j2+Y/Y2qnozXA5q72MgpHOZv08tOwD8YiPPL8mYraEB9Os2cgTMCtT7M079JA8xx+Btw+DHxfRL4vIt8H/hFnpdVyHgZ2ichOEWkHbsTJc7XEXTnlFYp+N/CgqmZUdRRIisgV7mOvxZ0bEZHc/N1vx9+w2Zpb2jVuPQ5jaspvJcC5hSyPH88wWKDk8eBAmOPpGU6U2UuYXVjkyeOZ8+ZMVpJDa71atsehqt8WkV3Ale6hQ6q6bOhU1QURuRWn8FMAuFNVHxeRW9zHbweuAu4WkUWcwPDrObd4P/BlN7Acwe2ZAH8sIkM4w17PA+9d9lWuA97OUetxGFNbUZ+VAA+NZphbyC7NReS69qII4Cyrff3VUd/P/eTIOHOLWYZy5kwiG9pob21pjqEqEXmNqv6DiNyQ99ClIoKqfnO5m7tLZe/PO3Z7ztc/BHYVuTYO7Ctw/FeXe971yPuPbMtxjamtvlDQ1/xEYqns8IU9jqu3OUNNieHyAod3zyE38IBbPreCVCjrWakexyuBfwB+tsBjCiwbOMw5o5kZOtsChIJ+ppWMMZWKhoKcmZxjdmGRjtbilSHjyTRbutvZHrmw6FOwLcCV0Z6yJ8jjyRRbezouGJKOhoINlVq9VFr1j7qf31XsHOPfaGaG/nCwoaqAGbMeRcPOyqoTmVlim4oXZIonzzIUixT9nRyKRbgvfpxsVmlp8fd76238y7+nk6495e8F1IFSQ1X/qdSFqvrp6jencY2lbde4Mash6paNHc3MFA0cmZl5nj05ydtK1AIfjEX48o+OcuTU5NLS2lLSU/McOTXJz7944ILH+sNBvvP4TMOUkC21qqpnmQ9ThtHMjE2MG7MKzm0CLD40dMDNiJs7F5HvWndllN/hqnMb/y68Z18oyNxCltRUY9TEKzVU9fHVbEgjy2bV8lQZs0r8BA4vGOzdHil6ziW93XR3tJJIpvh3BXoR+RLJFCKwZ+DCyfZoTg6tjV3tFzxeb5adqRWRIM4y2auBpXc+Vf33NWxXQzkzNcf8olqtcWNWQaizlWBb6RKy8WSKS7Z0Ed7QVvScQIuwZ3vYd6bcxHCKS3u7CQUvvKc37zKanuGq/pCv+61nfjYA/hUQBd4A/ABnB/h4LRvVaGwPhzGrx1v+WixwqCrxZKrgkFK+oYsiPDmSYWZ+seR53j1zc17lKjdr73rnJ3Bcpqq/B0yq6l3AW4A9tW1WYxlbqjV+4bI/Y0z1RcPFl7+OpGc4OT573u7uYgYHIswvKk+MZEqedyw1zamJuaWMuPm29iw/fFZP/AQObzYn5RZaCgM7ataiBmTpRoxZXaV6HOc2/kWWvY/XK4kfTZU8z5szKbQLHaC9tYUt3R0Ns3vcz260O9zU5b+Hk2uq2/3a+DSWnqFFYEt3/U+KGVMPvNrjhfZgxJMp2gMtXNW//OLQaDhINBRcdp4jkUzR3tpSsgZINNzRMENVpfZxPAF8GbhHVc/izG9csloNayQj6Rl6ezouyMBpjKmNaCjI/KJyZmqOLd3nL0qJJ1NctS1Ucld5rsFYeNnNe4lkmmu2hUoWaoqGggyf9VcnZL0r9U52E07v4rsi8iMR+VBeZlrj02hmxoapjFlFxZbkLmaVA8fSS3s0/BiKbeT501OcnSxchHRhMcuBY+llh74aqfZ4qUJOCVX9r6p6KfBBnOp8PxKRfxCR31i1FjaAMdv8Z8yq8ipt5s8pPHNinKm5xYKJDYvxzi02XPX02ATT84vLrtKKhoKkpuaXXaFVD3yNnajqv6nqh4F3AhuBz9W0VQ1mNG09DmNWU7EaGEsT40WWzRayZ3sYEWc4qpBSO8ZzRYsEs3q0bOAQkZ8QkU+LyAvAx4E7uLB2uCliam6BzMyC1Ro3ZhX1dnfQIhcOVcWTaULBVnZu6fJ9r55gG7u2dhNPni34ePxoisiGNi4qkVARzgWORliSW2py/I+AXwLOAvcAP6Wqw6vVsEaxtPnPehzGrJrWgLP89cLAUTh77XIGByI8cOhEwSSFiWFn499y94w20CbAUj2OWeBNqrpPVT+pqsMi8tbValijsD0cxqyNaPj8yeipuQWeHhv3tWM832AswpnJOZJnzl8VNTnr/559DdTjKDU5/nFVfTrv8O/XuD0NxxvPtKEqY1ZXfu3xx49nWMxqRYFjaSNg3gT5gWNpsrr8/AZAT0crG9oDDd/jKKT+E8mvstG0U57dehzGrK7+vNrj3u5vPzvG810R7aGjteWC/Rze93sLZMTNJyJOKpQmDBzvrUkrGthYZoaeYCtdHVYy1pjV1BcKkplZYGpuAXB6CwMbOy/YEOhHW6CFPdvDF9TmiCdTXLRpA5t93jMaCjb2UJVHRH5BRLx99G8QkW+KyItq3K6GYUtxjVkb+ZsAvbKulRqMRTh4LM38YnbpWLn3jIaCjGVmK27DeuGnx/F7qjouIi8HXgfcBXyhts1qHCO2+c+YNRHN2ctxamKW4bPTDJWxfyPfYCzC7EKWp0adqhInMjMcT88w6GOYypObQ6ue+Qkc3jbHtwC3q+q3AMvW55PVGjdmbXi/d2OZmaW5iFKlYpeTX0rW+3xtGfeMhoIsZJVTk/Xd6/ATOI6JyP8H/CJwv4h0+Lyu6S1mlZMTszZUZcwaOLfhbpZ4MkWgRbh6W+XV9wY2drKpq30pCCWGU7S2CFdv89/jWNo9nm78wPGLwHeAN6pqCtgE/FYtG9UoTk3MsphVG6oyZg10d7TS09HKWGaGeDLF5X09bGivfJGKiDA4ED6vx3Flfw/BNn9ZdqFxNgH6CRz9wP9W1WdE5FXALwAP1bJRjcJ2jRuztvrCQY6npkn4LBW7nKHYRg6fnCA9Pc9jyXRZOa/g/HmXeuYncHwDWBSRy4C/BHYCX6lpqxrEiNUaN2ZNRUNB9r9wlszMQtGyruUYjIVRhfsSxxmfXSh7ldaW7g4CLVK0rG298BM4sqq6ANwA/KmbJdfqcviwtGvcehzGrIm+UJAzbh2NYmVdy+H1Wu7+1+cByqrrARBoEXq7O5b+qKxXvmqOi8hNOCnV/8491ubn5iLyRhF5SkQOi8htBR7fKCL3ishjIvKQW9PceywiIl8XkUMi8qSIvNQ9vklEviciz7ifV/6/oUZGMzO0BYTNXbYIzZi1EA07G/M2tAe4bGv3iu8X2dDOjs0beObEBN0drVzSW/49G2H3uJ/A8S7gpcAfqupzIrIT+J/LXSQiAeDzwJuA3cBNIrI777SPAHFV3YsTmD6b89hngW+r6pXAIPCke/w24AFV3QU84H6/Lo2lZ9jaE7yg5rExZnVEw52AU1MjUKXfQ294qtJ7RhugEuCygUNVnwB+Ezjg9giGVfUTPu59HXBYVY+o6hxOavbr887ZjfPmj6oeAnaISJ+IhIBX4MypoKpz7oou3Hvc5X59F/A2H22pyPHUND989nTF14/a5j9j1pS3MGUl+zfyecNVld4zGg42/hyHu5LqGZzew58DT4vIK3zcezuQzPl+mAsLQCVw5k4QketwytMOAJcAJ4EvicijIvIXIuJVXulT1REA9/PWIu1+j4jsF5H9J0+e9NHcC/3ZPzzDe/9qP6qV7fK0WuPGrC2vYNNLdm6u2j2v27npvM/l6gsFGZ9dYHJ2oWptWm1+hqo+BbxeVV+pqq8A3gB8xsd1hfpw+e/AnwA2ikgceD/wKLCAU2DqRcAXVPVaYJIyh6RU9Q63lsi+3t7eci5dMjgQITOzwHOnJsu+VlUZtV3jxqypy7Z284PfehWvuqKy94BCrt4Wdu55eWX39OZd6nm4yk/gaFPVp7xv3BodfibHh4FYzvcDwPHcE1Q1o6rvUtUhnDmOXuA599phVf2Re+rXcQIJwJiI9AO4n0/4aEtFvK5osSL1pYzPLjA1t7j0n8QYszYu3txVdsW/Wt4zGnLmXeo5S66fwPGIiPyliLzK/fgi8IiP6x4GdonIThFpB24E7ss9wV055S05ejfwoBtMRoGkiFzhPvZa4An36/uAm92vbwa+5aMtFdm1tYcN7YGiRepL8cYwrcdhjMnVCLXH/ey/vwV4H/ABnOGnB3HmOkpS1QURuRUnXUkAuFNVHxeRW9zHbweuAu4WkUWcwPDrObd4P/BlN7AcwVndBc7w1tdE5NeBozg72Wsi0CJcsz3Mo3k5+P3wuqH97qoOY4yBxkg7UjJwiEgL8IiqXgN8utybq+r9wP15x27P+fqHwK4i18aBfQWOn8bpgayKa2MRvvQvzzO7sEhHq/+cNJZuxBhTSGd7gFCwta73cpQcqlLVLJAQkYtWqT3rzmAswtxilkMj42Vd5wWOrSGb4zDGnC8aru9KgH6GqvqBx0XkIZzVTQCo6s/VrFXryGBODv5y8tKMZmbYuKGtrMyZxpjm0Beq793jfgLHx2veinVsWzhIb0/HBUXqlzOWsaW4xpjC+sPBpUqC9aho4HCz4fap6g/yjr8COFbrhq0XTg7+yAVF6pdju8aNMcVEQ0FOTcyysJilNVB/dfFKtfhPgUIhccp9rGlce1GEI6cmSU/N+75mND1LvwUOY0wBfeEgWYWTE/VZCbBU4Nihqo/lH1TV/cCOmrVoHfKKtTx2LOXr/LmFLKcnZ22oyhhT0NKS3DqdIC8VOEq96zXV5oQ9A04BmPjRlK/zT4zPoGpLcY0xhXl/VNbrBHmpwPGwiPxG/kF3452fneMNI9zZxqW9Xb5TjywVcLKhKmNMAd4wdr0WdCq1qupDwL0i8sucCxT7gHbg7TVu17ozGIvw4NOnUNVlc9SMpp1xS+txGGMK2dTVTnugpW53jxftcajqmKq+DGc57vPux8dV9aVuLqmmMhSLcGpilmOp6WXPPZduxAKHMeZCIsLWUEfd1uVYdh+Hqv4j8I+r0JZ1zSvekkimGdi4oeS5Y5kZOlpbCHf6qrBrjGlC9VwJsP4WEK+RK6Mh2ltbiCfPLnvuSNrZw1HtVM7GmMbRFw4ylmm85bgmR3trC1dvC/lKsT5mBZyMMcvoDzn5qiqtMLqWLHCUYXAgwoFjaRYWsyXPs5KxxpjlRMNBpucXyUzXXwlZCxxlGIpFmJ5f5OmxiaLnqKqlGzHGLKuvjutyWOAow9IEeYn9HKmpeeYWstbjMMaUtFQJ0AJHY7t48wbCnW0lM+V6/wmsx2GMKcX747Iel+Ra4CiDiDAYK50pd9RqjRtjfPCKvFmPowkMxSI8PTbO5GzhCS3rcRhj/OhoDbC5q90CRzMYioXJKhw4VnhZ7mh6BhHY2mMlY40xpfWF6rOErAWOMnkp1ovNc4xlZtjS3UFbHRZnMcasrnqtPW7vbmXa3N1BbFNn0ZVVtofDGONXvdYet8BRgcGBSNHaHKO2a9wY41N/OMjpyTlmFxbXuillscBRgaFYhOPpGU4U+EvB2fxn8xvGmOV5oxMn6ixnlQWOCngbAfOX5c7ML5KamrehKmOML311ugnQAkcFrt4WJtAiF8xzLFX+s8BhjPGhXmuPW+CoQGd7gCujPRdkyvV++P3hpirJboypULROa49b4KjQYCxCIpkimz2XEvnc5j+b4zDGLC/U2UpnW8B6HLlE5I0i8pSIHBaR2wo8vlFE7hWRx0TkIRG5Juex50XkgIjERWR/zvGPicgx93hcRN5cy9dQzFAswvjsAkdOTS4ds3QjxphyiIizl8N6HA4RCQCfB94E7AZuEpHdead9BIir6l7gncBn8x5/taoOqeq+vOOfcY8Pqer9tWj/cs6Vkk0tHRvNzNDVHqAnaCVjjTH+9IU6bKgqx3XAYVU9oqpzwD3A9Xnn7AYeAFDVQ8AOEemrYZuq5tLebrraA+etrBrLzCytkjDGGD+ioSAjNlS1ZDuQzPl+2D2WKwHcACAi1wEXAwPuYwp8V0QeEZH35F13qzu8daeIbKx+05cXaBH2DkTOW1k1mrZd48aY8vSFg5zIzNZVCdlaBg4pcCz/X+YTwEYRiQPvBx4FvLSzP6WqL8IZ6nqfiLzCPf4F4FJgCBgBPlXwyUXeIyL7RWT/yZMnV/I6ihqMRXhyJMPMvLPrcywza1lxjTFliYaCzC1mOTM5t9ZN8a2WgWMYiOV8PwAczz1BVTOq+i5VHcKZ4+gFnnMfO+5+PgHcizP0haqOqeqiqmaBL3rH86nqHaq6T1X39fb2VvWFeYZiEeYXlSdGMmSzypjlqTLGlKm/DjcB1jJwPAzsEpGdItIO3Ajcl3uCiETcxwDeDTyoqhkR6RKRHvecLuD1wEH3+/6cW7zdO74WcifIT03OspBV63EYY8rSV4d7OVprdWNVXRCRW4HvAAHgTlV9XERucR+/HbgKuFtEFoEngF93L+8D7hURr41fUdVvu4/9sYgM4Qx7PQ+8t1avYTnRcJC+UAfxZIp9F28CbCmuMaY83h+b9TRBXrPAAeAulb0/79jtOV//ENhV4LojwGCRe/5qlZu5IkPuRsClzX8WOIwxZejt7qBF6qv2uO0cX6HBWITnT0/x1GgGODdeaYwxfrQGWtjS3WFzHM1kyK0I+J3Hxwi0CJu7Ld2IMaY8/eEgo3WUWt0CxwrtGQgj4tQg39rTQaCl0CpkY4wpri8UtKGqZtITbOOy3m7AJsaNMZWpt3xVFjiqwFuWaxPjxphK9IWCpKfnmZ6rjxKyFjiqYNALHDYxboypwFJBpzrpddR0OW6z8HocNlRljKmEtxrznXf+iGBroKr3/qMb9vATOzZV9Z4WOKrgqv4Q73/NZbx1b//yJxtjTJ6hiyL84r4BJmYXlj+5TJ1t1Q1EAFJPGRkrtW/fPt2/f//yJxpjjFkiIo8UqIdkcxzGGGPKY4HDGGNMWSxwGGOMKYsFDmOMMWWxwGGMMaYsFjiMMcaUxQKHMcaYsljgMMYYU5am2AAoIieBF/IObwFOrUFzaqXRXg803mtqtNcDjfeaGu31wMpe08Wq2pt/sCkCRyEisr/Qjsh61WivBxrvNTXa64HGe02N9nqgNq/JhqqMMcaUxQKHMcaYsjRz4LhjrRtQZY32eqDxXlOjvR5ovNfUaK8HavCamnaOwxhjTGWaucdhjDGmAhY4jDHGlKXpAoeIvFFEnhKRwyJy21q3pxpE5HkROSAicRGpu4pVInKniJwQkYM5xzaJyPdE5Bn388a1bGO5irymj4nIMffnFBeRN69lG8shIjER+UcReVJEHheRD7rH6/LnVOL11PPPKCgiD4lIwn1NH3ePV/1n1FRzHCISAJ4GXgcMAw8DN6nqE2vasBUSkeeBfapalxuXROQVwARwt6pe4x77Y+CMqn7CDfAbVfW/rGU7y1HkNX0MmFDVT65l2yohIv1Av6r+WER6gEeAtwG/Rh3+nEq8nl+kfn9GAnSp6oSItAH/DHwQuIEq/4yarcdxHXBYVY+o6hxwD3D9Grep6anqg8CZvMPXA3e5X9+F80tdN4q8prqlqiOq+mP363HgSWA7dfpzKvF66pY6Jtxv29wPpQY/o2YLHNuBZM73w9T5fxaXAt8VkUdE5D1r3Zgq6VPVEXB+yYGta9yearlVRB5zh7LqYlgnn4jsAK4FfkQD/JzyXg/U8c9IRAIiEgdOAN9T1Zr8jJotcEiBY40wVvdTqvoi4E3A+9xhErP+fAG4FBgCRoBPrWlrKiAi3cA3gA+pamat27NSBV5PXf+MVHVRVYeAAeA6EbmmFs/TbIFjGIjlfD8AHF+jtlSNqh53P58A7sUZkqt3Y+44tDcefWKN27Niqjrm/mJngS9SZz8nd9z8G8CXVfWb7uG6/TkVej31/jPyqGoK+D7wRmrwM2q2wPEwsEtEdopIO3AjcN8at2lFRKTLndxDRLqA1wMHS19VF+4Dbna/vhn41hq2pSq8X17X26mjn5M78fqXwJOq+umch+ry51Ts9dT5z6hXRCLu153AzwCHqMHPqKlWVQG4y+v+FAgAd6rqH65ti1ZGRC7B6WUAtAJfqbfXJCJfBV6Fk/55DPgo8L+ArwEXAUeBX1DVuplsLvKaXoUzBKLA88B7vbHn9U5EXg78E3AAyLqHP4IzL1B3P6cSr+cm6vdntBdn8juA0yn4mqr+vohspso/o6YLHMYYY1am2YaqjDHGrJAFDmOMMWWxwGGMMaYsFjiMMcaUxQKHMcaYsljgMA1DRL4vIm/IO/YhEfnzZa7ZV+N2fdVNYfHhvOMfE5HfdL8OuplLP1rg+l9ws7j+4wraMJHz9ZvdTKkXuW2YEpGtRc5VEflUzve/6SZrNE3MAodpJF/F2dSZ60b3+JoQkSjwMlXdq6qfKXJOO84O5kdU9eMFTvl14D+q6qt9PmdricdeC/wZ8EZVPeoePgX85yKXzAI3iMgWP89tmoMFDtNIvg68VUQ6YCl53Tbgn0XkCyKyP7dOQb68v7T/nYj8/+7XvSLyDRF52P34qQLXBkXkS+LURXlURLw3+e8CW8Wp7fDTBZ62FSdL8zOqekF9GBH5b8DLgdtF5E+KPY+I/JqI/I2I/K37nIVe30/jpNF4i6o+m/PQncAvicimApct4NSs/nCBx0yTssBhGoaqngYewsnPA05v46/V2eX6O6q6D9gLvNLdZevXZ4HPqOpPAD8P/EWBc97ntmEPzu7ju0QkCPwc8KyqDqnqPxW47reBBVX9UJHX9PvAfuCXVfW3SjwPwEuBm1X1NQVu1YGTauJtqnoo77EJnODxwUJtAD4P/LKIhIs8bpqMBQ7TaHKHq3KHqX5RRH4MPApcDewu454/A3zOTVd9HxDy8oPleDnwVwDuG/MLwOU+7v3PwEtFxM+5yz3P90qkkpgH/hVn2KuQ/wHcLCKh/AfcrLF3Ax/w2UbT4CxwmEbzv4DXisiLgE63wttO4DeB16rqXuB/A8EC1+bm38l9vAV4qdtrGFLV7W7xn1yFUvb78SDwIeD/iMg2H+eXep7JEo9lcarb/YSIfCT/QTeb6leA/1jk+j/FCTpdPtpoGpwFDtNQ3Apo38cZevF6GyGcN9W0iPTh1C0pZExErhKRFpzMqJ7vArd634jIUIFrHwR+2X38cpyEck/5bPM3gD8Bvu1lNy1hJc8zBbwVZ9ipUM/j08B7ceZd8q89g5Mor1iPxTQRCxymEX0VGMSZdEZVEzhDVI/jBJR/KXLdbcDfAf+AU8TH8wFgn7uk9gnglgLX/jkQEJEDwF8Dv6aqs34brKq3A98E7suZsyhkpc9zBmcO6HdF5Pq8x07hZFruKHL5p3Cy/ZomZ9lxjTHGlMV6HMYYY8pigcMYY0xZLHAYY4wpiwUOY4wxZbHAYYwxpiwWOIwxxpTFAocxxpiy/F9vUSqEq14JhwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "# plot the value of K for KNN (x-axis) versus the cross-validated accuracy (y-axis)\n", "plt.plot(k_range, k_scores)\n", "plt.xlabel('Value of K for KNN')\n", "plt.ylabel('Cross-Validated Accuracy')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross-validation example: model selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Goal:** Compare the best KNN model with logistic regression on the iris dataset" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9800000000000001\n" ] } ], "source": [ "# 10-fold cross-validation with the best KNN model\n", "knn = KNeighborsClassifier(n_neighbors=20)\n", "print(cross_val_score(knn, X, y, cv=10, scoring='accuracy').mean())" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9533333333333334\n" ] } ], "source": [ "# 10-fold cross-validation with logistic regression\n", "from sklearn.linear_model import LogisticRegression\n", "logreg = LogisticRegression(solver='liblinear')\n", "print(cross_val_score(logreg, X, y, cv=10, scoring='accuracy').mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross-validation example: feature selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Goal**: Select whether the Newspaper feature should be included in the linear regression model on the advertising dataset" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn.linear_model import LinearRegression" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# read in the advertising dataset\n", "data = pd.read_csv('data/Advertising.csv', index_col=0)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# create a Python list of three feature names\n", "feature_cols = ['TV', 'Radio', 'Newspaper']\n", "\n", "# use the list to select a subset of the DataFrame (X)\n", "X = data[feature_cols]\n", "\n", "# select the Sales column as the response (y)\n", "y = data.Sales" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-3.56038438 -3.29767522 -2.08943356 -2.82474283 -1.3027754 -1.74163618\n", " -8.17338214 -2.11409746 -3.04273109 -2.45281793]\n" ] } ], "source": [ "# 10-fold cross-validation with all three features\n", "lm = LinearRegression()\n", "scores = cross_val_score(lm, X, y, cv=10, scoring='neg_mean_squared_error')\n", "print(scores)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3.56038438 3.29767522 2.08943356 2.82474283 1.3027754 1.74163618\n", " 8.17338214 2.11409746 3.04273109 2.45281793]\n" ] } ], "source": [ "# fix the sign of MSE scores\n", "mse_scores = -scores\n", "print(mse_scores)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.88689808 1.81595022 1.44548731 1.68069713 1.14139187 1.31971064\n", " 2.85891276 1.45399362 1.7443426 1.56614748]\n" ] } ], "source": [ "# convert from MSE to RMSE\n", "rmse_scores = np.sqrt(mse_scores)\n", "print(rmse_scores)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.6913531708051797\n" ] } ], "source": [ "# calculate the average RMSE\n", "print(rmse_scores.mean())" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.6796748419090768\n" ] } ], "source": [ "# 10-fold cross-validation with two features (excluding Newspaper)\n", "feature_cols = ['TV', 'Radio']\n", "X = data[feature_cols]\n", "print(np.sqrt(-cross_val_score(lm, X, y, cv=10, scoring='neg_mean_squared_error')).mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Improvements to cross-validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Repeated cross-validation**\n", "\n", "- Repeat cross-validation multiple times (with **different random splits** of the data) and average the results\n", "- More reliable estimate of out-of-sample performance by **reducing the variance** associated with a single trial of cross-validation\n", "\n", "**Creating a hold-out set**\n", "\n", "- \"Hold out\" a portion of the data **before** beginning the model building process\n", "- Locate the best model using cross-validation on the remaining data, and test it **using the hold-out set**\n", "- More reliable estimate of out-of-sample performance since hold-out set is **truly out-of-sample**\n", "\n", "**Feature engineering and selection within cross-validation iterations**\n", "\n", "- Normally, feature engineering and selection occurs **before** cross-validation\n", "- Instead, perform all feature engineering and selection **within each cross-validation iteration**\n", "- More reliable estimate of out-of-sample performance since it **better mimics** the application of the model to out-of-sample data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resources\n", "\n", "- scikit-learn documentation: [Cross-validation](https://scikit-learn.org/stable/modules/cross_validation.html), [Model evaluation](https://scikit-learn.org/stable/modules/model_evaluation.html)\n", "- scikit-learn issue on GitHub: [MSE is negative when returned by cross_val_score](https://github.com/scikit-learn/scikit-learn/issues/2439)\n", "- Section 5.1 of [An Introduction to Statistical Learning](https://www.statlearning.com/) (11 pages) and related videos: [K-fold and leave-one-out cross-validation](https://www.youtube.com/watch?v=rSGzUy13F_0&list=PL5-da3qGB5IA6E6ZNXu7dp89_uv8yocmf&index=2) (14 minutes), [Cross-validation the right and wrong ways](https://www.youtube.com/watch?v=r64tRyHFAJ8&list=PL5-da3qGB5IA6E6ZNXu7dp89_uv8yocmf&index=3) (10 minutes)\n", "- Scott Fortmann-Roe: [Accurately Measuring Model Prediction Error](http://scott.fortmann-roe.com/docs/MeasuringError.html)\n", "- Machine Learning Mastery: [An Introduction to Feature Selection](https://machinelearningmastery.com/an-introduction-to-feature-selection/)\n", "- Harvard CS109: [Cross-Validation: The Right and Wrong Way](https://github.com/cs109/content/blob/master/lec_10_cross_val.ipynb)\n", "- Journal of Cheminformatics: [Cross-validation pitfalls when selecting and assessing regression and classification models](https://jcheminf.biomedcentral.com/track/pdf/10.1186/1758-2946-6-10.pdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comments or Questions?\n", "\n", "- Email: \n", "- Website: https://www.dataschool.io\n", "- Twitter: [@justmarkham](https://twitter.com/justmarkham)\n", "\n", "© 2021 [Data School](https://www.dataschool.io). All rights reserved." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" } }, "nbformat": 4, "nbformat_minor": 1 }