{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " \n", "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n", "Author: [Yury Kashnitsky](https://yorko.github.io) (@yorko). Edited by Anna Tarelina (@feuerengel), and Mikhail Korshchikov (@MS4). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#
Assignment #2. Fall 2019\n", "##
Deadline for A2: 2019 October 6, 20:59 CET (London time)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt\n", "from sklearn.model_selection import train_test_split, GridSearchCV\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.tree import DecisionTreeClassifier, export_graphviz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Decision trees for regression: a toy example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's consider the following one-dimensional regression problem. We need to build a function $\\large a(x)$ to approximate the dependency $\\large y = f(x)$ using the mean-squared error criterion: $\\large \\min \\sum_i {(a(x_i) - f(x_i))}^2$." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEKCAYAAAAW8vJGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAEqtJREFUeJzt3X2QXXddx/H3xzTF5UECJliSNqaMEEVQg2unUh8KrYapDK2Pwx8o9Sk+IjhMsLEzMj7MgMTx+WkyBQfHjqglhIpgaC3o6Ewr26ZladNIqS10U+yiEwRZaRq+/rF34/6W3WSX7L3n7t73a2an555zes/nnt3s557fOXtPqgpJkuZ8WdcBJEnDxWKQJDUsBklSw2KQJDUsBklSw2KQJDUsBklSw2KQJDUsBklS47yuA3wpNm/eXDt27Og6hiStKXfeeeenqmrL2dZbk8WwY8cOJiYmuo4hSWtKkoeXs55DSZKkhsUgSWpYDJKkhsUgSWpYDJKkxlBclZTkF4GfAAqYBH60qv6321SSNBwOHZli/+FjHD8xw9ZNY+zdvZNrdm3r2/Y6P2JIsg34BWC8ql4AbABe2W0qSRoOh45Mse/gJFMnZihg6sQM+w5OcujIVN+22Xkx9JwHjCU5D3gycLzjPJI0FPYfPsbMyVPNvJmTp9h/+Fjfttl5MVTVFPBbwMeBR4FPV9X7F66XZE+SiSQT09PTg44pSZ04fmJmRfNXQ+fFkOQZwNXAxcBW4ClJXrVwvao6UFXjVTW+ZctZ/6JbktaFrZvGVjR/NXReDMCVwL9X1XRVnQQOAi/uOJMkDYW9u3cytnFDM29s4wb27t7Zt20Ow1VJHwcuTfJkYAa4AvCDkCQJTl99NMirkjovhqq6I8lNwF3AE8AR4EC3qSRpeFyza1tfi2ChzosBoKreCLyx6xySpOE4xyBJGiIWgySpYTFIkhoWgySpYTFIkhoWgySpYTFIkhoWgySpYTFIkhoWgySpYTFIkhoWgySpYTFIkhoWgySpYTFIkhpDUQxJNiW5Kcn9SY4m+dauM0nSqBqKG/UAvwf8fVX9QJLzgSd3HUiSRlXnxZDk6cB3ANcCVNXjwONdZpKkUTYMQ0kXA9PAnyU5kuSGJE/pOpQkjaphKIbzgBcBf1JVu4D/Aa5buFKSPUkmkkxMT08POqMkjYxhKIZHgEeq6o7e45uYLYpGVR2oqvGqGt+yZctAA0rSKOm8GKrqk8AnkuzszboCuK/DSJI00jo/+dzzGuDG3hVJDwI/2nEeSRpZQ1EMVXU3MN51DknSEAwlSZKGi8UgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWoMTTEk2ZDkSJL3dJ1FkkbZ0BQD8FrgaNchJGnUDUUxJLkQ+B7ghq6zSNKoG4piAH4XeAPwha6DSNKo67wYkrwceKyq7jzLenuSTCSZmJ6eHlA6SRo9nRcDcBnwiiQPAe8AXprkLxauVFUHqmq8qsa3bNky6IySNDI6L4aq2ldVF1bVDuCVwG1V9aqOY0nSyOq8GCRJw+W8rgPMV1UfBD7YcQxJGmkeMUiSGhaDJKlhMUiSGhaDJKlhMUiSGhaDJKlhMUiSGhaDJKlhMUiSGhaDJKlhMUiSGhaDJKkxVB+iJ0mr5dCRKfYfPsbxEzNs3TTG3t07uWbXtq5jrQkWg6R159CRKfYdnGTm5CkApk7MsO/gJIDlsAwOJUlad/YfPna6FObMnDzF/sPHOkq0tlgMktad4ydmVjRfrc6LIclFST6Q5L4k9yZ5bdeZJK1tWzeNrWi+Wp0XA/AE8Pqqej5wKfBzSZ7fcSZJa9je3TsZ27ihmTe2cQN7d+/sKNHa0vnJ56p6FHi0N/2ZJEeBbcB9nQaTtGbNnWD2qqQvTaqq6wynJdkB/BPwgqr67wXL9gB7ALZv3/7NDz/88MDzSdJaluTOqho/23rDMJQEQJKnAu8EXrewFACq6kBVjVfV+JYtWwYfUJJGxFAUQ5KNzJbCjVV1sOs8kjTKOi+GJAHeChytqt/uOo8kjbrOiwG4DPhh4KVJ7u59XdV1KEkaVcNwVdI/A+k6hyRp1jAcMUiShojFIElqWAySpIbFIElqWAySpIbFIElqWAySpIbFIElqWAySpIbFIElqWAySpIbFIElqdP4hepKGy6EjU94Sc8RZDJJOO3Rkin0HJ5k5eQqAqRMz7Ds4CWA5jBCHkiSdtv/wsdOlMGfm5Cn2Hz7WUSJ1YSiKIcnLkhxL8kCS67rOI42q4ydmVjRf61PnQ0lJNgB/BHwX8AjwoSQ3V9V93SaTlm+9jMtv3TTG1CIlsHXTWAdp1JVhOGK4BHigqh6sqseBdwBXd5xJWra5cfmpEzMU/z8uf+jIVNfRVmzv7p2MbdzQzBvbuIG9u3d2lEhdOGsxJLklyTf2McM24BPzHj/SmyetCetpXP6aXdt40/e9kG2bxgiwbdMYb/q+F67Jox996ZYzlPRLwO8meQj45ap6tL+RFpdkD7AHYPv27V1EkBa13sblr9m1zSIYcWc9Yqiqu6rqJcB7gL9P8sYkqzngOAVcNO/xhb15C3McqKrxqhrfsmXLKm5eXTp0ZIrL3nwbF1/3d1z25tvW5PDLUuPvjstrrVrWOYYkAY4BfwK8Bvhokh9epQwfAp6b5OIk5wOvBG5epefWEFsvY/OOy2u9Wc45hn9h9h387zA79n8tcDlwSZID5xqgqp4Afh44DBwF/rqq7j3X59XwWy9j847La71ZzjmGPcB9VVUL5r8mydHVCFFV7wXeuxrPpbVjPY3NOy6v9WQ55xjuXaQU5nzPKufRCHFsXhpO5/R3DFX14GoF0ehxbF4aTp3/5bNG19zQy3r4i2FpPbEY1CnH5qXhMwwfiSFJGiIWgySpYTFIkhoWgySpYTFIkhoWgySpYTFIkhoWgySpYTFIkhoWgySpYTFIkhoWgySp0WkxJNmf5P4kH07yriSbuswjSer+iOEW4AVV9Q3AvwH7Os4jSSOv02Koqvf37vkMcDtwYZd5JEndHzHM92PA+7oOIUmjru836klyK3DBIouur6p399a5HngCuPEMz7MH2AOwffv2PiSVJMEAiqGqrjzT8iTXAi8HrqiqOsPzHAAOAIyPjy+53ig4dGTK22FK6ptOb+2Z5GXAG4DvrKrPdZllrTh0ZIp9ByeZOXkKgKkTM+w7OAlgOUhaFV2fY/hD4GnALUnuTvKnHecZevsPHztdCnNmTp5i/+FjHSWStN50esRQVV/T5fbXouMnZlY0X5JWqusjBq3Q1k1jK5ovSStlMawxe3fvZGzjhmbe2MYN7N29s6NEktabToeStHJzJ5i9KklSv1gMa9A1u7ZZBJL6xqEkSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNYaiGJK8Pkkl2dx1FkkadZ0XQ5KLgO8GPt51FknSEBQD8DvAG4DqOogkqeNiSHI1MFVV9yxj3T1JJpJMTE9PDyCdJI2mvt+oJ8mtwAWLLLoe+GVmh5HOqqoOAAcAxsfHPbqQpD7pezFU1ZWLzU/yQuBi4J4kABcCdyW5pKo+2e9ckqTFdXZrz6qaBJ419zjJQ8B4VX2qq0ySpOE4+SxJGiKdHTEsVFU7us4gSfKIQZK0gMUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkhsUgSWpYDJKkRufFkOQ1Se5Pcm+St3SdR5JGXac36knyEuBq4Bur6vNJnnW2/0eS1F9dHzH8DPDmqvo8QFU91nEeSRp5XRfD84BvT3JHkn9M8i0d55Gkkdf3oaQktwIXLLLo+t72nwlcCnwL8NdJnlNVtcjz7AH2AGzfvr1/gSVpxPW9GKrqyqWWJfkZ4GCvCP41yReAzcD0Is9zADgAMD4+/kXFIUlaHV0PJR0CXgKQ5HnA+cCnOk0kSSOu06uSgLcBb0vyEeBx4NWLDSNJkgan02KoqseBV3WZQZLU6nooSZI0ZCwGSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNSwGSVLDYpAkNSwGSVKj02JI8k1Jbk9yd5KJJJd0mUeS1P2tPd8C/GpVvS/JVb3Hl/drY4eOTLH/8DGOn5hh66Yx9u7eyTW7tvVrc5K0JnVdDAV8RW/66cDxfm3o0JEp9h2cZObkKQCmTsyw7+AkgOUgSfN0fY7hdcD+JJ8AfgvY168N7T987HQpzJk5eYr9h4/1a5OStCb1/Yghya3ABYssuh64AvjFqnpnkh8C3gpcucTz7AH2AGzfvn3FOY6fmFnRfEkaVX0vhqpa9Bc9QJI/B17be/g3wA1neJ4DwAGA8fHxWmmOrZvGmFqkBLZuGlvpU0nSutb1UNJx4Dt70y8FPtqvDe3dvZOxjRuaeWMbN7B3985+bVKS1qSuTz7/JPB7Sc4D/pfeUFE/zJ1g9qokSTqzVK14VKZz4+PjNTEx0XUMSVpTktxZVeNnW6/roSRJ0pCxGCRJDYtBktSwGCRJDYtBktRYk1clJZkGHj6Hp9gMfGqV4qwmcy3fMGYCc62UuVbmXHN9dVVtOdtKa7IYzlWSieVcsjVo5lq+YcwE5lopc63MoHI5lCRJalgMkqTGqBbDga4DLMFcyzeMmcBcK2WulRlIrpE8xyBJWtqoHjFIkpYwEsWQZH+S+5N8OMm7kmxaYr2XJTmW5IEk1w0g1w8muTfJF5IseaVBkoeSTCa5O0nfPz1wBbkGtr+SPDPJLUk+2vvvM5ZY71RvP92d5OY+5jnja0/ypCR/1Vt+R5Id/cqywlzXJpmet49+YgCZ3pbksSQfWWJ5kvx+L/OHk7yo35mWmevyJJ+et69+ZUC5LkrygST39f4dvnaRdfq7z6pq3X8B3w2c15v+TeA3F1lnA/Ax4DnA+cA9wPP7nOvrgJ3AB4HxM6z3ELB5gPvrrLkGvb+AtwDX9aavW+x72Fv22QHsn7O+duBngT/tTb8S+KshyXUt8IeD+lnqbfM7gBcBH1li+VXA+4AAlwJ3DEmuy4H3DHJf9bb7bOBFvemnAf+2yPexr/tsJI4Yqur9VfVE7+HtwIWLrHYJ8EBVPVhVjwPvAK7uc66jVTV0N51eZq5B76+rgbf3pt8OXNPHbZ3Ncl77/Lw3AVckyRDkGriq+ifgv86wytXAn9es24FNSZ49BLk6UVWPVtVdvenPAEeBhTeO6es+G4liWODHmG3ahbYBn5j3+BG++JvRlQLen+TO3r2vh8Gg99dXVdWjvelPAl+1xHpfnmQiye1J+lUey3ntp9fpvSn5NPCVfcqzklwA398bfrgpyUV9zrQcw/xv71uT3JPkfUm+ftAb7w1B7gLuWLCor/us6zu4rZoktwIXLLLo+qp6d2+d64EngBuHKdcyfFtVTSV5FnBLkvt773a6zrWqzpRp/oOqqiRLXU731b199RzgtiSTVfWx1c66hv0t8JdV9fkkP8XsUc1LO840rO5i9ufps0muAg4Bzx3UxpM8FXgn8Lqq+u9BbRfWUTFU1ZVnWp7kWuDlwBXVG6RbYAqY/+7pwt68vuZa5nNM9f77WJJ3MTtkcE7FsAq5Vn1/nSlTkv9I8uyqerR3yPzYEs8xt68eTPJBZt9trXYxLOe1z63zSGZvXft04D9XOceKc1XV/Aw3MHvupmt9+bd3rub/Mq6q9yb54ySbq6rvn6GUZCOzpXBjVR1cZJW+7rORGEpK8jLgDcArqupzS6z2IeC5SS5Ocj6zJwz7dlXLciV5SpKnzU0zeyJ90asoBmzQ++tm4NW96VcDX3RUk+QZSZ7Um94MXAbc14csy3nt8/P+AHDbEm9IBpprwTj0K5gdv+7azcCP9K60uRT49Lxhw84kuWDuvFCSS5j9fdnvcqe3zbcCR6vqt5dYrb/7bNBn3Lv4Ah5gdjzu7t7X3NUiW4H3zlvvKmavAPgYs0Mq/c71vcyODX4e+A/g8MJczF5hck/v695hyTXo/cXs+Pw/AB8FbgWe2Zs/DtzQm34xMNnbV5PAj/cxzxe9duDXmH3zAfDlwN/0fvb+FXhOv79vy8z1pt7P0T3AB4CvHUCmvwQeBU72fq5+HPhp4Kd7ywP8US/zJGe4Qm/AuX5+3r66HXjxgHJ9G7PnFT8873fWVYPcZ/7lsySpMRJDSZKk5bMYJEkNi0GS1LAYJEkNi0GS1LAYJEkNi0GS1LAYpFXQ+/z87+pN/0aSP+g6k/SlWjeflSR17I3Ar/U+6HAXsx83Ia1J/uWztEqS/CPwVODymv0cfWlNcihJWgVJXsjsnbcetxS01lkM0jnqfWLpjczeVeuzvU/zldYsi0E6B0meDBwEXl9VR4FfZ/Z8g7RmeY5BktTwiEGS1LAYJEkNi0GS1LAYJEkNi0GS1LAYJEkNi0GS1LAYJEmN/wOg1rDmxPLdDAAAAABJRU5ErkJggg==\n", "text/plain": [ "