{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Compare the Tree-like algorithms: Tree, Random Forest, AdaBoost 비교" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import warnings\n", "from sklearn.model_selection import train_test_split,GridSearchCV\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.ensemble import AdaBoostClassifier\n", "from sklearn import metrics\n", "from sklearn.datasets import load_digits #digits은 해상도 수치\n", "from scipy.ndimage.interpolation import rotate\n", "warnings.filterwarnings(action='ignore') # Turn off the warnings.\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1. Read in data and explore:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data = load_digits() #digits은 해상도 수치" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".. _digits_dataset:\n", "\n", "Optical recognition of handwritten digits dataset\n", "--------------------------------------------------\n", "\n", "**Data Set Characteristics:**\n", "\n", " :Number of Instances: 5620\n", " :Number of Attributes: 64\n", " :Attribute Information: 8x8 image of integer pixels in the range 0..16.\n", " :Missing Attribute Values: None\n", " :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)\n", " :Date: July; 1998\n", "\n", "This is a copy of the test set of the UCI ML hand-written digits datasets\n", "https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits\n", "\n", "The data set contains images of hand-written digits: 10 classes where\n", "each class refers to a digit.\n", "\n", "Preprocessing programs made available by NIST were used to extract\n", "normalized bitmaps of handwritten digits from a preprinted form. From a\n", "total of 43 people, 30 contributed to the training set and different 13\n", "to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of\n", "4x4 and the number of on pixels are counted in each block. This generates\n", "an input matrix of 8x8 where each element is an integer in the range\n", "0..16. This reduces dimensionality and gives invariance to small\n", "distortions.\n", "\n", "For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.\n", "T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.\n", "L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,\n", "1994.\n", "\n", ".. topic:: References\n", "\n", " - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their\n", " Applications to Handwritten Digit Recognition, MSc Thesis, Institute of\n", " Graduate Studies in Science and Engineering, Bogazici University.\n", " - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.\n", " - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.\n", " Linear dimensionalityreduction using relevance weighted LDA. School of\n", " Electrical and Electronic Engineering Nanyang Technological University.\n", " 2005.\n", " - Claudio Gentile. A New Approximate Maximal Margin Classification\n", " Algorithm. NIPS. 2000.\n" ] } ], "source": [ "print(data['DESCR'])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1797, 64)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#설명변수\n", "X = data['data']\n", "X.shape" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1797,)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#반응변수\n", "Y = data['target']\n", "Y.shape" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Define a visualization function.\n", "def show_data(X, Y, n, angle=0):\n", " print(Y[n])\n", " image_matrix = X[n,:].reshape((8,8)) #1차원에서 2차원 행렬로 변환\n", " image_matrix = rotate(image_matrix, angle, cval=0.01, reshape=False) # Rotate if wanted. \n", " plt.imshow(image_matrix, cmap='Greys',interpolation='None')\n", " plt.show()\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPUAAAD4CAYAAAA0L6C7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAK2ElEQVR4nO3dX2id9R3H8c9nUcn8R3EtQ5uyKEhABms1FCQorO1GnKK72EULCtqBN1OUDUR3tzuvxF0URapOsFO2WkGK0wlWV2VzTWu32qbOrHQ2q/1HEbXD1ep3FzmFqnF5zjnPv3x9vyCYkxzy+x7r2+ecJ6fPzxEhAHl8o+kBAJSLqIFkiBpIhqiBZIgaSOasKn7owoULY3h4uIof/SUnTpyoZR1JOnToUG1rSdInn3xS21p1/nus09KlS2tdb2BgoJZ19u/fr2PHjnm271US9fDwsCYmJqr40V+ybdu2WtaRpPvvv7+2tSTp8OHDta31+uuv17ZWnbZs2VLregsWLKhlndHR0a/8Hk+/gWSIGkiGqIFkiBpIhqiBZIgaSIaogWSIGkiGqIFkCkVte9z227anbN9b9VAAejdn1LYHJK2TdJ2kKyStsX1F1YMB6E2RI/VySVMRsS8iTkp6WtJN1Y4FoFdFol4s6cAZt6c7X/sc27fbnrA9cfTo0bLmA9ClIlHP9te7vnS1woh4JCJGI2J00aJF/U8GoCdFop6WtOSM20OSDlYzDoB+FYl6m6TLbV9q+xxJqyU9V+1YAHo150USIuKU7TskvShpQNJjEbG78skA9KTQlU8i4nlJz1c8C4AS8I4yIBmiBpIhaiAZogaSIWogGaIGkiFqIJlKduio08MPP1zbWps2baptLUm66KKLaltr3bp1ta21atWq2taqa8eMNuFIDSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSRD1EAyRA0kQ9RAMkV26HjM9hHbb9UxEID+FDlS/0bSeMVzACjJnFFHxJ8kHa9hFgAlKO01NdvuAO1QWtRsuwO0A2e/gWSIGkimyK+0npL0Z0kjtqdt/7T6sQD0qsheWmvqGARAOXj6DSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSQz77fdueqqq2pba+vWrbWtJUnXXHNNbWutXbu2trUGBwdrW+vriCM1kAxRA8kQNZAMUQPJEDWQDFEDyRA1kAxRA8kQNZAMUQPJFLlG2RLbW2xP2t5t+646BgPQmyLv/T4l6RcRscP2BZK2234pIvZUPBuAHhTZdue9iNjR+fxDSZOSFlc9GIDedPWa2vawpGWS3pjle2y7A7RA4ahtny/pGUl3R8QHX/w+2+4A7VAoattnayboDRGxqdqRAPSjyNlvS3pU0mREPFD9SAD6UeRIPSbpFkkrbO/sfPyo4rkA9KjItjuvSXINswAoAe8oA5IhaiAZogaSIWogGaIGkiFqIBmiBpIhaiCZeb+XVp3eeeedtOu9+uqrta01NTVV21pfRxypgWSIGkiGqIFkiBpIhqiBZIgaSIaogWSIGkiGqIFkilx4cND2X23/rbPtzq/qGAxAb4q8TfS/klZExEedSwW/ZvsPEfGXimcD0IMiFx4MSR91bp7d+YgqhwLQu6IX8x+wvVPSEUkvRQTb7gAtVSjqiPg0IpZKGpK03PZ3Z7kP2+4ALdDV2e+IeF/SK5LGK5kGQN+KnP1eZHtB5/NvSlolaW/VgwHoTZGz3xdLesL2gGb+J/C7iNhc7VgAelXk7PffNbMnNYB5gHeUAckQNZAMUQPJEDWQDFEDyRA1kAxRA8kQNZDMvN92Z+3atbWtNTIyUttakmS7trVWrlxZ21qoFkdqIBmiBpIhaiAZogaSIWogGaIGkiFqIBmiBpIhaiAZogaSKRx154L+b9rmooNAi3VzpL5L0mRVgwAoR9Ftd4YkXS9pfbXjAOhX0SP1g5LukfTZV92BvbSAdiiyQ8cNko5ExPb/dz/20gLaociRekzSjbb3S3pa0grbT1Y6FYCezRl1RNwXEUMRMSxptaSXI+LmyicD0BN+Tw0k09XljCLiFc1sZQugpThSA8kQNZAMUQPJEDWQDFEDyRA1kAxRA8nM+213BgcHa1ur7q1pJiYmal2vLgcPHqxtrUsuuaS2tdqCIzWQDFEDyRA1kAxRA8kQNZAMUQPJEDWQDFEDyRA1kAxRA8kUepto50qiH0r6VNKpiBitcigAvevmvd/fj4hjlU0CoBQ8/QaSKRp1SPqj7e22b5/tDmy7A7RD0ajHIuJKSddJ+pnta794B7bdAdqhUNQRcbDzzyOSnpW0vMqhAPSuyAZ559m+4PTnkn4o6a2qBwPQmyJnv78t6Vnbp+//24h4odKpAPRszqgjYp+k79UwC4AS8CstIBmiBpIhaiAZogaSIWogGaIGkiFqIJl5v+3Oxx9/XNtau3btqm0tSRofH69trbGxsdrW+jpuhVMnjtRAMkQNJEPUQDJEDSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSRTKGrbC2xvtL3X9qTtq6seDEBvir73+9eSXoiIn9g+R9K5Fc4EoA9zRm37QknXSrpVkiLipKST1Y4FoFdFnn5fJumopMdtv2l7fef635/DtjtAOxSJ+ixJV0p6KCKWSToh6d4v3oltd4B2KBL1tKTpiHijc3ujZiIH0EJzRh0RhyQdsD3S+dJKSXsqnQpAz4qe/b5T0obOme99km6rbiQA/SgUdUTslDRa8SwASsA7yoBkiBpIhqiBZIgaSIaogWSIGkiGqIFkiBpIZt7vpfXuu+/Wtlade1tJ0vHjx2tba/PmzbWthWpxpAaSIWogGaIGkiFqIBmiBpIhaiAZogaSIWogGaIGkpkzatsjtnee8fGB7bvrGA5A9+Z8m2hEvC1pqSTZHpD0b0nPVjwXgB51+/R7paR/RsS/qhgGQP+6jXq1pKdm+wbb7gDtUDjqzjW/b5T0+9m+z7Y7QDt0c6S+TtKOiDhc1TAA+tdN1Gv0FU+9AbRHoahtnyvpB5I2VTsOgH4V3XbnP5K+VfEsAErAO8qAZIgaSIaogWSIGkiGqIFkiBpIhqiBZIgaSMYRUf4PtY9K6vavZy6UdKz0Ydoh62PjcTXnOxEx69+cqiTqXtieiIjRpueoQtbHxuNqJ55+A8kQNZBMm6J+pOkBKpT1sfG4Wqg1r6kBlKNNR2oAJSBqIJlWRG173Pbbtqds39v0PGWwvcT2FtuTtnfbvqvpmcpke8D2m7Y3Nz1LmWwvsL3R9t7On93VTc/UrcZfU3c2CPiHZi6XNC1pm6Q1EbGn0cH6ZPtiSRdHxA7bF0jaLunH8/1xnWb755JGJV0YETc0PU9ZbD8haWtErO9cQffciHi/6bm60YYj9XJJUxGxLyJOSnpa0k0Nz9S3iHgvInZ0Pv9Q0qSkxc1OVQ7bQ5Kul7S+6VnKZPtCSddKelSSIuLkfAtaakfUiyUdOOP2tJL8x3+a7WFJyyS90ewkpXlQ0j2SPmt6kJJdJumopMc7Ly3W2z6v6aG61YaoPcvX0vyezfb5kp6RdHdEfND0PP2yfYOkIxGxvelZKnCWpCslPRQRyySdkDTvzvG0IeppSUvOuD0k6WBDs5TK9tmaCXpDRGS5vPKYpBtt79fMS6UVtp9sdqTSTEuajojTz6g2aibyeaUNUW+TdLntSzsnJlZLeq7hmfpm25p5bTYZEQ80PU9ZIuK+iBiKiGHN/Fm9HBE3NzxWKSLikKQDtkc6X1opad6d2Cx03e8qRcQp23dIelHSgKTHImJ3w2OVYUzSLZJ22d7Z+dovI+L5BmfC3O6UtKFzgNkn6baG5+la47/SAlCuNjz9BlAiogaSIWogGaIGkiFqIBmiBpIhaiCZ/wF3R7KFZeRv2AAAAABJRU5ErkJggg==\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "9\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPUAAAD4CAYAAAA0L6C7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAK1UlEQVR4nO3dXYhc9RnH8d/P1dL6xoYmBNmEroIEpFIjS0CDQrNpjVW0FwUTUKgUBGmC0oJo8aaQS1F7UQSJWsHE0EYFERsrqLTG1pqNaeu6GtOQkk1ik1DiW6Eh8enFTiDaTffMmfOWh+8Hgrs7w/6fIX5zZs7Onr8jQgDyOKvtAQBUi6iBZIgaSIaogWSIGkjm7Dq+6fz582N0dLSOb92qgwcPNrreRx991NhaCxcubGytefPmNbZWVnv37tWRI0c82221RD06Oqrt27fX8a1btX79+kbX27p1a2NrrVu3rrG1brnllsbWympsbOy0t/H0G0iGqIFkiBpIhqiBZIgaSIaogWSIGkiGqIFkiBpIplDUtlfZft/2btv31j0UgPLmjNr2kKRfSrpe0mWS1ti+rO7BAJRT5Ei9TNLuiNgTEcckbZZ0c71jASirSNQjkvad8vl072tfYPsO29ttbz98+HBV8wHoU5GoZ/v1rv+5WmFEPBoRYxExtmDBgsEnA1BKkainJS0+5fNFkg7UMw6AQRWJ+i1Jl9q+2PZXJK2W9Hy9YwEoa86LJETEcdtrJb0kaUjS4xExWftkAEopdOWTiHhR0os1zwKgAryjDEiGqIFkiBpIhqiBZIgaSIaogWSIGkimlh06spqYmGh0vW3btqVc67rrrmtsreHh4cbW6gqO1EAyRA0kQ9RAMkQNJEPUQDJEDSRD1EAyRA0kQ9RAMkQNJFNkh47HbR+y/U4TAwEYTJEj9a8krap5DgAVmTPqiPi9pH81MAuAClT2mpptd4BuqCxqtt0BuoGz30AyRA0kU+RHWk9L+qOkJbanbf+o/rEAlFVkL601TQwCoBo8/QaSIWogGaIGkiFqIBmiBpIhaiAZogaSYdudDrv66qsbW+uNN95obK0mty8aHx9vbK2u4EgNJEPUQDJEDSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSRD1EAyRa5Rttj2q7anbE/avquJwQCUU+S938cl/TQidti+QNKE7Zcj4t2aZwNQQpFtdw5GxI7ex59ImpI0UvdgAMrp6zW17VFJSyW9OcttbLsDdEDhqG2fL+kZSXdHxMdfvp1td4BuKBS17XM0E/TGiHi23pEADKLI2W9LekzSVEQ8WP9IAAZR5Ei9XNJtklbY3tn7872a5wJQUpFtd16X5AZmAVAB3lEGJEPUQDJEDSRD1EAyRA0kQ9RAMkQNJEPUQDLspdWHtWvXNrreypUrG12vKXfeeWdja+3atauxtbqCIzWQDFEDyRA1kAxRA8kQNZAMUQPJEDWQDFEDyRA1kEyRCw9+1fafbf+lt+3Oz5sYDEA5Rd4m+h9JKyLi096lgl+3/duI+FPNswEoociFB0PSp71Pz+n9iTqHAlBe0Yv5D9neKemQpJcjgm13gI4qFHVEnIiIKyQtkrTM9jdnuQ/b7gAd0NfZ74g4Kuk1SatqmQbAwIqc/V5ge7j38dckrZT0Xt2DASinyNnviyQ9aXtIM/8I/DoiXqh3LABlFTn7/VfN7EkN4AzAO8qAZIgaSIaogWSIGkiGqIFkiBpIhqiBZIgaSIZtd/owPj7e6Hozv/XajKNHjza21rx58xpba//+/Y2tJUkjIyONrjcbjtRAMkQNJEPUQDJEDSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSRTOOreBf3fts1FB4EO6+dIfZekqboGAVCNotvuLJJ0g6QN9Y4DYFBFj9QPS7pH0uenuwN7aQHdUGSHjhslHYqIif93P/bSArqhyJF6uaSbbO+VtFnSCttP1ToVgNLmjDoi7ouIRRExKmm1pFci4tbaJwNQCj+nBpLp63JGEfGaZrayBdBRHKmBZIgaSIaogWSIGkiGqIFkiBpIhqiBZNh2B5Kk4eHhxtZav359Y2s99NBDja0lSQ888ECj682GIzWQDFEDyRA1kAxRA8kQNZAMUQPJEDWQDFEDyRA1kAxRA8kUepto70qin0g6Iel4RIzVORSA8vp57/e3I+JIbZMAqARPv4FkikYdkn5ne8L2HbPdgW13gG4oGvXyiLhS0vWSfmz72i/fgW13gG4oFHVEHOj995Ck5yQtq3MoAOUV2SDvPNsXnPxY0nclvVP3YADKKXL2e6Gk52yfvP+miNha61QASpsz6ojYI+lbDcwCoAL8SAtIhqiBZIgaSIaogWSIGkiGqIFkiBpIhm13+nDgwIFG19u2bVtja23atKmxtSYnJxtb64MPPmhsLUm6//77G1nnxIkTp72NIzWQDFEDyRA1kAxRA8kQNZAMUQPJEDWQDFEDyRA1kAxRA8kUitr2sO0ttt+zPWX7qroHA1BO0fd+/0LS1oj4ge2vSDq3xpkADGDOqG1fKOlaST+UpIg4JulYvWMBKKvI0+9LJB2W9ITtt21v6F3/+wvYdgfohiJRny3pSkmPRMRSSZ9JuvfLd2LbHaAbikQ9LWk6It7sfb5FM5ED6KA5o46IDyXts72k96VxSe/WOhWA0oqe/V4naWPvzPceSbfXNxKAQRSKOiJ2ShqreRYAFeAdZUAyRA0kQ9RAMkQNJEPUQDJEDSRD1EAyRA0kw15afWhybytJ2rx5c2NrnXVWc/++X3755Y2tdc011zS2liQNDw83ss7Q0NBpb+NIDSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSRD1EAyRA0kM2fUtpfY3nnKn49t393EcAD6N+fbRCPifUlXSJLtIUn7JT1X81wASur36fe4pL9HxD/qGAbA4PqNerWkp2e7gW13gG4oHHXvmt83SfrNbLez7Q7QDf0cqa+XtCMi/lnXMAAG10/Ua3Sap94AuqNQ1LbPlfQdSc/WOw6AQRXdduffkr5e8ywAKsA7yoBkiBpIhqiBZIgaSIaogWSIGkiGqIFkiBpIxhFR/Te1D0vq99cz50s6Uvkw3ZD1sfG42vONiJj1N6dqiboM29sjYqztOeqQ9bHxuLqJp99AMkQNJNOlqB9te4AaZX1sPK4O6sxragDV6NKRGkAFiBpIphNR215l+33bu23f2/Y8VbC92PartqdsT9q+q+2ZqmR7yPbbtl9oe5Yq2R62vcX2e72/u6vanqlfrb+m7m0QsEszl0ualvSWpDUR8W6rgw3I9kWSLoqIHbYvkDQh6ftn+uM6yfZPJI1JujAibmx7nqrYflLSHyJiQ+8KuudGxNG25+pHF47UyyTtjog9EXFM0mZJN7c808Ai4mBE7Oh9/ImkKUkj7U5VDduLJN0gaUPbs1TJ9oWSrpX0mCRFxLEzLWipG1GPSNp3yufTSvI//0m2RyUtlfRmu5NU5mFJ90j6vO1BKnaJpMOSnui9tNhg+7y2h+pXF6L2LF9L83M22+dLekbS3RHxcdvzDMr2jZIORcRE27PU4GxJV0p6JCKWSvpM0hl3jqcLUU9LWnzK54skHWhplkrZPkczQW+MiCyXV14u6SbbezXzUmmF7afaHaky05KmI+LkM6otmon8jNKFqN+SdKnti3snJlZLer7lmQZm25p5bTYVEQ+2PU9VIuK+iFgUEaOa+bt6JSJubXmsSkTEh5L22V7S+9K4pDPuxGah637XKSKO214r6SVJQ5Iej4jJlseqwnJJt0n6m+2dva/9LCJebHEmzG2dpI29A8weSbe3PE/fWv+RFoBqdeHpN4AKETWQDFEDyRA1kAxRA8kQNZAMUQPJ/Bdvp7JVAtKZOAAAAABJRU5ErkJggg==\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPUAAAD4CAYAAAA0L6C7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAKiElEQVR4nO3d34tc9RnH8c+nq7L1F0oTiu6GrkJIkEKNDAFZEBrbEquYXvQiAYVqwZsqSguives/IFYogsRYwVRto6KI1Qoqraa1bmLSGldrGlKyjTYJEvxRbIg+vdgTiHZ1z5w5v/bp+wXBnZ1hzzPEd87MmbPn64gQgDy+1PUAAOpF1EAyRA0kQ9RAMkQNJHNKEz902bJlMTU11cSP7lTbnxTs3r27tW2Nj4+3tq3Vq1e3tq2s9u/fryNHjnih+xqJempqSjMzM0386E599NFHrW5vYmKitW21GdpLL73U2rayGgwGn3sfL7+BZIgaSIaogWSIGkiGqIFkiBpIhqiBZIgaSIaogWRKRW17ve03be+1fVvTQwGobtGobY9J+oWkKyRdJGmT7YuaHgxANWX21Gsl7Y2IfRFxTNJDkjY0OxaAqspEPSHpwEm354rvfYrtG2zP2J45fPhwXfMBGFKZqBf69a7/+R3EiLgnIgYRMVi+fPnokwGopEzUc5JWnHR7UtLBZsYBMKoyUb8iaaXtC2yfJmmjpCeaHQtAVYteJCEijtu+UdIzksYkbYmIPY1PBqCSUlc+iYinJD3V8CwAasAZZUAyRA0kQ9RAMkQNJEPUQDJEDSRD1EAyjazQgXq8++67rW1r+/btrW2rzdVbvmgli6zYUwPJEDWQDFEDyRA1kAxRA8kQNZAMUQPJEDWQDFEDyRA1kEyZFTq22D5k+7U2BgIwmjJ76l9KWt/wHABqsmjUEfF7Se39ZgGAkdT2nppld4B+qC1qlt0B+oGj30AyRA0kU+YjrQcl/VHSKttztn/Y/FgAqiqzltamNgYBUA9efgPJEDWQDFEDyRA1kAxRA8kQNZAMUQPJsOzOEB5//PGuR0jh/3EpnDaxpwaSIWogGaIGkiFqIBmiBpIhaiAZogaSIWogGaIGkiFqIJky1yhbYft527O299i+uY3BAFRT5tzv45J+EhE7bZ8laYftZyPi9YZnA1BBmWV33o6IncXX70ualTTR9GAAqhnqPbXtKUlrJL28wH0suwP0QOmobZ8p6RFJt0TEe5+9n2V3gH4oFbXtUzUf9NaIeLTZkQCMoszRb0u6V9JsRNzR/EgARlFmTz0t6VpJ62zvKv58t+G5AFRUZtmdFyW5hVkA1IAzyoBkiBpIhqiBZIgaSIaogWSIGkiGqIFkiBpIhrW0hnDXXXd1PUJjpqenux4BNWFPDSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSRD1EAyRA0kU+bCg+O2/2x7d7Hszs/aGAxANWVOE/2PpHUR8UFxqeAXbf82Iv7U8GwAKihz4cGQ9EFx89TiTzQ5FIDqyl7Mf8z2LkmHJD0bESy7A/RUqagj4uOIuFjSpKS1tr++wGNYdgfogaGOfkfEUUkvSFrfyDQARlbm6Pdy2+cUX39Z0rckvdH0YACqKXP0+zxJ99se0/w/Ar+OiCebHQtAVWWOfv9F82tSA1gCOKMMSIaogWSIGkiGqIFkiBpIhqiBZIgaSIaogWSW/LI7Dz/8cGvb2r59e2vbAqpiTw0kQ9RAMkQNJEPUQDJEDSRD1EAyRA0kQ9RAMkQNJEPUQDKloy4u6P+qbS46CPTYMHvqmyXNNjUIgHqUXXZnUtKVkjY3Ow6AUZXdU98p6VZJn3zeA1hLC+iHMit0XCXpUETs+KLHsZYW0A9l9tTTkq62vV/SQ5LW2X6g0akAVLZo1BFxe0RMRsSUpI2SnouIaxqfDEAlfE4NJDPU5Ywi4gXNL2ULoKfYUwPJEDWQDFEDyRA1kAxRA8kQNZAMUQPJLPlldzZs2NDatq6//vrWtiVJW7ZsaW1bEdHattAs9tRAMkQNJEPUQDJEDSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSRT6jTR4kqi70v6WNLxiBg0ORSA6oY59/ubEXGksUkA1IKX30AyZaMOSb+zvcP2DQs9gGV3gH4oG/V0RFwi6QpJP7J92WcfwLI7QD+UijoiDhb/PSTpMUlrmxwKQHVlFsg7w/ZZJ76W9B1JrzU9GIBqyhz9/qqkx2yfePyvIuLpRqcCUNmiUUfEPknfaGEWADXgIy0gGaIGkiFqIBmiBpIhaiAZogaSIWogGTex3MpgMIiZmZnaf27Xjh492ur2zj333Fa31xaW+BndYDDQzMyMF7qPPTWQDFEDyRA1kAxRA8kQNZAMUQPJEDWQDFEDyRA1kAxRA8mUitr2Oba32X7D9qztS5seDEA1ZZfd+bmkpyPi+7ZPk3R6gzMBGMGiUds+W9Jlkn4gSRFxTNKxZscCUFWZl98XSjos6T7br9reXFz/+1NYdgfohzJRnyLpEkl3R8QaSR9Kuu2zD2LZHaAfykQ9J2kuIl4ubm/TfOQAemjRqCPiHUkHbK8qvnW5pNcbnQpAZWWPft8kaWtx5HufpOuaGwnAKEpFHRG7JA0angVADTijDEiGqIFkiBpIhqiBZIgaSIaogWSIGkiGqIFkyp5RBknj4+Otbm/lypWtbeutt95qbVsHDx5sbVvnn39+a9vqC/bUQDJEDSRD1EAyRA0kQ9RAMkQNJEPUQDJEDSRD1EAyi0Zte5XtXSf9ec/2LW0MB2B4i54mGhFvSrpYkmyPSfqnpMcangtARcO+/L5c0t8j4h9NDANgdMNGvVHSgwvdwbI7QD+Ujrq45vfVkn6z0P0suwP0wzB76isk7YyIfzU1DIDRDRP1Jn3OS28A/VEqatunS/q2pEebHQfAqMouu/NvSV9peBYANeCMMiAZogaSIWogGaIGkiFqIBmiBpIhaiAZogaScUTU/0Ptw5KG/fXMZZKO1D5MP2R9bjyv7nwtIhb8zalGoq7C9kxEDLqeowlZnxvPq594+Q0kQ9RAMn2K+p6uB2hQ1ufG8+qh3rynBlCPPu2pAdSAqIFkehG17fW237S91/ZtXc9TB9srbD9ve9b2Hts3dz1TnWyP2X7V9pNdz1In2+fY3mb7jeLv7tKuZxpW5++piwUC/qb5yyXNSXpF0qaIeL3TwUZk+zxJ50XETttnSdoh6XtL/XmdYPvHkgaSzo6Iq7qepy6275f0h4jYXFxB9/SIONr1XMPow556raS9EbEvIo5JekjSho5nGllEvB0RO4uv35c0K2mi26nqYXtS0pWSNnc9S51sny3pMkn3SlJEHFtqQUv9iHpC0oGTbs8pyf/8J9iekrRG0svdTlKbOyXdKumTrgep2YWSDku6r3hrsdn2GV0PNaw+RO0FvpfmczbbZ0p6RNItEfFe1/OMyvZVkg5FxI6uZ2nAKZIukXR3RKyR9KGkJXeMpw9Rz0lacdLtSUkHO5qlVrZP1XzQWyMiy+WVpyVdbXu/5t8qrbP9QLcj1WZO0lxEnHhFtU3zkS8pfYj6FUkrbV9QHJjYKOmJjmcamW1r/r3ZbETc0fU8dYmI2yNiMiKmNP939VxEXNPxWLWIiHckHbC9qvjW5ZKW3IHNUtf9blJEHLd9o6RnJI1J2hIRezoeqw7Tkq6V9Ffbu4rv/TQinupwJizuJklbix3MPknXdTzP0Dr/SAtAvfrw8htAjYgaSIaogWSIGkiGqIFkiBpIhqiBZP4LawOntdDSjFoAAAAASUVORK5CYII=\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "for i in [15,29,99]: #해상도 수치\n", " show_data(X,Y,i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2. Data pre-processing: 데이터 전처리" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Min-Max scaling \n", "X_min = X.min()\n", "X_max = X.max()\n", "X_range = X_max - X_min\n", "X = (X - X_min)/X_range #x변수가 0~1수치가 되도록하는 공식(0~1사이로 정규화)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1234)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.3. Classification with Tree (optimized hyperparameters):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NOTE: 시간 상 일부 하이퍼 파라미터만 최적화" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "depth_grid = np.arange(1,21) #depth로 리스트 만들어 grid\n", "min_samples_leaf_grid = np.arange(2,31,2) #min_samples_leaf로 리스트 만들어 grid\n", "max_leaf_nodes_grid = np.arange(2,51,2) #max_leaf_nodes로 리스트 만들어 grid\n", "parameters = {'max_depth':depth_grid, 'min_samples_leaf':min_samples_leaf_grid, 'max_leaf_nodes':max_leaf_nodes_grid} #parameters 딕셔너리" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "gridCV = GridSearchCV(DecisionTreeClassifier(), parameters, cv=10, n_jobs = -1) #GridSearchCV로 최적의 하이퍼 파라미터 조합 만듬\n", "\n", "gridCV.fit(X_train, Y_train)\n", "\n", "best_depth = gridCV.best_params_['max_depth'] #딕셔너리에서 max_depth의 베스트 파라미터 뽑음\n", "best_min_samples_leaf = gridCV.best_params_['min_samples_leaf'] #딕셔너리에서 min_samples_leaf의 베스트 파라미터 뽑음\n", "best_max_leaf_nodes = gridCV.best_params_['max_leaf_nodes'] #딕셔너리에서 max_leaf_nodes의 베스트 파라미터 뽑음" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tree best max_depth : 18\n", "Tree best min_samples_leaf : 4\n", "Tree best max_leaf_nodes : 46\n" ] } ], "source": [ "print(\"Tree best max_depth : \" + str(best_depth))\n", "print(\"Tree best min_samples_leaf : \" + str(best_min_samples_leaf))\n", "print(\"Tree best max_leaf_nodes : \" + str(best_max_leaf_nodes))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tree best accuracy : 0.852\n" ] } ], "source": [ "DTC_best = DecisionTreeClassifier(max_depth=best_depth,min_samples_leaf=best_min_samples_leaf,max_leaf_nodes=best_max_leaf_nodes)\n", "DTC_best.fit(X_train, Y_train)\n", "Y_pred = DTC_best.predict(X_test)\n", "print( \"Tree best accuracy : \" + str(np.round(metrics.accuracy_score(Y_test,Y_pred),3)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.4. Classification with Random Forest (optimized hyperparameters):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NOTE: 시간 상 일부 하이퍼 파라미터만 최적화" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "n_estimators_grid = np.arange(20, 50,2)\n", "depth_grid = np.arange(1, 10)\n", "min_samples_leaf_grid = np.arange(10,21,2)\n", "parameters = {'n_estimators': n_estimators_grid, 'max_depth': depth_grid, 'min_samples_leaf':min_samples_leaf_grid}\n", "\n", "gridCV = GridSearchCV(RandomForestClassifier(), param_grid=parameters, cv=10, n_jobs=-1)\n", "\n", "gridCV.fit(X_train, Y_train)\n", "\n", "best_n_estim = gridCV.best_params_['n_estimators']\n", "best_depth = gridCV.best_params_['max_depth']\n", "best_min_samples_leaf = gridCV.best_params_['min_samples_leaf']" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Random Forest best n_estimator : 32\n", "Random Forest best max_depth : 8\n", "Random Forest best min_samples_leaf : 10\n" ] } ], "source": [ "print(\"Random Forest best n_estimator : \" + str(best_n_estim))\n", "print(\"Random Forest best max_depth : \" + str(best_depth))\n", "print(\"Random Forest best min_samples_leaf : \" + str(best_min_samples_leaf))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Random Forest best accuracy : 0.935\n" ] } ], "source": [ "RF_best = RandomForestClassifier(n_estimators=30,max_depth=best_depth,min_samples_leaf=best_min_samples_leaf,random_state=123)\n", "RF_best.fit(X_train, Y_train)\n", "Y_pred = RF_best.predict(X_test)\n", "print( \"Random Forest best accuracy : \" + str(np.round(metrics.accuracy_score(Y_test,Y_pred),3)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.5. Classification with AdaBoost (optimized hyperparameters):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NOTE: 시간 상 일부 하이퍼 파라미터만 최적화" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "my_max_depth = 9 # Fixed.\n", "my_learn_rate = 0.01 # Fixed.\n", "n_estimators_grid = np.arange(50, 81, 2)\n", "parameters = {'n_estimators': n_estimators_grid}\n", "\n", "AB = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=my_max_depth), learning_rate=my_learn_rate) # Instantiate an estimator.\n", "gridCV = GridSearchCV(AB, param_grid=parameters, cv=10, n_jobs = -1)\n", "\n", "gridCV.fit(X_train, Y_train)\n", "\n", "best_n_estim = gridCV.best_params_['n_estimators']" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AdaBoost best n estimator : 62\n" ] } ], "source": [ "print(\"AdaBoost best n estimator : \" + str(best_n_estim))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AdaBoost best accuracy : 0.931\n" ] } ], "source": [ "AB_best = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=my_max_depth),n_estimators=best_n_estim,learning_rate=my_learn_rate,random_state=123)\n", "AB_best.fit(X_train, Y_train)\n", "Y_pred = AB_best.predict(X_test)\n", "print( \"AdaBoost best accuracy : \" + str(np.round(metrics.accuracy_score(Y_test,Y_pred),3)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "랜덤포레스트가 성능이 가장 좋음<br>\n", "셋 중에 랜덤포레스트 추천!!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }