{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.4 분류 예측의 불확실성 추정\n",
    "- 불확실성 추정\n",
    "    - 분류모델이 예측한 분류 클래스가 얼마나 정확한지 판단\n",
    "    - scikit-learn 에서 제공하는 불확실성 추정 함수\n",
    "        - decision_function\n",
    "        - predict_prob\n",
    "        \n",
    "### 2.4.3 다중 분류에서의 불확실성"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import sys \n",
    "sys.path.append('..')\n",
    "from preamble import *\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GradientBoostingClassifier(criterion='friedman_mse', init=None,\n",
       "              learning_rate=0.01, loss='deviance', max_depth=3,\n",
       "              max_features=None, max_leaf_nodes=None,\n",
       "              min_impurity_split=1e-07, min_samples_leaf=1,\n",
       "              min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
       "              n_estimators=100, presort='auto', random_state=0,\n",
       "              subsample=1.0, verbose=0, warm_start=False)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.ensemble import GradientBoostingClassifier\n",
    "from sklearn.datasets import load_iris\n",
    "\n",
    "iris = load_iris()\n",
    "X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)\n",
    "gbrt = GradientBoostingClassifier(learning_rate=0.01, random_state=0)\n",
    "gbrt.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "결정 함수의 결과 형태: (38, 3)\n",
      "결정 함수 결과: \n",
      "[[-0.529  1.466 -0.504]\n",
      " [ 1.512 -0.496 -0.503]\n",
      " [-0.524 -0.468  1.52 ]\n",
      " [-0.529  1.466 -0.504]\n",
      " [-0.531  1.282  0.215]\n",
      " [ 1.512 -0.496 -0.503]]\n"
     ]
    }
   ],
   "source": [
    "print(\"결정 함수의 결과 형태: {}\".format(gbrt.decision_function(X_test).shape))\n",
    "print(\"결정 함수 결과: \\n{}\".format(gbrt.decision_function(X_test)[:6, :]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 각 데이터의 포인트 마다 가장 큰값을 찾아 예측 결과를 재현"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "가장 큰 결정 함수의 인덱스: \n",
      "[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1\n",
      " 0]\n",
      "예측:\n",
      "[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1\n",
      " 0]\n"
     ]
    }
   ],
   "source": [
    "print(\"가장 큰 결정 함수의 인덱스: \\n{}\".format(np.argmax(gbrt.decision_function(X_test), axis=1)))\n",
    "print(\"예측:\\n{}\".format(gbrt.predict(X_test)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "예측 확률: \n",
      "[[ 0.107  0.784  0.109]\n",
      " [ 0.789  0.106  0.105]\n",
      " [ 0.102  0.108  0.789]\n",
      " [ 0.107  0.784  0.109]\n",
      " [ 0.108  0.663  0.228]\n",
      " [ 0.789  0.106  0.105]]\n",
      "합: [ 1.  1.  1.  1.  1.  1.]\n"
     ]
    }
   ],
   "source": [
    "print(\"예측 확률: \\n{}\".format(gbrt.predict_proba(X_test)[:6]))\n",
    "print(\"합: {}\".format(gbrt.predict_proba(X_test)[:6].sum(axis=1)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "가장 큰 결정 함수의 인덱스: \n",
      "[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1\n",
      " 0]\n",
      "예측:\n",
      "[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1\n",
      " 0]\n"
     ]
    }
   ],
   "source": [
    "print(\"가장 큰 결정 함수의 인덱스: \\n{}\".format(np.argmax(gbrt.predict_proba(X_test), axis=1)))\n",
    "print(\"예측:\\n{}\".format(gbrt.predict(X_test)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
       "          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
       "          verbose=0, warm_start=False)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn import linear_model\n",
    "logreg = linear_model.LogisticRegression()\n",
    "\n",
    "named_target = iris.target_names[y_train]\n",
    "logreg.fit(X_train, named_target)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "훈련 데이터에 있는 클래스 종류: ['setosa' 'versicolor' 'virginica']\n",
      "예측: ['versicolor' 'setosa' 'virginica' 'versicolor' 'versicolor' 'setosa'\n",
      " 'versicolor' 'virginica' 'versicolor' 'versicolor']\n",
      "가장 큰 결정 함수의 인덱스: [1 0 2 1 1 0 1 2 1 1]\n"
     ]
    }
   ],
   "source": [
    "print(\"훈련 데이터에 있는 클래스 종류: {}\".format(logreg.classes_))\n",
    "print(\"예측: {}\".format(logreg.predict(X_test)[:10]))\n",
    "argmax_dec_func = np.argmax(logreg.decision_function(X_test), axis=1)\n",
    "print(\"가장 큰 결정 함수의 인덱스: {}\".format(argmax_dec_func[:10]))\n",
    "print(\"인덱스를 classes_에 여\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}