{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# %load /Users/facai/Study/book_notes/preconfig.py\n",
    "%matplotlib inline\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "from IPython.display import SVG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "决策树在 sklearn 中的实现简介\n",
    "============================"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 0. 预前\n",
    "本文简单分析 [scikit-learn/scikit-learn](https://github.com/scikit-learn/scikit-learn) 中决策树涉及的代码模块关系。\n",
    "\n",
    "分析的代码版本信息是：\n",
    "```shell\n",
    "~/W/s/sklearn ❯❯❯ git log -n 1                                                                                                                                                         study/analyses_decision_tree\n",
    "commit d161bfaa1a42da75f4940464f7f1c524ef53484f\n",
    "Author: John B Nelson <jnelso11@gmu.edu>\n",
    "Date:   Thu May 26 18:36:37 2016 -0400\n",
    "\n",
    "    Add missing double quote (#6831)\n",
    "```\n",
    "\n",
    "本文假设读者已经了解决策树的其本概念，阅读 [sklearn - Decision Trees](http://scikit-learn.org/stable/modules/tree.html) 有助于快速了解。 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 总纲\n",
    "\n",
    "决策树的代码位于 `scikit-learn/sklearn/tree` 目录下，各文件大意说明如下：\n",
    "\n",
    "```\n",
    "tree\n",
    "+-- __init__.py\n",
    "+-- setup.py            \n",
    "+-- tree.py             主文件\n",
    "+-- export.py           导出树模型\n",
    "+-- _tree.*             组建树的类\n",
    "+-- _splitter.*         分割方法\n",
    "+-- _criterion.*        不纯度评价\n",
    "+-- _utils.*            辅助数据结构：栈和最小堆 \n",
    "+-- tests/\n",
    "    +-- __init__.py\n",
    "    +-- test_tree.py\n",
    "    +-- test_export.py\n",
    "```\n",
    "\n",
    "类之间的大致关系如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg height=\"416\" version=\"1.1\" width=\"1172\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs/><g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"159.8076171875\" x=\"175\" y=\"391\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"159.8076171875\" x=\"168\" y=\"384\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 168 384 L 327.8076171875 384 L 327.8076171875 456 L 168 456 L 168 384 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 168 409 L 327.8076171875 409\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 168 419 L 327.8076171875 419\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"182.90380859375\" y=\"403.5\">DecisionTreeClassifier</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"173\" y=\"436.5\">+predict_proba()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"173\" y=\"451.5\">+predict_log_proba()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"127.40283203125\" x=\"351\" y=\"231\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"127.40283203125\" x=\"344\" y=\"224\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 344 224 L 471.40283203125 224 L 471.40283203125 296 L 344 296 L 344 224 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 344 249 L 471.40283203125 249\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 344 259 L 471.40283203125 259\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"354.701416015625\" y=\"243.5\">BaseDecisionTree</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"349\" y=\"276.5\">+fit()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"349\" y=\"291.5\">+predict()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 284 383 L 370 297\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 361.580964487968 317.3253497152483 L 370 297 L 349.6746502847517 305.419035512032\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 361.580964487968 317.3253497152483 L 370 297 L 349.6746502847517 305.419035512032 L 361.580964487968 317.3253497152483\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"44\" opacity=\"0.2\" stroke=\"none\" width=\"153\" x=\"471\" y=\"407\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#ffffff\" height=\"44\" stroke=\"none\" width=\"153\" x=\"464\" y=\"400\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 464 400 L 617 400 L 617 444 L 464 444 L 464 400 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 464 425 L 617 425\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 464 435 L 617 435\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"473\" y=\"419.5\">DecisionTreeRegressor</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 521 399 L 437 297\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 456.4198802973928 307.33771005759667 L 437 297 L 443.4220641644505 318.04179393178447\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 456.4198802973928 307.33771005759667 L 437 297 L 443.4220641644505 318.04179393178447 L 456.4198802973928 307.33771005759667\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"44\" opacity=\"0.2\" stroke=\"none\" width=\"126\" x=\"191\" y=\"543\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#ffffff\" height=\"44\" stroke=\"none\" width=\"126\" x=\"184\" y=\"536\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 184 536 L 310 536 L 310 580 L 184 580 L 184 536 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 184 561 L 310 561\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 184 571 L 310 571\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"191.5\" y=\"555.5\">ExtraTreeClassifier</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"44\" opacity=\"0.2\" stroke=\"none\" width=\"131\" x=\"479\" y=\"543\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#ffffff\" height=\"44\" stroke=\"none\" width=\"131\" x=\"472\" y=\"536\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 472 536 L 603 536 L 603 580 L 472 580 L 472 536 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 472 561 L 603 561\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 472 571 L 603 571\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"479.5\" y=\"555.5\">ExtraTreeRegressor</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 247 535 L 247 457\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 255.41903551203197 477.3253497152483 L 247 457 L 238.58096448796803 477.3253497152483\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 255.41903551203197 477.3253497152483 L 247 457 L 238.58096448796803 477.3253497152483 L 255.41903551203197 477.3253497152483\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 538 535 L 539 445\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 547.1926925868167 465.4176342453752 L 539 445 L 530.3556608536235 465.2305561150064\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 547.1926925868167 465.4176342453752 L 539 445 L 530.3556608536235 465.2305561150064 L 547.1926925868167 465.4176342453752\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"145\" opacity=\"0.2\" stroke=\"none\" width=\"125\" x=\"679\" y=\"191\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#ffffff\" height=\"145\" stroke=\"none\" width=\"125\" x=\"672\" y=\"184\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 672 184 L 797 184 L 797 329 L 672 329 L 672 184 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 672 209 L 797 209\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 672 292 L 797 292\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"701.5\" y=\"203.5\">TreeBuilder</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"677\" y=\"226.5\">+splitter</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"677\" y=\"241.5\">+min_samples_split</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"677\" y=\"256.5\">+min_samples_leaf</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"677\" y=\"271.5\">+min_weight_leaf</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"677\" y=\"286.5\">+max_depth</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"677\" y=\"309.5\">+build()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"677\" y=\"324.5\">+_check_input()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"102\" opacity=\"0.2\" stroke=\"none\" width=\"106\" x=\"919\" y=\"215\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#ffffff\" height=\"102\" stroke=\"none\" width=\"106\" x=\"912\" y=\"208\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 912 208 L 1018 208 L 1018 310 L 912 310 L 912 208 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 912 233 L 1018 233\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 912 243 L 1018 243\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"italic\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"944.5\" y=\"227.5\">Splitter</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"917\" y=\"260.5\">+node_impurity()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"917\" y=\"275.5\">+node_reset()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"917\" y=\"290.5\">+node_split()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"917\" y=\"305.5\">+node_value()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"192\" x=\"1135\" y=\"231\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"192\" x=\"1128\" y=\"224\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 1128 224 L 1320 224 L 1320 296 L 1128 296 L 1128 224 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 1128 249 L 1320 249\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 1128 259 L 1320 259\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"1200\" y=\"243.5\">Criterion</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1133\" y=\"276.5\">+proxy_impurity_improvement()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1133\" y=\"291.5\">+impurity_improvement()</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 1019 259 L 1127 260\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 1116.7987854116082 264.1152425106646 L 1127 260 L 1116.876736102691 255.69656787372708 L 1105.0009430120667 259.7963050278895\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 1116.7987854116082 264.1152425106646 L 1127 260 L 1116.876736102691 255.69656787372708 L 1105.0009430120667 259.7963050278895 L 1116.7987854116082 264.1152425106646\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1058\" y=\"251\">+1..1</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 798 257 L 911 258\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 900.8004721513548 262.1194212778873 L 911 258 L 900.8749739731637 253.70071541348017 L 889.0008614107562 257.8053173576173\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 900.8004721513548 262.1194212778873 L 911 258 L 900.8749739731637 253.70071541348017 L 889.0008614107562 257.8053173576173 L 900.8004721513548 262.1194212778873\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"839\" y=\"249\">+1..1</text></g></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 472 259 L 671 257\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 660.8801429315722 261.3114374534014 L 671 257 L 660.7955337823144 252.8928271022379 L 649.0011109988897 257.22109436182024\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><path d=\"M 660.8801429315722 261.3114374534014 L 671 257 L 660.7955337823144 252.8928271022379 L 649.0011109988897 257.22109436182024 L 660.8801429315722 261.3114374534014\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-158,-174) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"555\" y=\"250\">+1..1</text></g></g></g></svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SVG(\"./res/uml/Model__tree_0.svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Tree.py` 下定义了 `BaseDecisionTree` 基类，实现了完整的分类和回归功能，衍生出的子类主要用于封装初始化参数。两种子类的区别在于：`DecisionTree*` 类会遍历特征和值，从而找到最佳分割点，而 `ExtraTree*` 类会随机抽取特征和值，来寻找分割点。\n",
    "\n",
    "下面是基类的训练方法 `fit` 流程：\n",
    "\n",
    "1. 检查参数。\n",
    "2. 设置评价函数。\n",
    "3. 创建分割方法：根据数据是否稀疏阵，生成相应类。\n",
    "4. 创建树：根据叶子数决定用深度优先还是评价优先。\n",
    "5. 调用树的构建方法：生成决策树。\n",
    "\n",
    "代码如下，细节已经折叠："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "  72 class BaseDecisionTree(six.with_metaclass(ABCMeta, BaseEstimator,\n",
    "  73                                           _LearntSelectorMixin)):\n",
    "  74     \"\"\"Base class for decision trees.\n",
    "  75 #+--  3 lines: Warning: This class should not be used directly.-------------------\n",
    "  78     \"\"\"\n",
    "  79\n",
    "  80     @abstractmethod\n",
    "  81     def __init__(self,\n",
    "  82 #+-- 30 lines: criterion,---------------------------------------------------------\n",
    " 112\n",
    " 113     def fit(self, X, y, sample_weight=None, check_input=True,\n",
    " 114             X_idx_sorted=None):\n",
    " 115         \"\"\"Build a decision tree from the training set (X, y).\n",
    " 116 #+-- 34 lines: Parameters---------------------------------------------------------\n",
    " 150         \"\"\"\n",
    " 151\n",
    " 152 #+--180 lines: random_state = check_random_state(self.random_state)---------------\n",
    " 332\n",
    " 333         # Build tree\n",
    " 334         criterion = self.criterion\n",
    " 335 #+--  6 lines: if not isinstance(criterion, Criterion):---------------------------\n",
    " 341\n",
    " 342         SPLITTERS = SPARSE_SPLITTERS if issparse(X) else DENSE_SPLITTERS\n",
    " 343 #+--  9 lines: splitter = self.splitter-------------------------------------------\n",
    " 352\n",
    " 353         self.tree_ = Tree(self.n_features_, self.n_classes_, self.n_outputs_)\n",
    " 354\n",
    " 355         # Use BestFirst if max_leaf_nodes given; use DepthFirst otherwise\n",
    " 356         if max_leaf_nodes < 0:\n",
    " 357             builder = DepthFirstTreeBuilder(splitter, min_samples_split,\n",
    " 358                                             min_samples_leaf,\n",
    " 359                                             min_weight_leaf,\n",
    " 360                                             max_depth)\n",
    " 361         else:\n",
    " 362             builder = BestFirstTreeBuilder(splitter, min_samples_split,\n",
    " 363                                            min_samples_leaf,\n",
    " 364                                            min_weight_leaf,\n",
    " 365                                            max_depth,\n",
    " 366                                            max_leaf_nodes)\n",
    " 367\n",
    " 368         builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)\n",
    " 369\n",
    " 370 #+--  3 lines: if self.n_outputs_ == 1:-------------------------------------------\n",
    " 373\n",
    " 374         return self\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "预测方法 `predict` 代码非常简单，调用 `tree_.predict()` 方法取得预测值：如果是分类问题，输出预测值最大的类；如果是回归问题，直接输出。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```Python\n",
    " 398     def predict(self, X, check_input=True):\n",
    " 399         \"\"\"Predict class or regression value for X.\n",
    " 400 +-- 20 lines: For a classification model, the predicted class for each sample in X\n",
    " 420         \"\"\"\n",
    " 421\n",
    " 422         X = self._validate_X_predict(X, check_input)\n",
    " 423         proba = self.tree_.predict(X)\n",
    " 424         n_samples = X.shape[0]\n",
    " 425\n",
    " 426         # Classification\n",
    " 427         if isinstance(self, ClassifierMixin):\n",
    " 428             if self.n_outputs_ == 1:\n",
    " 429                 return self.classes_.take(np.argmax(proba, axis=1), axis=0)        \n",
    " 430\n",
    " 431 +---  9 lines: else:--------------------------------------------------------------\n",
    " 440\n",
    " 441         # Regression\n",
    " 442         else:\n",
    " 443             if self.n_outputs_ == 1:\n",
    " 444                 return proba[:, 0]\n",
    " 445 +---  3 lines: else:--------------------------------------------------------------\n",
    " ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "sklearn 的决策树是 CART(Classification and Regression Trees) 算法，分类问题会转换成预测概率的回归问题，所以两类问题的处理方法是相同的，主要区别在评价函数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2 模块简介"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2.0 评价函数\n",
    "`_criterion.*` 是评价函数相关的文件，用 Cython 实现，*.pxd 和 *.pyx 文件分别对等 C 语言中的 *.h 和 *.c 文件。\n",
    "\n",
    "下面是类的关系图："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg height=\"668\" version=\"1.1\" width=\"590.58251953125\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs/><g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"210.58251953125\" x=\"391\" y=\"117\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"210.58251953125\" x=\"384\" y=\"110\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 384 110 L 594.58251953125 110 L 594.58251953125 182 L 384 182 L 384 110 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 384 135 L 594.58251953125 135\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 384 145 L 594.58251953125 145\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"465.291259765625\" y=\"129.5\">Criterion</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"389\" y=\"162.5\">+proxy_impurity_improvement()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"389\" y=\"177.5\">+impurity_improvement()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"117\" opacity=\"0.2\" stroke=\"none\" width=\"158.83642578125\" x=\"359\" y=\"285\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#ffffff\" height=\"117\" stroke=\"none\" width=\"158.83642578125\" x=\"352\" y=\"278\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 352 278 L 510.83642578125 278 L 510.83642578125 395 L 352 395 L 352 278 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 352 303 L 510.83642578125 303\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 352 313 L 510.83642578125 313\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"368.418212890625\" y=\"297.5\">ClassificationCriterion</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"357\" y=\"330.5\">+init()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"357\" y=\"345.5\">+reset()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"357\" y=\"360.5\">+reverse_reset()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"357\" y=\"375.5\">+update()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"357\" y=\"390.5\">+node_value()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"117\" opacity=\"0.2\" stroke=\"none\" width=\"142.7197265625\" x=\"655\" y=\"285\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#ffffff\" height=\"117\" stroke=\"none\" width=\"142.7197265625\" x=\"648\" y=\"278\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 648 278 L 790.7197265625 278 L 790.7197265625 395 L 648 395 L 648 278 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 648 303 L 790.7197265625 303\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 648 313 L 790.7197265625 313\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"662.85986328125\" y=\"297.5\">RegressionCriterion</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"653\" y=\"330.5\">+init()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"653\" y=\"345.5\">+reset()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"653\" y=\"360.5\">+reverse_reset()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"653\" y=\"375.5\">+update()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"653\" y=\"390.5\">+node_value()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 449 277 L 478 183\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 480.052969565859 204.90400228181272 L 478 183 L 463.9632001529931 199.94013725018388\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 480.052969565859 204.90400228181272 L 478 183 L 463.9632001529931 199.94013725018388 L 480.052969565859 204.90400228181272\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 648 277 L 534 183\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 555.0378427766591 189.43499582785995 L 534 183 L 544.3257721021251 202.42623047569893\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 555.0378427766591 189.43499582785995 L 534 183 L 544.3257721021251 202.42623047569893 L 555.0378427766591 189.43499582785995\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"135.22314453125\" x=\"263\" y=\"509\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"135.22314453125\" x=\"256\" y=\"502\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 256 502 L 391.22314453125 502 L 391.22314453125 574 L 256 574 L 256 502 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 256 527 L 391.22314453125 527\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 256 537 L 391.22314453125 537\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"301.111572265625\" y=\"521.5\">Entropy</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"261\" y=\"554.5\">+node_impurity()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"261\" y=\"569.5\">+children_impurity()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"135.22314453125\" x=\"447\" y=\"509\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"135.22314453125\" x=\"440\" y=\"502\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 440 502 L 575.22314453125 502 L 575.22314453125 574 L 440 574 L 440 502 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 440 527 L 575.22314453125 527\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 440 537 L 575.22314453125 537\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"496.111572265625\" y=\"521.5\">Gini</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"445\" y=\"554.5\">+node_impurity()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"445\" y=\"569.5\">+children_impurity()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"87\" opacity=\"0.2\" stroke=\"none\" width=\"210.58251953125\" x=\"623\" y=\"509\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#ffffff\" height=\"87\" stroke=\"none\" width=\"210.58251953125\" x=\"616\" y=\"502\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 616 502 L 826.58251953125 502 L 826.58251953125 589 L 616 589 L 616 502 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 616 527 L 826.58251953125 527\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 616 537 L 826.58251953125 537\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"706.791259765625\" y=\"521.5\">MSE</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"621\" y=\"554.5\">+node_impurity()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"621\" y=\"569.5\">+children_impurity()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"621\" y=\"584.5\">+proxy_impurity_improvement()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"210.58251953125\" x=\"623\" y=\"693\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"210.58251953125\" x=\"616\" y=\"686\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 616 686 L 826.58251953125 686 L 826.58251953125 758 L 616 758 L 616 686 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 616 711 L 826.58251953125 711\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 616 721 L 826.58251953125 721\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"679.791259765625\" y=\"705.5\">FriedmanMSE</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"621\" y=\"738.5\">+proxy_impurity_improvement()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"621\" y=\"753.5\">+impurity_improvement()</text></g></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 343 501 L 399 396\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 396.86369029167605 417.8960311661753 L 399 396 L 382.00656879985496 409.97223303720403\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 396.86369029167605 417.8960311661753 L 399 396 L 382.00656879985496 409.97223303720403 L 396.86369029167605 417.8960311661753\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 493 501 L 454 396\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 468.96922963525884 412.12210172796614 L 454 396 L 453.18479632540596 417.98489124305434\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 468.96922963525884 412.12210172796614 L 454 396 L 453.18479632540596 417.98489124305434 L 468.96922963525884 412.12210172796614\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 721 501 L 720 396\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 728.6122197032238 416.244250338884 L 720 396 L 711.774912258555 416.4046056478808\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 728.6122197032238 416.244250338884 L 720 396 L 711.774912258555 416.4046056478808 L 728.6122197032238 416.244250338884\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 721 685 L 721 590\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 729.419035512032 610.3253497152483 L 721 590 L 712.580964487968 610.3253497152483\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-246,-100) scale(1,1)\"><path d=\"M 729.419035512032 610.3253497152483 L 721 590 L 712.580964487968 610.3253497152483 L 729.419035512032 610.3253497152483\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g></g></svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SVG(\"./res/uml/Model___criterion_1.svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "+ 对于分类问题，sklearn 提供了 Gini 和 Entropy 两种评价函数；\n",
    "  - 默认会用 Gini 。\n",
    "  - [Decision Trees: “Gini” vs. “Entropy” criteria](https://www.garysieling.com/blog/sklearn-gini-vs-entropy-criteria)\n",
    "\n",
    "+ 对于回归问题，则提供了 MSE（均方差）和 FriedmanMSE。\n",
    "  - 默认会用 MSE 。\n",
    "  - FriedmanMSE 用于 gradient boosting。\n",
    "  \n",
    "在实际使用中，我们应该都测试下同的评价函数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2.1 分割方法\n",
    "`_splitter.*` 是分割方法相关的文件。\n",
    "\n",
    "下面是类的关系图："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg height=\"676\" version=\"1.1\" width=\"822.544921875\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs/><g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"102\" opacity=\"0.2\" stroke=\"none\" width=\"115.64697265625\" x=\"551\" y=\"231\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#ffffff\" height=\"102\" stroke=\"none\" width=\"115.64697265625\" x=\"544\" y=\"224\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 544 224 L 659.64697265625 224 L 659.64697265625 326 L 544 326 L 544 224 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 544 249 L 659.64697265625 249\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 544 259 L 659.64697265625 259\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"italic\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"581.323486328125\" y=\"243.5\">Splitter</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"549\" y=\"276.5\">+node_impurity()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"549\" y=\"291.5\">+node_reset()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"549\" y=\"306.5\">+node_split()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"549\" y=\"321.5\">+node_value()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"57\" opacity=\"0.2\" stroke=\"none\" width=\"129.78955078125\" x=\"351\" y=\"455\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#ffffff\" height=\"57\" stroke=\"none\" width=\"129.78955078125\" x=\"344\" y=\"448\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 344 448 L 473.78955078125 448 L 473.78955078125 505 L 344 505 L 344 448 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 344 473 L 473.78955078125 473\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 344 483 L 473.78955078125 483\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"354.894775390625\" y=\"467.5\">BaseDenseSplitter</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"349\" y=\"500.5\">+init()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"57\" opacity=\"0.2\" stroke=\"none\" width=\"89.9580078125\" x=\"247\" y=\"583\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#ffffff\" height=\"57\" stroke=\"none\" width=\"89.9580078125\" x=\"240\" y=\"576\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 240 576 L 329.9580078125 576 L 329.9580078125 633 L 240 633 L 240 576 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 240 601 L 329.9580078125 601\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 240 611 L 329.9580078125 611\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"250.97900390625\" y=\"595.5\">BestSplitter</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"245\" y=\"628.5\">+node_split()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"57\" opacity=\"0.2\" stroke=\"none\" width=\"112.23193359375\" x=\"455\" y=\"583\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#ffffff\" height=\"57\" stroke=\"none\" width=\"112.23193359375\" x=\"448\" y=\"576\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 448 576 L 560.23193359375 576 L 560.23193359375 633 L 448 633 L 448 576 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 448 601 L 560.23193359375 601\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 448 611 L 560.23193359375 611\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"459.615966796875\" y=\"595.5\">RandomSplitter</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"453\" y=\"628.5\">+node_split()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 436 447 L 551 327\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 543.0152258605152 347.4998385833014 L 551 327 L 530.8583371842452 335.8494869352094\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 543.0152258605152 347.4998385833014 L 551 327 L 530.8583371842452 335.8494869352094 L 543.0152258605152 347.4998385833014\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 312 575 L 379 506\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 370.8807445893692 526.4469482216037 L 379 506 L 358.80064369889124 514.7169951830236\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 370.8807445893692 526.4469482216037 L 379 506 L 358.80064369889124 514.7169951830236 L 370.8807445893692 526.4469482216037\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 482 575 L 431 506\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 449.85159367707973 517.3409618566632 L 431 506 L 436.31081869121203 527.349360759261\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 449.85159367707973 517.3409618566632 L 431 506 L 436.31081869121203 527.349360759261 L 449.85159367707973 517.3409618566632\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"132.9189453125\" x=\"727\" y=\"439\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"132.9189453125\" x=\"720\" y=\"432\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 720 432 L 852.9189453125 432 L 852.9189453125 504 L 720 504 L 720 432 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 720 457 L 852.9189453125 457\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 720 467 L 852.9189453125 467\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"730.45947265625\" y=\"451.5\">BaseSparseSplitter</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"725\" y=\"484.5\">+init()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"725\" y=\"499.5\">+extract_nnz()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"57\" opacity=\"0.2\" stroke=\"none\" width=\"130.544921875\" x=\"919\" y=\"583\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#ffffff\" height=\"57\" stroke=\"none\" width=\"130.544921875\" x=\"912\" y=\"576\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 912 576 L 1042.544921875 576 L 1042.544921875 633 L 912 633 L 912 576 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 912 601 L 1042.544921875 601\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 912 611 L 1042.544921875 611\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"922.7724609375\" y=\"595.5\">BestSparseSplitter</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"917\" y=\"628.5\">+node_split()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"57\" opacity=\"0.2\" stroke=\"none\" width=\"156.79248046875\" x=\"631\" y=\"583\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#ffffff\" height=\"57\" stroke=\"none\" width=\"156.79248046875\" x=\"624\" y=\"576\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 624 576 L 780.79248046875 576 L 780.79248046875 633 L 624 633 L 624 576 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 624 601 L 780.79248046875 601\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 624 611 L 780.79248046875 611\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"637.396240234375\" y=\"595.5\">RandomSparseSplitter</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"629\" y=\"628.5\">+node_split()</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 936 575 L 838 505\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 859.4328823711502 509.9630185637813 L 838 505 L 849.6459523834094 523.6647205466185\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 859.4328823711502 509.9630185637813 L 838 505 L 849.6459523834094 523.6647205466185 L 859.4328823711502 509.9630185637813\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 720 575 L 763 505\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 759.535001938477 526.725418026672 L 763 505 L 745.1876859631267 517.912066784671\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 759.535001938477 526.725418026672 L 763 505 L 745.1876859631267 517.912066784671 L 759.535001938477 526.725418026672\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 751 431 L 651 327\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 671.1563962201599 335.81587723462184 L 651 327 L 659.0189578768487 347.4864910262672\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 671.1563962201599 335.81587723462184 L 651 327 L 659.0189578768487 347.4864910262672 L 671.1563962201599 335.81587723462184\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 488 816 L 725 816 L 735 826 L 735 863 L 488 863 L 488 816\" fill=\"#ffffff\" stroke=\"none\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 488 816 L 725 816 L 735 826 L 735 863 L 488 863 L 488 816 L 488 816\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 725 816 L 725 826 L 735 826 L 725 816\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"493\" y=\"834\">Best:</text><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"493\" y=\"847\">遍历一个特征的值以确定最佳阈值；</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 424 672 L 809 672 L 819 682 L 819 710 L 424 710 L 424 672\" fill=\"#ffffff\" stroke=\"none\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 424 672 L 809 672 L 819 682 L 819 710 L 424 710 L 424 672 L 424 672\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 809 672 L 809 682 L 819 682 L 809 672\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"429\" y=\"690\">Random:</text><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"429\" y=\"703\">在一个特征的最大值和最小值之间随机抽样一个值作为最佳阈值。</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 545 634 L 595 671\" fill=\"none\" stroke=\"#000000\" stroke-dasharray=\"3\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 674 634 L 640 671\" fill=\"none\" stroke=\"#000000\" stroke-dasharray=\"3\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 326 634 L 577 815\" fill=\"none\" stroke=\"#000000\" stroke-dasharray=\"3\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 931 634 L 651 815\" fill=\"none\" stroke=\"#000000\" stroke-dasharray=\"3\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"135\" opacity=\"0.2\" stroke=\"none\" width=\"105.6240234375\" x=\"871\" y=\"215\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><rect fill=\"#ffffff\" height=\"135\" stroke=\"none\" width=\"105.6240234375\" x=\"864\" y=\"208\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 864 208 L 969.6240234375 208 L 969.6240234375 343 L 864 343 L 864 208 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 864 246 L 969.6240234375 246\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"882.81201171875\" y=\"225.5\">«dataType»</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"883.31201171875\" y=\"240.5\">SplitRecord</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"869\" y=\"263.5\">+feature</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"869\" y=\"278.5\">+pos</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"869\" y=\"293.5\">+threshold</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"869\" y=\"308.5\">+improvement</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"869\" y=\"323.5\">+impurity_left</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"869\" y=\"338.5\">+impurity_right</text></g></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 661 275 L 863 275\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><path d=\"M 852.8373251423759 279.209517756016 L 863 275 L 852.8373251423758 270.790482243984\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-198) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"748\" y=\"267\">+1..*</text></g></g></g></svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SVG(\"./res/uml/Model___splitter_2.svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Splitter` 基类依数据储存方式（实阵或稀疏阵）衍生为 `BaseDenseSplitte` 和 `BaseSparseSplitter`。在这之下根据阈值的寻优方法再细分两类：`Best*Splitter` 会遍历特征的可能值，而 `Random*Splitter` 则是随机抽取。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2.2 树的组建方法\n",
    "`_tree.*` 是树组建方法相关的文件。\n",
    "\n",
    "下面是类的关系图："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg height=\"492\" version=\"1.1\" width=\"1076.4111328125\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><defs/><g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"150\" opacity=\"0.2\" stroke=\"none\" width=\"184.4111328125\" x=\"1119\" y=\"183\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#ffffff\" height=\"150\" stroke=\"none\" width=\"184.4111328125\" x=\"1112\" y=\"176\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 1112 176 L 1296.4111328125 176 L 1296.4111328125 326 L 1112 326 L 1112 176 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 1112 214 L 1296.4111328125 214\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1170.20556640625\" y=\"193.5\">«dataType»</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"1189.20556640625\" y=\"208.5\">Node</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1117\" y=\"231.5\">+left_child</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1117\" y=\"246.5\">+right_child</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1117\" y=\"261.5\">+feature</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1117\" y=\"276.5\">+threshold</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1117\" y=\"291.5\">+impurity</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1117\" y=\"306.5\">+n_node_samples</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1117\" y=\"321.5\">+weighted_n_node_samples</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"132\" opacity=\"0.2\" stroke=\"none\" width=\"215.6416015625\" x=\"711\" y=\"191\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#ffffff\" height=\"132\" stroke=\"none\" width=\"215.6416015625\" x=\"704\" y=\"184\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 704 184 L 919.6416015625 184 L 919.6416015625 316 L 704 316 L 704 184 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 704 209 L 919.6416015625 209\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 704 219 L 919.6416015625 219\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"798.82080078125\" y=\"203.5\">Tree</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"709\" y=\"236.5\">+_add_note()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"709\" y=\"251.5\">+_resize()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"709\" y=\"266.5\">+predict()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"709\" y=\"281.5\">+apply()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"709\" y=\"296.5\">+decision_path()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"709\" y=\"311.5\">+compute_feature_importances()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"145\" opacity=\"0.2\" stroke=\"none\" width=\"125\" x=\"407\" y=\"183\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#ffffff\" height=\"145\" stroke=\"none\" width=\"125\" x=\"400\" y=\"176\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 400 176 L 525 176 L 525 321 L 400 321 L 400 176 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 400 201 L 525 201\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 400 284 L 525 284\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"429.5\" y=\"195.5\">TreeBuilder</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"405\" y=\"218.5\">+splitter</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"405\" y=\"233.5\">+min_samples_split</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"405\" y=\"248.5\">+min_samples_leaf</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"405\" y=\"263.5\">+min_weight_leaf</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"405\" y=\"278.5\">+max_depth</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"405\" y=\"301.5\">+build()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"405\" y=\"316.5\">+_check_input()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 526 248 L 703 249\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 692.8137051246281 253.15203524936533 L 703 249 L 692.861269537907 244.73313409901056 L 681.0003511039827 248.87570819832757\" fill=\"#000000\" stroke=\"none\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 692.8137051246281 253.15203524936533 L 703 249 L 692.861269537907 244.73313409901056 L 681.0003511039827 248.87570819832757 L 692.8137051246281 253.15203524936533\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"599\" y=\"240\">+1..1</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 921 250 L 1111 251\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 1100.815310846965 255.15597243206102 L 1111 251 L 1100.8596209464772 246.73705352474659 L 1089.0003047028108 250.8842121300148\" fill=\"#000000\" stroke=\"none\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 1100.815310846965 255.15597243206102 L 1111 251 L 1100.8596209464772 246.73705352474659 L 1089.0003047028108 250.8842121300148 L 1100.815310846965 255.15597243206102\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"1002\" y=\"242\">+1..*</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"57\" opacity=\"0.2\" stroke=\"none\" width=\"145\" x=\"247\" y=\"431\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#ffffff\" height=\"57\" stroke=\"none\" width=\"145\" x=\"240\" y=\"424\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 240 424 L 385 424 L 385 481 L 240 481 L 240 424 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 240 449 L 385 449\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 240 459 L 385 459\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"249.5\" y=\"443.5\">DepthFirstTreeBuilder</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"245\" y=\"476.5\">+build()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"88\" x=\"271\" y=\"583\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"88\" x=\"264\" y=\"576\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 264 576 L 352 576 L 352 648 L 264 648 L 264 576 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 264 601 L 352 601\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 264 611 L 352 611\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"273.5\" y=\"595.5\">_utils.Stack</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"269\" y=\"628.5\">+push()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"269\" y=\"643.5\">+pop()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 333 423 L 408 322\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 402.64172515799646 343.33749963602935 L 408 322 L 389.1232368375562 333.29901820995985\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 402.64172515799646 343.33749963602935 L 408 322 L 389.1232368375562 333.29901820995985 L 402.64172515799646 343.33749963602935\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 311 482 L 309 575\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 305.0099569447797 564.7491680134007 L 309 575 L 313.4270463089558 564.9301806878991\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"309\" y=\"535\">+1..1</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"57\" opacity=\"0.2\" stroke=\"none\" width=\"136\" x=\"535\" y=\"431\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#ffffff\" height=\"57\" stroke=\"none\" width=\"136\" x=\"528\" y=\"424\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 528 424 L 664 424 L 664 481 L 528 481 L 528 424 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 528 449 L 664 449\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 528 459 L 664 459\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"536.5\" y=\"443.5\">BestFirstTreeBuilder</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"533\" y=\"476.5\">+build()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 577 423 L 511 322\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 529.1662013380095 334.40923563104917 L 511 322 L 515.0707894904501 343.6200988185633\" fill=\"#FFFFFF\" stroke=\"none\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 529.1662013380095 334.40923563104917 L 511 322 L 515.0707894904501 343.6200988185633 L 529.1662013380095 334.40923563104917\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#C0C0C0\" height=\"72\" opacity=\"0.2\" stroke=\"none\" width=\"124\" x=\"543\" y=\"583\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><rect fill=\"#ffffff\" height=\"72\" stroke=\"none\" width=\"124\" x=\"536\" y=\"576\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 536 576 L 660 576 L 660 648 L 536 648 L 536 576 Z Z\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 536 601 L 660 601\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 536 611 L 660 611\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"bold\" stroke=\"none\" text-decoration=\"none\" x=\"545\" y=\"595.5\">_utils.PriorityHeap</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"541\" y=\"628.5\">+push()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"541\" y=\"643.5\">+pop()</text></g></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 596 482 L 598 575\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><path d=\"M 593.5729536910442 564.9301806878991 L 598 575 L 601.9900430552203 564.7491680134007\" fill=\"none\" stroke=\"#000000\" stroke-miterlimit=\"10\"/></g><g transform=\"translate(-230,-166) scale(1,1)\"><g><path fill=\"none\" stroke=\"none\"/><text fill=\"#000000\" font-family=\"Arial\" font-size=\"13px\" font-style=\"normal\" font-weight=\"normal\" stroke=\"none\" text-decoration=\"none\" x=\"596\" y=\"534\">+1..1</text></g></g></g></svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SVG(\"./res/uml/Model___tree_3.svg\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "sklearn 提供了两种树的组建方法：一种是用栈实现的深度优先方法，它会先左后右地生成整颗决策树；另一种是用最大堆实现的最优优先方法，它每次在纯净度提升最大的节点进行分割生长。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3 结语\n",
    "本文简单介绍了 sklearn 中决策树的实现框架，后面会对各子模块作进一步的详述。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}