{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# %load /Users/facai/Study/book_notes/preconfig.py\n", "%matplotlib inline\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "from IPython.display import SVG" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "逻辑回归在scikit-learn中的实现简介\n", "==============================" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "分析用的代码版本信息:\n", "\n", "```bash\n", "~/W/g/scikit-learn ❯❯❯ git log -n 1\n", "commit d161bfaa1a42da75f4940464f7f1c524ef53484f\n", "Author: John B Nelson \n", "Date: Thu May 26 18:36:37 2016 -0400\n", "\n", " Add missing double quote (#6831)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0. 总纲\n", "\n", "下面是sklearn中逻辑回归的构成情况:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/svg+xml": [ "LogisticRegression+fit()+predict_proba()+predict_log_proba()BaseEstimator+get_params()+set_params()-get_param_names()LinearClassifierMixin+decision_function()+predict()-predict_proba_lr()_LearntSelectorMixin+transform()SparseCoefMixin+densify()+sparsify()objectClassifierMixin-estimator_type+score()TransformerMixin+fit_transform()«dataType»base.py+_fit_liblinear()«dataType»logistic.py+logistic_regression_path()-check_solver_option()-logistic_loss()-logistic_loss_and_grad()-multinomial_loss()-multinomial_loss_grad()-multinomial_grad_hess()«dataType»optimize.py+newton_cg()«dataType»sag.py+sag_solver()" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SVG(\"./res/sklearn_lr.svg\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如[逻辑回归在spark中的实现简介](./spark_ml_lr.ipynb)中分析一样,主要把精力定位到算法代码上,即寻优算子和损失函数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. 寻优算子\n", "\n", "sklearn支持liblinear, sag, lbfgs和newton-cg四种寻优算子,其中lbfgs属于scipy包,liblinear属于LibLinear库,剩下两种由sklearn自己实现。代码很好定位,逻辑也很明了,不多说:\n", "\n", "```python\n", " 704 if solver == 'lbfgs':\n", " 705 try:\n", " 706 w0, loss, info = optimize.fmin_l_bfgs_b(\n", " 707 func, w0, fprime=None,\n", " 708 args=(X, target, 1. / C, sample_weight),\n", " 709 iprint=(verbose > 0) - 1, pgtol=tol, maxiter=max_iter)\n", " 710 except TypeError:\n", " 711 # old scipy doesn't have maxiter\n", " 712 w0, loss, info = optimize.fmin_l_bfgs_b(\n", " 713 func, w0, fprime=None,\n", " 714 args=(X, target, 1. / C, sample_weight),\n", " 715 iprint=(verbose > 0) - 1, pgtol=tol)\n", " 716 if info[\"warnflag\"] == 1 and verbose > 0:\n", " 717 warnings.warn(\"lbfgs failed to converge. Increase the number \"\n", " 718 \"of iterations.\")\n", " 719 try:\n", " 720 n_iter_i = info['nit'] - 1\n", " 721 except:\n", " 722 n_iter_i = info['funcalls'] - 1\n", " 723 elif solver == 'newton-cg':\n", " 724 args = (X, target, 1. / C, sample_weight)\n", " 725 w0, n_iter_i = newton_cg(hess, func, grad, w0, args=args,\n", " 726 maxiter=max_iter, tol=tol)\n", " 727 elif solver == 'liblinear':\n", " 728 coef_, intercept_, n_iter_i, = _fit_liblinear(\n", " 729 X, target, C, fit_intercept, intercept_scaling, None,\n", " 730 penalty, dual, verbose, max_iter, tol, random_state,\n", " 731 sample_weight=sample_weight)\n", " 732 if fit_intercept:\n", " 733 w0 = np.concatenate([coef_.ravel(), intercept_])\n", " 734 else:\n", " 735 w0 = coef_.ravel()\n", " 736\n", " 737 elif solver == 'sag':\n", " 738 if multi_class == 'multinomial':\n", " 739 target = target.astype(np.float64)\n", " 740 loss = 'multinomial'\n", " 741 else:\n", " 742 loss = 'log'\n", " 743\n", " 744 w0, n_iter_i, warm_start_sag = sag_solver(\n", " 745 X, target, sample_weight, loss, 1. / C, max_iter, tol,\n", " 746 verbose, random_state, False, max_squared_sum, warm_start_sag)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. 损失函数\n", "\n", "#### 2.1 二分类\n", "\n", "二分类的损失函数和导数由`_logistic_loss_and_grad`实现,运算逻辑和[逻辑回归算法简介和Python实现](./demo.ipynb)是相同的,不多说。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.2 多分类\n", "\n", "sklearn的多分类支持ovr (one vs rest,一对多)和multinominal两种方式。\n", "\n", "\n", "##### 2.2.0 ovr\n", "默认是ovr,它会对毎个标签训练一个二分类的分类器,即总共$K$个。训练代码在\n", "\n", "```python\n", "1230 fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,\n", "1231 backend=backend)(\n", "1232 path_func(X, y, pos_class=class_, Cs=[self.C],\n", "1233 fit_intercept=self.fit_intercept, tol=self.tol,\n", "1234 verbose=self.verbose, solver=self.solver, copy=False,\n", "1235 multi_class=self.multi_class, max_iter=self.max_iter,\n", "1236 class_weight=self.class_weight, check_input=False,\n", "1237 random_state=self.random_state, coef=warm_start_coef_,\n", "1238 max_squared_sum=max_squared_sum,\n", "1239 sample_weight=sample_weight)\n", "1240 for (class_, warm_start_coef_) in zip(classes_, warm_start_coef))\n", "```\n", "\n", "注意,1240L的`for class_ in classes`配合1232L的`pos_class=class`,就是逐个取标签来训练的逻辑。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.2.1 multinominal\n", "\n", "前面讲到ovr会遍历标签,逐个训练。为了兼容这段逻辑,真正的二分类问题需要做变化:\n", "\n", "```python\n", "1201 if len(self.classes_) == 2:\n", "1202 n_classes = 1\n", "1203 classes_ = classes_[1:]\n", "```\n", "\n", "同样地,multinominal需要一次对全部标签做处理,也需要做变化:\n", "\n", "```python\n", "1217 # Hack so that we iterate only once for the multinomial case.\n", "1218 if self.multi_class == 'multinomial':\n", "1219 classes_ = [None]\n", "1220 warm_start_coef = [warm_start_coef]\n", "```\n", "\n", "好,接下来,我们看multinoinal的损失函数和导数计算代码,它是`_multinomial_loss_grad`这个函数。\n", "\n", "sklearn里多分类的代码使用的公式和[逻辑回归算法简介和Python实现](./demo.ipynb)里一致,即:\n", "\n", "\\begin{align}\n", " L(\\beta) &= \\log(\\sum_i e^{\\beta_{i0} + \\beta_i x)}) - (\\beta_{k0} + \\beta_k x) \\\\\n", " \\frac{\\partial L}{\\partial \\beta} &= x \\left ( \\frac{e^{\\beta_{k0} + \\beta_k x}}{\\sum_i e^{\\beta_{i0} + \\beta_i x}} - I(y = k) \\right ) \\\\\n", "\\end{align}\n", "\n", "具体到损失函数:\n", "\n", "```python\n", "244 def _multinomial_loss(w, X, Y, alpha, sample_weight):\n", " 245 #+-- 37 lines: \"\"\"Computes multinomial loss and class probabilities.---\n", " 282 n_classes = Y.shape[1]\n", " 283 n_features = X.shape[1]\n", " 284 fit_intercept = w.size == (n_classes * (n_features + 1))\n", " 285 w = w.reshape(n_classes, -1)\n", " 286 sample_weight = sample_weight[:, np.newaxis]\n", " 287 if fit_intercept:\n", " 288 intercept = w[:, -1]\n", " 289 w = w[:, :-1]\n", " 290 else:\n", " 291 intercept = 0\n", " 292 p = safe_sparse_dot(X, w.T)\n", " 293 p += intercept\n", " 294 p -= logsumexp(p, axis=1)[:, np.newaxis]\n", " 295 loss = -(sample_weight * Y * p).sum()\n", " 296 loss += 0.5 * alpha * squared_norm(w)\n", " 297 p = np.exp(p, p)\n", " 298 return loss, p, w\n", "```\n", "\n", "+ 292L-293L是计算$\\beta_{i0} + \\beta_i x$。\n", "+ 294L是计算 $L(\\beta)$。注意,这里防止计算溢出,是在`logsumexp`函数里作的,原理和[逻辑回归在spark中的实现简介](./spark_ml_lr.ipynb)一样。\n", "+ 295L是加总(注意,$Y$毎列是单位向量,所以起了选标签对应$k$的作用)。\n", "+ 296L加上L2正则。\n", "+ 注意,297L是p变回了$\\frac{e^{\\beta_{k0} + \\beta_k x}}{\\sum_i e^{\\beta_{i0} + \\beta_i x}}$,为了计算导数时直接用。\n", "\n", "好,再看导数的计算:\n", "\n", "```python\n", " 301 def _multinomial_loss_grad(w, X, Y, alpha, sample_weight):\n", " 302 #+-- 37 lines: \"\"\"Computes the multinomial loss, gradient and class probabilities.---\n", " 339 n_classes = Y.shape[1]\n", " 340 n_features = X.shape[1]\n", " 341 fit_intercept = (w.size == n_classes * (n_features + 1))\n", " 342 grad = np.zeros((n_classes, n_features + bool(fit_intercept)))\n", " 343 loss, p, w = _multinomial_loss(w, X, Y, alpha, sample_weight)\n", " 344 sample_weight = sample_weight[:, np.newaxis]\n", " 345 diff = sample_weight * (p - Y)\n", " 346 grad[:, :n_features] = safe_sparse_dot(diff.T, X)\n", " 347 grad[:, :n_features] += alpha * w\n", " 348 if fit_intercept:\n", " 349 grad[:, -1] = diff.sum(axis=0)\n", " 350 return loss, grad.ravel(), p\n", "```\n", "\n", "+ 345L-346L,对应了导数的计算式;\n", "+ 347L是加上L2的导数;\n", "+ 348L-349L,是对intercept的计算。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.3 Hessian\n", "\n", "注意,sklearn支持牛顿法,需要用到Hessian阵,定义见维基[Hessian matrix](https://en.wikipedia.org/wiki/Hessian_matrix),\n", "\n", "\\begin{equation}\n", "{\\mathbf H}={\\begin{bmatrix}{\\dfrac {\\partial ^{2}f}{\\partial x_{1}^{2}}}&{\\dfrac {\\partial ^{2}f}{\\partial x_{1}\\,\\partial x_{2}}}&\\cdots &{\\dfrac {\\partial ^{2}f}{\\partial x_{1}\\,\\partial x_{n}}}\\\\[2.2ex]{\\dfrac {\\partial ^{2}f}{\\partial x_{2}\\,\\partial x_{1}}}&{\\dfrac {\\partial ^{2}f}{\\partial x_{2}^{2}}}&\\cdots &{\\dfrac {\\partial ^{2}f}{\\partial x_{2}\\,\\partial x_{n}}}\\\\[2.2ex]\\vdots &\\vdots &\\ddots &\\vdots \\\\[2.2ex]{\\dfrac {\\partial ^{2}f}{\\partial x_{n}\\,\\partial x_{1}}}&{\\dfrac {\\partial ^{2}f}{\\partial x_{n}\\,\\partial x_{2}}}&\\cdots &{\\dfrac {\\partial ^{2}f}{\\partial x_{n}^{2}}}\\end{bmatrix}}.\n", "\\end{equation}\n", "\n", "其实就是各点位的二阶偏导。具体推导就不写了,感兴趣可以看[Logistic Regression - Jia Li](http://sites.stat.psu.edu/~jiali/course/stat597e/notes2/logit.pdf)或[Logistic regression: a simple ANN Nando de Freitas](https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/lecture6.pdf)。\n", "\n", "基本公式是$\\mathbf{H} = \\mathbf{X}^T \\operatorname{diag}(\\pi_i (1 - \\pi_i)) \\mathbf{X}$,其中$\\pi_i = \\operatorname{sigm}(x_i \\beta)$。\n", "\n", "```python\n", " 167 def _logistic_grad_hess(w, X, y, alpha, sample_weight=None):\n", " 168 #+-- 33 lines: \"\"\"Computes the gradient and the Hessian, in the case of a logistic loss.\n", " 201 w, c, yz = _intercept_dot(w, X, y)\n", " 202 #+-- 4 lines: if sample_weight is None:---------\n", " 206 z = expit(yz)\n", " 207 #+-- 8 lines: z0 = sample_weight * (z - 1) * y---\n", " 215 # The mat-vec product of the Hessian\n", " 216 d = sample_weight * z * (1 - z)\n", " 217 if sparse.issparse(X):\n", " 218 dX = safe_sparse_dot(sparse.dia_matrix((d, 0),\n", " 219 shape=(n_samples, n_samples)), X)\n", " 220 else:\n", " 221 # Precompute as much as possible\n", " 222 dX = d[:, np.newaxis] * X\n", " 223\n", " 224 if fit_intercept:\n", " 225 # Calculate the double derivative with respect to intercept\n", " 226 # In the case of sparse matrices this returns a matrix object.\n", " 227 dd_intercept = np.squeeze(np.array(dX.sum(axis=0)))\n", " 228\n", " 229 def Hs(s):\n", " 230 ret = np.empty_like(s)\n", " 231 ret[:n_features] = X.T.dot(dX.dot(s[:n_features]))\n", " 232 ret[:n_features] += alpha * s[:n_features]\n", " 233\n", " 234 # For the fit intercept case.\n", " 235 if fit_intercept:\n", " 236 ret[:n_features] += s[-1] * dd_intercept\n", " 237 ret[-1] = dd_intercept.dot(s[:n_features])\n", " 238 ret[-1] += d.sum() * s[-1]\n", " 239 return ret\n", " 240\n", " 241 return grad, Hs\n", "```\n", "\n", "+ 201L, 206L, 和216L是计算中间的$\\pi_i (1 - \\pi_i)$。\n", "+ 217L-222L,对中间参数变为对角阵后,预算公式后半部份,配合231L就是整个式子了。\n", "\n", "这里我也只知其然,以后有时间再深挖下吧。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. 小结\n", "\n", "本文简单介绍了sklearn中逻辑回归的实现,包括二分类和多分类的具体代码和公式对应。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }