{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# %load /Users/facai/Study/book_notes/preconfig.py\n", "%matplotlib inline\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "from IPython.display import SVG" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "逻辑回归在TensorFlow contrib.learn中的实现简介\n", "==============================" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "分析用的代码版本信息\n", "\n", "```bash\n", "~/W/g/t/tensorflow ❯❯❯ git log -n 1\n", "commit 8308ecd1ec68d914365b8fdfa16d5ac97e69f18c\n", "Merge: f991800 310901d\n", "Author: Shanqing Cai \n", "Date: Sun Dec 25 08:44:49 2016 -0500\n", "\n", " Merge pull request #6465 from velaia/patch-1\n", "\n", " typo 'unit8' instead of 'uint8'\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0. 总纲\n", "\n", "在contrib.learn中确实有个[LogisticRegressor](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/LogisticRegressor),然而我理解它是一个低层的封装,需要写model_fn来指定损失函数。而真正可直接使用的是[LinearClassifier](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/LinearClassifier),默认是二分类逻辑回归,简单的构成图如下:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/svg+xml": [ "LinearClassifier+__init__()Evaluable+model_dir+evaluate()Trainable+fit()«dataType»linear.py+_get_default_optimizer()+_linear_model_fn()«dataType»header.py+_multi_class_head()+_log_loss_with_two_classes()_BinaryLogisticHead+_thresholds+_loss_fun_MultiClassHead_Head+head_ops()«dataType»nn_imply.py+sigmoid_cross_entropy_with_logits()ModelFnOps+__new__()«dataType»feature_column_ops.py+weighted_sum_from_feature_columns()+_create_embedding_lookup()«dataType»gradients_impl.py+gradients()" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SVG(\"./res/tensorflow_lr.svg\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TensorFlow的封装非常细,需要跳来跳去,但逻辑性很好,可以很容易追出整个流程,所以就不打算再细贴代码了,只说对于二分类的几个重点:\n", "\n", "0. LinearClassifier默认是二分类逻辑回归,见head_lib.multi_class_head方法。\n", "1. 损失函数:\n", " + _linear_model_fn中计算的logits = $w^T x + b$。 \n", " 有趣的是,这里用了embeding,似乎已经支持稀疏矩阵。我这方面不太熟,只是猜测。\n", " + 在_BinaryLogisticHead中调用sigmoid_cross_entropy_with_logits计算出损失函数。这里的公式是针对标签是0/1的推导,所以与spark和sklearn略有差异。\n", "2. 导数:在_linear_model_fn中调用gradients.gradients来计算。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. 小结\n", "\n", "本文概要介绍了TensorFlow contrib.learn中逻辑回归的实现。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }