{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Inference\n", "Chapter 8. Model Inference and Averaging\n", "\n", "本书中大部分模型的拟合,都是通过 \n", "> minimizing a sum of squares for regression, or by minimizing cross-entropy for classification.\n", "\n", "*单独摘引这句,因为最近在 run 一个 tensorflow 的例子,把别人的一个 regression 的代码用来做 classification,发现结果很差。 Google 后,把优化目标从 RMS (RMSPropOptimizer) 改成了 Entropy (tf.nn.sigmoid_cross_entropy_with_logits,AdamOptimizer) 就解决了,当时感觉特别神奇。今天读到这句话,原来是个常识。。。好吧,我要坚持读完这本书*\n", "\n", "当然,本章节重点不是这两个,而是更 general 的讨论 maximum likelihood。*嗯,通信里面,输出信号的判别,其实就是 maximum likelihood。*\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Outline\n", "\n", "1. *Maximum likelihood*. 对 *likelihood function* $L(\\theta;\\mathbf{Z})= \\prod_{i=1}^{N}g_{\\theta}(z_i)$ 取对数,得到\n", "\\begin{align}\n", "l(\\theta;\\mathbf{Z})= \\sum_{i=1}^N l(\\theta;z_i) = \\sum_{i=1}^N \\log g_{\\theta}(z_i)\n", "\\end{align}\n", ",其中 $l(\\theta;z_i)= \\log g_{\\theta}(z_i)$ 被称为 *log-likelihood component*\n", "\n", "1. *Bayesian Method*. 我们可以根据贝叶斯公式计算后验概率,\n", "\\begin{align}\n", "\\Pr(\\theta) = \\frac{\\Pr(\\mathbf{z}|\\theta)\\Pr(\\theta)}{\\int \\Pr(\\mathbf{z}|\\theta)\\Pr(\\theta) d\\theta}\n", "\\end{align}\n", "基于已有的参数分布 $\\Pr(\\theta)$,更新参数 $\\theta$。当然,最终目的是预测最新的数据分布,*predictive distribution*\n", "\\begin{align}\n", "\\Pr(z^{new}|\\mathbf{z}) = \\int \\Pr(z^{new}|\\theta) \\Pr(\\theta|\\mathbf{z})d\\theta.\n", "\\end{align}\n", "\n", "1. *EM Algorithm (Expection Maximum, 期望最大化)* 最常用的迭代求解 maximum likelihood 算法。\n", "\n", " ```\n", " 1. Initialize parameters.\n", " 2. Expectation likelihood.\n", " 3. Maximization the likelihood by reestimate parameters.\n", " 4. Iterate steps 2 and 3 until convergence.\n", " ```\n", "\n", "1. *Gibbs sampling*, a Markov chain Monte Carlo approach. 没怎么看懂,大概就是重复很多次后,Markov chain 的转移概率 $\\Pr(U_j|U_1,U_2,\\cdots,U_{j-1},U_{j+1},\\cdots, U_K)$ 会达到一个平衡,那么这个时候的采样就靠谱了,系统就稳定了,参数就得到了(通常需要求平均)。*我们跑仿真就是这个思路啊*\n", "\n", "1. *Model fitting*. 之前都是根据不同的数据,产生最 fitting 的模型。这里介绍了三种方法,是基于不同 bootstrap 得到的模型,然后 fitting 得到一个相对最好的最终模型。\n", "\n", " - *Bagging*,几个模型相加:\n", "\\begin{align}\n", "\\hat{f}_{bag}(x) = \\frac{1}{B} \\sum_{b=1}^B \\hat{f}^{*b}(x)\n", "\\end{align}\n", " - *Staking*, 几个模型加权相加:\n", " \\begin{align}\n", " \\hat{w}^{st} = \\arg \\min_{w} \\sum_{i=1}^N \\left[ y_i - \\sum_{m=1}^M w_m \\hat{f}_m^{-i}(x_i) \\right]^2\n", " \\end{align}\n", " - *Bumping*, 几个模型中选择最佳的那个:\n", " \\begin{align}\n", " \\hat{b} = \\arg \\min_b \\sum_{i=1}^N [y_i -\\hat{f}_m^{-i}(x_i) ]^2\n", " \\end{align}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 具体内容\n", "\n", "### Bootstrap vs. Maximum Likelihood\n", "> 本质上, in essence the bootstrap is a computer implementation of nonparametric or parametric maximum likelood.\n", "\n", "### Maximum Likelihood vs. Bayesian\n", "Bayesian 可以表示为\n", "\\begin{align}\n", "posterior = \\frac{likelihood \\cdot prior}{evidence}\n", "\\end{align}\n", "同时可以参考 Bayesian vs. Frequentist的讨论。\n", "\n", "### Bayesian inference vs. Frequentist inference \n", "\n", "\\begin{align}\n", "P(A |B) = \\frac{P(B|A)P(A)}{P(B)}\n", "\\end{align}\n", "\n", "Bayesian inference 贝叶斯推断 基于先验概率 $P(B)$ 和后验概率 $P(B|A)$; 而 Frequentist inference 频率论推断(大概是这么翻译的吧)只看 $P(B|A)$。两者互有优略,相关文献说,19世纪主要是 Bayesian, 20世纪开始流行 Frequentist, 到了21世纪,由于计算能力的发展以及大数据的流行,Bayesian 再次主导。\n", "\n", "*我认为,Bayesian 的先验概率 $P(B)$ 容易引入观察者误差,这正式 Frequentist 不采用它的原因。但是,随着大数据的盛行,$P(B)$ 可以被 Bootstrap 等技术很好的描述,所以 Bayesian 又再次胜出。*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }