{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Neural Networks\n", "\n", "Chapter 11. Neural Networks\n", "\n", "*本章节总计24页,还包括好多例子。当 Neural networks and deep learning 已经被写成一本书的时候,24页只能是一个概览。* 接下来,就让我们从统计学家的角度来看看神经网络吧。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Outline\n", "\n", "1. *Projection Pursuit Regerssion* 。*为什么要讲这个?说是这个给神经网络一些启发。好吧,我觉得启发于线性代数才对。*\n", "\\begin{align}\n", "f(x) = \\sum_{m=1}^M g_m(w_m^T X).\n", "\\end{align}\n", "Optimization 的套路,重复下列过程直至收敛:\n", " - 给定 $w$,优化 $g$\n", " - 给定 $g$,优化 $w$\n", " \n", "1. *Neural Networks*,以一个经典的三层网络(输入 X,中间 Z,输出 Y)为例,又叫 *vanilla neural net*,*single layer perceptron*,*single hidden layer back-propagation network*.\n", "\\begin{align}\n", "Z_m = & \\delta (\\alpha _{0m} + \\alpha_m^T X), \\quad m=1,2,\\cdots,M, \\\\\n", "f_k(K) = & g_k(\\beta_{0k} + \\beta_k^T Z), \\quad k= 1,2,\\cdots,K\n", "\\end{align}\n", " - $\\delta()$ 为激活函数 activation function,例如 *sigmoid * $\\delta(v) = 1/(1+e^{-v})$\n", " - $g()$ 为最终的转化函数,regression 的时候可以会 identity function $g_k(T) = T_k$, classifiction 的时候为 *softmax* $g_k(T) = \\frac{e^{T_k}}{\\sum_{l=1}^K e^{T_l}}$\n", " \n", "1. *Back-Propagation*,BP算法。其实是个 gradient descent update 过程,但是在求导过程中巧妙地采用了 back-propagation 的方法节省了巨大的求导开销。\n", "\\begin{align}\n", "\\beta_{km}^{(r+1)} = & \\beta_{km}^{(r)} - \\gamma_r \\sum_{i=1}^N \\frac{\\partial R_i}{\\partial \\beta_{km}^{(r)}}, \\\\\n", "\\alpha_{ml}^{(r+1)} = & \\alpha_{ml}^{(r)} - \\gamma_r \\sum_{i=1}^N \\frac{\\partial R_i}{\\partial \\beta_{ml}^{(r)}}, \\\\\n", "\\frac{1}{x_{il}}\\frac{\\partial R_i}{\\partial \\beta_{ml}^{(r)}} = & \\delta'(\\alpha_m^Tx_i)\\sum_{k=1}^K \\beta_{km} \\frac{1}{z_{mi}}\\frac{\\partial R_i}{\\partial \\beta_{km}^{(r)}}\n", "\\end{align}\n", " - BP 算法的精髓在最后一个 back-propagation 方程\n", " - $\\gamma_r$ 是 *leanrning rate*\n", " - $N$ 是 *batch learning* 的 batch size\n", " - *training epoch* 是指一次对所有训练数据的遍历" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 具体内容\n", "\n", "### 模型优化的讨论\n", "\n", "1. Starting Values:权重 w 的初始值是 0 附件的随机数。如果全部取0,那模型就没办法更新了。太大又会得到很差的效果。\n", "\n", "1. Overfitting:由于神经网络可以引入非常多的 神经元,非常容易达到过拟合,所以模型训练到一定程度就可以停止了,不可以无休止的训练到100%。可以采用类似 redge regression 的思路,引入 *weight decay*,实现 regularization $\\lambda$。即,给 error function $R(\\theta)$ 引入 penalty $J(\\theta)$,变成优化 $R(\\theta) + \\lambda J(\\theta)$.\n", " - Cross-validation 可以估计 $\\lambda$\n", " - *weight elimination* penalty \n", " \\begin{align}\n", " J(\\theta) = \\sum_{km} \\frac{\\beta_{km}^2}{1+\\beta_{km}^2} +\\sum_{ml}\\frac{\\alpha_{ml}^2}{1+\\alpha_{ml}^2}\n", " \\end{align}\n", "\n", "1. Scaling of the Inputs。最好把输入数据标准化,mean = 0, std = 1.\n", "\n", "1. Number of hidden units and layers. 越多越好。但是太多,会导致所有权重趋于0." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }