{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Theano 实例:更复杂的网络" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using gpu device 1: Tesla C2075 (CNMeM is disabled)\n" ] } ], "source": [ "import theano\n", "import theano.tensor as T\n", "import numpy as np\n", "from load import mnist\n", "from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams\n", "\n", "srng = RandomStreams()\n", "\n", "def floatX(X):\n", " return np.asarray(X, dtype=theano.config.floatX)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上一节我们用了一个简单的神经网络来训练 MNIST 数据,这次我们使用更复杂的网络来进行训练,同时加入 `dropout` 机制,防止过拟合。\n", "\n", "这里采用比较简单的 `dropout` 机制,即将输入值按照一定的概率随机置零。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def dropout(X, prob=0.):\n", " if prob > 0:\n", " X *= srng.binomial(X.shape, p=1-prob, dtype = theano.config.floatX)\n", " X /= 1 - prob\n", " return X" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "之前我们采用的的激活函数是 `sigmoid`,现在我们使用 `rectify` 激活函数。\n", "\n", "这可以使用 `T.nnet.relu(x, alpha=0)` 来实现,它本质上相当于:`T.switch(x > 0, x, alpha * x)`,而 `rectify` 函数的定义为:\n", "\n", "$$\n", "\\text{rectify}(x) = \\left\\{\n", "\\begin{aligned}\n", "x, & \\ x > 0 \\\\\n", "0, & \\ x < 0\n", "\\end{aligned}\\right.\n", "$$\n", "\n", "之前我们构造的是一个单隐层的神经网络结构,现在我们构造一个双隐层的结构即“输入-隐层1-隐层2-输出”的全连接结构。\n", "\n", "$$\n", "\\begin{aligned}\n", "& h_1 = \\text{rectify}(W_{h_1} \\ x) \\\\\n", "& h_2 = \\text{rectify}(W_{h_2} \\ h_1) \\\\\n", "& o = \\text{softmax}(W_o h_2)\n", "\\end{aligned}\n", "$$\n", "\n", "`Theano` 自带的 `T.nnet.softmax()` 的 GPU 实现目前似乎有 bug 会导致梯度溢出的问题,因此自定义了 `softmax` 函数:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def softmax(X):\n", " e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))\n", " return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')\n", "\n", "def model(X, w_h1, w_h2, w_o, p_drop_input, p_drop_hidden):\n", " \"\"\"\n", " input:\n", " X: input data\n", " w_h1: weights input layer to hidden layer 1\n", " w_h2: weights hidden layer 1 to hidden layer 2\n", " w_o: weights hidden layer 2 to output layer\n", " p_drop_input: dropout rate for input layer\n", " p_drop_hidden: dropout rate for hidden layer\n", " output:\n", " h1: hidden layer 1\n", " h2: hidden layer 2\n", " py_x: output layer\n", " \"\"\"\n", " X = dropout(X, p_drop_input)\n", " h1 = T.nnet.relu(T.dot(X, w_h1))\n", " \n", " h1 = dropout(h1, p_drop_hidden)\n", " h2 = T.nnet.relu(T.dot(h1, w_h2))\n", " \n", " h2 = dropout(h2, p_drop_hidden)\n", " py_x = softmax(T.dot(h2, w_o))\n", " return h1, h2, py_x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "随机初始化权重矩阵:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def init_weights(shape):\n", " return theano.shared(floatX(np.random.randn(*shape) * 0.01))\n", "\n", "w_h1 = init_weights((784, 625))\n", "w_h2 = init_weights((625, 625))\n", "w_o = init_weights((625, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "定义变量:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = T.matrix()\n", "Y = T.matrix()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "定义更新的规则,之前我们使用的是简单的 SGD,这次我们使用 RMSprop 来更新,其规则为:\n", "$$\n", "\\begin{align}\n", "MS(w, t) & = \\rho MS(w, t-1) + (1-\\rho) \\left(\\left.\\frac{\\partial E}{\\partial w}\\right|_{w(t-1)}\\right)^2 \\\\\n", "w(t) & = w(t-1) - \\alpha \\left.\\frac{\\partial E}{\\partial w}\\right|_{w(t-1)} / \\sqrt{MS(w, t)}\n", "\\end{align}\n", "$$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def RMSprop(cost, params, accs, lr=0.001, rho=0.9, epsilon=1e-6):\n", " grads = T.grad(cost=cost, wrt=params)\n", " updates = []\n", " for p, g, acc in zip(params, grads, accs):\n", " acc_new = rho * acc + (1 - rho) * g ** 2\n", " gradient_scaling = T.sqrt(acc_new + epsilon)\n", " g = g / gradient_scaling\n", " updates.append((acc, acc_new))\n", " updates.append((p, p - lr * g))\n", " return updates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "训练函数:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# 有 dropout,用来训练\n", "noise_h1, noise_h2, noise_py_x = model(X, w_h1, w_h2, w_o, 0.2, 0.5)\n", "cost = T.mean(T.nnet.categorical_crossentropy(noise_py_x, Y))\n", "params = [w_h1, w_h2, w_o]\n", "accs = [theano.shared(p.get_value() * 0.) for p in params]\n", "updates = RMSprop(cost, params, accs, lr=0.001)\n", "# 训练函数\n", "train = theano.function(inputs=[X, Y], outputs=cost, updates=updates, allow_input_downcast=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "预测函数:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# 没有 dropout,用来预测\n", "h1, h2, py_x = model(X, w_h1, w_h2, w_o, 0., 0.)\n", "# 预测的结果\n", "y_x = T.argmax(py_x, axis=1)\n", "predict = theano.function(inputs=[X], outputs=y_x, allow_input_downcast=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "训练:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "iter 001 accuracy: 0.943\n", "iter 002 accuracy: 0.9665\n", "iter 003 accuracy: 0.9732\n", "iter 004 accuracy: 0.9763\n", "iter 005 accuracy: 0.9767\n", "iter 006 accuracy: 0.9802\n", "iter 007 accuracy: 0.9795\n", "iter 008 accuracy: 0.979\n", "iter 009 accuracy: 0.9807\n", "iter 010 accuracy: 0.9805\n", "iter 011 accuracy: 0.9824\n", "iter 012 accuracy: 0.9816\n", "iter 013 accuracy: 0.9838\n", "iter 014 accuracy: 0.9846\n", "iter 015 accuracy: 0.983\n", "iter 016 accuracy: 0.9837\n", "iter 017 accuracy: 0.9841\n", "iter 018 accuracy: 0.9837\n", "iter 019 accuracy: 0.9835\n", "iter 020 accuracy: 0.9844\n", "iter 021 accuracy: 0.9837\n", "iter 022 accuracy: 0.9839\n", "iter 023 accuracy: 0.984\n", "iter 024 accuracy: 0.9851\n", "iter 025 accuracy: 0.985\n", "iter 026 accuracy: 0.9847\n", "iter 027 accuracy: 0.9851\n", "iter 028 accuracy: 0.9846\n", "iter 029 accuracy: 0.9846\n", "iter 030 accuracy: 0.9853\n", "iter 031 accuracy: 0.985\n", "iter 032 accuracy: 0.9844\n", "iter 033 accuracy: 0.9849\n", "iter 034 accuracy: 0.9845\n", "iter 035 accuracy: 0.9848\n", "iter 036 accuracy: 0.9868\n", "iter 037 accuracy: 0.9864\n", "iter 038 accuracy: 0.9866\n", "iter 039 accuracy: 0.9859\n", "iter 040 accuracy: 0.9857\n", "iter 041 accuracy: 0.9853\n", "iter 042 accuracy: 0.9855\n", "iter 043 accuracy: 0.9861\n", "iter 044 accuracy: 0.9865\n", "iter 045 accuracy: 0.9872\n", "iter 046 accuracy: 0.9867\n", "iter 047 accuracy: 0.9868\n", "iter 048 accuracy: 0.9863\n", "iter 049 accuracy: 0.9862\n", "iter 050 accuracy: 0.9856\n" ] } ], "source": [ "trX, teX, trY, teY = mnist(onehot=True)\n", "\n", "for i in range(50):\n", " for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):\n", " cost = train(trX[start:end], trY[start:end])\n", " print \"iter {:03d} accuracy:\".format(i + 1), np.mean(np.argmax(teY, axis=1) == predict(teX))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }