{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Theano 实例:人工神经网络" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "神经网络的模型可以参考 UFLDL 的教程,这里不做过多描述。 \n", "\n", "http://ufldl.stanford.edu/wiki/index.php/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using gpu device 1: Tesla K10.G2.8GB (CNMeM is disabled)\n" ] } ], "source": [ "import theano\n", "import theano.tensor as T\n", "\n", "import numpy as np\n", "from load import mnist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们在这里使用一个简单的三层神经网络:输入 - 隐层 - 输出。\n", "\n", "对于网络的激活函数,隐层用 `sigmoid` 函数,输出层用 `softmax` 函数,其模型如下: \n", "\n", "$$\n", "\\begin{aligned}\n", " h & = \\sigma (W_h X) \\\\\n", " o & = \\text{softmax} (W_o h)\n", "\\end{aligned}\n", "$$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def model(X, w_h, w_o):\n", " \"\"\"\n", " input:\n", " X: input data\n", " w_h: hidden unit weights\n", " w_o: output unit weights\n", " output:\n", " Y: probability of y given x\n", " \"\"\"\n", " # 隐层\n", " h = T.nnet.sigmoid(T.dot(X, w_h))\n", " # 输出层\n", " pyx = T.nnet.softmax(T.dot(h, w_o))\n", " return pyx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用随机梯度下降的方法进行训练:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def sgd(cost, params, lr=0.05):\n", " \"\"\"\n", " input:\n", " cost: cost function\n", " params: parameters\n", " lr: learning rate\n", " output:\n", " update rules\n", " \"\"\"\n", " grads = T.grad(cost=cost, wrt=params)\n", " updates = []\n", " for p, g in zip(params, grads):\n", " updates.append([p, p - g * lr])\n", " return updates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于 `MNIST` 手写数字的问题,我们使用一个 `784 × 625 × 10` 即输入层大小为 `784`,隐层大小为 `625`,输出层大小为 `10` 的神经网络来模拟,最后的输出表示数字为 `0` 到 `9` 的概率。\n", "\n", "为了对权重进行更新,我们需要将权重设为 shared 变量:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def floatX(X):\n", " return np.asarray(X, dtype=theano.config.floatX)\n", "\n", "def init_weights(shape):\n", " return theano.shared(floatX(np.random.randn(*shape) * 0.01))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "因此变量初始化为:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X = T.matrix()\n", "Y = T.matrix()\n", "\n", "w_h = init_weights((784, 625))\n", "w_o = init_weights((625, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "模型输出为:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "py_x = model(X, w_h, w_o)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "预测的结果为:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y_x = T.argmax(py_x, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "模型的误差函数为:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "更新规则为:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "updates = sgd(cost, [w_h, w_o])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "定义训练和预测的函数:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "train = theano.function(inputs=[X, Y], outputs=cost, updates=updates, allow_input_downcast=True)\n", "predict = theano.function(inputs=[X], outputs=y_x, allow_input_downcast=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "训练:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "导入 MNIST 数据:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "trX, teX, trY, teY = mnist(onehot=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "训练 100 轮,正确率为 0.956:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "000 0.7028\n", "001 0.8285\n", "002 0.8673\n", "003 0.883\n", "004 0.89\n", "005 0.895\n", "006 0.8984\n", "007 0.9017\n", "008 0.9047\n", "009 0.907\n", "010 0.9089\n", "011 0.9105\n", "012 0.9127\n", "013 0.914\n", "014 0.9152\n", "015 0.9159\n", "016 0.9169\n", "017 0.9173\n", "018 0.918\n", "019 0.9185\n", "020 0.919\n", "021 0.9197\n", "022 0.9201\n", "023 0.9205\n", "024 0.9206\n", "025 0.9212\n", "026 0.9219\n", "027 0.9228\n", "028 0.9228\n", "029 0.9229\n", "030 0.9236\n", "031 0.9244\n", "032 0.925\n", "033 0.9255\n", "034 0.9263\n", "035 0.927\n", "036 0.9274\n", "037 0.9278\n", "038 0.928\n", "039 0.9284\n", "040 0.9289\n", "041 0.9294\n", "042 0.9298\n", "043 0.9302\n", "044 0.9311\n", "045 0.932\n", "046 0.9325\n", "047 0.9332\n", "048 0.934\n", "049 0.9347\n", "050 0.9354\n", "051 0.9358\n", "052 0.9365\n", "053 0.9372\n", "054 0.9377\n", "055 0.9385\n", "056 0.9395\n", "057 0.9399\n", "058 0.9405\n", "059 0.9411\n", "060 0.9416\n", "061 0.9422\n", "062 0.9427\n", "063 0.9429\n", "064 0.9431\n", "065 0.9438\n", "066 0.9444\n", "067 0.9446\n", "068 0.9449\n", "069 0.9453\n", "070 0.9458\n", "071 0.9462\n", "072 0.9469\n", "073 0.9475\n", "074 0.9474\n", "075 0.9476\n", "076 0.948\n", "077 0.949\n", "078 0.9497\n", "079 0.95\n", "080 0.9503\n", "081 0.9507\n", "082 0.9507\n", "083 0.9515\n", "084 0.9519\n", "085 0.9521\n", "086 0.9523\n", "087 0.9529\n", "088 0.9536\n", "089 0.9538\n", "090 0.9542\n", "091 0.9545\n", "092 0.9544\n", "093 0.9546\n", "094 0.9547\n", "095 0.9549\n", "096 0.9552\n", "097 0.9554\n", "098 0.9557\n", "099 0.9562\n" ] } ], "source": [ "for i in range(100):\n", " for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):\n", " cost = train(trX[start:end], trY[start:end])\n", " print \"{0:03d}\".format(i), np.mean(np.argmax(teY, axis=1) == predict(teX))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }