{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Theano 实例：人工神经网络"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "神经网络的模型可以参考 UFLDL 的教程，这里不做过多描述。 \n",
    "\n",
    "http://ufldl.stanford.edu/wiki/index.php/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using gpu device 1: Tesla K10.G2.8GB (CNMeM is disabled)\n"
     ]
    }
   ],
   "source": [
    "import theano\n",
    "import theano.tensor as T\n",
    "\n",
    "import numpy as np\n",
    "from load import mnist"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们在这里使用一个简单的三层神经网络：输入 - 隐层 - 输出。\n",
    "\n",
    "对于网络的激活函数，隐层用 `sigmoid` 函数，输出层用 `softmax` 函数，其模型如下： \n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "    h & = \\sigma (W_h X) \\\\\n",
    "    o & = \\text{softmax} (W_o h)\n",
    "\\end{aligned}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def model(X, w_h, w_o):\n",
    "    \"\"\"\n",
    "    input:\n",
    "        X: input data\n",
    "        w_h: hidden unit weights\n",
    "        w_o: output unit weights\n",
    "    output:\n",
    "        Y: probability of y given x\n",
    "    \"\"\"\n",
    "    # 隐层\n",
    "    h = T.nnet.sigmoid(T.dot(X, w_h))\n",
    "    # 输出层\n",
    "    pyx = T.nnet.softmax(T.dot(h, w_o))\n",
    "    return pyx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用随机梯度下降的方法进行训练："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sgd(cost, params, lr=0.05):\n",
    "    \"\"\"\n",
    "    input:\n",
    "        cost: cost function\n",
    "        params: parameters\n",
    "        lr: learning rate\n",
    "    output:\n",
    "        update rules\n",
    "    \"\"\"\n",
    "    grads = T.grad(cost=cost, wrt=params)\n",
    "    updates = []\n",
    "    for p, g in zip(params, grads):\n",
    "        updates.append([p, p - g * lr])\n",
    "    return updates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于 `MNIST` 手写数字的问题，我们使用一个 `784 × 625 × 10` 即输入层大小为 `784`，隐层大小为 `625`，输出层大小为 `10` 的神经网络来模拟，最后的输出表示数字为 `0` 到 `9` 的概率。\n",
    "\n",
    "为了对权重进行更新，我们需要将权重设为 shared 变量："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def floatX(X):\n",
    "    return np.asarray(X, dtype=theano.config.floatX)\n",
    "\n",
    "def init_weights(shape):\n",
    "    return theano.shared(floatX(np.random.randn(*shape) * 0.01))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因此变量初始化为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "X = T.matrix()\n",
    "Y = T.matrix()\n",
    "\n",
    "w_h = init_weights((784, 625))\n",
    "w_o = init_weights((625, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "模型输出为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "py_x = model(X, w_h, w_o)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "预测的结果为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y_x = T.argmax(py_x, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "模型的误差函数为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "更新规则为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "updates = sgd(cost, [w_h, w_o])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义训练和预测的函数："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "train = theano.function(inputs=[X, Y], outputs=cost, updates=updates, allow_input_downcast=True)\n",
    "predict = theano.function(inputs=[X], outputs=y_x, allow_input_downcast=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "训练："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "导入 MNIST 数据："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "trX, teX, trY, teY = mnist(onehot=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "训练 100 轮，正确率为 0.956："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "000 0.7028\n",
      "001 0.8285\n",
      "002 0.8673\n",
      "003 0.883\n",
      "004 0.89\n",
      "005 0.895\n",
      "006 0.8984\n",
      "007 0.9017\n",
      "008 0.9047\n",
      "009 0.907\n",
      "010 0.9089\n",
      "011 0.9105\n",
      "012 0.9127\n",
      "013 0.914\n",
      "014 0.9152\n",
      "015 0.9159\n",
      "016 0.9169\n",
      "017 0.9173\n",
      "018 0.918\n",
      "019 0.9185\n",
      "020 0.919\n",
      "021 0.9197\n",
      "022 0.9201\n",
      "023 0.9205\n",
      "024 0.9206\n",
      "025 0.9212\n",
      "026 0.9219\n",
      "027 0.9228\n",
      "028 0.9228\n",
      "029 0.9229\n",
      "030 0.9236\n",
      "031 0.9244\n",
      "032 0.925\n",
      "033 0.9255\n",
      "034 0.9263\n",
      "035 0.927\n",
      "036 0.9274\n",
      "037 0.9278\n",
      "038 0.928\n",
      "039 0.9284\n",
      "040 0.9289\n",
      "041 0.9294\n",
      "042 0.9298\n",
      "043 0.9302\n",
      "044 0.9311\n",
      "045 0.932\n",
      "046 0.9325\n",
      "047 0.9332\n",
      "048 0.934\n",
      "049 0.9347\n",
      "050 0.9354\n",
      "051 0.9358\n",
      "052 0.9365\n",
      "053 0.9372\n",
      "054 0.9377\n",
      "055 0.9385\n",
      "056 0.9395\n",
      "057 0.9399\n",
      "058 0.9405\n",
      "059 0.9411\n",
      "060 0.9416\n",
      "061 0.9422\n",
      "062 0.9427\n",
      "063 0.9429\n",
      "064 0.9431\n",
      "065 0.9438\n",
      "066 0.9444\n",
      "067 0.9446\n",
      "068 0.9449\n",
      "069 0.9453\n",
      "070 0.9458\n",
      "071 0.9462\n",
      "072 0.9469\n",
      "073 0.9475\n",
      "074 0.9474\n",
      "075 0.9476\n",
      "076 0.948\n",
      "077 0.949\n",
      "078 0.9497\n",
      "079 0.95\n",
      "080 0.9503\n",
      "081 0.9507\n",
      "082 0.9507\n",
      "083 0.9515\n",
      "084 0.9519\n",
      "085 0.9521\n",
      "086 0.9523\n",
      "087 0.9529\n",
      "088 0.9536\n",
      "089 0.9538\n",
      "090 0.9542\n",
      "091 0.9545\n",
      "092 0.9544\n",
      "093 0.9546\n",
      "094 0.9547\n",
      "095 0.9549\n",
      "096 0.9552\n",
      "097 0.9554\n",
      "098 0.9557\n",
      "099 0.9562\n"
     ]
    }
   ],
   "source": [
    "for i in range(100):\n",
    "    for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):\n",
    "        cost = train(trX[start:end], trY[start:end])\n",
    "    print \"{0:03d}\".format(i), np.mean(np.argmax(teY, axis=1) == predict(teX))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}