{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Theano tensor 模块：conv 子模块" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`conv` 是 `tensor` 中处理卷积神经网络的子模块。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 卷积" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里只介绍二维卷积：\n", "\n", "`T.nnet.conv2d(input, filters, input_shape=None, filter_shape=None, border_mode='valid', subsample=(1, 1), filter_flip=True, image_shape=None, **kwargs)`\n", "\n", "`conv2d` 函数接受两个输入：\n", "\n", "- `4D` 张量 `input`，其形状如下：\n", " \n", " `[b, ic, i0, i1]`\n", " \n", " \n", "- `4D` 张量 `filter` ，其形状如下：\n", " \n", " `[oc, ic, f0, f1]`\n", " \n", "`border_mode` 控制输出大小：\n", "\n", "- `'valid'`：输出形状：\n", "\n", " `[b, oc, i0 - f0 + 1, i1 - f1 + 1]`\n", " \n", "- `'full'`：输出形状：\n", "\n", " `[b, oc, i0 + f0 - 1, i1 + f1 - 1]`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 池化" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "池化操作：\n", "\n", "`T.signal.downsample.max_pool_2d(input, ds, ignore_border=None, st=None, padding=(0, 0), mode='max')`\n", "\n", "`input` 池化操作在其最后两维进行。\n", "\n", "`ds` 是池化区域的大小，用长度为 2 的元组表示。\n", "\n", "`ignore_border` 设为 `Ture` 时，`(5, 5)` 在 `(2, 2)` 的池化下会变成 `(2, 2)`（5 % 2 == 1，多余的 1 个被舍去了），否则是 `(3, 3)`。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MNIST 卷积神经网络形状详解" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "def model(X, w, w2, w3, w4, p_drop_conv, p_drop_hidden):\n", " \n", " # X: 128 * 1 * 28 * 28\n", " # w: 32 * 1 * 3 * 3\n", " # full mode\n", " # l1a: 128 * 32 * (28 + 3 - 1) * (28 + 3 - 1)\n", " l1a = rectify(conv2d(X, w, border_mode='full'))\n", " # l1a: 128 * 32 * 30 * 30\n", " # ignore_border False\n", " # l1: 128 * 32 * (30 / 2) * (30 / 2)\n", " l1 = max_pool_2d(l1a, (2, 2), ignore_border=False)\n", " l1 = dropout(l1, p_drop_conv)\n", "\n", " # l1: 128 * 32 * 15 * 15\n", " # w2: 64 * 32 * 3 * 3\n", " # valid mode\n", " # l2a: 128 * 64 * (15 - 3 + 1) * (15 - 3 + 1)\n", " l2a = rectify(conv2d(l1, w2)) \n", " # l2a: 128 * 64 * 13 * 13\n", " # l2: 128 * 64 * (13 / 2 + 1) * (13 / 2 + 1)\n", " l2 = max_pool_2d(l2a, (2, 2), ignore_border=False)\n", " l2 = dropout(l2, p_drop_conv)\n", "\n", " # l2: 128 * 64 * 7 * 7\n", " # w3: 128 * 64 * 3 * 3\n", " # l3a: 128 * 128 * (7 - 3 + 1) * (7 - 3 + 1)\n", " l3a = rectify(conv2d(l2, w3))\n", " # l3a: 128 * 128 * 5 * 5\n", " # l3b: 128 * 128 * (5 / 2 + 1) * (5 / 2 + 1)\n", " l3b = max_pool_2d(l3a, (2, 2), ignore_border=False) \n", " # l3b: 128 * 128 * 3 * 3\n", " # l3: 128 * (128 * 3 * 3)\n", " l3 = T.flatten(l3b, outdim=2)\n", " l3 = dropout(l3, p_drop_conv)\n", " \n", " # l3: 128 * (128 * 3 * 3)\n", " # w4: (128 * 3 * 3) * 625\n", " # l4: 128 * 625\n", " l4 = rectify(T.dot(l3, w4))\n", " l4 = dropout(l4, p_drop_hidden)\n", "\n", " # l5: 128 * 625\n", " # w5: 625 * 10\n", " # pyx: 128 * 10\n", " pyx = softmax(T.dot(l4, w_o))\n", " return l1, l2, l3, l4, pyx\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }