{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Theano 基础" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "首先导入 `theano` 及其 `tensor` 子模块(`tensor`,张量):" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using gpu device 1: Tesla K10.G2.8GB (CNMeM is disabled)\n" ] } ], "source": [ "import theano\n", "\n", "# 一般都把 `tensor` 子模块导入并命名为 T\n", "import theano.tensor as T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`tensor` 模块包含很多我们常用的数学操作,所以为了方便,将其命名为 T。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 符号计算" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`theano` 中,所有的算法都是用符号计算的,所以某种程度上,用 `theano` 写算法更像是写数学(之前在[04.06 积分](../04. scipy/04.06 integration in python.ipynb)一节中接触过用 `sympy` 定义的符号变量)。\n", "\n", "用 `T.scalar` 来定义一个符号标量:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "foo = T.scalar('x')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x\n" ] } ], "source": [ "print foo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "支持符号计算:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Elemwise{pow,no_inplace}.0\n" ] } ], "source": [ "bar = foo ** 2\n", "\n", "print bar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里定义 `foo` 是 $x$,`bar` 就是变量 $x^2$,但显示出来的却是看不懂的东西。\n", "\n", "为了更好的显示 `bar`,我们使用 `theano.pp()` 函数(`pretty print`)来显示:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(x ** TensorConstant{2})\n" ] } ], "source": [ "print theano.pp(bar)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "查看类型:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "TensorType(float32, scalar)\n" ] } ], "source": [ "print type(foo)\n", "print foo.type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## theano 函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "有了符号变量,自然可以用符号变量来定义函数,`theano.function()` 函数用来生成符号函数:\n", "\n", " theano.function(input, output)\n", "\n", "其中 `input` 对应的是作为参数的符号变量组成的列表,`output` 对应的是输出,输出可以是一个,也可以是多个符号变量组成的列表。\n", "\n", "例如,我们用刚才生成的 `foo` 和 `bar` 来定义函数:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "square = theano.function([foo], bar)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用 `square` 函数:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9.0\n" ] } ], "source": [ "print square(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "也可以使用 `bar` 的 `eval` 方法,将 `x` 替换为想要的值,`eval` 接受一个字典作为参数,键值对表示符号变量及其对应的值:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9.0\n" ] } ], "source": [ "print bar.eval({foo: 3})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## theano.tensor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "除了 `T.scalar()` 标量之外,`Theano` 中还有很多符号变量类型,这些都包含在 `tensor`(张量)子模块中,而且 `tensor` 中也有很多函数对它们进行操作。\n", "\n", "- `T.scalar(name=None, dtype=config.floatX)` \n", " - 标量,shape - ()\n", "- `T.vector(name=None, dtype=config.floatX)` \n", " - 向量,shape - (?,)\n", "- `T.matrix(name=None, dtype=config.floatX)` \n", " - 矩阵,shape - (?,?)\n", "- `T.row(name=None, dtype=config.floatX)` \n", " - 行向量,shape - (1,?)\n", "- `T.col(name=None, dtype=config.floatX)` \n", " - 列向量,shape - (?,1)\n", "- `T.tensor3(name=None, dtype=config.floatX)`\n", " - 3 维张量,shape - (?,?,?)\n", "- `T.tensor4(name=None, dtype=config.floatX)`\n", " - 4 维张量,shape - (?,?,?,?)\n", "\n", "`shape` 中为 1 的维度支持 `broadcast` 机制。\n", "\n", "除了直接指定符号变量的类型(默认 `floatX`),还可以直接在每类前面加上一个字母来定义不同的类型:\n", "\n", "- `b` int8\n", "- `w` int16\n", "- `i` int32\n", "- `l` int64\n", "- `d` float64\n", "- `f` float32\n", "- `c` complex64\n", "- `z` complex128\n", "\n", "例如 `T.dvector()` 表示的就是一个 `float64` 型的向量。\n", "\n", "除此之外,还可以用它们的复数形式一次定义多个符号变量:\n", "\n", " x,y,z = T.vectors('x','y','z')\n", " x,y,z = T.vectors(3)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "A = T.matrix('A')\n", "x = T.vector('x')\n", "b = T.vector('b')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`T.dot()` 表示矩阵乘法:\n", "$$y = Ax+b$$" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y = T.dot(A, x) + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`T.sum()` 表示进行求和:\n", "$$z = \\sum_{i,j} A_{ij}^2$$" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "z = T.sum(A**2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "来定义一个线性函数,以 $A,x,b$ 为参数,以 $y,z$ 为输出: " ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "linear_mix = theano.function([A, x, b],\n", " [y, z])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用这个函数:\n", "\n", "$$\n", "A = \\begin{bmatrix}\n", "1 & 2 & 3 \\\\\n", "4 & 5 & 6\n", "\\end{bmatrix}, \n", "x = \\begin{bmatrix}\n", "1 \\\\ 2 \\\\ 3\n", "\\end{bmatrix},\n", "b = \\begin{bmatrix}\n", "4 \\\\ 5\n", "\\end{bmatrix}\n", "$$" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[array([ 18., 37.], dtype=float32), array(91.0, dtype=float32)]\n" ] } ], "source": [ "print linear_mix(np.array([[1, 2, 3],\n", " [4, 5, 6]], dtype=theano.config.floatX), #A\n", " np.array([1, 2, 3], dtype=theano.config.floatX), #x\n", " np.array([4, 5], dtype=theano.config.floatX)) #b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里 `dtype=theano.config.floatX` 是为了与 `theano` 设置的浮点数精度保持一致,默认是 `float64`,但是在 `GPU` 上一般使用 `float32` 会更高效一些。\n", "\n", "我们还可以像定义普通函数一样,给 `theano` 函数提供默认值,需要使用 `theano.Param` 类:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "linear_mix_default = theano.function([A, x, theano.Param(b, default=np.zeros(2, dtype=theano.config.floatX))],\n", " [y, z])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "计算默认参数下的结果:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[array([ 14., 32.], dtype=float32), array(91.0, dtype=float32)]\n" ] } ], "source": [ "print linear_mix_default(np.array([[1, 2, 3],\n", " [4, 5, 6]], dtype=theano.config.floatX), #A\n", " np.array([1, 2, 3], dtype=theano.config.floatX)) #x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "计算刚才的结果:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[array([ 18., 37.], dtype=float32), array(91.0, dtype=float32)]\n" ] } ], "source": [ "print linear_mix_default(np.array([[1, 2, 3],\n", " [4, 5, 6]], dtype=theano.config.floatX), #A\n", " np.array([1, 2, 3], dtype=theano.config.floatX), #x\n", " np.array([4, 5], dtype=theano.config.floatX)) #b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 共享的变量" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Theano` 中可以定义共享的变量,它们可以在多个函数中被共享,共享变量类似于普通函数定义时候使用的全局变量,同时加上了 `global` 的属性以便在函数中修改这个全局变量的值。" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CudaNdarrayType(float32, matrix)\n" ] } ], "source": [ "shared_var = theano.shared(np.array([[1.0, 2.0], [3.0, 4.0]], dtype=theano.config.floatX))\n", "\n", "print shared_var.type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以通过 `set_value` 方法改变它的值:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "shared_var.set_value(np.array([[3.0, 4], [2, 1]], dtype=theano.config.floatX))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通过 `get_value()` 方法返回它的值:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 3. 4.]\n", " [ 2. 1.]]\n" ] } ], "source": [ "print shared_var.get_value()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "共享变量进行运算:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 9. 16.]\n", " [ 4. 1.]]\n" ] } ], "source": [ "shared_square = shared_var ** 2\n", "\n", "f = theano.function([], shared_square)\n", "\n", "print f()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这里函数不需要参数,因为共享变量隐式地被认为是一个参数。\n", "\n", "得到的结果会随这个共享变量的变化而变化:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 4.]\n", " [ 9. 16.]]\n" ] } ], "source": [ "shared_var.set_value(np.array([[1.0, 2], [3, 4]], dtype=theano.config.floatX))\n", "\n", "print f()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "一个共享变量的值可以用 `updates` 关键词在 `theano` 函数中被更新:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [], "source": [ "subtract = T.matrix('subtract')\n", "\n", "f_update = theano.function([subtract], shared_var, updates={shared_var: shared_var - subtract})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "这个函数先返回当前的值,然后将当前值更新为原来的值减去参数:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "before update:\n", "[[ 1. 2.]\n", " [ 3. 4.]]\n", "the return value:\n", "\n", "after update:\n", "[[ 0. 1.]\n", " [ 2. 3.]]\n" ] } ], "source": [ "print 'before update:'\n", "print shared_var.get_value()\n", "\n", "print 'the return value:'\n", "print f_update(np.array([[1.0, 1], [1, 1]], dtype=theano.config.floatX))\n", "\n", "print 'after update:'\n", "print shared_var.get_value()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 导数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Theano` 的一大好处在于它对符号变量计算导数的能力。\n", "\n", "我们用 `T.grad()` 来计算导数,之前我们定义了 `foo` 和 `bar` (分别是 $x$ 和 $x^2$),我们来计算 `bar` 关于 `foo` 的导数(应该是 $2x$):" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "20.0\n" ] } ], "source": [ "bar_grad = T.grad(bar, foo) # 表示 bar (x^2) 关于 foo (x) 的导数\n", "\n", "print bar_grad.eval({foo: 10})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "再如,对之前的 $y = Ax + b$ 求 $y$ 关于 $x$ 的雅可比矩阵(应当是 $A$):" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 9. 8. 7.]\n", " [ 4. 5. 6.]]\n" ] } ], "source": [ "y_J = theano.gradient.jacobian(y, x)\n", "\n", "print y_J.eval({A: np.array([[9.0, 8, 7], [4, 5, 6]], dtype=theano.config.floatX), #A\n", " x: np.array([1.0, 2, 3], dtype=theano.config.floatX), #x\n", " b: np.array([4.0, 5], dtype=theano.config.floatX)}) #b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`theano.gradient.jacobian` 用来计算雅可比矩阵,而 `theano.gradient.hessian` 可以用来计算 `Hessian` 矩阵。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `R-op` 和 `L-op`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rop 用来计算 $\\frac{\\partial f}{\\partial x}v$,Lop 用来计算 $v\\frac{\\partial f}{\\partial x}$:\n", "\n", "一个是雅可比矩阵与列向量的乘积,另一个是行向量与雅可比矩阵的乘积。" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 2. 2.]\n" ] } ], "source": [ "W = T.dmatrix('W')\n", "V = T.dmatrix('V')\n", "x = T.dvector('x')\n", "y = T.dot(x, W)\n", "JV = T.Rop(y, W, V)\n", "f = theano.function([W, V, x], JV)\n", "\n", "print f([[1, 1], [1, 1]], [[2, 2], [2, 2]], [0,1])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }