{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 4.4 Custom Layers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.4.1 Layers without Parameters" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from mxnet import gluon, nd\n", "from mxnet.gluon import nn\n", "\n", "class CenteredLayer(nn.Block):\n", " def __init__(self, **kwargs):\n", " super(CenteredLayer, self).__init__(**kwargs)\n", "\n", " def forward(self, x):\n", " return x - x.mean()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "[-2. -1. 0. 1. 2.]\n", "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "layer = CenteredLayer()\n", "X = nd.array([1, 2, 3, 4, 5])\n", "layer(X)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "net = nn.Sequential()\n", "net.add(\n", " nn.Dense(128), \n", " CenteredLayer()\n", ")\n", "net.initialize()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4, 128)\n", "\n", "[-1.2560122e-09]\n", "\n" ] } ], "source": [ "X = nd.random.uniform(shape=(4, 8))\n", "y = net(X)\n", "print(y.shape)\n", "print(y.mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.4.2 Layers with Parameters\n", "- The `Parameter` class and the `ParameterDict` dictionary provide some basic housekeeping functionality.\n", " - They govern access, initialization, sharing, saving and loading model parameters.\n", "- For instance, we can use the member variable params of the `ParameterDict` type that comes with the `Block` class. \n", " - It is a dictionary that maps string type parameter names to model parameters in the `Parameter` type. \n", " - We can create a `Parameter` instance from `ParameterDict` via the `get` function." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "param_dict = gluon.ParameterDict()\n", "param = param_dict.get('param2', shape=(2, 3))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Parameter param2 (shape=(2, 3), dtype=)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "param" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(\n", " Parameter param2 (shape=(2, 3), dtype=)\n", ")" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "param_dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Let's use this to implement our own version of the dense layer. " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "class MyDense(nn.Block):\n", " # Units: the number of outputs in this layer; in_units: the number of inputs in this layer.\n", " def __init__(self, units, in_units, **kwargs):\n", " super(MyDense, self).__init__(**kwargs)\n", " self.weight = self.params.get('weight', shape=(in_units, units))\n", " self.bias = self.params.get('bias', shape=(units,))\n", "\n", " def forward(self, x):\n", " linear = nd.dot(x, self.weight.data()) + self.bias.data()\n", " return nd.relu(linear)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mydense1_ (\n", " Parameter mydense1_weight (shape=(5, 3), dtype=)\n", " Parameter mydense1_bias (shape=(3,), dtype=)\n", ")" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dense = MyDense(units=3, in_units=5)\n", "dense.params" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "[[0.00618806 0.06494527 0.12089312]\n", " [0.04054129 0.06180677 0.07008321]]\n", "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dense.initialize()\n", "\n", "X = nd.random.uniform(shape=(2, 5))\n", "dense(X)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "[[0. ]\n", " [0.01760728]]\n", "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "net = nn.Sequential()\n", "net.add(\n", " MyDense(8, in_units=64),\n", " MyDense(1, in_units=8)\n", ")\n", "net.initialize()\n", "\n", "X = nd.random.uniform(shape=(2, 64))\n", "net(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4.5 File I/O\n", "- At some point, we want to save the results (what we obtained) for later use and distribution. \n", "- Likewise, when running a long training process it is best practice to save intermediate results (checkpointing) to ensure that we don’t lose several days worth of computation. \n", "- At the same time, we might want to load a pretrained model.\n", "- For all of these cases we need to load and store both individual weight vectors and entire models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.5.1 NDArray" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "from mxnet import nd\n", "from mxnet.gluon import nn\n", "\n", "x = nd.arange(4)\n", "nd.save('x-file.dat', x)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\n", " [0. 1. 2. 3.]\n", " ]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x2 = nd.load('x-file.dat')\n", "x2" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(\n", " [0. 1. 2. 3.]\n", " , \n", " [0. 0. 0. 0.]\n", " )" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = nd.zeros(4)\n", "nd.save('x-files.dat', [x, y])\n", "x2, y2 = nd.load('x-files.dat')\n", "(x2, y2)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'x': \n", " [0. 1. 2. 3.]\n", " , 'y': \n", " [0. 0. 0. 0.]\n", " }" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mydict = {'x': x, 'y': y}\n", "nd.save('mydict.dat', mydict)\n", "mydict2 = nd.load('mydict.dat')\n", "mydict2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.5.2 Gluon Model Parameters\n", "- Saving individual weight vectors (or other NDArray tensors) is useful but it gets very tedious if we want to save (and later load) an entire model. \n", "- For this reason Gluon provides built-in functionality to load and save entire networks rather than just single weight vectors. \n", " - This saves model parameters and not the entire model. \n", " - I.e. if we have a 3-layer MLP, we need to specify the architecture separately. \n", "- The result is that in order to reinstate a model we need to generate the architecture in code and then load the parameters from disk. \n", " - The deferred initialization is quite advantageous here since we can simply define a model without the need to put actual values in place." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "class MLP(nn.Block):\n", " def __init__(self, **kwargs):\n", " super(MLP, self).__init__(**kwargs)\n", " self.hidden = nn.Dense(256, activation='relu')\n", " self.output = nn.Dense(10)\n", "\n", " def forward(self, x):\n", " return self.output(self.hidden(x))\n", "\n", "net = MLP()\n", "net.initialize()\n", "x = nd.random.uniform(shape=(2, 20))\n", "y = net(x)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "net.save_parameters('mlp.params')" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "clone = MLP()\n", "clone.load_parameters('mlp.params')" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n", " [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]\n", "\n" ] } ], "source": [ "yclone = clone(x)\n", "print(yclone == y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.6 GPUs\n", "- If a CPU version of MXNet is already installed, we need to uninstall it first. \n", " - `pip uninstall mxnet` \n", "- then install the corresponding MXNet version according to the CUDA version. \n", " - Assuming you have CUDA 9.0 installed, `pip install mxnet-cu90`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.6.1 Computing Devices\n", "- `mx.cpu()` (or any integer in the parentheses) means ***all physical CPUs and memory***. \n", " - MXNet's calculations will try to use all CPU cores.\n", " \n", "- `mx.gpu()` only represents one graphic card and the corresponding graphic memory. \n", " - If there are multiple GPUs, we use mx.gpu(i) to represent the $i$-th GPU ($i$ starts from 0). \n", " - Also, mx.gpu(0) and mx.gpu() are equivalent." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(cpu(0), gpu(0), gpu(1))" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import mxnet as mx\n", "from mxnet import nd\n", "from mxnet.gluon import nn\n", "\n", "mx.cpu(), mx.gpu(), mx.gpu(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- By default, `NDArray` objects are created on the CPU. \n", "- Therefore, we will see the `@cpu(0)` identifier each time we print an `NDArray`." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "[1. 2. 3.]\n", "" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = nd.array([1, 2, 3])\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- We can use the `context` property of `NDArray` to view the device where the `NDArray` is located. \n", "- Whenever we want to operate on multiple terms, they need to be ***in the same context***. \n", " - For instance, if we sum two variables, we need to make sure that both arguments are on the same device - otherwise MXNet would not know where to store the result or even how to decide where to perform the computation." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "cpu(0)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.context" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Storage on the GPU\n", " - We can specify a storage device with the `ctx` parameter when creating an `NDArray`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = nd.ones((2, 3), ctx=mx.gpu())\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = nd.random.uniform(shape=(2, 3), ctx=mx.gpu(0))\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Copying\n", " - If we want to compute $\\mathbf{x} + \\mathbf{y}$, we need to decide where to perform this operation. \n", " - For instance, we can transfer $\\mathbf{x}$ to gpu(1) and perform the operation there. \n", " - Do not simply add $\\mathbf{x} + \\mathbf{y}$, since this will result in an exception. \n", " - If the runtime engine cannot find data on the same device, it fails.\n", " ![](https://github.com/d2l-ai/d2l-en/raw/master/img/copyto.svg?sanitize=true)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z = x.copyto(mx.gpu(1))\n", "print(x)\n", "print(z)\n", "y + z" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Imagine that your variable $z$ already lives on your second GPU (gpu(0)). \n", "- We want to make a copy only if the variables currently lives on different contexts. \n", " - In these cases, we can call `as_in_context()`. \n", " - If the variable is already the specified context then this is a no-op. \n", " - In fact, unless you specifically want to make a copy, as_in_context() is the method of choice." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z = x.as_in_context(mx.gpu(1))\n", "z" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- if the context of the source variable and the target variable are consistent, then the `as_in_context` function does not anything." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y.as_in_context(mx.gpu(1)) is y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- The `copyto` function always creates new memory for the target variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y.copyto(mx.gpu()) is y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Watch Out\n", " - Transferring data between devices (CPU, GPUs, other machines) is something that is much slower than computation.\n", " - It also makes parallelization a lot more difficult, since we have to wait for data to be sent (or rather to be received) before we can proceed with more operations. \n", " - As a rule of thumb\n", " - 1) Many small operations are much worse than one big operation. \n", " - 2) Several operations at a time are much better than many single operations interspersed in the code \n", " - Such operations can block if one device has to wait for the other before it can do something else. \n", " - Lastly, when we print `NDArray` data or convert `NDArrays` to `NumPy` format, if the data is not in main memory, MXNet will copy it to the main memory first, resulting in additional transmission overhead. \n", " - Even worse, it is now subject to the dreaded ***Global Interpreter Lock*** which makes everything wait for Python to complete.\n", " - Computing the loss for ***every*** minibatch on the GPU and reporting it back to the user on the commandline (or logging it in a NumPy array) will ***trigger a global interpreter lock which stalls all GPUs***. \n", " - It is much better to allocate memory for logging inside the GPU and only move larger logs.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.6.3 Gluon and GPUs\n", "- Gluon’s model can specify devices through the `ctx` parameter during initialization. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net = nn.Sequential()\n", "net.add(nn.Dense(1))\n", "net.initialize(ctx=mx.gpu())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net[0].weight.data()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }