{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4.4 Custom Layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.4.1 Layers without Parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mxnet import gluon, nd\n",
    "from mxnet.gluon import nn\n",
    "\n",
    "class CenteredLayer(nn.Block):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(CenteredLayer, self).__init__(**kwargs)\n",
    "\n",
    "    def forward(self, x):\n",
    "        return x - x.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[-2. -1.  0.  1.  2.]\n",
       "<NDArray 5 @cpu(0)>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "layer = CenteredLayer()\n",
    "X = nd.array([1, 2, 3, 4, 5])\n",
    "layer(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "net = nn.Sequential()\n",
    "net.add(\n",
    "    nn.Dense(128), \n",
    "    CenteredLayer()\n",
    ")\n",
    "net.initialize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(4, 128)\n",
      "\n",
      "[-1.2560122e-09]\n",
      "<NDArray 1 @cpu(0)>\n"
     ]
    }
   ],
   "source": [
    "X = nd.random.uniform(shape=(4, 8))\n",
    "y = net(X)\n",
    "print(y.shape)\n",
    "print(y.mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.4.2 Layers with Parameters\n",
    "- The `Parameter` class and the `ParameterDict` dictionary provide some basic housekeeping functionality.\n",
    "  - They govern access, initialization, sharing, saving and loading model parameters.\n",
    "- For instance, we can use the member variable params of the `ParameterDict` type that comes with the `Block` class. \n",
    "  - It is a dictionary that maps string type parameter names to model parameters in the `Parameter` type. \n",
    "  - We can create a `Parameter` instance from `ParameterDict` via the `get` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "param_dict = gluon.ParameterDict()\n",
    "param = param_dict.get('param2', shape=(2, 3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Parameter param2 (shape=(2, 3), dtype=<class 'numpy.float32'>)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "param"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(\n",
       "  Parameter param2 (shape=(2, 3), dtype=<class 'numpy.float32'>)\n",
       ")"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "param_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Let's use this to implement our own version of the dense layer. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MyDense(nn.Block):\n",
    "    # Units: the number of outputs in this layer; in_units: the number of inputs in this layer.\n",
    "    def __init__(self, units, in_units, **kwargs):\n",
    "        super(MyDense, self).__init__(**kwargs)\n",
    "        self.weight = self.params.get('weight', shape=(in_units, units))\n",
    "        self.bias = self.params.get('bias', shape=(units,))\n",
    "\n",
    "    def forward(self, x):\n",
    "        linear = nd.dot(x, self.weight.data()) + self.bias.data()\n",
    "        return nd.relu(linear)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "mydense1_ (\n",
       "  Parameter mydense1_weight (shape=(5, 3), dtype=<class 'numpy.float32'>)\n",
       "  Parameter mydense1_bias (shape=(3,), dtype=<class 'numpy.float32'>)\n",
       ")"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dense = MyDense(units=3, in_units=5)\n",
    "dense.params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0.00618806 0.06494527 0.12089312]\n",
       " [0.04054129 0.06180677 0.07008321]]\n",
       "<NDArray 2x3 @cpu(0)>"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dense.initialize()\n",
    "\n",
    "X = nd.random.uniform(shape=(2, 5))\n",
    "dense(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[[0.        ]\n",
       " [0.01760728]]\n",
       "<NDArray 2x1 @cpu(0)>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = nn.Sequential()\n",
    "net.add(\n",
    "    MyDense(8, in_units=64),\n",
    "    MyDense(1, in_units=8)\n",
    ")\n",
    "net.initialize()\n",
    "\n",
    "X = nd.random.uniform(shape=(2, 64))\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4.5 File I/O\n",
    "- At some point, we want to save the results (what we obtained) for later use and distribution. \n",
    "- Likewise, when running a long training process it is best practice to save intermediate results (checkpointing) to ensure that we don’t lose several days worth of computation. \n",
    "- At the same time, we might want to load a pretrained model.\n",
    "- For all of these cases we need to load and store both individual weight vectors and entire models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.5.1 NDArray"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mxnet import nd\n",
    "from mxnet.gluon import nn\n",
    "\n",
    "x = nd.arange(4)\n",
    "nd.save('x-file.dat', x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\n",
       " [0. 1. 2. 3.]\n",
       " <NDArray 4 @cpu(0)>]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x2 = nd.load('x-file.dat')\n",
    "x2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(\n",
       " [0. 1. 2. 3.]\n",
       " <NDArray 4 @cpu(0)>, \n",
       " [0. 0. 0. 0.]\n",
       " <NDArray 4 @cpu(0)>)"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = nd.zeros(4)\n",
    "nd.save('x-files.dat', [x, y])\n",
    "x2, y2 = nd.load('x-files.dat')\n",
    "(x2, y2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'x': \n",
       " [0. 1. 2. 3.]\n",
       " <NDArray 4 @cpu(0)>, 'y': \n",
       " [0. 0. 0. 0.]\n",
       " <NDArray 4 @cpu(0)>}"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mydict = {'x': x, 'y': y}\n",
    "nd.save('mydict.dat', mydict)\n",
    "mydict2 = nd.load('mydict.dat')\n",
    "mydict2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.5.2 Gluon Model Parameters\n",
    "- Saving individual weight vectors (or other NDArray tensors) is useful but it gets very tedious if we want to save (and later load) an entire model. \n",
    "- For this reason Gluon provides built-in functionality to load and save entire networks rather than just single weight vectors. \n",
    "  - This saves model parameters and not the entire model. \n",
    "  - I.e. if we have a 3-layer MLP, we need to specify the architecture separately. \n",
    "- The result is that in order to reinstate a model we need to generate the architecture in code and then load the parameters from disk. \n",
    "  - The deferred initialization is quite advantageous here since we can simply define a model without the need to put actual values in place."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MLP(nn.Block):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(MLP, self).__init__(**kwargs)\n",
    "        self.hidden = nn.Dense(256, activation='relu')\n",
    "        self.output = nn.Dense(10)\n",
    "\n",
    "    def forward(self, x):\n",
    "        return self.output(self.hidden(x))\n",
    "\n",
    "net = MLP()\n",
    "net.initialize()\n",
    "x = nd.random.uniform(shape=(2, 20))\n",
    "y = net(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "net.save_parameters('mlp.params')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "clone = MLP()\n",
    "clone.load_parameters('mlp.params')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
      " [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]\n",
      "<NDArray 2x10 @cpu(0)>\n"
     ]
    }
   ],
   "source": [
    "yclone = clone(x)\n",
    "print(yclone == y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.6 GPUs\n",
    "- If a CPU version of MXNet is already installed, we need to uninstall it first. \n",
    "  - `pip uninstall mxnet` \n",
    "- then install the corresponding MXNet version according to the CUDA version. \n",
    "  - Assuming you have CUDA 9.0 installed, `pip install mxnet-cu90`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.6.1 Computing Devices\n",
    "- `mx.cpu()` (or any integer in the parentheses) means ***all physical CPUs and memory***. \n",
    "  - MXNet's calculations will try to use all CPU cores.\n",
    "  \n",
    "- `mx.gpu()` only represents one graphic card and the corresponding graphic memory. \n",
    "  - If there are multiple GPUs, we use mx.gpu(i) to represent the $i$-th GPU ($i$ starts from 0). \n",
    "  - Also, mx.gpu(0) and mx.gpu() are equivalent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(cpu(0), gpu(0), gpu(1))"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import mxnet as mx\n",
    "from mxnet import nd\n",
    "from mxnet.gluon import nn\n",
    "\n",
    "mx.cpu(), mx.gpu(), mx.gpu(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- By default, `NDArray` objects are created on the CPU. \n",
    "- Therefore, we will see the `@cpu(0)` identifier each time we print an `NDArray`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "[1. 2. 3.]\n",
       "<NDArray 3 @cpu(0)>"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = nd.array([1, 2, 3])\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- We can use the `context` property of `NDArray` to view the device where the `NDArray` is located. \n",
    "- Whenever we want to operate on multiple terms, they need to be ***in the same context***. \n",
    "  - For instance, if we sum two variables, we need to make sure that both arguments are on the same device - otherwise MXNet would not know where to store the result or even how to decide where to perform the computation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "cpu(0)"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.context"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Storage on the GPU\n",
    "  - We can specify a storage device with the `ctx` parameter when creating an `NDArray`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = nd.ones((2, 3), ctx=mx.gpu())\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = nd.random.uniform(shape=(2, 3), ctx=mx.gpu(0))\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Copying\n",
    "  - If we want to compute $\\mathbf{x} + \\mathbf{y}$, we need to decide where to perform this operation. \n",
    "  - For instance, we can transfer $\\mathbf{x}$ to gpu(1) and perform the operation there. \n",
    "  - Do not simply add $\\mathbf{x} + \\mathbf{y}$, since this will result in an exception. \n",
    "  - If the runtime engine cannot find data on the same device, it fails.\n",
    "  ![](https://github.com/d2l-ai/d2l-en/raw/master/img/copyto.svg?sanitize=true)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = x.copyto(mx.gpu(1))\n",
    "print(x)\n",
    "print(z)\n",
    "y + z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Imagine that your variable $z$ already lives on your second GPU (gpu(0)). \n",
    "- We want to make a copy only if the variables currently lives on different contexts. \n",
    "  - In these cases, we can call `as_in_context()`. \n",
    "  - If the variable is already the specified context then this is a no-op. \n",
    "  - In fact, unless you specifically want to make a copy, as_in_context() is the method of choice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = x.as_in_context(mx.gpu(1))\n",
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- if the context of the source variable and the target variable are consistent, then the `as_in_context` function does not anything."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y.as_in_context(mx.gpu(1)) is y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The `copyto` function always creates new memory for the target variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y.copyto(mx.gpu()) is y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Watch Out\n",
    "  - Transferring data between devices (CPU, GPUs, other machines) is something that is much slower than computation.\n",
    "  - It also makes parallelization a lot more difficult, since we have to wait for data to be sent (or rather to be received) before we can proceed with more operations. \n",
    "  - As a rule of thumb\n",
    "    - 1) <u>Many small operations are much worse than one big operation</u>. \n",
    "    - 2) <u>Several operations at a time are much better than many single operations interspersed in the code</u> \n",
    "      - Such operations can block if one device has to wait for the other before it can do something else. \n",
    "  - Lastly, when we print `NDArray` data or convert `NDArrays` to `NumPy` format, if the data is not in main memory, MXNet will copy it to the main memory first, resulting in additional transmission overhead. \n",
    "  - Even worse, it is now subject to the dreaded ***Global Interpreter Lock*** which makes everything wait for Python to complete.\n",
    "    - Computing the loss for ***every*** minibatch on the GPU and reporting it back to the user on the commandline (or logging it in a NumPy array) will ***trigger a global interpreter lock which stalls all GPUs***. \n",
    "    - It is much better to <u>allocate memory for logging inside the GPU and only move larger logs</u>.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.6.3 Gluon and GPUs\n",
    "- Gluon’s model can specify devices through the `ctx` parameter during initialization. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "net = nn.Sequential()\n",
    "net.add(nn.Dense(1))\n",
    "net.initialize(ctx=mx.gpu())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "net[0].weight.data()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}