{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Stack Semantics in Trax: Ungraded Lab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this ungraded lab, we will explain the stack semantics in Trax. This will help in understanding how to use layers like `Select` and `Residual` which gets . If you've taken a computer science class before, you will recall that a stack is a data structure that follows the Last In, First Out (LIFO) principle. That is, whatever is the latest element that is pushed into the stack will also be the first one to be popped out. If you're not yet familiar with stacks, then you may find this [short tutorial](https://www.tutorialspoint.com/python_data_structure/python_stack.htm) useful. In a nutshell, all you really need to remember is it puts elements one on top of the other. You should be aware of what is on top of the stack to know which element you will be popping. You will see this in the discussions below. Let's get started!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:tensorflow:tokens_length=568 inputs_length=512 targets_length=114 noise_density=0.15 mean_noise_span_length=3.0 \n" ] } ], "source": [ "import numpy as np # regular ol' numpy\n", "from trax import layers as tl # core building block\n", "from trax import shapes # data signatures: dimensionality and type\n", "from trax import fastmath # uses jax, offers numpy on steroids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. The tl.Serial Combinator is Stack Oriented.\n", "\n", "To understand how stack-orientation works in [Trax](https://trax-ml.readthedocs.io/en/latest/), most times one will be using the `Serial` layer. We will define two simple [Function layers](https://trax-ml.readthedocs.io/en/latest/notebooks/layers_intro.html?highlight=fn#With-the-Fn-layer-creating-function.): 1) Addition and 2) Multiplication.\n", "\n", "Suppose we want to make the simple calculation (3 + 4) * 15 + 3. `Serial` will perform the calculations in the following manner `3` `4` `add` `15` `mul` `3` `add`. The steps of the calculation are shown in the table below. The first column shows the operations made on the stack and the second column the output of those operations. Moreover, the rightmost element in the second column represents the top of the stack (e.g. in the second row, `Push(3)` pushes `3 ` on top of the stack and `4` is now under it).\n", "\n", "
\n", "\n", "After processing all the stack contains 108 which is the answer to our simple computation.\n", "\n", "From this, the following can be concluded: a stack-based layer has only one way to handle data, by taking one piece of data from atop the stack, termed popping, and putting data back atop the stack, termed pushing. Any expression that can be written conventionally, can be written in this form and thus be amenable to being interpreted by a stack-oriented layer like `Serial`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Coding the example in the table:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Defining addition**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-- Properties --\n", "name : Addition\n", "expected inputs : 2\n", "promised outputs : 1 \n", "\n", "-- Inputs --\n", "x : [3] \n", "\n", "y : [4] \n", "\n", "-- Outputs --\n", "z : [7]\n" ] } ], "source": [ "def Addition():\n", " layer_name = \"Addition\" # don't forget to give your custom layer a name to identify\n", "\n", " # Custom function for the custom layer\n", " def func(x, y):\n", " return x + y\n", "\n", " return tl.Fn(layer_name, func)\n", "\n", "\n", "# Test it\n", "add = Addition()\n", "\n", "# Inspect properties\n", "print(\"-- Properties --\")\n", "print(\"name :\", add.name)\n", "print(\"expected inputs :\", add.n_in)\n", "print(\"promised outputs :\", add.n_out, \"\\n\")\n", "\n", "# Inputs\n", "x = np.array([3])\n", "y = np.array([4])\n", "print(\"-- Inputs --\")\n", "print(\"x :\", x, \"\\n\")\n", "print(\"y :\", y, \"\\n\")\n", "\n", "# Outputs\n", "z = add((x, y))\n", "print(\"-- Outputs --\")\n", "print(\"z :\", z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Defining multiplication**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-- Properties --\n", "name : Multiplication\n", "expected inputs : 2\n", "promised outputs : 1 \n", "\n", "-- Inputs --\n", "x : [7] \n", "\n", "y : [15] \n", "\n", "-- Outputs --\n", "z : [105]\n" ] } ], "source": [ "def Multiplication():\n", " layer_name = (\n", " \"Multiplication\" # don't forget to give your custom layer a name to identify\n", " )\n", "\n", " # Custom function for the custom layer\n", " def func(x, y):\n", " return x * y\n", "\n", " return tl.Fn(layer_name, func)\n", "\n", "\n", "# Test it\n", "mul = Multiplication()\n", "\n", "# Inspect properties\n", "print(\"-- Properties --\")\n", "print(\"name :\", mul.name)\n", "print(\"expected inputs :\", mul.n_in)\n", "print(\"promised outputs :\", mul.n_out, \"\\n\")\n", "\n", "# Inputs\n", "x = np.array([7])\n", "y = np.array([15])\n", "print(\"-- Inputs --\")\n", "print(\"x :\", x, \"\\n\")\n", "print(\"y :\", y, \"\\n\")\n", "\n", "# Outputs\n", "z = mul((x, y))\n", "print(\"-- Outputs --\")\n", "print(\"z :\", z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Implementing the computations using Serial combinator.**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-- Serial Model --\n", "Serial_in4[\n", " Addition_in2\n", " Multiplication_in2\n", " Addition_in2\n", "] \n", "\n", "-- Properties --\n", "name : Serial\n", "sublayers : [Addition_in2, Multiplication_in2, Addition_in2]\n", "expected inputs : 4\n", "promised outputs : 1 \n", "\n", "-- Inputs --\n", "x : (array([3]), array([4]), array([15]), array([3])) \n", "\n", "-- Outputs --\n", "y : [108]\n" ] } ], "source": [ "# Serial combinator\n", "serial = tl.Serial(\n", " Addition(), Multiplication(), Addition() # add 3 + 4 # multiply result by 15\n", ")\n", "\n", "# Initialization\n", "x = (np.array([3]), np.array([4]), np.array([15]), np.array([3])) # input\n", "\n", "serial.init(shapes.signature(x)) # initializing serial instance\n", "\n", "\n", "print(\"-- Serial Model --\")\n", "print(serial, \"\\n\")\n", "print(\"-- Properties --\")\n", "print(\"name :\", serial.name)\n", "print(\"sublayers :\", serial.sublayers)\n", "print(\"expected inputs :\", serial.n_in)\n", "print(\"promised outputs :\", serial.n_out, \"\\n\")\n", "\n", "# Inputs\n", "print(\"-- Inputs --\")\n", "print(\"x :\", x, \"\\n\")\n", "\n", "# Outputs\n", "y = serial(x)\n", "print(\"-- Outputs --\")\n", "print(\"y :\", y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The example with the two simple adition and multiplication functions that where coded together with the serial combinator show how stack semantics work in `Trax`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. The tl.Select combinator in the context of the Serial combinator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having understood how stack semantics work in `Trax`, we will demonstrate how the [tl.Select](https://trax-ml.readthedocs.io/en/latest/trax.layers.html?highlight=select#trax.layers.combinators.Select) combinator works." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### First example of tl.Select" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we want to make the simple calculation (3 + 4) * 3 + 4. We can use `Select` to perform the calculations in the following manner:\n", "\n", "1. `4`\n", "2. `3`\n", "3. `tl.Select([0,1,0,1])` \n", "4. `add` \n", "5. `mul` \n", "6. `add`. \n", "\n", "The `tl.Select` requires a list or tuple of 0-based indices to select elements relative to the top of the stack. For our example, the top of the stack is `3` (which is at index 0) then `4` (index 1) and we Select to add in an ordered manner to the top of the stack which after the command is `3` `4` `3` `4`. The steps of the calculation for our example are shown in the table below. As in the previous table each column shows the contents of the stack and the outputs after the operations are carried out.\n", "\n", "\n", "\n", "After processing all the inputs the stack contains 25 which is the answer we get above." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-- Serial Model --\n", "Serial_in2[\n", " Select[0,1,0,1]_in2_out4\n", " Addition_in2\n", " Multiplication_in2\n", " Addition_in2\n", "] \n", "\n", "-- Properties --\n", "name : Serial\n", "sublayers : [Select[0,1,0,1]_in2_out4, Addition_in2, Multiplication_in2, Addition_in2]\n", "expected inputs : 2\n", "promised outputs : 1 \n", "\n", "-- Inputs --\n", "x : (array([3]), array([4])) \n", "\n", "-- Outputs --\n", "y : [25]\n" ] } ], "source": [ "serial = tl.Serial(tl.Select([0, 1, 0, 1]), Addition(), Multiplication(), Addition())\n", "\n", "# Initialization\n", "x = (np.array([3]), np.array([4])) # input\n", "\n", "serial.init(shapes.signature(x)) # initializing serial instance\n", "\n", "\n", "print(\"-- Serial Model --\")\n", "print(serial, \"\\n\")\n", "print(\"-- Properties --\")\n", "print(\"name :\", serial.name)\n", "print(\"sublayers :\", serial.sublayers)\n", "print(\"expected inputs :\", serial.n_in)\n", "print(\"promised outputs :\", serial.n_out, \"\\n\")\n", "\n", "# Inputs\n", "print(\"-- Inputs --\")\n", "print(\"x :\", x, \"\\n\")\n", "\n", "# Outputs\n", "y = serial(x)\n", "print(\"-- Outputs --\")\n", "print(\"y :\", y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Second example of tl.Select" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we want to make the simple calculation (3 + 4) * 4. We can use `Select` to perform the calculations in the following manner:\n", "\n", "1. `4`\n", "2. `3`\n", "3. `tl.Select([0,1,0,1])` \n", "4. `add` \n", "5. `tl.Select([0], n_in=2)`\n", "6. `mul`\n", "\n", "The example is a bit contrived but it demonstrates the flexibility of the command. The second `tl.Select` pops two elements (specified in n_in) from the stack starting from index 0 (i.e. top of the stack). This means that `7` and `3 ` will be popped out because `n_in = 2`) but only `7` is placed back on top because it only selects `[0]`. As in the previous table each column shows the contents of the stack and the outputs after the operations are carried out.\n", "\n", "\n", "\n", "After processing all the inputs the stack contains 28 which is the answer we get above." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-- Serial Model --\n", "Serial_in2[\n", " Select[0,1,0,1]_in2_out4\n", " Addition_in2\n", " Select[0]_in2\n", " Multiplication_in2\n", "] \n", "\n", "-- Properties --\n", "name : Serial\n", "sublayers : [Select[0,1,0,1]_in2_out4, Addition_in2, Select[0]_in2, Multiplication_in2]\n", "expected inputs : 2\n", "promised outputs : 1 \n", "\n", "-- Inputs --\n", "x : (array([3]), array([4])) \n", "\n", "-- Outputs --\n", "y : [28]\n" ] } ], "source": [ "serial = tl.Serial(\n", " tl.Select([0, 1, 0, 1]), Addition(), tl.Select([0], n_in=2), Multiplication()\n", ")\n", "\n", "# Initialization\n", "x = (np.array([3]), np.array([4])) # input\n", "\n", "serial.init(shapes.signature(x)) # initializing serial instance\n", "\n", "\n", "print(\"-- Serial Model --\")\n", "print(serial, \"\\n\")\n", "print(\"-- Properties --\")\n", "print(\"name :\", serial.name)\n", "print(\"sublayers :\", serial.sublayers)\n", "print(\"expected inputs :\", serial.n_in)\n", "print(\"promised outputs :\", serial.n_out, \"\\n\")\n", "\n", "# Inputs\n", "print(\"-- Inputs --\")\n", "print(\"x :\", x, \"\\n\")\n", "\n", "# Outputs\n", "y = serial(x)\n", "print(\"-- Outputs --\")\n", "print(\"y :\", y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**In summary, what Select does in this example is a copy of the inputs in order to be used further along in the stack of operations.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. The tl.Residual combinator in the context of the Serial combinator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### tl.Residual" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Residual networks](https://arxiv.org/pdf/1512.03385.pdf) are frequently used to make deep models easier to train and you will be using it in the assignment as well. Trax already has a built in layer for this. The [Residual layer](https://trax-ml.readthedocs.io/en/latest/trax.layers.html?highlight=residual#trax.layers.combinators.Residual) computes the element-wise sum of the *stack-top* input with the output of the layer series. For example, if we wanted the cumulative sum of the folowing series of computations (3 + 4) * 3 + 4. The result can be obtained with the use of the `Residual` combinator in the following manner \n", "\n", "1. `4`\n", "2. `3`\n", "3. `tl.Select([0,1,0,1])` \n", "4. `add` \n", "5. `mul` \n", "6. `tl.Residual`. \n", "\n", "For our example the top of the stack is `3` `4` and we select to add the same to numbers in an ordered manner to the top of the stack which after the command is `3` `4` `3` `4`. The steps of the calculation for our example are shown in the table below together with the cumulative sum which is the result of `tl.Residual`.\n", "\n", "\n", "\n", "After processing all the inputs the stack contains 50 which is the cumulative sum of all the operations." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "lines_to_next_cell": 2, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-- Serial Model --\n", "Serial_in2[\n", " Select[0,1,0,1]_in2_out4\n", " Addition_in2\n", " Multiplication_in2\n", " Addition_in2\n", " Serial[\n", " Branch_out2[\n", " None\n", " Serial\n", " ]\n", " Add_in2\n", " ]\n", "] \n", "\n", "-- Properties --\n", "name : Serial\n", "sublayers : [Select[0,1,0,1]_in2_out4, Addition_in2, Multiplication_in2, Addition_in2, Serial[\n", " Branch_out2[\n", " None\n", " Serial\n", " ]\n", " Add_in2\n", "]]\n", "expected inputs : 2\n", "promised outputs : 1 \n", "\n", "-- Inputs --\n", "x : (array([3]), array([4])) \n", "\n", "-- Outputs --\n", "y : [50]\n" ] } ], "source": [ "serial = tl.Serial(\n", " tl.Select([0, 1, 0, 1]), Addition(), Multiplication(), Addition(), tl.Residual()\n", ")\n", "\n", "# Initialization\n", "x = (np.array([3]), np.array([4])) # input\n", "\n", "serial.init(shapes.signature(x)) # initializing serial instance\n", "\n", "\n", "print(\"-- Serial Model --\")\n", "print(serial, \"\\n\")\n", "print(\"-- Properties --\")\n", "print(\"name :\", serial.name)\n", "print(\"sublayers :\", serial.sublayers)\n", "print(\"expected inputs :\", serial.n_in)\n", "print(\"promised outputs :\", serial.n_out, \"\\n\")\n", "\n", "# Inputs\n", "print(\"-- Inputs --\")\n", "print(\"x :\", x, \"\\n\")\n", "\n", "# Outputs\n", "y = serial(x)\n", "print(\"-- Outputs --\")\n", "print(\"y :\", y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A slightly trickier example:\n", "\n", "Normally, the Residual layer will accept a layer as an argument and it will add the output of that layer to the current stack top input. In the example below, you'll notice that in the last step, we specify `tl.Residual(Addition())`. If you refer to the same figure above, you'll notice that the stack at that point has `21 4` where `21` is the top of the stack. The Residual layer remembers this value (i.e. `21`) so the result of the `Addition()` layer nested into it (i.e. `25`) is added to this stack top input to arrive at the result: `46`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "serial = tl.Serial(\n", " tl.Select([0, 1, 0, 1]), Addition(), Multiplication(), tl.Residual(Addition()) \n", ")\n", "\n", "# Initialization\n", "x = (np.array([3]), np.array([4])) # input\n", "\n", "serial.init(shapes.signature(x)) # initializing serial instance\n", "\n", "\n", "print(\"-- Serial Model --\")\n", "print(serial, \"\\n\")\n", "print(\"-- Properties --\")\n", "print(\"name :\", serial.name)\n", "print(\"sublayers :\", serial.sublayers)\n", "print(\"expected inputs :\", serial.n_in)\n", "print(\"promised outputs :\", serial.n_out, \"\\n\")\n", "\n", "# Inputs\n", "print(\"-- Inputs --\")\n", "print(\"x :\", x, \"\\n\")\n", "\n", "# Outputs\n", "y = serial(x)\n", "print(\"-- Outputs --\")\n", "print(\"y :\", y)" ] } ], "metadata": { "jupytext": { "encoding": "# -*- coding: utf-8 -*-", "formats": "ipynb,py:percent" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }