{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Custom Layers in Tensorflow 2\n", "\n", "> Custom layers give you the flexibility to implement models that use non-standard layers. In this post, we will practice uilding off of existing standard layers to create custom layers for your models. This is the summary of lecture \"Custom Models, Layers and Loss functions with Tensorflow\" from DeepLearning.AI.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Coursera, Tensorflow, DeepLearning.AI]\n", "- image: images/model_simpleQuadratic.png" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import tensorflow as tf\n", "from tensorflow.keras.utils import plot_model\n", "from tensorflow.keras import backend as K\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1 - Lambda Layer\n", "\n", "In this section, it will show how you can define custom layers with the [Lambda](https://keras.io/api/layers/core_layers/lambda/) layer. You can either use [lambda functions](https://www.w3schools.com/python/python_lambda.asp) within the Lambda layer or define a custom function that the Lambda layer will call." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare the data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "mnist = tf.keras.datasets.mnist\n", "\n", "(X_train, y_train), (X_test, y_test) = mnist.load_data()\n", "X_train, X_test = X_train / 255.0, X_test / 255.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Build the model\n", "\n", "Here, we'll use a Lambda layer to define a custom layer in our network. We're using a lambda function to get the absolute value of the layer input." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "model = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=(28, 28)),\n", " tf.keras.layers.Dense(128),\n", " tf.keras.layers.Lambda(lambda x: tf.abs(x)),\n", " tf.keras.layers.Dense(10, activation='softmax')\n", "])\n", "\n", "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_model(model, show_layer_names=True, show_shapes=True, show_dtype=True, to_file='./image/lambda_model.png')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/5\n", "1875/1875 [==============================] - 2s 776us/step - loss: 0.4046 - accuracy: 0.8879\n", "Epoch 2/5\n", "1875/1875 [==============================] - 1s 744us/step - loss: 0.0971 - accuracy: 0.9718\n", "Epoch 3/5\n", "1875/1875 [==============================] - 1s 733us/step - loss: 0.0653 - accuracy: 0.9805\n", "Epoch 4/5\n", "1875/1875 [==============================] - 1s 735us/step - loss: 0.0480 - accuracy: 0.9848\n", "Epoch 5/5\n", "1875/1875 [==============================] - 1s 726us/step - loss: 0.0401 - accuracy: 0.9873\n", "313/313 [==============================] - 0s 710us/step - loss: 0.0847 - accuracy: 0.9764\n" ] }, { "data": { "text/plain": [ "[0.08468984067440033, 0.9764000177383423]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.fit(X_train, y_train, epochs=5)\n", "model.evaluate(X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way to use the Lambda layer is to pass in a function defined outside the model. The code below shows how a custom ReLU function is used as a custom layer in the model." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def my_relu(x):\n", " return K.maximum(-0.1, x)\n", "\n", "model = tf.keras.models.Sequential([\n", " tf.keras.layers.Flatten(input_shape=(28, 28)),\n", " tf.keras.layers.Dense(128),\n", " tf.keras.layers.Lambda(my_relu), \n", " tf.keras.layers.Dense(10, activation='softmax')\n", "])\n", "\n", "model.compile(optimizer='adam',\n", " loss='sparse_categorical_crossentropy',\n", " metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/5\n", "1875/1875 [==============================] - 2s 750us/step - loss: 0.4280 - accuracy: 0.8812\n", "Epoch 2/5\n", "1875/1875 [==============================] - 1s 759us/step - loss: 0.1238 - accuracy: 0.96390s - loss: 0.1259 - ac\n", "Epoch 3/5\n", "1875/1875 [==============================] - 1s 789us/step - loss: 0.0800 - accuracy: 0.9759\n", "Epoch 4/5\n", "1875/1875 [==============================] - 1s 755us/step - loss: 0.0557 - accuracy: 0.9830\n", "Epoch 5/5\n", "1875/1875 [==============================] - 1s 760us/step - loss: 0.0420 - accuracy: 0.9879\n", "313/313 [==============================] - 0s 706us/step - loss: 0.0743 - accuracy: 0.9775\n" ] }, { "data": { "text/plain": [ "[0.07428351789712906, 0.9775000214576721]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.fit(X_train, y_train, epochs=5)\n", "model.evaluate(X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2 - Building a Custom Dense Layer\n", "\n", "In this section, we'll walk through how to create a custom layer that inherits the [Layer](https://keras.io/api/layers/base_layer/#layer-class) class. Unlike simple Lambda layers you did previously, the custom layer here will contain weights that can be updated during training." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare the Data" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# define the dataset\n", "xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)\n", "ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Custom Layer with weights\n", "\n", "To make custom layer that is trainable, we need to define a class that inherits the [Layer](https://keras.io/api/layers/base_layer/#layer-class) base class from Keras. The Python syntax is shown below in the class declaration. This class requires three functions: `__init__()`, `build()` and `call()`. These ensure that our custom layer has a *state* and *computation* that can be accessed during training or inference." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.layers import Layer\n", "\n", "class SimpleDense(Layer):\n", " def __init__(self, units=32):\n", " '''\n", " Initialize the instance attributes\n", " '''\n", " super(SimpleDense, self).__init__()\n", " self.units = units\n", " \n", " def build(self, input_shape):\n", " '''\n", " Create the state of the layer (weights)\n", " '''\n", " w_init = tf.random_normal_initializer()\n", " self.w = tf.Variable(name='kernel',\n", " initial_value=w_init(shape=(input_shape[-1], self.units), dtype='float32'),\n", " trainable=True)\n", " \n", " # initialize bias\n", " b_init = tf.zeros_initializer()\n", " self.b = tf.Variable(name='bias',\n", " initial_value=b_init(shape=(self.units,), dtype='float32'),\n", " trainable=True)\n", " \n", " def call(self, inputs):\n", " '''\n", " Defines the computation from inputs to outputs\n", " '''\n", " return tf.matmul(inputs, self.w) + self.b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use our custom layer like below:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# declare an instance of the class\n", "my_dense = SimpleDense(units=1)\n", "\n", "# define an input and feed into the layer\n", "x = tf.ones((1, 1))\n", "y = my_dense(x)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[,\n", " ]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# parameters of the base layer class like 'variables' can be used\n", "my_dense.variables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's then try using it in simple network:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# use the Sequential API to build a model with our custom layer\n", "my_layer = SimpleDense(units=1)\n", "model = tf.keras.Sequential([my_layer])\n", "\n", "# configure and train the model\n", "model.compile(optimizer='sgd', loss='mean_squared_error')\n", "model.fit(xs, ys, epochs=500, verbose=0)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[18.981411]], dtype=float32)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# perform inference\n", "model.predict([10.0])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[,\n", " ]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_layer.variables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Activation in a custom layer\n", "\n", "In this section, we extend our knowledge of building custom layers by adding an activation parameter. The implementation is pretty straightforward as you'll see below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare the Data\n", "\n", "we'll use MNIST dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding an activation layer\n", "\n", "To use the built-in activations in Keras, we can specify an `activation` parameter in the `__init__()` method of our custom layer class. From there, we can initialize it by using the `tf.keras.activations.get()` method. This takes in a string identifier that corresponds to one of the [available activations](https://keras.io/api/layers/activations/#available-activations) in Keras. Next, you can now pass in the forward computation to this activation in the `call()` method." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "class SimpleDense(Layer):\n", " # add an activation paramter\n", " def __init__(self, units=32, activation=None):\n", " super(SimpleDense, self).__init__()\n", " self.units = units\n", " \n", " # define the activation to get from the built-in activation layers in Keras\n", " self.activation = tf.keras.activations.get(activation)\n", " \n", " def build(self, input_shape):\n", " # initialize the weight\n", " w_init = tf.random_normal_initializer()\n", " self.w = tf.Variable(name='kernel',\n", " initial_value=w_init(shape=(input_shape[-1], self.units)),\n", " trainable=True)\n", " \n", " # intialize the bias\n", " b_init = tf.zeros_initializer()\n", " self.b = tf.Variable(name='bias',\n", " initial_value=b_init(shape=(self.units, )),\n", " trainable=True)\n", " \n", " def call(self, inputs):\n", " # pass the computation to the activation layer\n", " return self.activation(tf.matmul(inputs, self.w) + self.b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now pass in an activation parameter to our custom layer. The string identifier is mostly the same as the function name so 'relu' below will get `tf.keras.activations.relu`." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "model = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=(28, 28)),\n", " SimpleDense(128, activation='relu'),\n", " tf.keras.layers.Dropout(0.2),\n", " tf.keras.layers.Dense(10, activation='softmax')\n", "])\n", "\n", "model.compile(optimizer='adam', \n", " loss='sparse_categorical_crossentropy',\n", " metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_model(model, show_shapes=True, show_layer_names=True, to_file='./image/model_simpleDense.png')" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/5\n", "1875/1875 [==============================] - 2s 753us/step - loss: 0.4704 - accuracy: 0.8634\n", "Epoch 2/5\n", "1875/1875 [==============================] - 1s 743us/step - loss: 0.1502 - accuracy: 0.9564\n", "Epoch 3/5\n", "1875/1875 [==============================] - 1s 759us/step - loss: 0.1107 - accuracy: 0.9669\n", "Epoch 4/5\n", "1875/1875 [==============================] - 1s 719us/step - loss: 0.0898 - accuracy: 0.9720\n", "Epoch 5/5\n", "1875/1875 [==============================] - 1s 762us/step - loss: 0.0721 - accuracy: 0.9773\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.fit(X_train, y_train, epochs=5)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "313/313 [==============================] - 0s 742us/step - loss: 0.0743 - accuracy: 0.9752\n" ] }, { "data": { "text/plain": [ "[0.07433376461267471, 0.9751999974250793]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.evaluate(X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Application - Implement a Quadratic Layer\n", "\n", "In this section, you will build a custom quadratic layer which computes $y = ax^2 + bx + c$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define the quadratic layer \n", "Implement a simple quadratic layer. It has 3 state variables: $a$, $b$ and $c$. The computation returned is $ax^2 + bx + c$. Make sure it can also accept an activation function.\n", "\n", "#### `__init__`\n", "- call `super(my_fun, self)` to access the base class of `my_fun`, and call the `__init__()` function to initialize that base class. In this case, `my_fun` is `SimpleQuadratic` and its base class is `Layer`.\n", "- self.units: set this using one of the function parameters.\n", "- self.activation: The function parameter `activation` will be passed in as a string. To get the tensorflow object associated with the string, please use `tf.keras.activations.get()` \n", "\n", "\n", "#### `build`\n", "The following are suggested steps for writing your code. If you prefer to use fewer lines to implement it, feel free to do so. Either way, you'll want to set `self.a`, `self.b` and `self.c`.\n", "\n", "- a_init: set this to tensorflow's `random_normal_initializer()`\n", "- a_init_val: Use the `random_normal_initializer()` that you just created and invoke it, setting the `shape` and `dtype`.\n", " - The `shape` of `a` should have its row dimension equal to the last dimension of `input_shape`, and its column dimension equal to the number of units in the layer. \n", " - This is because you'll be matrix multiplying x^2 * a, so the dimensions should be compatible.\n", " - set the dtype to 'float32'\n", "- self.a: create a tensor using tf.Variable, setting the initial_value and set trainable to True.\n", "\n", "- b_init, b_init_val, and self.b: these will be set in the same way that you implemented a_init, a_init_val and self.a\n", "- c_init: set this to `tf.zeros_initializer`.\n", "- c_init_val: Set this by calling the tf.zeros_initializer that you just instantiated, and set the `shape` and `dtype`\n", " - shape: This will be a vector equal to the number of units. This expects a tuple, and remember that a tuple `(9,)` includes a comma.\n", " - dtype: set to 'float32'.\n", "- self.c: create a tensor using tf.Variable, and set the parameters `initial_value` and `trainable`.\n", "\n", "#### `call`\n", "The following section performs the multiplication x^2*a + x*b + c. The steps are broken down for clarity, but you can also perform this calculation in fewer lines if you prefer.\n", "- x_squared: use tf.math.square()\n", "- x_squared_times_a: use tf.matmul(). \n", " - If you see an error saying `InvalidArgumentError: Matrix size-incompatible`, please check the order of the matrix multiplication to make sure that the matrix dimensions line up.\n", "- x_times_b: use tf.matmul().\n", "- x2a_plus_xb_plus_c: add the three terms together.\n", "- activated_x2a_plus_xb_plus_c: apply the class's `activation` to the sum of the three terms.\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "class SimpleQuadratic(Layer):\n", "\n", " def __init__(self, units=32, activation=None):\n", " '''Initializes the class and sets up the internal variables'''\n", " super(SimpleQuadratic, self).__init__()\n", " self.units = units\n", " self.activation = tf.keras.activations.get(activation)\n", " \n", " def build(self, input_shape):\n", " '''Create the state of the layer (weights)'''\n", " a_init = tf.random_normal_initializer()\n", " a_init_val = a_init(shape=(input_shape[-1], self.units),\n", " dtype='float32')\n", " self.a = tf.Variable(name='a', \n", " initial_value=a_init_val,\n", " trainable=True)\n", " \n", " b_init = tf.random_normal_initializer()\n", " b_init_val = b_init(shape=(input_shape[-1], self.units),\n", " dtype='float32')\n", " self.b = tf.Variable(name='b',\n", " initial_value=b_init_val,\n", " trainable=True)\n", " \n", " c_init = tf.zeros_initializer()\n", " c_init_val = c_init(shape=(self.units, ),\n", " dtype='float32')\n", " self.c = tf.Variable(name='c',\n", " initial_value=c_init_val,\n", " trainable=True)\n", " super().build(input_shape)\n", " \n", " \n", " def call(self, inputs):\n", " '''Defines the computation from inputs to outputs'''\n", " x_squared = tf.math.square(inputs)\n", " x_squared_times_a = tf.matmul(x_squared, self.a)\n", " x_times_b = tf.matmul(inputs, self.b)\n", " x2a_plus_xb_plus_c = x_squared_times_a + x_times_b + self.c\n", " activated_x2a_plus_xb_plus_c = self.activation(x2a_plus_xb_plus_c)\n", " return activated_x2a_plus_xb_plus_c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train your model with the `SimpleQuadratic` layer that you just implemented." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "model = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=(28, 28)),\n", " SimpleQuadratic(128, activation='relu'),\n", " tf.keras.layers.Dropout(0.2),\n", " tf.keras.layers.Dense(10, activation='softmax')\n", "])\n", "\n", "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot_model(model, show_shapes=True, show_layer_names=True, to_file='./image/model_simpleQuadratic.png')" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/5\n", "1875/1875 [==============================] - 2s 972us/step - loss: 0.4291 - accuracy: 0.8678\n", "Epoch 2/5\n", "1875/1875 [==============================] - 2s 1ms/step - loss: 0.1362 - accuracy: 0.9589\n", "Epoch 3/5\n", "1875/1875 [==============================] - 2s 883us/step - loss: 0.1015 - accuracy: 0.9683\n", "Epoch 4/5\n", "1875/1875 [==============================] - 1s 765us/step - loss: 0.0800 - accuracy: 0.9743\n", "Epoch 5/5\n", "1875/1875 [==============================] - 1s 755us/step - loss: 0.0702 - accuracy: 0.9772\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.fit(X_train, y_train, epochs=5)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "313/313 [==============================] - 0s 700us/step - loss: 0.0779 - accuracy: 0.9775\n" ] }, { "data": { "text/plain": [ "[0.07792823016643524, 0.9775000214576721]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.evaluate(X_test, y_test)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }