{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Guide to Tensorflow Keras on TPUs using MNIST.ipynb", "version": "0.3.2", "provenance": [], "collapsed_sections": [], "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "accelerator": "TPU" }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "[View in Colaboratory](https://colab.research.google.com/github/cedrickchee/data-science-notebooks/blob/master/notebooks/tensorflow/google_cloud_tpu/guide_to_tensorflow_keras_on_tpu_mnist.ipynb)" ] }, { "metadata": { "id": "ON-UmcT_ZFqw", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Guide to TensorFlow + Keras on TPUs for free on Google Colab\n", "\n", "Here is a very quick implemention and walkthrough to show using TPUs with Keras in Colab.\n", "\n", "If you have any questions or suggestions to make it better please let me know." ] }, { "metadata": { "id": "dPamRBokUZEq", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "import numpy as np\n", "\n", "import tensorflow as tf\n", "import time\n", "import os\n", "\n", "import tensorflow.keras\n", "from tensorflow.keras.datasets import mnist, fashion_mnist\n", "from tensorflow.keras.models import Sequential, Model\n", "from tensorflow.keras.layers import Dense, Dropout, Flatten,Input\n", "from tensorflow.keras.layers import Conv2D, MaxPooling2D\n", "from tensorflow.keras import backend as K\n" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "95qn1rJyHFz5", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 53 }, "outputId": "1fd8e49e-d6de-40f8-e27b-885bb06cad94" }, "cell_type": "code", "source": [ "print(tf.__version__)\n", "print(tf.keras.__version__)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "1.11.0-rc2\n", "2.1.6-tf\n" ], "name": "stdout" } ] }, { "metadata": { "id": "Zq4l11_Dtx8b", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Check for TPU\n", "\n", "First, test if you have TPU set up.\n", "\n", "Run the Cell below.\n", "\n", "If no TPU is found, press \"Runtime\" (in the menu at the top) and choose \"Change Runtime Type\" to TPU.\n", "\n", "The `TPU_ADDRESS` variable will be needed to pass into the distribution strategy." ] }, { "metadata": { "id": "dwqHrONrZtng", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "outputId": "6ae3bbd0-1623-4873-f616-55bcb8ab0f88" }, "cell_type": "code", "source": [ "try:\n", " device_name = os.environ['COLAB_TPU_ADDR']\n", " TPU_ADDRESS = 'grpc://' + device_name\n", " print('Found TPU at: {}'.format(TPU_ADDRESS))\n", "\n", "except KeyError:\n", " print('TPU not found')" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Found TPU at: grpc://10.114.111.10:8470\n" ], "name": "stdout" } ] }, { "metadata": { "id": "I6mOaxj3k30j", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Normal MNIST Stuff" ] }, { "metadata": { "id": "181VT0eOUkL3", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "batch_size = 1024\n", "num_classes = 10\n", "epochs = 5\n", "learning_rate = 0.001\n", "\n", "# input image dimensions\n", "img_rows, img_cols = 28, 28" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "LHmvV1heVGDi", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 53 }, "outputId": "17240d89-63e9-4818-8f26-b869632d9c28" }, "cell_type": "code", "source": [ "# the data, shuffled and split between train and test sets\n", "(x_train, y_train), (x_test, y_test) = mnist.load_data()" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz\n", "11493376/11490434 [==============================] - 0s 0us/step\n" ], "name": "stdout" } ] }, { "metadata": { "id": "1tidRmu9VM4E", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)\n", "x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)\n", "input_shape = (img_rows, img_cols, 1)" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "vfinmHzDX6SH", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 71 }, "outputId": "5bd2c783-5964-4e40-d89c-3364c7544ca1" }, "cell_type": "code", "source": [ "x_train = x_train.astype('float32')\n", "x_test = x_test.astype('float32')\n", "x_train /= 255\n", "x_test /= 255\n", "print('x_train shape:', x_train.shape)\n", "print(x_train.shape[0], 'train samples')\n", "print(x_test.shape[0], 'test samples')" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "x_train shape: (60000, 28, 28, 1)\n", "60000 train samples\n", "10000 test samples\n" ], "name": "stdout" } ] }, { "metadata": { "id": "HpOYyqEnX-G1", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "# convert class vectors to binary class matrices\n", "y_train = tf.keras.utils.to_categorical(y_train, num_classes)\n", "y_test = tf.keras.utils.to_categorical(y_test, num_classes)" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "O_K39jsGzJL-", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Use `tf.data`\n", "\n", "You need to make sure you have `drop_remainder = True` as TPUs need to have a fixed shape." ] }, { "metadata": { "id": "abbwQQfH0td3", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "def train_input_fn(batch_size=1024):\n", " # convert the inputs to a Dataset.\n", " dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))\n", "\n", " # shuffle, repeat, and batch the examples.\n", " dataset = dataset.shuffle(1000).repeat().batch(batch_size, drop_remainder=True)\n", "\n", " # return the dataset.\n", " return dataset" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "QVu91avJzMAO", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "def test_input_fn(batch_size=1024):\n", " # convert the inputs to a Dataset.\n", " dataset = tf.data.Dataset.from_tensor_slices((x_test,y_test))\n", "\n", " # shuffle, repeat, and batch the examples.\n", " dataset = dataset.shuffle(1000).repeat().batch(batch_size, drop_remainder=True)\n", "\n", " # return the dataset.\n", " return dataset" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "G_spUwX0VYGt", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Create the model\n", "\n", "You must pass in an input shape and batch size as TPUs (and XLA) require fixed shapes.\n", "\n", "The rest of the model is just a simple CNN." ] }, { "metadata": { "id": "qHzyhDMhVXHy", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "Inp = tf.keras.Input(\n", " name='input', shape=input_shape, batch_size=batch_size, dtype=tf.float32)\n", "\n", "x = Conv2D(32, kernel_size=(3, 3), activation='relu',name = 'Conv_01')(Inp)\n", "x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_01')(x)\n", "x = Conv2D(64, (3, 3), activation='relu',name = 'Conv_02')(x)\n", "x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_02')(x)\n", "x = Conv2D(64, (3, 3), activation='relu',name = 'Conv_03')(x)\n", "x = Flatten(name = 'Flatten_01')(x)\n", "x = Dense(64, activation='relu',name = 'Dense_01')(x)\n", "x = Dropout(0.5,name = 'Dropout_02')(x)\n", "\n", "output = Dense(num_classes, activation='softmax',name = 'Dense_02')(x)" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "xj-jMmGnuKX0", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "model = tf.keras.Model(inputs=[Inp], outputs=[output])" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "D00NKseRuOR3", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "# use a tf optimizer rather than a Keras one for now\n", "opt = tf.train.AdamOptimizer(learning_rate)\n", "\n", "model.compile(\n", " optimizer=opt,\n", " loss='categorical_crossentropy',\n", " metrics=['acc'])" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "mQnZM5JYlRvs", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Creating the TPU from a Keras Model\n", "\n", "`tf.contrib.tpu.keras_to_tpu_model` will eventually go away and you will pass it into the `model.compile` as a distribution strategy, but for 1.11 this works. \n", "\n", "We can see this is a TPU v2 with 8 cores.\n", "\n", "For batching you want to have a batch of 128 per core so 1024 overall.\n", "\n", "You could also use 128, 256, 512 etc." ] }, { "metadata": { "id": "TFxqdkz-joxg", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 359 }, "outputId": "cec992c9-0563-4d03-c0ac-9ca9e3a01c08" }, "cell_type": "code", "source": [ "tpu_model = tf.contrib.tpu.keras_to_tpu_model(\n", " model,\n", " strategy=tf.contrib.tpu.TPUDistributionStrategy(\n", " tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "INFO:tensorflow:Querying Tensorflow master (b'grpc://10.114.111.10:8470') for TPU system metadata.\n", "INFO:tensorflow:Found TPU system:\n", "INFO:tensorflow:*** Num TPU Cores: 8\n", "INFO:tensorflow:*** Num TPU Workers: 1\n", "INFO:tensorflow:*** Num TPU Cores Per Worker: 8\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 973931917537708864)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 8792028991883212283)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 10595085297325393161)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 10139671714968909828)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 10491071598227653110)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 3213028352983874138)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 13713210220232872762)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 16117693853034682668)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 3592681710289544177)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 12525050454546375568)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 17588780763802917777)\n", "INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 1127662344348348349)\n", "WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.\n", "INFO:tensorflow:Connecting to: b'grpc://10.114.111.10:8470'\n" ], "name": "stdout" } ] }, { "metadata": { "id": "g2u9PUA9W7NK", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 503 }, "outputId": "af9e35b6-79ee-4d4b-9711-2bcd26ae1ae0" }, "cell_type": "code", "source": [ "tpu_model.summary()" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "input (InputLayer) (1024, 28, 28, 1) 0 \n", "_________________________________________________________________\n", "Conv_01 (Conv2D) (1024, 26, 26, 32) 320 \n", "_________________________________________________________________\n", "MaxPool_01 (MaxPooling2D) (1024, 13, 13, 32) 0 \n", "_________________________________________________________________\n", "Conv_02 (Conv2D) (1024, 11, 11, 64) 18496 \n", "_________________________________________________________________\n", "MaxPool_02 (MaxPooling2D) (1024, 5, 5, 64) 0 \n", "_________________________________________________________________\n", "Conv_03 (Conv2D) (1024, 3, 3, 64) 36928 \n", "_________________________________________________________________\n", "Flatten_01 (Flatten) (1024, 576) 0 \n", "_________________________________________________________________\n", "Dense_01 (Dense) (1024, 64) 36928 \n", "_________________________________________________________________\n", "Dropout_02 (Dropout) (1024, 64) 0 \n", "_________________________________________________________________\n", "Dense_02 (Dense) (1024, 10) 650 \n", "=================================================================\n", "Total params: 93,322\n", "Trainable params: 93,322\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ], "name": "stdout" } ] }, { "metadata": { "id": "_w2mss3nltod", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Training using `tf.data pipeline`\n", "\n", "Obviously training MNIST on a TPU is a bit overkill and the TPU barely gets a chance to warm up. ^-^" ] }, { "metadata": { "id": "SSqmECv9kKFO", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 485 }, "outputId": "d06202cd-446d-42aa-ec34-c3b9bed6906c" }, "cell_type": "code", "source": [ "tpu_model.fit(\n", " train_input_fn,\n", " steps_per_epoch = 60,\n", " epochs=10,\n", ")" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Epoch 1/10\n", "INFO:tensorflow:New input shapes; (re-)compiling: mode=train, [TensorSpec(shape=(1024, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(1024, 10), dtype=tf.float32, name=None)]\n", "INFO:tensorflow:Overriding default placeholder.\n", "INFO:tensorflow:Remapping placeholder for input\n", "INFO:tensorflow:Started compiling\n", "INFO:tensorflow:Finished compiling. Time elapsed: 1.6537692546844482 secs\n", "INFO:tensorflow:Setting weights on TPU model.\n", "60/60 [==============================] - 6s 104ms/step - loss: 0.9355 - acc: 0.7056\n", "Epoch 2/10\n", "60/60 [==============================] - 3s 44ms/step - loss: 0.2260 - acc: 0.9349\n", "Epoch 3/10\n", "60/60 [==============================] - 3s 46ms/step - loss: 0.1372 - acc: 0.9606\n", "Epoch 4/10\n", "60/60 [==============================] - 3s 48ms/step - loss: 0.1055 - acc: 0.9702\n", "Epoch 5/10\n", "60/60 [==============================] - 3s 48ms/step - loss: 0.0838 - acc: 0.9760\n", "Epoch 6/10\n", "60/60 [==============================] - 3s 48ms/step - loss: 0.0696 - acc: 0.9799\n", "Epoch 7/10\n", "60/60 [==============================] - 3s 44ms/step - loss: 0.0623 - acc: 0.9820\n", "Epoch 8/10\n", "60/60 [==============================] - 3s 44ms/step - loss: 0.0576 - acc: 0.9838\n", "Epoch 9/10\n", "60/60 [==============================] - 3s 43ms/step - loss: 0.0492 - acc: 0.9858\n", "Epoch 10/10\n", "60/60 [==============================] - 3s 43ms/step - loss: 0.0449 - acc: 0.9875\n" ], "name": "stdout" } ] }, { "metadata": { "id": "BbC4yE3zYFhL", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "outputId": "2cf01879-4d29-4c60-d0ba-ae45e2820277" }, "cell_type": "code", "source": [ "tpu_model.save_weights('./MNIST_TPU_1024.h5', overwrite=True)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "INFO:tensorflow:Copying TPU weights to the CPU\n" ], "name": "stdout" } ] }, { "metadata": { "id": "4oVNywSIpt7-", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Inference\n", "\n", "Evaluate model." ] }, { "metadata": { "id": "13MbQUW5khfC", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "outputId": "a7afc0a3-8d57-4b4b-d090-bb055340c1c7" }, "cell_type": "code", "source": [ "tpu_model.evaluate(test_input_fn, steps = 100)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "INFO:tensorflow:New input shapes; (re-)compiling: mode=eval, [TensorSpec(shape=(1024, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(1024, 10), dtype=tf.float32, name=None)]\n", "INFO:tensorflow:Overriding default placeholder.\n", "INFO:tensorflow:Remapping placeholder for input\n", "INFO:tensorflow:Started compiling\n", "INFO:tensorflow:Finished compiling. Time elapsed: 0.9941656589508057 secs\n", "100/100 [==============================] - 7s 65ms/step\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/plain": [ "[0.0268026649922831, 0.991123046875]" ] }, "metadata": { "tags": [] }, "execution_count": 18 } ] }, { "metadata": { "id": "hqMMMhPr4C0X", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Doing it the exact same thing without `tf.data` is much slower!" ] }, { "metadata": { "id": "GjYJOcL7kunS", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "outputId": "7c5cdc6b-a06d-4c1e-daac-dc1f3d4f6b22" }, "cell_type": "code", "source": [ "tpu_model.fit(x_train, y_train, epochs=1)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Epoch 1/1\n", "INFO:tensorflow:New input shapes; (re-)compiling: mode=train, [TensorSpec(shape=(4, 28, 28, 1), dtype=tf.float32, name='input0'), TensorSpec(shape=(4, 10), dtype=tf.float32, name='Dense_02_target_10')]\n", "INFO:tensorflow:Overriding default placeholder.\n", "INFO:tensorflow:Remapping placeholder for input\n", "INFO:tensorflow:Started compiling\n", "INFO:tensorflow:Finished compiling. Time elapsed: 1.0541026592254639 secs\n", "60000/60000 [==============================] - 58s 964us/step - loss: 0.0991 - acc: 0.9708\n" ], "name": "stdout" } ] }, { "metadata": { "id": "ltIQKpHVoHfh", "colab_type": "text" }, "cell_type": "markdown", "source": [ "_**Note:**_\n", "\n", "_This notebook was adapted from the Jupyter notebook used for the demo during the talk, \"Get training in Keras on TPUs for free\" at Singapore TensorFlow and Deep Learning group meetup on 2018-09-28 GMT+8. Thanks to Sam Witteveen._\n", "\n", "_Slides: https://www.dropbox.com/s/jg7j07unw94wbom/TensorFlow%20Keras%20Colab%20TPUs.pdf?dl=0_" ] } ] }