[View in Colaboratory](https://colab.research.google.com/github/cedrickchee/data-science-notebooks/blob/master/notebooks/tensorflow/google_cloud_tpu/guide_to_tensorflow_keras_on_tpu_mnist.ipynb)

## Guide to TensorFlow + Keras on TPUs for free on Google Colab

Here is a very quick implemention and walkthrough to show using TPUs with Keras in Colab.

If you have any questions or suggestions to make it better please let me know.

In [0]:
import numpy as np

import tensorflow as tf
import time
import os

import tensorflow.keras
from tensorflow.keras.datasets import mnist, fashion_mnist
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Dropout, Flatten,Input
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K


In [0]:
print(tf.__version__)
print(tf.keras.__version__)

1.11.0-rc2
2.1.6-tf


## Check for TPU

First, test if you have TPU set up.

Run the Cell below.

If no TPU is found, press "Runtime" (in the menu at the top) and choose "Change Runtime Type" to TPU.

The `TPU_ADDRESS` variable will be needed to pass into the distribution strategy.

In [0]:
try:
 device_name = os.environ['COLAB_TPU_ADDR']
 TPU_ADDRESS = 'grpc://' + device_name
 print('Found TPU at: {}'.format(TPU_ADDRESS))

except KeyError:
 print('TPU not found')

Found TPU at: grpc://10.114.111.10:8470


### Normal MNIST Stuff

In [0]:
batch_size = 1024
num_classes = 10
epochs = 5
learning_rate = 0.001

# input image dimensions
img_rows, img_cols = 28, 28

In [0]:
# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [0]:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

In [0]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [0]:
# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

## Use `tf.data`

You need to make sure you have `drop_remainder = True` as TPUs need to have a fixed shape.

In [0]:
def train_input_fn(batch_size=1024):
 # convert the inputs to a Dataset.
 dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))

 # shuffle, repeat, and batch the examples.
 dataset = dataset.shuffle(1000).repeat().batch(batch_size, drop_remainder=True)

 # return the dataset.
 return dataset

In [0]:
def test_input_fn(batch_size=1024):
 # convert the inputs to a Dataset.
 dataset = tf.data.Dataset.from_tensor_slices((x_test,y_test))

 # shuffle, repeat, and batch the examples.
 dataset = dataset.shuffle(1000).repeat().batch(batch_size, drop_remainder=True)

 # return the dataset.
 return dataset

## Create the model

You must pass in an input shape and batch size as TPUs (and XLA) require fixed shapes.

The rest of the model is just a simple CNN.

In [0]:
Inp = tf.keras.Input(
 name='input', shape=input_shape, batch_size=batch_size, dtype=tf.float32)

x = Conv2D(32, kernel_size=(3, 3), activation='relu',name = 'Conv_01')(Inp)
x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_01')(x)
x = Conv2D(64, (3, 3), activation='relu',name = 'Conv_02')(x)
x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_02')(x)
x = Conv2D(64, (3, 3), activation='relu',name = 'Conv_03')(x)
x = Flatten(name = 'Flatten_01')(x)
x = Dense(64, activation='relu',name = 'Dense_01')(x)
x = Dropout(0.5,name = 'Dropout_02')(x)

output = Dense(num_classes, activation='softmax',name = 'Dense_02')(x)

In [0]:
model = tf.keras.Model(inputs=[Inp], outputs=[output])

In [0]:
# use a tf optimizer rather than a Keras one for now
opt = tf.train.AdamOptimizer(learning_rate)

model.compile(
 optimizer=opt,
 loss='categorical_crossentropy',
 metrics=['acc'])

## Creating the TPU from a Keras Model

`tf.contrib.tpu.keras_to_tpu_model` will eventually go away and you will pass it into the `model.compile` as a distribution strategy, but for 1.11 this works. 

We can see this is a TPU v2 with 8 cores.

For batching you want to have a batch of 128 per core so 1024 overall.

You could also use 128, 256, 512 etc.

In [0]:
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
 model,
 strategy=tf.contrib.tpu.TPUDistributionStrategy(
 tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))

INFO:tensorflow:Querying Tensorflow master (b'grpc://10.114.111.10:8470') for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 973931917537708864)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 8792028991883212283)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 10595085297325393161)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 10139671714968909828)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 10491071598227653110)
INFO:tensorflow:*** Available Device: _Devic

In [0]:
tpu_model.summary()

_________________________________________________________________
Layer (type) Output Shape Param # 
input (InputLayer) (1024, 28, 28, 1) 0 
_________________________________________________________________
Conv_01 (Conv2D) (1024, 26, 26, 32) 320 
_________________________________________________________________
MaxPool_01 (MaxPooling2D) (1024, 13, 13, 32) 0 
_________________________________________________________________
Conv_02 (Conv2D) (1024, 11, 11, 64) 18496 
_________________________________________________________________
MaxPool_02 (MaxPooling2D) (1024, 5, 5, 64) 0 
_________________________________________________________________
Conv_03 (Conv2D) (1024, 3, 3, 64) 36928 
_________________________________________________________________
Flatten_01 (Flatten) (1024, 576) 0 
_________________________________________________________________
Dense_01 (Dense) (1024, 64) 36928 
_________________________________________________________________
Dropout_02 (Dropout) (1024, 64) 0 
______

## Training using `tf.data pipeline`

Obviously training MNIST on a TPU is a bit overkill and the TPU barely gets a chance to warm up. ^-^

In [0]:
tpu_model.fit(
 train_input_fn,
 steps_per_epoch = 60,
 epochs=10,
)

Epoch 1/10
INFO:tensorflow:New input shapes; (re-)compiling: mode=train, [TensorSpec(shape=(1024, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(1024, 10), dtype=tf.float32, name=None)]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for input
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 1.6537692546844482 secs
INFO:tensorflow:Setting weights on TPU model.
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [0]:
tpu_model.save_weights('./MNIST_TPU_1024.h5', overwrite=True)

INFO:tensorflow:Copying TPU weights to the CPU


## Inference

Evaluate model.

In [0]:
tpu_model.evaluate(test_input_fn, steps = 100)

INFO:tensorflow:New input shapes; (re-)compiling: mode=eval, [TensorSpec(shape=(1024, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(1024, 10), dtype=tf.float32, name=None)]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for input
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 0.9941656589508057 secs


[0.0268026649922831, 0.991123046875]

## Doing it the exact same thing without `tf.data` is much slower!

In [0]:
tpu_model.fit(x_train, y_train, epochs=1)

Epoch 1/1
INFO:tensorflow:New input shapes; (re-)compiling: mode=train, [TensorSpec(shape=(4, 28, 28, 1), dtype=tf.float32, name='input0'), TensorSpec(shape=(4, 10), dtype=tf.float32, name='Dense_02_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for input
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 1.0541026592254639 secs


_**Note:**_

_This notebook was adapted from the Jupyter notebook used for the demo during the talk, "Get training in Keras on TPUs for free" at Singapore TensorFlow and Deep Learning group meetup on 2018-09-28 GMT+8. Thanks to Sam Witteveen._

_Slides: https://www.dropbox.com/s/jg7j07unw94wbom/TensorFlow%20Keras%20Colab%20TPUs.pdf?dl=0_