{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "import os\n", "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\";\n", "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tensorflow version 2.1.0\n" ] } ], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "print(\"Tensorflow version \" + tf.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using *ktrain* to Facilitate a Normal TensorFlow Workflow\n", "\n", "This example notebook simply illustrates how *ktrain* can be used in a **minimally-invasive** way within\n", "a normal TensorFlow workflow. In this notebook, we will store our datasets in the form of `tf.Datasets` and build our own `tf.Keras` model following the example of TensorFlow's [Keras MNIST TPU.ipynb](https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/keras_mnist_tpu.ipynb#scrollTo=cCpkS9C_H7Tl). We will then simply use **ktrain** as a lightweight wrapper for our model and data to estimate a learning rate, train the model, inspect the model, and make predictions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Detect Hardware: CPU vs. GPU vs. TPU" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running on single GPU /device:GPU:0\n", "Number of accelerators: 1\n" ] } ], "source": [ "# Detect hardware\n", "try:\n", " tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection\n", "except ValueError:\n", " tpu = None\n", " gpus = tf.config.experimental.list_logical_devices(\"GPU\")\n", " \n", "# Select appropriate distribution strategy\n", "if tpu:\n", " tf.tpu.experimental.initialize_tpu_system(tpu)\n", " strategy = tf.distribute.experimental.TPUStrategy(tpu, steps_per_run=128) # Going back and forth between TPU and host is expensive. Better to run 128 batches on the TPU before reporting back.\n", " print('Running on TPU ', tpu.cluster_spec().as_dict()['worker']) \n", "elif len(gpus) > 1:\n", " strategy = tf.distribute.MirroredStrategy([gpu.name for gpu in gpus])\n", " print('Running on multiple GPUs ', [gpu.name for gpu in gpus])\n", "elif len(gpus) == 1:\n", " strategy = tf.distribute.get_strategy() # default strategy that works on CPU and single GPU\n", " print('Running on single GPU ', gpus[0].name)\n", "else:\n", " strategy = tf.distribute.get_strategy() # default strategy that works on CPU and single GPU\n", " print('Running on CPU')\n", "print(\"Number of accelerators: \", strategy.num_replicas_in_sync)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare Training and Validation Data as `tf.Datasets`\n", "\n", "Download the dataset files from [LeCun's website](http://yann.lecun.com/exdb/mnist/)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "BATCH_SIZE = 64 * strategy.num_replicas_in_sync # Gobal batch size.\n", "training_images_file = 'data/mnist_lecun/train-images-idx3-ubyte'\n", "training_labels_file = 'data/mnist_lecun/train-labels-idx1-ubyte'\n", "validation_images_file = 'data/mnist_lecun/t10k-images-idx3-ubyte'\n", "validation_labels_file = 'data/mnist_lecun/t10k-labels-idx1-ubyte'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that, if training using a TPU, these should be set as follows:\n", "\n", "```python\n", "training_images_file = 'gs://mnist-public/train-images-idx3-ubyte'\n", "training_labels_file = 'gs://mnist-public/train-labels-idx1-ubyte'\n", "validation_images_file = 'gs://mnist-public/t10k-images-idx3-ubyte'\n", "validation_labels_file = 'gs://mnist-public/t10k-labels-idx1-ubyte'\n", "```\n", "\n", "You may need to authenticate:\n", "```python\n", "IS_COLAB_BACKEND = 'COLAB_GPU' in os.environ # this is always set on Colab, the value is 0 or 1 depending on GPU presence\n", "if IS_COLAB_BACKEND:\n", " from google.colab import auth\n", " # Authenticates the Colab machine and also the TPU using your\n", " # credentials so that they can access your private GCS buckets.\n", " auth.authenticate_user()\n", "```" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def read_label(tf_bytestring):\n", " label = tf.io.decode_raw(tf_bytestring, tf.uint8)\n", " label = tf.reshape(label, [])\n", " label = tf.one_hot(label, 10)\n", " return label\n", " \n", "def read_image(tf_bytestring):\n", " image = tf.io.decode_raw(tf_bytestring, tf.uint8)\n", " image = tf.cast(image, tf.float32)/255.0\n", " image = tf.reshape(image, [28*28])\n", " return image\n", " \n", "def load_dataset(image_file, label_file):\n", " imagedataset = tf.data.FixedLengthRecordDataset(image_file, 28*28, header_bytes=16)\n", " imagedataset = imagedataset.map(read_image, num_parallel_calls=16)\n", " labelsdataset = tf.data.FixedLengthRecordDataset(label_file, 1, header_bytes=8)\n", " labelsdataset = labelsdataset.map(read_label, num_parallel_calls=16)\n", " dataset = tf.data.Dataset.zip((imagedataset, labelsdataset))\n", " return dataset \n", " \n", "def get_training_dataset(image_file, label_file, batch_size):\n", " dataset = load_dataset(image_file, label_file)\n", " dataset = dataset.cache() # this small dataset can be entirely cached in RAM\n", " dataset = dataset.shuffle(5000, reshuffle_each_iteration=True)\n", " dataset = dataset.repeat() # Mandatory for Keras for now\n", " dataset = dataset.batch(batch_size, drop_remainder=True) # drop_remainder is important on TPU, batch size must be fixed\n", " dataset = dataset.prefetch(-1) # fetch next batches while training on the current one (-1: autotune prefetch buffer size)\n", " return dataset\n", " \n", "def get_validation_dataset(image_file, label_file):\n", " dataset = load_dataset(image_file, label_file)\n", " dataset = dataset.cache() # this small dataset can be entirely cached in RAM\n", " dataset = dataset.batch(10000, drop_remainder=True) # 10000 items in eval dataset, all in one batch\n", " dataset = dataset.repeat() # Mandatory for Keras for now\n", " return dataset\n", "\n", "def load_label_dataset(label_file):\n", " labelsdataset = tf.data.FixedLengthRecordDataset(label_file, 1, header_bytes=8)\n", " labelsdataset = labelsdataset.map(read_label, num_parallel_calls=16)\n", " return labelsdataset \n", "\n", "# instantiate the datasets\n", "training_dataset = get_training_dataset(training_images_file, training_labels_file, BATCH_SIZE)\n", "validation_dataset = get_validation_dataset(validation_images_file, validation_labels_file)\n", "\n", "# exract ground truth labels\n", "training_labels = np.vstack(list(load_label_dataset(training_labels_file).as_numpy_iterator()))\n", "validation_labels = np.vstack(list(load_label_dataset(validation_labels_file).as_numpy_iterator()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build a Model" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# This model trains to 99.4% accuracy in 10 epochs (with a batch size of 64) \n", "\n", "def make_model():\n", " model = tf.keras.Sequential(\n", " [\n", " tf.keras.layers.Reshape(input_shape=(28*28,), target_shape=(28, 28, 1), name=\"image\"),\n", "\n", " tf.keras.layers.Conv2D(filters=12, kernel_size=3, padding='same', use_bias=False), # no bias necessary before batch norm\n", " tf.keras.layers.BatchNormalization(scale=False, center=True), # no batch norm scaling necessary before \"relu\"\n", " tf.keras.layers.Activation('relu'), # activation after batch norm\n", "\n", " tf.keras.layers.Conv2D(filters=24, kernel_size=6, padding='same', use_bias=False, strides=2),\n", " tf.keras.layers.BatchNormalization(scale=False, center=True),\n", " tf.keras.layers.Activation('relu'),\n", "\n", " tf.keras.layers.Conv2D(filters=32, kernel_size=6, padding='same', use_bias=False, strides=2),\n", " tf.keras.layers.BatchNormalization(scale=False, center=True),\n", " tf.keras.layers.Activation('relu'),\n", "\n", " tf.keras.layers.Flatten(),\n", " tf.keras.layers.Dense(200, use_bias=False),\n", " tf.keras.layers.BatchNormalization(scale=False, center=True),\n", " tf.keras.layers.Activation('relu'),\n", " tf.keras.layers.Dropout(0.4), # Dropout on dense layer only\n", "\n", " tf.keras.layers.Dense(10, activation='softmax')\n", " ])\n", "\n", " model.compile(optimizer='adam', # learning rate will be set by LearningRateScheduler\n", " loss='categorical_crossentropy',\n", " metrics=['accuracy'])\n", " return model\n", " \n", "with strategy.scope():\n", " model = make_model()\n", "\n", "\n", "# set up learning rate decay [FROM ORIGINAL EXAMPLE BUT NOT USED]\n", "# NOT NEEDED: we will use ktrain to find LR and decay learning rate during training\n", "LEARNING_RATE = 0.01\n", "LEARNING_RATE_EXP_DECAY = 0.6 if strategy.num_replicas_in_sync == 1 else 0.7\n", "lr_decay = tf.keras.callbacks.LearningRateScheduler(\n", " lambda epoch: LEARNING_RATE * LEARNING_RATE_EXP_DECAY**epoch,\n", " verbose=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use *ktrain* With Our Model and Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wrap tf.Datasets in a `ktrain.TFDataset` wrapper and create `Learner`" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/amaiya/projects/ghub/ktrain/ktrain/data.py:86: UserWarning: batch_size parameter is ignored, as pre-configured batch_size of tf.data.Dataset is used\n", " warnings.warn('batch_size parameter is ignored, as pre-configured batch_size of tf.data.Dataset is used')\n" ] } ], "source": [ "import ktrain\n", "trn = ktrain.TFDataset(training_dataset, n=training_labels.shape[0], y=training_labels)\n", "val = ktrain.TFDataset(validation_dataset, n=validation_labels.shape[0], y=validation_labels)\n", "learner = ktrain.get_learner(model, train_data=trn, val_data=val)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find Learning Rate" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "simulating training for different learning rates... this may take a few moments...\n", "Train for 937 steps\n", "Epoch 1/1024\n", "937/937 [==============================] - 8s 8ms/step - loss: 1.8162 - accuracy: 0.4173\n", "Epoch 2/1024\n", "604/937 [==================>...........] - ETA: 2s - loss: 0.2286 - accuracy: 0.9345\n", "\n", "done.\n", "Visually inspect loss plot and select learning rate associated with falling loss\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "learner.lr_find(show_plot=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train the Model Using a Cosine Annealing LR Schedule" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train for 938 steps, validate for 1 steps\n", "Epoch 1/10\n", "938/938 [==============================] - 7s 8ms/step - loss: 0.1176 - accuracy: 0.9641 - val_loss: 0.0513 - val_accuracy: 0.9825\n", "Epoch 2/10\n", "938/938 [==============================] - 6s 7ms/step - loss: 0.0504 - accuracy: 0.9844 - val_loss: 0.0375 - val_accuracy: 0.9874\n", "Epoch 3/10\n", "938/938 [==============================] - 6s 6ms/step - loss: 0.0413 - accuracy: 0.9875 - val_loss: 0.0336 - val_accuracy: 0.9888\n", "Epoch 4/10\n", "938/938 [==============================] - 6s 6ms/step - loss: 0.0327 - accuracy: 0.9899 - val_loss: 0.0388 - val_accuracy: 0.9891\n", "Epoch 5/10\n", "938/938 [==============================] - 6s 6ms/step - loss: 0.0268 - accuracy: 0.9918 - val_loss: 0.0278 - val_accuracy: 0.9906\n", "Epoch 6/10\n", "938/938 [==============================] - 6s 6ms/step - loss: 0.0186 - accuracy: 0.9943 - val_loss: 0.0254 - val_accuracy: 0.9921\n", "Epoch 7/10\n", "938/938 [==============================] - 6s 6ms/step - loss: 0.0135 - accuracy: 0.9955 - val_loss: 0.0224 - val_accuracy: 0.9933\n", "Epoch 8/10\n", "938/938 [==============================] - 6s 6ms/step - loss: 0.0083 - accuracy: 0.9974 - val_loss: 0.0191 - val_accuracy: 0.9937\n", "Epoch 9/10\n", "938/938 [==============================] - 6s 6ms/step - loss: 0.0044 - accuracy: 0.9988 - val_loss: 0.0190 - val_accuracy: 0.9943\n", "Epoch 10/10\n", "938/938 [==============================] - 6s 6ms/step - loss: 0.0035 - accuracy: 0.9992 - val_loss: 0.0190 - val_accuracy: 0.9943\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learner.fit(5e-3, 1, cycle_len=10, checkpoint_folder='/tmp/mymodel')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# cosine annealed LR schedule\n", "learner.plot('lr')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# training vs. validation loss\n", "learner.plot('loss')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspect Model\n", "\n", "#### Evaluate as Normal" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "1/1 [==============================] - 0s 57ms/step - loss: 0.0186 - accuracy: 0.9943\n" ] }, { "data": { "text/plain": [ "[0.018631214275956154, 0.9943]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learner.model.evaluate(validation_dataset, steps=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Validation Metrics" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.99 1.00 1.00 980\n", " 1 1.00 1.00 1.00 1135\n", " 2 0.99 1.00 1.00 1032\n", " 3 0.99 0.99 0.99 1010\n", " 4 0.99 0.99 0.99 982\n", " 5 0.99 0.99 0.99 892\n", " 6 0.99 0.99 0.99 958\n", " 7 1.00 0.99 1.00 1028\n", " 8 0.99 0.99 0.99 974\n", " 9 0.99 0.99 0.99 1009\n", "\n", " accuracy 0.99 10000\n", " macro avg 0.99 0.99 0.99 10000\n", "weighted avg 0.99 0.99 0.99 10000\n", "\n" ] }, { "data": { "text/plain": [ "array([[ 979, 0, 0, 0, 0, 0, 0, 1, 0, 0],\n", " [ 0, 1135, 0, 0, 0, 0, 0, 0, 0, 0],\n", " [ 0, 1, 1028, 0, 0, 0, 0, 3, 0, 0],\n", " [ 0, 0, 2, 1003, 0, 4, 0, 0, 1, 0],\n", " [ 0, 0, 0, 0, 975, 0, 4, 0, 0, 3],\n", " [ 1, 0, 0, 7, 0, 883, 1, 0, 0, 0],\n", " [ 3, 1, 0, 0, 0, 1, 951, 0, 2, 0],\n", " [ 0, 2, 2, 0, 0, 0, 0, 1022, 0, 2],\n", " [ 2, 0, 2, 1, 0, 0, 0, 0, 967, 2],\n", " [ 0, 0, 0, 0, 5, 2, 0, 0, 2, 1000]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learner.validate(class_names=list(map(str, range(10))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### View Top Losses" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "----------\n", "id:1014 | loss:7.4 | true:6 | pred:5)\n", "\n" ] } ], "source": [ "learner.view_top_losses(n=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Making Predictions" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "preds = learner.predict(val)\n", "preds = np.argmax(preds, axis=1)\n", "actual = learner.ground_truth(val)\n", "actual = np.argmax(actual, axis=1)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PredictedActual
077
122
211
300
444
\n", "
" ], "text/plain": [ " Predicted Actual\n", "0 7 7\n", "1 2 2\n", "2 1 1\n", "3 0 0\n", "4 4 4" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.DataFrame(zip(preds, actual), columns=['Predicted', 'Actual'])\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save Model and Reload Model" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "learner.save_model('/tmp/my_tf_model')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "learner.load_model('/tmp/my_tf_model')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", "1/1 [==============================] - 0s 176ms/step - loss: 0.0190 - accuracy: 0.9943\n" ] }, { "data": { "text/plain": [ "[0.018986882641911507, 0.9943]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learner.model.evaluate(validation_dataset, steps=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }