{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nEF2_0sn2Tua"
      },
      "source": [
        "# Image Classification using BigTransfer (BiT)\n",
        "\n",
        "**Author:** [Sayan Nath](https://twitter.com/sayannath2350)<br>\n",
        "**Date created:** 2021/09/24<br>\n",
        "**Last modified:** 2021/09/24<br>\n",
        "**Description:** BigTransfer (BiT) State-of-the-art transfer learning for image classification."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5LUKJJ0l2Tuc"
      },
      "source": [
        "## Introduction\n",
        "\n",
        "BigTransfer (also known as BiT) is a state-of-the-art transfer learning method for image\n",
        "classification. Transfer of pre-trained representations improves sample efficiency and\n",
        "simplifies hyperparameter tuning when training deep neural networks for vision. BiT\n",
        "revisit the paradigm of pre-training on large supervised datasets and fine-tuning the\n",
        "model on a target task. The importance of appropriately choosing normalization layers and\n",
        "scaling the architecture capacity as the amount of pre-training data increases.\n",
        "\n",
        "BigTransfer(BiT) is trained on public datasets, along with code in\n",
        "[TF2, Jax and Pytorch](https://github.com/google-research/big_transfer). This will help anyone to reach\n",
        "state of the art performance on their task of interest, even with just a handful of\n",
        "labeled images per class.\n",
        "\n",
        "You can find BiT models pre-trained on\n",
        "[ImageNet](https://image-net.org/challenges/LSVRC/2012/index) and ImageNet-21k in\n",
        "[TFHub](https://tfhub.dev/google/collections/bit/1) as TensorFlow2 SavedModels that you\n",
        "can use easily as Keras Layers. There are a variety of sizes ranging from a standard\n",
        "ResNet50 to a ResNet152x4 (152 layers deep, 4x wider than a typical ResNet50) for users\n",
        "with larger computational and memory budgets but higher accuracy requirements.\n",
        "\n",
        "![](https://i.imgur.com/XeWVfe7.jpeg)\n",
        "Figure: The x-axis shows the number of images used per class, ranging from 1 to the full\n",
        "dataset. On the plots on the left, the curve in blue above is our BiT-L model, whereas\n",
        "the curve below is a ResNet-50 pre-trained on ImageNet (ILSVRC-2012)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hvzPOZc72Tud"
      },
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "x2bXlydz2Tue"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "import tensorflow as tf\n",
        "from tensorflow import keras\n",
        "import tensorflow_hub as hub\n",
        "import tensorflow_datasets as tfds\n",
        "\n",
        "tfds.disable_progress_bar()\n",
        "\n",
        "SEEDS = 42\n",
        "\n",
        "np.random.seed(SEEDS)\n",
        "tf.random.set_seed(SEEDS)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iSr9snUg2Tuf"
      },
      "source": [
        "## Gather Flower Dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jTf0_ElG2Tuf"
      },
      "outputs": [],
      "source": [
        "train_ds, validation_ds = tfds.load(\n",
        "    \"tf_flowers\",\n",
        "    split=[\"train[:85%]\", \"train[85%:]\"],\n",
        "    as_supervised=True,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RW6zvyF22Tuh"
      },
      "source": [
        "## Visualise the dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6W-AiCml2Tui"
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(10, 10))\n",
        "for i, (image, label) in enumerate(train_ds.take(9)):\n",
        "    ax = plt.subplot(3, 3, i + 1)\n",
        "    plt.imshow(image)\n",
        "    plt.title(int(label))\n",
        "    plt.axis(\"off\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0CsqwOoT2Tui"
      },
      "source": [
        "## Define hyperparameters"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5hy87EV42Tuj"
      },
      "outputs": [],
      "source": [
        "RESIZE_TO = 384\n",
        "CROP_TO = 224\n",
        "BATCH_SIZE = 64\n",
        "STEPS_PER_EPOCH = 10\n",
        "AUTO = tf.data.AUTOTUNE  # optimise the pipeline performance\n",
        "NUM_CLASSES = 5  # number of classes\n",
        "SCHEDULE_LENGTH = (\n",
        "    500  # we will train on lower resolution images and will still attain good results\n",
        ")\n",
        "SCHEDULE_BOUNDARIES = [\n",
        "    200,\n",
        "    300,\n",
        "    400,\n",
        "]  # more the dataset size the schedule length increase"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EV7hIbAh2Tuk"
      },
      "source": [
        "The hyperparamteres like `SCHEDULE_LENGTH` and `SCHEDULE_BOUNDARIES` are determined based\n",
        "on empirical results. The method has been explained in the [original\n",
        "paper](https://arxiv.org/abs/1912.11370) and in their [Google AI Blog\n",
        "Post](https://ai.googleblog.com/2020/05/open-sourcing-bit-exploring-large-scale.html).\n",
        "\n",
        "The `SCHEDULE_LENGTH` is aslo determined whether to use [MixUp\n",
        "Augmentation](https://arxiv.org/abs/1710.09412) or not. You can also find an easy MixUp\n",
        "Implementation in [Keras Coding Examples](https://keras.io/examples/vision/mixup/).\n",
        "\n",
        "![](https://i.imgur.com/oSaIBYZ.jpeg)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9fc7_2AS2Tul"
      },
      "source": [
        "## Define preprocessing helper functions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zUQI1SWo2Tum"
      },
      "outputs": [],
      "source": [
        "SCHEDULE_LENGTH = SCHEDULE_LENGTH * 512 / BATCH_SIZE\n",
        "\n",
        "\n",
        "@tf.function\n",
        "def preprocess_train(image, label):\n",
        "    image = tf.image.random_flip_left_right(image)\n",
        "    image = tf.image.resize(image, (RESIZE_TO, RESIZE_TO))\n",
        "    image = tf.image.random_crop(image, (CROP_TO, CROP_TO, 3))\n",
        "    image = image / 255.0\n",
        "    return (image, label)\n",
        "\n",
        "\n",
        "@tf.function\n",
        "def preprocess_test(image, label):\n",
        "    image = tf.image.resize(image, (RESIZE_TO, RESIZE_TO))\n",
        "    image = image / 255.0\n",
        "    return (image, label)\n",
        "\n",
        "\n",
        "DATASET_NUM_TRAIN_EXAMPLES = train_ds.cardinality().numpy()\n",
        "\n",
        "repeat_count = int(\n",
        "    SCHEDULE_LENGTH * BATCH_SIZE / DATASET_NUM_TRAIN_EXAMPLES * STEPS_PER_EPOCH\n",
        ")\n",
        "repeat_count += 50 + 1  # To ensure at least there are 50 epochs of training"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7b8-1Z7y2Tum"
      },
      "source": [
        "## Define the data pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "t8xK0KS12Tum"
      },
      "outputs": [],
      "source": [
        "# Training pipeline\n",
        "pipeline_train = (\n",
        "    train_ds.shuffle(10000)\n",
        "    .repeat(repeat_count)  # Repeat dataset_size / num_steps\n",
        "    .map(preprocess_train, num_parallel_calls=AUTO)\n",
        "    .batch(BATCH_SIZE)\n",
        "    .prefetch(AUTO)\n",
        ")\n",
        "\n",
        "# Validation pipeline\n",
        "pipeline_validation = (\n",
        "    validation_ds.map(preprocess_test, num_parallel_calls=AUTO)\n",
        "    .batch(BATCH_SIZE)\n",
        "    .prefetch(AUTO)\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xUMadZeC2Tum"
      },
      "source": [
        "## Visualise the training samples"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "TrILDQhX2Tun"
      },
      "outputs": [],
      "source": [
        "image_batch, label_batch = next(iter(pipeline_train))\n",
        "\n",
        "plt.figure(figsize=(10, 10))\n",
        "for n in range(25):\n",
        "    ax = plt.subplot(5, 5, n + 1)\n",
        "    plt.imshow(image_batch[n])\n",
        "    plt.title(label_batch[n].numpy())\n",
        "    plt.axis(\"off\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Mp-qVs622Tun"
      },
      "source": [
        "## Load pretrained TF-Hub model into a `KerasLayer`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zo_SlVWr2Tuo"
      },
      "outputs": [],
      "source": [
        "bit_model_url = \"https://tfhub.dev/google/bit/m-r50x1/1\"\n",
        "bit_module = hub.KerasLayer(bit_model_url)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gRxcFeya2Tuo"
      },
      "source": [
        "## Create BigTransfer (BiT) model\n",
        "\n",
        "To create the new model, we:\n",
        "\n",
        "1. Cut off the BiT model’s original head. This leaves us with the “pre-logits” output.\n",
        "We do not have to do this if we use the ‘feature extractor’ models (i.e. all those in\n",
        "subdirectories titled `feature_vectors`), since for those models the head has already\n",
        "been cut off.\n",
        "\n",
        "2. Add a new head with the number of outputs equal to the number of classes of our new\n",
        "task. Note that it is important that we initialise the head to all zeroes."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9dfabz842Tup"
      },
      "outputs": [],
      "source": [
        "\n",
        "class MyBiTModel(keras.Model):\n",
        "    def __init__(self, num_classes, module, **kwargs):\n",
        "        super().__init__(**kwargs)\n",
        "\n",
        "        self.num_classes = num_classes\n",
        "        self.head = keras.layers.Dense(num_classes, kernel_initializer=\"zeros\")\n",
        "        self.bit_model = module\n",
        "\n",
        "    def call(self, images):\n",
        "        bit_embedding = self.bit_model(images)\n",
        "        return self.head(bit_embedding)\n",
        "\n",
        "\n",
        "model = MyBiTModel(num_classes=NUM_CLASSES, module=bit_module)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_CQ4Q8cu2Tup"
      },
      "source": [
        "## Define optimizer and loss"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2AcOBGkb2Tup"
      },
      "outputs": [],
      "source": [
        "learning_rate = 0.003 * BATCH_SIZE / 512\n",
        "\n",
        "# Decay learning rate by a factor of 10 at SCHEDULE_BOUNDARIES.\n",
        "lr_schedule = keras.optimizers.schedules.PiecewiseConstantDecay(\n",
        "    boundaries=SCHEDULE_BOUNDARIES,\n",
        "    values=[\n",
        "        learning_rate,\n",
        "        learning_rate * 0.1,\n",
        "        learning_rate * 0.01,\n",
        "        learning_rate * 0.001,\n",
        "    ],\n",
        ")\n",
        "optimizer = keras.optimizers.SGD(learning_rate=lr_schedule, momentum=0.9)\n",
        "\n",
        "loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "m7pBODp82Tuq"
      },
      "source": [
        "## Compile the model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "R_FYzSq92Tuq"
      },
      "outputs": [],
      "source": [
        "model.compile(optimizer=optimizer, loss=loss_fn, metrics=[\"accuracy\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "54D9h9F42Tuq"
      },
      "source": [
        "## Set up callbacks"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UGSXyb-c2Tus"
      },
      "outputs": [],
      "source": [
        "train_callbacks = [\n",
        "    keras.callbacks.EarlyStopping(\n",
        "        monitor=\"val_accuracy\", patience=2, restore_best_weights=True\n",
        "    )\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7dMcs7yZ2Tus"
      },
      "source": [
        "## Train the model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9B4wTD_u2Tus"
      },
      "outputs": [],
      "source": [
        "history = model.fit(\n",
        "    pipeline_train,\n",
        "    batch_size=BATCH_SIZE,\n",
        "    epochs=int(SCHEDULE_LENGTH / STEPS_PER_EPOCH),\n",
        "    steps_per_epoch=STEPS_PER_EPOCH,\n",
        "    validation_data=pipeline_validation,\n",
        "    callbacks=train_callbacks,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9x9dUuIE2Tut"
      },
      "source": [
        "## Plot the training and validation metrics"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5s8YQTw52Tut"
      },
      "outputs": [],
      "source": [
        "\n",
        "def plot_hist(hist):\n",
        "    plt.plot(hist.history[\"accuracy\"])\n",
        "    plt.plot(hist.history[\"val_accuracy\"])\n",
        "    plt.plot(hist.history[\"loss\"])\n",
        "    plt.plot(hist.history[\"val_loss\"])\n",
        "    plt.title(\"Training Progress\")\n",
        "    plt.ylabel(\"Accuracy/Loss\")\n",
        "    plt.xlabel(\"Epochs\")\n",
        "    plt.legend([\"train_acc\", \"val_acc\", \"train_loss\", \"val_loss\"], loc=\"upper left\")\n",
        "    plt.show()\n",
        "\n",
        "\n",
        "plot_hist(history)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "v5M_9ejY2Tut"
      },
      "source": [
        "## Evaluate the model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yhQdkrrY2Tuu"
      },
      "outputs": [],
      "source": [
        "accuracy = model.evaluate(pipeline_validation)[1] * 100\n",
        "print(\"Accuracy: {:.2f}%\".format(accuracy))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "j7ZRc0sD2Tuu"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "BiT performs well across a surprisingly wide range of data regimes\n",
        "-- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on\n",
        "ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark\n",
        "(VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class,\n",
        "and 97.0% on CIFAR-10 with 10 examples per class.\n",
        "\n",
        "![](https://i.imgur.com/b1Lw5fz.png)\n",
        "\n",
        "You can experiment further with the BigTransfer Method by following the\n",
        "[original paper](https://arxiv.org/abs/1912.11370).\n",
        "\n",
        "\n",
        "**Example available on HuggingFace**\n",
        "| Trained Model | Demo |\n",
        "| :--: | :--: |\n",
        "| [![Generic badge](https://img.shields.io/badge/%F0%9F%A4%97%20Model-bit-black.svg)](https://huggingface.co/keras-io/bit) | [![Generic badge](https://img.shields.io/badge/%F0%9F%A4%97%20Spaces-bit-black.svg)](https://huggingface.co/spaces/keras-io/siamese-contrastive) |"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "bit",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}