{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "perceiver_example.ipynb", "provenance": [], "authorship_tag": "ABX9TyOqmZf5kbA3yzzIhxWVRZ9Q", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "8zkCmgOstzIn" }, "source": [ "# Perceiver Example\n", "\n", "This Python package implements [Perceiver: General Perception with Iterative Attention](https://arxiv.org/abs/2103.03206) by Andrew Jaegle in TensorFlow. This model builds on top \n", "of Transformers such that the data only enters through the cross attention mechanism (see figure) and allow it to scale to hundreds of thousands of inputs, like ConvNets. This, in \n", "part also solves the Transformers Quadratic compute and memory bottleneck.\n", "\n", "![](https://github.com/Rishit-dagli/Perceiver/raw/main/images/architecture.PNG)\n", "\n", "## A bit about Perceiver\n", "\n", "The Perceiver model aims to deal with arbitrary configurations of different modalities using a single transformer-based architecture. Transformers are often flexible and make few assumptions about their inputs, but that also scale quadratically with the number of inputs in terms of both memory and computation. This model proposes a mechanism that makes it possible to deal with high-dimensional inputs, while retaining the expressivity and flexibility to deal with arbitrary input configurations.\n", "\n", "![](https://github.com/Rishit-dagli/Perceiver/raw/main/images/architecture.PNG)\n", "\n", "The idea here is to introduce a small set of latent units that forms an attention bottleneck through which the inputs must pass. This avoids the quadratic scaling problem of all-to-all attention of a classical transformer. The model can be seen as performing a fully end-to-end clustering of the inputs, with the latent units as the cluster centres, leveraging a highly asymmetric crossattention layer. For spatial information the authors compensate for the lack of explicit grid structures in our model by associating Fourier feature encodings.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "RDSIYllpuR1N" }, "source": [ "## Setup" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qVG7fStTtUy6", "outputId": "84157780-7f23-4e7f-b080-64c59e14c714" }, "source": [ "!pip install perceiver" ], "execution_count": 1, "outputs": [ { "output_type": "stream", "text": [ "Collecting perceiver\n", " Downloading https://files.pythonhosted.org/packages/33/a9/a59f7928263242cf8d1272b0087c73cd64d0999b5872ccb325788a477027/perceiver-0.1.0-py3-none-any.whl\n", "Requirement already satisfied: tensorflow~=2.4.0 in /usr/local/lib/python3.7/dist-packages (from perceiver) (2.4.1)\n", "Collecting einops>=0.3\n", " Downloading https://files.pythonhosted.org/packages/5d/a0/9935e030634bf60ecd572c775f64ace82ceddf2f504a5fd3902438f07090/einops-0.3.0-py2.py3-none-any.whl\n", "Requirement already satisfied: grpcio~=1.32.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (1.32.0)\n", "Requirement already satisfied: wheel~=0.35 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (0.36.2)\n", "Requirement already satisfied: h5py~=2.10.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (2.10.0)\n", "Requirement already satisfied: numpy~=1.19.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (1.19.5)\n", "Requirement already satisfied: astunparse~=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (1.6.3)\n", "Requirement already satisfied: protobuf>=3.9.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (3.12.4)\n", "Requirement already satisfied: termcolor~=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (1.1.0)\n", "Requirement already satisfied: tensorflow-estimator<2.5.0,>=2.4.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (2.4.0)\n", "Requirement already satisfied: keras-preprocessing~=1.1.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (1.1.2)\n", "Requirement already satisfied: typing-extensions~=3.7.4 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (3.7.4.3)\n", "Requirement already satisfied: opt-einsum~=3.3.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (3.3.0)\n", "Requirement already satisfied: wrapt~=1.12.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (1.12.1)\n", "Requirement already satisfied: gast==0.3.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (0.3.3)\n", "Requirement already satisfied: flatbuffers~=1.12.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (1.12)\n", "Requirement already satisfied: six~=1.15.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (1.15.0)\n", "Requirement already satisfied: absl-py~=0.10 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (0.12.0)\n", "Requirement already satisfied: google-pasta~=0.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (0.2.0)\n", "Requirement already satisfied: tensorboard~=2.4 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.4.0->perceiver) (2.4.1)\n", "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from protobuf>=3.9.2->tensorflow~=2.4.0->perceiver) (54.2.0)\n", "Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (2.23.0)\n", "Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (1.8.0)\n", "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (3.3.4)\n", "Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (1.0.1)\n", "Requirement already satisfied: google-auth<2,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (1.28.0)\n", "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (0.4.3)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (1.24.3)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (2020.12.5)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (3.0.4)\n", "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.21.0->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (2.10)\n", "Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (3.8.1)\n", "Requirement already satisfied: rsa<5,>=3.1.4; python_version >= \"3.6\" in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (4.7.2)\n", "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (4.2.1)\n", "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.3->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (0.2.8)\n", "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (1.3.0)\n", "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->markdown>=2.6.8->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (3.4.1)\n", "Requirement already satisfied: pyasn1>=0.1.3 in /usr/local/lib/python3.7/dist-packages (from rsa<5,>=3.1.4; python_version >= \"3.6\"->google-auth<2,>=1.6.3->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (0.4.8)\n", "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.4->tensorflow~=2.4.0->perceiver) (3.1.0)\n", "Installing collected packages: einops, perceiver\n", "Successfully installed einops-0.3.0 perceiver-0.1.0\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "U0OB1TDEuXI1" }, "source": [ "import tensorflow as tf" ], "execution_count": 2, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "_Tm3LK7WuZ6t" }, "source": [ "## Create a Perceiver Class" ] }, { "cell_type": "code", "metadata": { "id": "L5anRaSTtyN-" }, "source": [ "from perceiver import Perceiver" ], "execution_count": 3, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "5DHVDLo8tn3a" }, "source": [ "model = Perceiver(\n", " input_channels = 3, # number of channels for each token of the input\n", " input_axis = 2, # number of axis for input data (2 for images, 3 for video)\n", " num_freq_bands = 6, # number of freq bands, with original value (2 * K + 1)\n", " max_freq = 10., # maximum frequency, hyperparameter depending on how fine the data is\n", " depth = 6, # depth of net\n", " num_latents = 256, # number of latents\n", " latent_dim = 512, # latent dimension\n", " cross_heads = 1, # number of heads for cross attention. paper said 1\n", " latent_heads = 8, # number of heads for latent self attention, 8\n", " cross_dim_head = 64,\n", " latent_dim_head = 64,\n", " num_classes = 1000, # output number of classes\n", " attn_dropout = 0.,\n", " ff_dropout = 0.,\n", ")" ], "execution_count": 4, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Y1Ho6X1dunzE", "outputId": "4ae2ba11-1d34-4cfa-c2d9-7e634b862e15" }, "source": [ "img = tf.random.normal([1, 224, 224, 3]) # replicating 1 imagenet image\n", "model(img) # (1, 1000)" ], "execution_count": 5, "outputs": [ { "output_type": "stream", "text": [ "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_3_input'), name='dense_3_input', description=\"created by layer 'dense_3_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_8_input'), name='dense_8_input', description=\"created by layer 'dense_8_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_13_input'), name='dense_13_input', description=\"created by layer 'dense_13_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_18_input'), name='dense_18_input', description=\"created by layer 'dense_18_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_23_input'), name='dense_23_input', description=\"created by layer 'dense_23_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_28_input'), name='dense_28_input', description=\"created by layer 'dense_28_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_33_input'), name='dense_33_input', description=\"created by layer 'dense_33_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_38_input'), name='dense_38_input', description=\"created by layer 'dense_38_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_43_input'), name='dense_43_input', description=\"created by layer 'dense_43_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_48_input'), name='dense_48_input', description=\"created by layer 'dense_48_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_53_input'), name='dense_53_input', description=\"created by layer 'dense_53_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_58_input'), name='dense_58_input', description=\"created by layer 'dense_58_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": { "tags": [] }, "execution_count": 5 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "RZH9iWbRuxnF", "outputId": "48ac2d3e-b689-465b-af31-9fb729fa8478" }, "source": [ "model(img).shape" ], "execution_count": 6, "outputs": [ { "output_type": "stream", "text": [ "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_3_input'), name='dense_3_input', description=\"created by layer 'dense_3_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_8_input'), name='dense_8_input', description=\"created by layer 'dense_8_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_13_input'), name='dense_13_input', description=\"created by layer 'dense_13_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_18_input'), name='dense_18_input', description=\"created by layer 'dense_18_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_23_input'), name='dense_23_input', description=\"created by layer 'dense_23_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_28_input'), name='dense_28_input', description=\"created by layer 'dense_28_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_33_input'), name='dense_33_input', description=\"created by layer 'dense_33_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_38_input'), name='dense_38_input', description=\"created by layer 'dense_38_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_43_input'), name='dense_43_input', description=\"created by layer 'dense_43_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_48_input'), name='dense_48_input', description=\"created by layer 'dense_48_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_53_input'), name='dense_53_input', description=\"created by layer 'dense_53_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n", "WARNING:tensorflow:Model was constructed with shape (None, 512) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512), dtype=tf.float32, name='dense_58_input'), name='dense_58_input', description=\"created by layer 'dense_58_input'\"), but it was called on an input with incompatible shape (1, 256, 512).\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/plain": [ "TensorShape([1, 1000])" ] }, "metadata": { "tags": [] }, "execution_count": 6 } ] } ] }