{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "wJcYs_ERTnnI" }, "source": [ "##### Copyright 2021 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "HMUDt0CiUJk9" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "77z2OchJTk0l" }, "source": [ "# Migrate from TPU embedding_columns to TPUEmbedding layer\n", "\n", "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " View on TensorFlow.org\n", " \n", " \n", " \n", " Run in Google Colab\n", " \n", " \n", " \n", " View source on GitHub\n", " \n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "meUTrR4I6m1C" }, "source": [ "This guide demonstrates how to migrate embedding training on on [TPUs](../../guide/tpu.ipynb) from TensorFlow 1's `embedding_column` API with `TPUEstimator` to TensorFlow 2's `TPUEmbedding` layer API with `TPUStrategy`.\n", "\n", "Embeddings are (large) matrices. They are lookup tables that map from a sparse feature space to dense vectors. Embeddings provide efficient and dense representations, capturing complex similarities and relationships between features.\n", "\n", "TensorFlow includes specialized support for training embeddings on TPUs. This TPU-specific embedding support allows you to train embeddings that are larger than the memory of a single TPU device, and to use sparse and ragged inputs on TPUs.\n", "\n", "- In TensorFlow 1, `tf.compat.v1.estimator.tpu.TPUEstimator` is a high level API that encapsulates training, evaluation, prediction, and exporting for serving with TPUs. It has special support for `tf.compat.v1.tpu.experimental.embedding_column`.\n", "- To implement this in TensorFlow 2, use the TensorFlow Recommenders' `tfrs.layers.embedding.TPUEmbedding` layer. For training and evaluation, use a TPU distribution strategy—`tf.distribute.TPUStrategy`—which is compatible with the Keras APIs for, for example, model building (`tf.keras.Model`), optimizers (`tf.keras.optimizers.Optimizer`), and training with `Model.fit` or a custom training loop with `tf.function` and `tf.GradientTape`.\n", "\n", "For additional information, refer to the `tfrs.layers.embedding.TPUEmbedding` layer's API documentation, as well as the `tf.tpu.experimental.embedding.TableConfig` and `tf.tpu.experimental.embedding.FeatureConfig` docs for additional information. For an overview of `tf.distribute.TPUStrategy`, check out the [Distributed training](../../guide/distributed_training.ipynb) guide and the [Use TPUs](../../guide/tpu.ipynb) guide. If you're migrating from `TPUEstimator` to `TPUStrategy`, check out [the TPU migration guide](tpu_estimator.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "YdZSoIXEbhg-" }, "source": [ "## Setup\n", "\n", "Start by installing [TensorFlow Recommenders](https://www.tensorflow.org/recommenders) and importing some necessary packages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tYE3RnRN2jNu" }, "outputs": [], "source": [ "!pip install tensorflow-recommenders" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iE0vSfMXumKI" }, "outputs": [], "source": [ "import tensorflow as tf\n", "import tensorflow.compat.v1 as tf1\n", "\n", "# TPUEmbedding layer is not part of TensorFlow.\n", "import tensorflow_recommenders as tfrs" ] }, { "cell_type": "markdown", "metadata": { "id": "Jsm9Rxx7s1OZ" }, "source": [ "And prepare a simple dataset for demonstration purposes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "m7rnGxsXtDkV" }, "outputs": [], "source": [ "features = [[1., 1.5]]\n", "embedding_features_indices = [[0, 0], [0, 1]]\n", "embedding_features_values = [0, 5]\n", "labels = [[0.3]]\n", "eval_features = [[4., 4.5]]\n", "eval_embedding_features_indices = [[0, 0], [0, 1]]\n", "eval_embedding_features_values = [4, 3]\n", "eval_labels = [[0.8]]" ] }, { "cell_type": "markdown", "metadata": { "id": "4uXff1BEssdE" }, "source": [ "## TensorFlow 1: Train embeddings on TPUs with TPUEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "pc-WSeYG2oje" }, "source": [ "In TensorFlow 1, you set up TPU embeddings using the `tf.compat.v1.tpu.experimental.embedding_column` API and train/evaluate the model on TPUs with `tf.compat.v1.estimator.tpu.TPUEstimator`.\n", "\n", "The inputs are integers ranging from zero to the vocabulary size for the TPU embedding table. Begin with encoding the inputs to categorical ID with `tf.feature_column.categorical_column_with_identity`. Use `\"sparse_feature\"` for the `key` parameter, since the input features are integer-valued, while `num_buckets` is the vocabulary size for the embedding table (`10`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sO_y-IRT3dcM" }, "outputs": [], "source": [ "embedding_id_column = (\n", " tf1.feature_column.categorical_column_with_identity(\n", " key=\"sparse_feature\", num_buckets=10))" ] }, { "cell_type": "markdown", "metadata": { "id": "57e2dec8ed4a" }, "source": [ "Next, convert the sparse categorical inputs to a dense representation with `tpu.experimental.embedding_column`, where `dimension` is the width of the embedding table. It will store an embedding vector for each of the `num_buckets`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6d61c855011f" }, "outputs": [], "source": [ "embedding_column = tf1.tpu.experimental.embedding_column(\n", " embedding_id_column, dimension=5)" ] }, { "cell_type": "markdown", "metadata": { "id": "c6061452ee5a" }, "source": [ "Now, define the TPU-specific embedding configuration via `tf.estimator.tpu.experimental.EmbeddingConfigSpec`. You will pass it later to `tf.estimator.tpu.TPUEstimator` as an `embedding_config_spec` parameter." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6abbf967fc82" }, "outputs": [], "source": [ "embedding_config_spec = tf1.estimator.tpu.experimental.EmbeddingConfigSpec(\n", " feature_columns=(embedding_column,),\n", " optimization_parameters=(\n", " tf1.tpu.experimental.AdagradParameters(0.05)))" ] }, { "cell_type": "markdown", "metadata": { "id": "BVWHEQj5a7rN" }, "source": [ "Next, to use a `TPUEstimator`, define: \n", "- An input function for the training data\n", "- An evaluation input function for the evaluation data\n", "- A model function for instructing the `TPUEstimator` how the training op is defined with the features and labels" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lqe9obf7suIj" }, "outputs": [], "source": [ "def _input_fn(params):\n", " dataset = tf1.data.Dataset.from_tensor_slices((\n", " {\"dense_feature\": features,\n", " \"sparse_feature\": tf1.SparseTensor(\n", " embedding_features_indices,\n", " embedding_features_values, [1, 2])},\n", " labels))\n", " dataset = dataset.repeat()\n", " return dataset.batch(params['batch_size'], drop_remainder=True)\n", "\n", "def _eval_input_fn(params):\n", " dataset = tf1.data.Dataset.from_tensor_slices((\n", " {\"dense_feature\": eval_features,\n", " \"sparse_feature\": tf1.SparseTensor(\n", " eval_embedding_features_indices,\n", " eval_embedding_features_values, [1, 2])},\n", " eval_labels))\n", " dataset = dataset.repeat()\n", " return dataset.batch(params['batch_size'], drop_remainder=True)\n", "\n", "def _model_fn(features, labels, mode, params):\n", " embedding_features = tf1.keras.layers.DenseFeatures(embedding_column)(features)\n", " concatenated_features = tf1.keras.layers.Concatenate(axis=1)(\n", " [embedding_features, features[\"dense_feature\"]])\n", " logits = tf1.layers.Dense(1)(concatenated_features)\n", " loss = tf1.losses.mean_squared_error(labels=labels, predictions=logits)\n", " optimizer = tf1.train.AdagradOptimizer(0.05)\n", " optimizer = tf1.tpu.CrossShardOptimizer(optimizer)\n", " train_op = optimizer.minimize(loss, global_step=tf1.train.get_global_step())\n", " return tf1.estimator.tpu.TPUEstimatorSpec(mode, loss=loss, train_op=train_op)" ] }, { "cell_type": "markdown", "metadata": { "id": "QYnP3Dszc-2R" }, "source": [ "With those functions defined, create a `tf.distribute.cluster_resolver.TPUClusterResolver` that provides the cluster information, and a `tf.compat.v1.estimator.tpu.RunConfig` object.\n", "\n", "Along with the model function you have defined, you can now create a `TPUEstimator`. Here, you will simplify the flow by skipping checkpoint savings. Then, you will specify the batch size for both training and evaluation for the `TPUEstimator`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WAqyqawemlcl" }, "outputs": [], "source": [ "cluster_resolver = tf1.distribute.cluster_resolver.TPUClusterResolver(tpu='')\n", "print(\"All devices: \", tf1.config.list_logical_devices('TPU'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HsOpjW5plH9Q" }, "outputs": [], "source": [ "tpu_config = tf1.estimator.tpu.TPUConfig(\n", " iterations_per_loop=10,\n", " per_host_input_for_training=tf1.estimator.tpu.InputPipelineConfig\n", " .PER_HOST_V2)\n", "config = tf1.estimator.tpu.RunConfig(\n", " cluster=cluster_resolver,\n", " save_checkpoints_steps=None,\n", " tpu_config=tpu_config)\n", "estimator = tf1.estimator.tpu.TPUEstimator(\n", " model_fn=_model_fn, config=config, train_batch_size=8, eval_batch_size=8,\n", " embedding_config_spec=embedding_config_spec)" ] }, { "cell_type": "markdown", "metadata": { "id": "Uxw7tWrcepaZ" }, "source": [ "Call `TPUEstimator.train` to begin training the model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WZPKFOMAcyrP" }, "outputs": [], "source": [ "estimator.train(_input_fn, steps=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "ev1vjIz9euIw" }, "source": [ "Then, call `TPUEstimator.evaluate` to evaluate the model using the evaluation data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "bqiKRiwWc0cz" }, "outputs": [], "source": [ "estimator.evaluate(_eval_input_fn, steps=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "KEmzBjfnsxwT" }, "source": [ "## TensorFlow 2: Train embeddings on TPUs with TPUStrategy" ] }, { "cell_type": "markdown", "metadata": { "id": "UesuXNbShrbi" }, "source": [ "In TensorFlow 2, to train on the TPU workers, use `tf.distribute.TPUStrategy` together with the Keras APIs for model definition and training/evaluation. (Refer to the [Use TPUs](https://render.githubusercontent.com/guide/tpu.ipynb) guide for more examples of training with Keras Model.fit and a custom training loop (with `tf.function` and `tf.GradientTape`).)\n", "\n", "Since you need to perform some initialization work to connect to the remote cluster and initialize the TPU workers, start by creating a `TPUClusterResolver` to provide the cluster information and connect to the cluster. (Learn more in the *TPU initialization* section of the [Use TPUs](../../guide/tpu.ipynb) guide.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_TgdPNgXoS63" }, "outputs": [], "source": [ "cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')\n", "tf.config.experimental_connect_to_cluster(cluster_resolver)\n", "tf.tpu.experimental.initialize_tpu_system(cluster_resolver)\n", "print(\"All devices: \", tf.config.list_logical_devices('TPU'))" ] }, { "cell_type": "markdown", "metadata": { "id": "94JBD0HxmdPI" }, "source": [ "Next, prepare your data. This is similar to how you created a dataset in the TensorFlow 1 example, except the dataset function is now passed a `tf.distribute.InputContext` object rather than a `params` dict. You can use this object to determine the local batch size (and which host this pipeline is for, so you can properly partition your data).\n", "\n", "- When using the `tfrs.layers.embedding.TPUEmbedding` API, it is important to include the `drop_remainder=True` option when batching the dataset with `Dataset.batch`, since `TPUEmbedding` requires a fixed batch size.\n", "- Additionally, the same batch size must be used for evaluation and training if they are taking place on the same set of devices.\n", "- Finally, you should use `tf.keras.utils.experimental.DatasetCreator` along with the special input option—`experimental_fetch_to_device=False`—in `tf.distribute.InputOptions` (which holds strategy-specific configurations). This is demonstrated below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9NTruOw6mcy9" }, "outputs": [], "source": [ "global_batch_size = 8\n", "\n", "def _input_dataset(context: tf.distribute.InputContext):\n", " dataset = tf.data.Dataset.from_tensor_slices((\n", " {\"dense_feature\": features,\n", " \"sparse_feature\": tf.SparseTensor(\n", " embedding_features_indices,\n", " embedding_features_values, [1, 2])},\n", " labels))\n", " dataset = dataset.shuffle(10).repeat()\n", " dataset = dataset.batch(\n", " context.get_per_replica_batch_size(global_batch_size),\n", " drop_remainder=True)\n", " return dataset.prefetch(2)\n", "\n", "def _eval_dataset(context: tf.distribute.InputContext):\n", " dataset = tf.data.Dataset.from_tensor_slices((\n", " {\"dense_feature\": eval_features,\n", " \"sparse_feature\": tf.SparseTensor(\n", " eval_embedding_features_indices,\n", " eval_embedding_features_values, [1, 2])},\n", " eval_labels))\n", " dataset = dataset.repeat()\n", " dataset = dataset.batch(\n", " context.get_per_replica_batch_size(global_batch_size),\n", " drop_remainder=True)\n", " return dataset.prefetch(2)\n", "\n", "input_options = tf.distribute.InputOptions(\n", " experimental_fetch_to_device=False)\n", "\n", "input_dataset = tf.keras.utils.experimental.DatasetCreator(\n", " _input_dataset, input_options=input_options)\n", "\n", "eval_dataset = tf.keras.utils.experimental.DatasetCreator(\n", " _eval_dataset, input_options=input_options)" ] }, { "cell_type": "markdown", "metadata": { "id": "R4EHXhN3CVmo" }, "source": [ "Next, once the data is prepared, you will create a `TPUStrategy`, and define a model, metrics, and an optimizer under the scope of this strategy (`Strategy.scope`).\n", "\n", "You should pick a number for `steps_per_execution` in `Model.compile` since it specifies the number of batches to run during each `tf.function` call, and is critical for performance. This argument is similar to `iterations_per_loop` used in `TPUEstimator`.\n", "\n", "The features and table configuration that were specified in TensorFlow 1 via the `tf.tpu.experimental.embedding_column` (and `tf.tpu.experimental.shared_embedding_column`) can be specified directly in TensorFlow 2 via a pair of configuration objects:\n", "- `tf.tpu.experimental.embedding.FeatureConfig`\n", "- `tf.tpu.experimental.embedding.TableConfig`\n", "\n", "(Refer to the associated API documentation for more details.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "atVciNgPs0fw" }, "outputs": [], "source": [ "strategy = tf.distribute.TPUStrategy(cluster_resolver)\n", "with strategy.scope():\n", " if hasattr(tf.keras.optimizers, \"legacy\"):\n", " optimizer = tf.keras.optimizers.legacy.Adagrad(learning_rate=0.05)\n", " else:\n", " optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.05)\n", " dense_input = tf.keras.Input(shape=(2,), dtype=tf.float32, batch_size=global_batch_size)\n", " sparse_input = tf.keras.Input(shape=(), dtype=tf.int32, batch_size=global_batch_size)\n", " embedded_input = tfrs.layers.embedding.TPUEmbedding(\n", " feature_config=tf.tpu.experimental.embedding.FeatureConfig(\n", " table=tf.tpu.experimental.embedding.TableConfig(\n", " vocabulary_size=10,\n", " dim=5,\n", " initializer=tf.initializers.TruncatedNormal(mean=0.0, stddev=1)),\n", " name=\"sparse_input\"),\n", " optimizer=optimizer)(sparse_input)\n", " input = tf.keras.layers.Concatenate(axis=1)([dense_input, embedded_input])\n", " result = tf.keras.layers.Dense(1)(input)\n", " model = tf.keras.Model(inputs={\"dense_feature\": dense_input, \"sparse_feature\": sparse_input}, outputs=result)\n", " model.compile(optimizer, \"mse\", steps_per_execution=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "FkM2VZyni98F" }, "source": [ "With that, you are ready to train the model with the training dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Kip65sYBlKiu" }, "outputs": [], "source": [ "model.fit(input_dataset, epochs=5, steps_per_epoch=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "r0AEK8sNjLOj" }, "source": [ "Finally, evaluate the model using the evaluation dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6tMRkyfKhqSL" }, "outputs": [], "source": [ "model.evaluate(eval_dataset, steps=1, return_dict=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "a97b888c1911" }, "source": [ "## Next steps" ] }, { "cell_type": "markdown", "metadata": { "id": "gHx_RUL8xcJ3" }, "source": [ "Learn more about setting up TPU-specific embeddings in the API docs:\n", "\n", "- `tfrs.layers.embedding.TPUEmbedding`: particularly about feature and table configuration, setting the optimizer, creating a model (using the Keras [functional](https://www.tensorflow.org/guide/keras/functional) API or via [subclassing](../..guide/keras/custom_layers_and_models.ipynb) `tf.keras.Model`), training/evaluation, and model serving with `tf.saved_model`\n", "- `tf.tpu.experimental.embedding.TableConfig`\n", "- `tf.tpu.experimental.embedding.FeatureConfig`\n", "\n", "For more information about `TPUStrategy` in TensorFlow 2, consider the following resources:\n", "\n", "- Guide: [Use TPUs](../../guide/tpu.ipynb) (covering training with Keras `Model.fit`/a custom training loop with `tf.distribute.TPUStrategy`, as well as tips on improving the performance with `tf.function`)\n", "- Guide: [Distributed training with TensorFlow](../../guide/distributed_training.ipynb)\n", "- Guide: [Migrate from TPUEstimator to TPUStrategy](tpu_estimator.ipynb).\n", "\n", "To learn more about customizing your training, refer to:\n", "\n", "- Guide: [Customize what happens in Model.fit](../..guide/keras/customizing_what_happens_in_fit.ipynb)\n", "- Guide: [Writing a training loop from scratch](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch)\n", "\n", "TPUs—Google's specialized ASICs for machine learning—are available through [Google Colab](https://colab.research.google.com/), the [TPU Research Cloud](https://sites.research.google/trc/), and [Cloud TPU](https://cloud.google.com/tpu)." ] } ], "metadata": { "accelerator": "TPU", "colab": { "collapsed_sections": [], "name": "tpu_embedding.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }