{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "wJcYs_ERTnnI" }, "source": [ "##### Copyright 2021 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "HMUDt0CiUJk9" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "77z2OchJTk0l" }, "source": [ "# Migration examples: Canned Estimators\n", "\n", "\n", " \n", " \n", " \n", "
\n", " \n", " View on TensorFlow.org\n", " \n", " \n", " \n", " Run in Google Colab\n", " \n", " \n", " \n", " View source on GitHub\n", " \n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "meUTrR4I6m1C" }, "source": [ "Canned (or Premade) Estimators have traditionally been used in TensorFlow 1 as quick and easy ways to train models for a variety of typical use cases. TensorFlow 2 provides straightforward approximate substitutes for a number of them by way of Keras models. For those canned estimators that do not have built-in TensorFlow 2 substitutes, you can still build your own replacement fairly easily.\n", "\n", "This guide will walk you through a few examples of direct equivalents and custom substitutions to demonstrate how TensorFlow 1's `tf.estimator`-derived models can be migrated to TensorFlow 2 with Keras.\n", "\n", "Namely, this guide includes examples for migrating:\n", "* From `tf.estimator`'s `LinearEstimator`, `Classifier` or `Regressor` in TensorFlow 1 to Keras `tf.compat.v1.keras.models.LinearModel` in TensorFlow 2\n", "* From `tf.estimator`'s `DNNEstimator`, `Classifier` or `Regressor` in TensorFlow 1 to a custom Keras DNN ModelKeras in TensorFlow 2\n", "* From `tf.estimator`'s `DNNLinearCombinedEstimator`, `Classifier` or `Regressor` in TensorFlow 1 to `tf.compat.v1.keras.models.WideDeepModel` in TensorFlow 2\n", "* From `tf.estimator`'s `BoostedTreesEstimator`, `Classifier` or `Regressor` in TensorFlow 1 to `tfdf.keras.GradientBoostedTreesModel` in TensorFlow 2\n", "\n", "A common precursor to the training of a model is feature preprocessing, which is done for TensorFlow 1 Estimator models with `tf.feature_column`. For more information on feature preprocessing in TensorFlow 2, see [this guide on migrating from feature columns to the Keras preprocessing layers API](migrating_feature_columns.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "YdZSoIXEbhg-" }, "source": [ "## Setup\n", "\n", "Start with a couple of necessary TensorFlow imports," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qsgZp0f-nu9s" }, "outputs": [], "source": [ "!pip install tensorflow_decision_forests" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iE0vSfMXumKI" }, "outputs": [], "source": [ "import pandas as pd\n", "import tensorflow as tf\n", "import tensorflow.compat.v1 as tf1\n", "import tensorflow_decision_forests as tfdf\n", "from tensorflow import keras\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Jsm9Rxx7s1OZ" }, "source": [ "prepare some simple data for demonstration from the standard Titanic dataset," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wC6i_bEZPrPY" }, "outputs": [], "source": [ "x_train = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')\n", "x_eval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')\n", "x_train['sex'].replace(('male', 'female'), (0, 1), inplace=True)\n", "x_eval['sex'].replace(('male', 'female'), (0, 1), inplace=True)\n", "\n", "x_train['alone'].replace(('n', 'y'), (0, 1), inplace=True)\n", "x_eval['alone'].replace(('n', 'y'), (0, 1), inplace=True)\n", "\n", "x_train['class'].replace(('First', 'Second', 'Third'), (1, 2, 3), inplace=True)\n", "x_eval['class'].replace(('First', 'Second', 'Third'), (1, 2, 3), inplace=True)\n", "\n", "x_train.drop(['embark_town', 'deck'], axis=1, inplace=True)\n", "x_eval.drop(['embark_town', 'deck'], axis=1, inplace=True)\n", "\n", "y_train = x_train.pop('survived')\n", "y_eval = x_eval.pop('survived')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lqe9obf7suIj" }, "outputs": [], "source": [ "# Data setup for TensorFlow 1 with `tf.estimator`\n", "def _input_fn():\n", " return tf1.data.Dataset.from_tensor_slices((dict(x_train), y_train)).batch(32)\n", "\n", "\n", "def _eval_input_fn():\n", " return tf1.data.Dataset.from_tensor_slices((dict(x_eval), y_eval)).batch(32)\n", "\n", "\n", "FEATURE_NAMES = [\n", " 'age', 'fare', 'sex', 'n_siblings_spouses', 'parch', 'class', 'alone'\n", "]\n", "\n", "feature_columns = []\n", "for fn in FEATURE_NAMES:\n", " feat_col = tf1.feature_column.numeric_column(fn, dtype=tf.float32)\n", " feature_columns.append(feat_col)" ] }, { "cell_type": "markdown", "metadata": { "id": "bYSgoezeMrpI" }, "source": [ "and create a method to instantiate a simplistic sample optimizer to use with various TensorFlow 1 Estimator and TensorFlow 2 Keras models." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YHB_nuzVLVLe" }, "outputs": [], "source": [ "def create_sample_optimizer(tf_version):\n", " if tf_version == 'tf1':\n", " optimizer = lambda: tf.keras.optimizers.legacy.Ftrl(\n", " l1_regularization_strength=0.001,\n", " learning_rate=tf1.train.exponential_decay(\n", " learning_rate=0.1,\n", " global_step=tf1.train.get_global_step(),\n", " decay_steps=10000,\n", " decay_rate=0.9))\n", " elif tf_version == 'tf2':\n", " optimizer = tf.keras.optimizers.legacy.Ftrl(\n", " l1_regularization_strength=0.001,\n", " learning_rate=tf.keras.optimizers.schedules.ExponentialDecay(\n", " initial_learning_rate=0.1, decay_steps=10000, decay_rate=0.9))\n", " return optimizer" ] }, { "cell_type": "markdown", "metadata": { "id": "4uXff1BEssdE" }, "source": [ "## Example 1: Migrating from LinearEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "_O7fyhCnpvED" }, "source": [ "### TensorFlow 1: Using LinearEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "A9560BqEOTpb" }, "source": [ "In TensorFlow 1, you can use `tf.estimator.LinearEstimator` to create a baseline linear model for regression and classification problems." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "oWfh0QW4IXTn" }, "outputs": [], "source": [ "linear_estimator = tf.estimator.LinearEstimator(\n", " head=tf.estimator.BinaryClassHead(),\n", " feature_columns=feature_columns,\n", " optimizer=create_sample_optimizer('tf1'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hi77Sg4k-0TR" }, "outputs": [], "source": [ "linear_estimator.train(input_fn=_input_fn, steps=100)\n", "linear_estimator.evaluate(input_fn=_eval_input_fn, steps=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "KEmzBjfnsxwT" }, "source": [ "### TensorFlow 2: Using Keras LinearModel" ] }, { "cell_type": "markdown", "metadata": { "id": "fkgkGf_AOaRR" }, "source": [ "In TensorFlow 2, you can create an instance of the Keras `tf.compat.v1.keras.models.LinearModel` which is the substitute to the `tf.estimator.LinearEstimator`. The `tf.compat.v1.keras` path is used to signify that the pre-made model exists for compatibility." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Kip65sYBlKiu" }, "outputs": [], "source": [ "linear_model = tf.compat.v1.keras.experimental.LinearModel()\n", "linear_model.compile(loss='mse', optimizer=create_sample_optimizer('tf2'), metrics=['accuracy'])\n", "linear_model.fit(x_train, y_train, epochs=10)\n", "linear_model.evaluate(x_eval, y_eval, return_dict=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "RRrj78Lqplni" }, "source": [ "## Example 2: Migrating from DNNEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "YKl6XZ7Bp1t5" }, "source": [ "### TensorFlow 1: Using DNNEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "J7wJUmgypln8" }, "source": [ "In TensorFlow 1, you can use `tf.estimator.DNNEstimator` to create a baseline deep neural network (DNN) model for regression and classification problems." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qHbgXCzfpln9" }, "outputs": [], "source": [ "dnn_estimator = tf.estimator.DNNEstimator(\n", " head=tf.estimator.BinaryClassHead(),\n", " feature_columns=feature_columns,\n", " hidden_units=[128],\n", " activation_fn=tf.nn.relu,\n", " optimizer=create_sample_optimizer('tf1'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6DTnXxU2pln-" }, "outputs": [], "source": [ "dnn_estimator.train(input_fn=_input_fn, steps=100)\n", "dnn_estimator.evaluate(input_fn=_eval_input_fn, steps=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "6xJz6px6pln-" }, "source": [ "### TensorFlow 2: Using Keras to create a custom DNN model" ] }, { "cell_type": "markdown", "metadata": { "id": "7cgc72rzpln-" }, "source": [ "In TensorFlow 2, you can create a custom DNN model to substitute for one generated by `tf.estimator.DNNEstimator`, with similar levels of user-specified customization (for instance, as in the previous example, the ability to customize a chosen model optimizer).\n", "\n", "A similar workflow can be used to replace `tf.estimator.experimental.RNNEstimator` with a Keras recurrent neural network (RNN) model. Keras provides a number of built-in, customizable choices by way of `tf.keras.layers.RNN`, `tf.keras.layers.LSTM`, and `tf.keras.layers.GRU`. To learn more, check out the _Built-in RNN layers: a simple example_ section of [RNN with Keras guide](https://www.tensorflow.org/guide/keras/rnn)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "B5SdsjlL49RG" }, "outputs": [], "source": [ "dnn_model = tf.keras.models.Sequential(\n", " [tf.keras.layers.Dense(128, activation='relu'),\n", " tf.keras.layers.Dense(1)])\n", "\n", "dnn_model.compile(loss='mse', optimizer=create_sample_optimizer('tf2'), metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JQmRw9_Upln_" }, "outputs": [], "source": [ "dnn_model.fit(x_train, y_train, epochs=10)\n", "dnn_model.evaluate(x_eval, y_eval, return_dict=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "UeBHZ0cd1Pl2" }, "source": [ "## Example 3: Migrating from DNNLinearCombinedEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "GfRaObf5g4TU" }, "source": [ "### TensorFlow 1: Using DNNLinearCombinedEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "2r13RMX-g4TV" }, "source": [ "In TensorFlow 1, you can use `tf.estimator.DNNLinearCombinedEstimator` to create a baseline combined model for regression and classification problems with customization capacity for both its linear and DNN components." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OyyDCqc5j7rf" }, "outputs": [], "source": [ "optimizer = create_sample_optimizer('tf1')\n", "\n", "combined_estimator = tf.estimator.DNNLinearCombinedEstimator(\n", " head=tf.estimator.BinaryClassHead(),\n", " # Wide settings\n", " linear_feature_columns=feature_columns,\n", " linear_optimizer=optimizer,\n", " # Deep settings\n", " dnn_feature_columns=feature_columns,\n", " dnn_hidden_units=[128],\n", " dnn_optimizer=optimizer)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aXN-BxwzmRaf" }, "outputs": [], "source": [ "combined_estimator.train(input_fn=_input_fn, steps=100)\n", "combined_estimator.evaluate(input_fn=_eval_input_fn, steps=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "BeMikL5ug4TX" }, "source": [ "### TensorFlow 2: Using Keras WideDeepModel" ] }, { "cell_type": "markdown", "metadata": { "id": "CYByxxBhg4TX" }, "source": [ "In TensorFlow 2, you can create an instance of the Keras `tf.compat.v1.keras.models.WideDeepModel` to substitute for one generated by `tf.estimator.DNNLinearCombinedEstimator`, with similar levels of user-specified customization (for instance, as in the previous example, the ability to customize a chosen model optimizer).\n", "\n", "This `WideDeepModel` is constructed on the basis of a constituent `LinearModel` and a custom DNN Model, both of which are discussed in the preceding two examples. A custom linear model can also be used in place of the built-in Keras `LinearModel` if desired.\n", "\n", "If you would like to build your own model instead of using a canned estimator, check out the [Keras Sequential model](https://www.tensorflow.org/guide/keras/sequential_model) guide. For more information on custom training and optimizers, check out the [Custom training: walkthrough](https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough) guide." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mIFM3e-_RLSX" }, "outputs": [], "source": [ "# Create LinearModel and DNN Model as in Examples 1 and 2\n", "optimizer = create_sample_optimizer('tf2')\n", "\n", "linear_model = tf.compat.v1.keras.experimental.LinearModel()\n", "linear_model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])\n", "linear_model.fit(x_train, y_train, epochs=10, verbose=0)\n", "\n", "dnn_model = tf.keras.models.Sequential(\n", " [tf.keras.layers.Dense(128, activation='relu'),\n", " tf.keras.layers.Dense(1)])\n", "dnn_model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mFmQz9kjmMSx" }, "outputs": [], "source": [ "combined_model = tf.compat.v1.keras.experimental.WideDeepModel(linear_model,\n", " dnn_model)\n", "combined_model.compile(\n", " optimizer=[optimizer, optimizer], loss='mse', metrics=['accuracy'])\n", "combined_model.fit([x_train, x_train], y_train, epochs=10)\n", "combined_model.evaluate(x_eval, y_eval, return_dict=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "wP1DBRhpeOJn" }, "source": [ "## Example 4: Migrating from BoostedTreesEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "_3mCQVDSeOKD" }, "source": [ "### TensorFlow 1: Using BoostedTreesEstimator" ] }, { "cell_type": "markdown", "metadata": { "id": "oEWYHNt4eOKD" }, "source": [ "In TensorFlow 1, you could use `tf.estimator.BoostedTreesEstimator` to create a baseline to create a baseline Gradient Boosting model using an ensemble of decision trees for regression and classification problems. This functionality is no longer included in TensorFlow 2." ] }, { "cell_type": "markdown", "metadata": { "id": "wliVIER1jLnA" }, "source": [ "```\n", "bt_estimator = tf1.estimator.BoostedTreesEstimator(\n", " head=tf.estimator.BinaryClassHead(),\n", " n_batches_per_layer=1,\n", " max_depth=10,\n", " n_trees=1000,\n", " feature_columns=feature_columns)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "-K87uBrZjR0u" }, "source": [ "```\n", "bt_estimator.train(input_fn=_input_fn, steps=1000)\n", "bt_estimator.evaluate(input_fn=_eval_input_fn, steps=100)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "eNuLP6BeeOKF" }, "source": [ "### TensorFlow 2: Using TensorFlow Decision Forests" ] }, { "cell_type": "markdown", "metadata": { "id": "m3EVq388eOKF" }, "source": [ "In TensorFlow 2, `tf.estimator.BoostedTreesEstimator` is replaced by [tfdf.keras.GradientBoostedTreesModel](https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/GradientBoostedTreesModel#attributes) from the [TensorFlow Decision Forests](https://www.tensorflow.org/decision_forests) package.\n", "\n", "TensorFlow Decision Forests provides various advantages over the `tf.estimator.BoostedTreesEstimator`, notably regarding quality, speed, ease of use and flexibility. To learn about TensorFlow Decision Forests, start with the [beginner colab](https://www.tensorflow.org/decision_forests/tutorials/beginner_colab).\n", "\n", "The following example shows how to train a Gradient Boosted Trees model using TensorFlow 2:" ] }, { "cell_type": "markdown", "metadata": { "id": "UB90fXJdVWC5" }, "source": [ "Install TensorFlow Decision Forests." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9097mTCIVVE9" }, "outputs": [], "source": [ "!pip install tensorflow_decision_forests" ] }, { "cell_type": "markdown", "metadata": { "id": "B1qTdAS-VpXk" }, "source": [ "Create a TensorFlow dataset. Note that Decision Forests natively support many types of features and do not need pre-processing." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jkjFHmDTVswY" }, "outputs": [], "source": [ "train_dataframe = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')\n", "eval_dataframe = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')\n", "\n", "# Convert the Pandas Dataframes into TensorFlow datasets.\n", "train_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(train_dataframe, label=\"survived\")\n", "eval_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(eval_dataframe, label=\"survived\")" ] }, { "cell_type": "markdown", "metadata": { "id": "7fPa-LfDWDzB" }, "source": [ "Train the model on the `train_dataset` dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JO0yCH9hWPvJ" }, "outputs": [], "source": [ "# Use the default hyper-parameters of the model.\n", "gbt_model = tfdf.keras.GradientBoostedTreesModel()\n", "gbt_model.fit(train_dataset)" ] }, { "cell_type": "markdown", "metadata": { "id": "2Y5xm29AWGxt" }, "source": [ "Evaluate the quality of the model on the `eval_dataset` dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JLS_2vKKeOKF" }, "outputs": [], "source": [ "gbt_model.compile(metrics=['accuracy'])\n", "gbt_evaluation = gbt_model.evaluate(eval_dataset, return_dict=True)\n", "print(gbt_evaluation)" ] }, { "cell_type": "markdown", "metadata": { "id": "Z22UJ5SUqToQ" }, "source": [ "Gradient Boosted Trees is just one of the many decision forest algorithms available in TensorFlow Decision Forests. For example, Random Forests (available as [tfdf.keras.GradientBoostedTreesModel](https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/RandomForestModel) is very resistant to overfitting) while CART (available as [tfdf.keras.CartModel](https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/CartModel)) is great for model interpretation.\n", "\n", "In the next example, train and plot a Random Forest model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "W3slOhn4Zi9X" }, "outputs": [], "source": [ "# Train a Random Forest model\n", "rf_model = tfdf.keras.RandomForestModel()\n", "rf_model.fit(train_dataset)\n", "\n", "# Evaluate the Random Forest model\n", "rf_model.compile(metrics=['accuracy'])\n", "rf_evaluation = rf_model.evaluate(eval_dataset, return_dict=True)\n", "print(rf_evaluation)" ] }, { "cell_type": "markdown", "metadata": { "id": "Z0QYolhoZb_k" }, "source": [ "In the final example, train and evaluate a CART model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "027bGnCork_W" }, "outputs": [], "source": [ "# Train a CART model\n", "cart_model = tfdf.keras.CartModel()\n", "cart_model.fit(train_dataset)\n", "\n", "# Plot the CART model\n", "tfdf.model_plotter.plot_model_in_colab(cart_model, max_depth=2)" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "canned_estimators.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }