eager_guide.Rmd
Eager execution is a way to train a Keras model without building a graph. Operations return values, not tensors. Consequently, you can inspect what goes in and comes out of an operation simply by printing a variable’s contents. This is an important advantage in model development and debugging.
You can use eager execution with Keras as long as you use the TensorFlow implementation. This guide gives an outline of the workflow by way of a simple regression example. Specifically, you will see how to:
To use eager execution with Keras, you need a current version of the R package keras
with a TensorFlow backend of version at least 1.9.
The following preamble is required when using eager execution:
library(keras)
# make sure we use the tensorflow implementation of Keras
# this line has to be executed immediately after loading the library
use_implementation("tensorflow")
library(tensorflow)
# enable eager execution
# the argument device_policy is needed only when using a GPU
tfe_enable_eager_execution(device_policy = "silent")
When in doubt, check if you are in fact using eager execution:
Models for use with eager execution are defined as Keras custom models.
Custom models are usually made up of normal Keras layers, which you configure as usual. However, you are free to implement custom logic in the model’s (implicit) call function.
Our simple regression example will use iris
to predict Sepal.Width
from Petal.Length
, Sepal.Length
and Petal.Width
.
Here is a model that can be used for that purpose:
# model instantiator
iris_regression_model <- function(name = NULL) {
keras_model_custom(name = name, function(self) {
# define any number of layers here
self$dense1 <- layer_dense(units = 32)
self$dropout <- layer_dropout(rate = 0.5)
self$dense2 <- layer_dense(units = 1)
# this is the "call" function that defines what happens when the model is called
function (x, mask = NULL) {
x %>%
self$dense1() %>%
self$dropout() %>%
self$dense2()
}
})
}
The model is created simply by instantiating it via its wrapper:
At this point, the shapes of the model’s weights are still unknown (note how no input_shape
has been defined for its first layer). You can, however, already call the model on some dummy data:
tf.Tensor(
[[-1.1474639]
[-1.0472134]], shape=(2, 1), dtype=float32)
After that call, you can inspect the model’s weights using
This will not just display the tensor shapes, but the actual weight values.
An appropriate loss function for a regression task like this is mean squared error:
mse_loss <- function(y_true, y_pred, x) {
# it's required to use a TensorFlow function here, not loss_mean_squared_error() from Keras
mse <- tf$losses$mean_squared_error(y_true, y_pred)
# here you could compute and add other losses
mse
}
Note how we have to use loss functions from TensorFlow, not the Keras equivalents. In the same vein, we need to use an optimizer from the tf$train
module.
In eager execution, you use tfdatasets to stream input and target data to the model. In our simple iris
example, we use tensor_slices_dataset to directly create a dataset from the underlying R matrices x_train
and y_train
.
However, a wide variety of other dataset creation functions is available. Datasets also allow for a variety of pre-processing transformations.
x_train <-
iris[1:120, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix()
y_train <-
iris[1:120, c("Sepal.Width")] %>% as.matrix()
# Convert to approriate tensor floating point type for backend
x_train <- k_constant(x_train)
y_train <- k_constant(y_train)
# same for test set
x_test <-
iris[121:150, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix()
y_test <-
iris[121:150, c("Sepal.Width")] %>% as.matrix()
x_test <- k_constant(x_test)
y_test <- k_constant(y_test)
library(tfdatasets)
train_dataset <- tensor_slices_dataset(list (x_train, y_train)) %>%
dataset_batch(10)
test_dataset <- tensor_slices_dataset(list (x_test, y_test)) %>%
dataset_batch(10)
Data is accessed from a dataset via make_iterator_one_shot
(to create an iterator) and iterator_get_next
(to obtain the next batch).
Datasets are available in non-eager (graph) execution as well. However, in eager mode, we can examine the actual values returned from the iterator:
[[1]]
tf.Tensor(
[[1.4 5.1 0.2]
[1.4 4.9 0.2]
[1.3 4.7 0.2]
[1.5 4.6 0.2]
[1.4 5. 0.2]
[1.7 5.4 0.4]
[1.4 4.6 0.3]
[1.5 5. 0.2]
[1.4 4.4 0.2]
[1.5 4.9 0.1]], shape=(10, 3), dtype=float32)
[[2]]
tf.Tensor(
[[3.5]
[3. ]
[3.2]
[3.1]
[3.6]
[3.9]
[3.4]
[3.4]
[2.9]
[3.1]], shape=(10, 1), dtype=float32)
With eager execution, you take full control over the training process.
In general, you will have at least two loops: an outer loop over epochs, and an inner loop over batches of data returned by the iterator (implemented implicitly by until_out_of_range
). The iterator is recreated at the start of each new epoch.
n_epochs <- 10
for (i in seq_len(n_epochs)) {
iter <- make_iterator_one_shot(train_dataset)
total_loss <- 0
until_out_of_range({
# get a new batch and run forward pass on it
# calculate loss
# calculate gradients of loss w.r.t. model weights
# update model weights
})
cat("Total loss (epoch): ", i, ": ", as.numeric(total_loss), "\n")
}
Filling in the missing pieces in the above outline, we will see that
model()
.GradientTape
that records all operations.GradientTape
then determines the gradients.Here is the complete code for the training loop:
n_epochs <- 10
# loop over epochs
for (i in seq_len(n_epochs)) {
# create fresh iterator from dataset
iter <- make_iterator_one_shot(train_dataset)
# accumulate current epoch's loss (for display purposes only)
total_loss <- 0
# loop once through the dataset
until_out_of_range({
# get next batch
batch <- iterator_get_next(iter)
x <- batch[[1]]
y <- batch[[2]]
# forward pass is recorded by tf$GradientTape
with(tf$GradientTape() %as% tape, {
# run model on current batch
preds <- model(x)
# compute the loss
loss <- mse_loss(y, preds, x)
})
# update total loss
total_loss <- total_loss + loss
# get gradients of loss w.r.t. model weights
gradients <- tape$gradient(loss, model$variables)
# update model weights
optimizer$apply_gradients(
purrr::transpose(list(gradients, model$variables)),
global_step = tf$train$get_or_create_global_step()
)
})
cat("Total loss (epoch): ", i, ": ", as.numeric(total_loss), "\n")
}
Getting predictions on the test set is just a call to model
, just like training has been.
To save model weights, create an instance of tf$Checkpoint
and pass it the objects to be saved: in our case, the model
and the optimizer
. This has to happen after the respective objects have been created, but before the training loop.
checkpoint_dir <- "./checkpoints"
checkpoint_prefix <- file.path(checkpoint_dir, "ckpt")
checkpoint <-
tf$train$Checkpoint(
optimizer = optimizer,
model = model
)
Then at the end of each epoch, you save the model’s current weights, like so:
This call saves model weights only, not the complete graph. Thus on restore, you re-create all components in the same way as above, and then load saved the model weights using e.g.
# restore from recent checkpoint, you can also use a different one
checkpoint$restore(tf$train$latest_checkpoint(checkpoint_dir))
You can then obtain predictions from the restored model, on the test set as a whole or batch-wise, using an iterator.
Here is the complete example.
library(keras)
use_implementation("tensorflow")
library(tensorflow)
tfe_enable_eager_execution(device_policy = "silent")
library(tfdatasets)
# Prepare training and test sets ------------------------------------------
x_train <-
iris[1:120, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix()
x_train <- k_constant(x_train)
y_train <-
iris[1:120, c("Sepal.Width")] %>% as.matrix()
y_train <- k_constant(y_train)
x_test <-
iris[121:150, c("Petal.Length", "Sepal.Length", "Petal.Width")] %>% as.matrix()
x_test <- k_constant(x_test)
y_test <-
iris[121:150, c("Sepal.Width")] %>% as.matrix()
y_test <- k_constant(y_test)
# Create datasets for training and testing --------------------------------
train_dataset <- tensor_slices_dataset(list (x_train, y_train)) %>%
dataset_batch(10)
test_dataset <- tensor_slices_dataset(list (x_test, y_test)) %>%
dataset_batch(10)
# Create model ------------------------------------------------------------
iris_regression_model <- function(name = NULL) {
keras_model_custom(name = name, function(self) {
self$dense1 <- layer_dense(units = 32, input_shape = 3)
self$dropout <- layer_dropout(rate = 0.5)
self$dense2 <- layer_dense(units = 1)
function (x, mask = NULL) {
self$dense1(x) %>%
self$dropout() %>%
self$dense2()
}
})
}
model <- iris_regression_model()
# Define loss function and optimizer --------------------------------------
mse_loss <- function(y_true, y_pred, x) {
mse <- tf$losses$mean_squared_error(y_true, y_pred)
mse
}
optimizer <- tf$train$AdamOptimizer()
# Set up checkpointing ----------------------------------------------------
checkpoint_dir <- "./checkpoints"
checkpoint_prefix <- file.path(checkpoint_dir, "ckpt")
checkpoint <-
tf$train$Checkpoint(optimizer = optimizer,
model = model)
n_epochs <- 10
# change to TRUE if you want to restore weights
restore <- FALSE
if (!restore) {
for (i in seq_len(n_epochs)) {
iter <- make_iterator_one_shot(train_dataset)
total_loss <- 0
until_out_of_range({
batch <- iterator_get_next(iter)
x <- batch[[1]]
y <- batch[[2]]
with(tf$GradientTape() %as% tape, {
preds <- model(x)
loss <- mse_loss(y, preds, x)
})
total_loss <- total_loss + loss
gradients <- tape$gradient(loss, model$variables)
optimizer$apply_gradients(purrr::transpose(list(gradients, model$variables)),
global_step = tf$train$get_or_create_global_step())
})
cat("Total loss (epoch): ", i, ": ", as.numeric(total_loss), "\n")
checkpoint$save(file_prefix = checkpoint_prefix)
}
} else {
checkpoint$restore(tf$train$latest_checkpoint(checkpoint_dir))
}
# Get model predictions on test set ---------------------------------------
model(x_test)
iter <- make_iterator_one_shot(test_dataset)
until_out_of_range({
batch <- iterator_get_next(iter)
preds <- model(batch[[1]])
print(preds)
})
In this guide, the task - and consequently, the custom model, associated loss and training routine - have been chosen for their simplicity. Visit the TensorFlow for R blog for case studies and paper implementations that use more intricate custom logic.