Model loss functions

loss_mean_squared_error(y_true, y_pred)

loss_mean_absolute_error(y_true, y_pred)

loss_mean_absolute_percentage_error(y_true, y_pred)

loss_mean_squared_logarithmic_error(y_true, y_pred)

loss_squared_hinge(y_true, y_pred)

loss_hinge(y_true, y_pred)

loss_categorical_hinge(y_true, y_pred)

loss_logcosh(y_true, y_pred)

loss_categorical_crossentropy(y_true, y_pred)

loss_sparse_categorical_crossentropy(y_true, y_pred)

loss_binary_crossentropy(y_true, y_pred)

loss_kullback_leibler_divergence(y_true, y_pred)

loss_poisson(y_true, y_pred)

loss_cosine_proximity(y_true, y_pred)

Arguments

y_true

True labels (Tensor)

y_pred

Predictions (Tensor of the same shape as y_true)

Details

Loss functions are to be supplied in the loss parameter of the compile.keras.engine.training.Model() function.

Loss functions can be specified either using the name of a built in loss function (e.g. 'loss = binary_crossentropy'), a reference to a built in loss function (e.g. 'loss = loss_binary_crossentropy()') or by passing an artitrary function that returns a scalar for each data-point and takes the following two arguments:

  • y_true True labels (Tensor)

  • y_pred Predictions (Tensor of the same shape as y_true)

The actual optimized objective is the mean of the output array across all datapoints.

Categorical Crossentropy

When using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros except for a 1 at the index corresponding to the class of the sample). In order to convert integer targets into categorical targets, you can use the Keras utility function to_categorical():

categorical_labels <- to_categorical(int_labels, num_classes = NULL)

loss_logcosh

log(cosh(x)) is approximately equal to (x ** 2) / 2 for small x and to abs(x) - log(2) for large x. This means that 'logcosh' works mostly like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction. However, it may return NaNs if the intermediate value cosh(y_pred - y_true) is too large to be represented in the chosen precision.

See also