Adamax optimizer

Adamax optimizer from Section 7 of the Adam paper. It is a variant of Adam based on the infinity norm.

optimizer_adamax(lr = 0.002, beta_1 = 0.9, beta_2 = 0.999,
  epsilon = NULL, decay = 0, clipnorm = NULL, clipvalue = NULL)

Arguments

lr	float >= 0. Learning rate.
beta_1	The exponential decay rate for the 1st moment estimates. float, 0 < beta < 1. Generally close to 1.
beta_2	The exponential decay rate for the 2nd moment estimates. float, 0 < beta < 1. Generally close to 1.
epsilon	float >= 0. Fuzz factor. If `NULL`, defaults to `k_epsilon()`.
decay	float >= 0. Learning rate decay over each update.
clipnorm	Gradients will be clipped when their L2 norm exceeds this value.
clipvalue	Gradients will be clipped when their absolute value exceeds this value.