Class Adam

java.lang.Object
org.tensorflow.framework.optimizers.Optimizer
org.tensorflow.framework.optimizers.Adam

@Operator public class Adam extends Optimizer
Optimizer that implements the Adam algorithm.

Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments.

According to Kingma et al., 2014, the method is "computationally efficient, has little memory requirement, invariant to diagonal rescaling of gradients, and is well suited for problems that are large in terms of data/parameters".

@see Kingma et al., 2014, Adam: A Method for Stochastic Optimization.

  • Field Details

  • Constructor Details

    • Adam

      public Adam(Graph graph)
      Creates an Adam optimizer
      Parameters:
      graph - the TensorFlow graph
    • Adam

      public Adam(Graph graph, float learningRate)
      Creates an Adam optimizer
      Parameters:
      graph - the TensorFlow graph
      learningRate - the learning rate
    • Adam

      public Adam(Graph graph, float learningRate, float betaOne, float betaTwo, float epsilon)
      Creates an Adam optimizer
      Parameters:
      graph - the TensorFlow graph
      learningRate - the learning rate
      betaOne - The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
      betaTwo - The exponential decay rate for the 2nd moment estimates. Defaults to 0.999.
      epsilon - A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper. Defaults to 1e-8.
    • Adam

      public Adam(Graph graph, String name, float learningRate)
      Creates an Adam optimizer
      Parameters:
      graph - the TensorFlow graph
      name - the Optimizer name, defaults to "Adam"
      learningRate - the learning rate
    • Adam

      public Adam(Graph graph, String name, float learningRate, float betaOne, float betaTwo, float epsilon)
      Creates an Adam optimizer
      Parameters:
      graph - the TensorFlow graph
      name - the Optimizer name, defaults to "Adam"
      learningRate - the learning rate
      betaOne - The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
      betaTwo - The exponential decay rate for the 2nd moment estimates. Defaults to 0.999.
      epsilon - A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper. Defaults to 1e-8.
  • Method Details

    • createAdamMinimize

      @Endpoint(name="adam_minimize") public static <T extends TType> Op createAdamMinimize(Scope scope, Operand<T> loss, float learningRate, float betaOne, float betaTwo, float epsilon, Optimizer.Options... options)
      Creates the Operation that minimizes the loss
      Type Parameters:
      T - the data type for the loss
      Parameters:
      scope - the TensorFlow scope
      loss - the loss to minimize
      learningRate - the learning rate
      betaOne - The exponential decay rate for the 1st moment estimates.
      betaTwo - The exponential decay rate for the 2nd moment estimates.
      epsilon - A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
      options - Optional Optimizer attributes
      Returns:
      the Operation that minimizes the loss
      Throws:
      IllegalArgumentException - if scope does not represent a Graph
    • createSlots

      protected void createSlots(List<Output<? extends TType>> variables)
      Performs a No-op slot creation method.
      Overrides:
      createSlots in class Optimizer
      Parameters:
      variables - The variables to create slots for.
    • prepare

      protected Optional<Op> prepare(String scopeName)
      Returns a No-op prepare.
      Overrides:
      prepare in class Optimizer
      Parameters:
      scopeName - The scope name to use for any variable creations.
      Returns:
      a No-op to prepare this optimizer, or empty if none.
    • applyDense

      protected <T extends TType> Op applyDense(Ops deps, Output<T> gradient, Output<T> variable)
      Generates the gradient update operations for the specific variable and gradient.
      Specified by:
      applyDense in class Optimizer
      Type Parameters:
      T - The type of the variable.
      Parameters:
      gradient - The gradient to use.
      variable - The variable to update.
      Returns:
      An operand which applies the desired optimizer update to the variable.
    • finish

      protected Op finish(List<Op> updateOperations, String name)
      Gathers up the update operations into a single op that can be used as a run target.

      Adds the betaOne and betaTwo updates to the end of the updates list.

      Overrides:
      finish in class Optimizer
      Parameters:
      updateOperations - The update operations.
      name - The name of the run target.
      Returns:
      A NoOp with a control dependency on each update operation.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • getOptimizerName

      public String getOptimizerName()
      Get the Name of the optimizer.
      Specified by:
      getOptimizerName in class Optimizer
      Returns:
      The optimizer name.