Class AdaDelta
Adadelta optimization is a stochastic gradient descent method that is based on adaptive learning rate per dimension to address two drawbacks:
- the continual decay of learning rates throughout training
- the need for a manually selected global learning rate
Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done. Compared to Adagrad, in the original version of Adadelta you don't have to set an initial learning rate. In this version, initial learning rate can be set, as in most other optimizers.
According to section 4.3 ("Effective Learning rates"), near the end of training step sizes converge to 1 which is effectively a high learning rate which would cause divergence. This occurs only near the end of the training as gradients and step sizes are small, and the epsilon constant in the numerator and denominator dominate past gradients and parameter updates which converge the learning rate to 1.
According to section 4.4("Speech Data"),where a large neural network with 4 hidden layers was
trained on a corpus of US English data, ADADELTA was used with 100 network replicas.The epsilon
used is 1e-6 with rho=0.95 which converged faster than ADAGRAD, by the following construction:
new AdaDelta(graph, 1.0f, 0.95f, 1e-6f);
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class Optimizer
Optimizer.GradAndVar<T>, Optimizer.Options -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final Stringstatic final floatstatic final floatstatic final floatFields inherited from class Optimizer
globals, graph, tf, VARIABLE_V2 -
Constructor Summary
ConstructorsConstructorDescriptionCreates an AdaDelta OptimizerCreates an AdaDelta OptimizerCreates an AdaDelta OptimizerCreates an AdaDelta Optimizer -
Method Summary
Modifier and TypeMethodDescriptionapplyDense(Ops deps, Output<T> gradient, Output<T> variable) Generates the gradient update operations for the specific variable and gradient.protected voidcreateSlots(List<Output<? extends TType>> variables) Performs a No-op slot creation method.Get the Name of the optimizer.toString()Methods inherited from class Optimizer
applyGradients, computeGradients, createName, createSlot, finish, getSlot, getTF, minimize, minimize, prepare
-
Field Details
-
ACCUMULATOR
- See Also:
-
ACCUMULATOR_UPDATE
- See Also:
-
LEARNING_RATE_DEFAULT
public static final float LEARNING_RATE_DEFAULT- See Also:
-
RHO_DEFAULT
public static final float RHO_DEFAULT- See Also:
-
EPSILON_DEFAULT
public static final float EPSILON_DEFAULT- See Also:
-
-
Constructor Details
-
AdaDelta
-
AdaDelta
Creates an AdaDelta Optimizer- Parameters:
graph- the TensorFlow GraphlearningRate- the learning rate
-
AdaDelta
Creates an AdaDelta Optimizer- Parameters:
graph- the TensorFlow GraphlearningRate- the learning raterho- The decay factorepsilon- A constant epsilon used to better conditioning the grad update
-
AdaDelta
-
AdaDelta
Creates an AdaDelta Optimizer- Parameters:
graph- the TensorFlow Graphname- the name for this Optimizer (defaults to 'Adadelta')learningRate- the learning raterho- The decay factorepsilon- A constant epsilon used to better conditioning the grad update
-
-
Method Details
-
createSlots
Performs a No-op slot creation method.- Overrides:
createSlotsin classOptimizer- Parameters:
variables- The variables to create slots for.
-
applyDense
Generates the gradient update operations for the specific variable and gradient.- Specified by:
applyDensein classOptimizer- Type Parameters:
T- The type of the variable.- Parameters:
gradient- The gradient to use.variable- The variable to update.- Returns:
- An operand which applies the desired optimizer update to the variable.
-
toString
-
getOptimizerName
Get the Name of the optimizer.- Specified by:
getOptimizerNamein classOptimizer- Returns:
- The optimizer name.
-