lingvo.core.egdd module
Exponentiated Gradient Delta-Delta optimizer.
- class lingvo.core.egdd.EGDD(learning_rate, momentum, beta=0.9, gain_learning_rate=0.01, scale_learning_rate=0.001, initial_gain=1.0, min_gain=0.01, max_gain=100.0, initial_scale=1.0, min_scale=0.1, max_scale=10.0, use_directions=True, use_signs=True, name='EGDD')[source]
Bases:
OptimizerA version of GD Momentum with adaptive gain and learning rate.
Exponentiated Gradient Delta-delta optimizer starts with a local gain of 1.0 for every weight and a lr_scale of 1.0 for all weights. The EGDD update rule applies:
momentum <- mu * momentum + learning_rate * gain * grad var <- var - lr_scale * momentum
The gain as well as the lr_scale are updated using the unnormalized exponentiated gradient algorithm [KW97].
Reference: TBA
[KW97] Kivinen, J., & Warmuth, M. K. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 1997.
- _create_slots(var_list)[source]
Create all slots needed by the variables.
- Parameters
var_list – A list of
Variableobjects.
- _prepare()[source]
Create all needed tensors before applying gradients.
This is called with the name_scope using the “name” that users have chosen for the application of gradients.
- _apply_dense(grad, var)[source]
Add ops to apply dense gradients to
var.- Parameters
grad – A
Tensor.var – A
Variableobject.
- Returns
An
Operation.
- _resource_apply_dense(grad, var)[source]
Add ops to apply dense gradients to the variable
handle.- Parameters
grad – a
Tensorrepresenting the gradient.handle – a
Tensorof dtyperesourcewhich points to the variable to be updated.
- Returns
An
Operationwhich updates the value of the variable.
- _resource_apply_sparse(grad_values, var, grad_indices)[source]
Add ops to apply sparse gradients to the variable
handle.Similar to
_apply_sparse, theindicesargument to this method has been de-duplicated. Optimizers which deal correctly with non-unique indices may instead override_resource_apply_sparse_duplicate_indicesto avoid this overhead.- Parameters
grad – a
Tensorrepresenting the gradient for the affected indices.handle – a
Tensorof dtyperesourcewhich points to the variable to be updated.indices – a
Tensorof integral type representing the indices for which the gradient is nonzero. Indices are unique.
- Returns
An
Operationwhich updates the value of the variable.
- _apply_sparse(grad, var)[source]
Add ops to apply sparse gradients to
var.The IndexedSlices object passed to
gradin this function is by default pre-processed in_apply_sparse_duplicate_indicesto remove duplicate indices (see its docstring for details). Optimizers which can tolerate or have correct special cases for duplicate sparse indices may override_apply_sparse_duplicate_indicesinstead of this function, avoiding that overhead.- Parameters
grad –
IndexedSlices, with no repeated indices.var – A
Variableobject.
- Returns
An
Operation.