lingvo.core.egdd module
Exponentiated Gradient Delta-Delta optimizer.
- class lingvo.core.egdd.EGDD(learning_rate, momentum, beta=0.9, gain_learning_rate=0.01, scale_learning_rate=0.001, initial_gain=1.0, min_gain=0.01, max_gain=100.0, initial_scale=1.0, min_scale=0.1, max_scale=10.0, use_directions=True, use_signs=True, name='EGDD')[source]
Bases:
Optimizer
A version of GD Momentum with adaptive gain and learning rate.
Exponentiated Gradient Delta-delta optimizer starts with a local gain of 1.0 for every weight and a lr_scale of 1.0 for all weights. The EGDD update rule applies:
momentum <- mu * momentum + learning_rate * gain * grad var <- var - lr_scale * momentum
The gain as well as the lr_scale are updated using the unnormalized exponentiated gradient algorithm [KW97].
Reference: TBA
[KW97] Kivinen, J., & Warmuth, M. K. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 1997.
- _create_slots(var_list)[source]
Create all slots needed by the variables.
- Parameters
var_list – A list of
Variable
objects.
- _prepare()[source]
Create all needed tensors before applying gradients.
This is called with the name_scope using the “name” that users have chosen for the application of gradients.
- _apply_dense(grad, var)[source]
Add ops to apply dense gradients to
var
.- Parameters
grad – A
Tensor
.var – A
Variable
object.
- Returns
An
Operation
.
- _resource_apply_dense(grad, var)[source]
Add ops to apply dense gradients to the variable
handle
.- Parameters
grad – a
Tensor
representing the gradient.handle – a
Tensor
of dtyperesource
which points to the variable to be updated.
- Returns
An
Operation
which updates the value of the variable.
- _resource_apply_sparse(grad_values, var, grad_indices)[source]
Add ops to apply sparse gradients to the variable
handle
.Similar to
_apply_sparse
, theindices
argument to this method has been de-duplicated. Optimizers which deal correctly with non-unique indices may instead override_resource_apply_sparse_duplicate_indices
to avoid this overhead.- Parameters
grad – a
Tensor
representing the gradient for the affected indices.handle – a
Tensor
of dtyperesource
which points to the variable to be updated.indices – a
Tensor
of integral type representing the indices for which the gradient is nonzero. Indices are unique.
- Returns
An
Operation
which updates the value of the variable.
- _apply_sparse(grad, var)[source]
Add ops to apply sparse gradients to
var
.The IndexedSlices object passed to
grad
in this function is by default pre-processed in_apply_sparse_duplicate_indices
to remove duplicate indices (see its docstring for details). Optimizers which can tolerate or have correct special cases for duplicate sparse indices may override_apply_sparse_duplicate_indices
instead of this function, avoiding that overhead.- Parameters
grad –
IndexedSlices
, with no repeated indices.var – A
Variable
object.
- Returns
An
Operation
.