lingvo.core.egdd module

Exponentiated Gradient Delta-Delta optimizer.

class lingvo.core.egdd.EGDD(learning_rate, momentum, beta=0.9, gain_learning_rate=0.01, scale_learning_rate=0.001, initial_gain=1.0, min_gain=0.01, max_gain=100.0, initial_scale=1.0, min_scale=0.1, max_scale=10.0, use_directions=True, use_signs=True, name='EGDD')[source]

Bases: Optimizer

A version of GD Momentum with adaptive gain and learning rate.

Exponentiated Gradient Delta-delta optimizer starts with a local gain of 1.0 for every weight and a lr_scale of 1.0 for all weights. The EGDD update rule applies:

momentum <- mu * momentum + learning_rate * gain * grad var <- var - lr_scale * momentum

The gain as well as the lr_scale are updated using the unnormalized exponentiated gradient algorithm [KW97].

Reference: TBA

[KW97] Kivinen, J., & Warmuth, M. K. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 1997.

_create_slots(var_list)[source]

Create all slots needed by the variables.

Parameters: var_list – A list of Variable objects.

_prepare()[source]

Create all needed tensors before applying gradients.

This is called with the name_scope using the “name” that users have chosen for the application of gradients.

_apply_dense(grad, var)[source]

Add ops to apply dense gradients to var.

Parameters

grad – A Tensor.
var – A Variable object.

Returns

An Operation.

_resource_apply_dense(grad, var)[source]

Add ops to apply dense gradients to the variable handle.

Parameters

grad – a Tensor representing the gradient.
handle – a Tensor of dtype resource which points to the variable to be updated.

Returns

An Operation which updates the value of the variable.

_resource_apply_sparse(grad_values, var, grad_indices)[source]

Add ops to apply sparse gradients to the variable handle.

Similar to _apply_sparse, the indices argument to this method has been de-duplicated. Optimizers which deal correctly with non-unique indices may instead override _resource_apply_sparse_duplicate_indices to avoid this overhead.

Parameters

grad – a Tensor representing the gradient for the affected indices.
handle – a Tensor of dtype resource which points to the variable to be updated.
indices – a Tensor of integral type representing the indices for which the gradient is nonzero. Indices are unique.

Returns

An Operation which updates the value of the variable.

_apply_sparse(grad, var)[source]

Add ops to apply sparse gradients to var.

The IndexedSlices object passed to grad in this function is by default pre-processed in _apply_sparse_duplicate_indices to remove duplicate indices (see its docstring for details). Optimizers which can tolerate or have correct special cases for duplicate sparse indices may override _apply_sparse_duplicate_indices instead of this function, avoiding that overhead.

Parameters

grad – IndexedSlices, with no repeated indices.
var – A Variable object.

Returns

An Operation.