Class TrainOps

java.lang.Object
org.tensorflow.op.TrainOps

public final class TrainOps extends Object
An API for building train operations as Ops
See Also:
  • Method Details

    • accumulatorApplyGradient

      public AccumulatorApplyGradient accumulatorApplyGradient(Operand<TString> handle, Operand<TInt64> localStep, Operand<? extends TType> gradient)
      Applies a gradient to a given accumulator. Does not add if local_step is lesser than the accumulator's global_step.
      Parameters:
      handle - The handle to a accumulator.
      localStep - The local_step value at which the gradient was computed.
      gradient - A tensor of the gradient to be accumulated.
      Returns:
      a new instance of AccumulatorApplyGradient
    • accumulatorNumAccumulated

      public AccumulatorNumAccumulated accumulatorNumAccumulated(Operand<TString> handle)
      Returns the number of gradients aggregated in the given accumulators.
      Parameters:
      handle - The handle to an accumulator.
      Returns:
      a new instance of AccumulatorNumAccumulated
    • accumulatorSetGlobalStep

      public AccumulatorSetGlobalStep accumulatorSetGlobalStep(Operand<TString> handle, Operand<TInt64> newGlobalStep)
      Updates the accumulator with a new value for global_step. Logs warning if the accumulator's value is already higher than new_global_step.
      Parameters:
      handle - The handle to an accumulator.
      newGlobalStep - The new global_step value to set.
      Returns:
      a new instance of AccumulatorSetGlobalStep
    • accumulatorTakeGradient

      public <T extends TType> AccumulatorTakeGradient<T> accumulatorTakeGradient(Operand<TString> handle, Operand<TInt32> numRequired, Class<T> dtype)
      Extracts the average gradient in the given ConditionalAccumulator. The op blocks until sufficient (i.e., more than num_required) gradients have been accumulated. If the accumulator has already aggregated more than num_required gradients, it returns the average of the accumulated gradients. Also automatically increments the recorded global_step in the accumulator by 1, and resets the aggregate to 0.
      Type Parameters:
      T - data type for AccumulatorTakeGradient output and operands
      Parameters:
      handle - The handle to an accumulator.
      numRequired - Number of gradients required before we return an aggregate.
      dtype - The data type of accumulated gradients. Needs to correspond to the type of the accumulator.
      Returns:
      a new instance of AccumulatorTakeGradient
    • applyAdaMax

      public <T extends TType> ApplyAdaMax<T> applyAdaMax(Operand<T> var, Operand<T> m, Operand<T> v, Operand<T> beta1Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ApplyAdaMax.Options... options)
      Update '*var' according to the AdaMax algorithm. m_t <- beta1 * m_{t-1} + (1 - beta1) * g v_t <- max(beta2 * v_{t-1}, abs(g)) variable <- variable - learning_rate / (1 - beta1^t) * m_t / (v_t + epsilon)
      Type Parameters:
      T - data type for ApplyAdaMax output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      v - Should be from a Variable().
      beta1Power - Must be a scalar.
      lr - Scaling factor. Must be a scalar.
      beta1 - Momentum factor. Must be a scalar.
      beta2 - Momentum factor. Must be a scalar.
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyAdaMax
    • applyAdadelta

      public <T extends TType> ApplyAdadelta<T> applyAdadelta(Operand<T> var, Operand<T> accum, Operand<T> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, ApplyAdadelta.Options... options)
      Update '*var' according to the adadelta scheme. accum = rho() * accum + (1 - rho()) * grad.square(); update = (update_accum + epsilon).sqrt() * (accum + epsilon()).rsqrt() * grad; update_accum = rho() * update_accum + (1 - rho()) * update.square(); var -= update;
      Type Parameters:
      T - data type for ApplyAdadelta output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      accumUpdate - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay factor. Must be a scalar.
      epsilon - Constant factor. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyAdadelta
    • applyAdagrad

      public <T extends TType> ApplyAdagrad<T> applyAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, ApplyAdagrad.Options... options)
      Update '*var' according to the adagrad scheme. accum += grad * grad var -= lr * grad * (1 / sqrt(accum))
      Type Parameters:
      T - data type for ApplyAdagrad output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyAdagrad
    • applyAdagradDa

      public <T extends TType> ApplyAdagradDa<T> applyAdagradDa(Operand<T> var, Operand<T> gradientAccumulator, Operand<T> gradientSquaredAccumulator, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ApplyAdagradDa.Options... options)
      Update '*var' according to the proximal adagrad scheme.
      Type Parameters:
      T - data type for ApplyAdagradDA output and operands
      Parameters:
      var - Should be from a Variable().
      gradientAccumulator - Should be from a Variable().
      gradientSquaredAccumulator - Should be from a Variable().
      grad - The gradient.
      lr - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      globalStep - Training step number. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyAdagradDa
    • applyAdagradV2

      public <T extends TType> ApplyAdagradV2<T> applyAdagradV2(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, ApplyAdagradV2.Options... options)
      Update '*var' according to the adagrad scheme. accum += grad * grad var -= lr * grad * (1 / sqrt(accum))
      Type Parameters:
      T - data type for ApplyAdagradV2 output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      epsilon - Constant factor. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyAdagradV2
    • applyAdam

      public <T extends TType> ApplyAdam<T> applyAdam(Operand<T> var, Operand<T> m, Operand<T> v, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ApplyAdam.Options... options)
      Update '*var' according to the Adam algorithm. $$\text{lr}t := \mathrm{lr} \cdot \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t}$$ $$m_t := \beta_1 \cdot m{t-1} + (1 - \beta_1) \cdot g$$ $$v_t := \beta_2 \cdot v_{t-1} + (1 - \beta_2) \cdot g^2$$ $$\text{var} := \begin{cases} \text{var} - (m_t \beta_1 + g \cdot (1 - \beta_1))\cdot\text{lr}_t/(\sqrt{v_t} + \epsilon), &\text{if use_nesterov}\\ \text{var} - m_t \cdot \text{lr}_t /(\sqrt{v_t} + \epsilon), &\text{otherwise} \end{cases}$$
      Type Parameters:
      T - data type for ApplyAdam output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      v - Should be from a Variable().
      beta1Power - Must be a scalar.
      beta2Power - Must be a scalar.
      lr - Scaling factor. Must be a scalar.
      beta1 - Momentum factor. Must be a scalar.
      beta2 - Momentum factor. Must be a scalar.
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyAdam
    • applyAddSign

      public <T extends TType> ApplyAddSign<T> applyAddSign(Operand<T> var, Operand<T> m, Operand<T> lr, Operand<T> alpha, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ApplyAddSign.Options... options)
      Update '*var' according to the AddSign update. m_t <- beta1 * m_{t-1} + (1 - beta1) * g update <- (alpha + sign_decay * sign(g) *sign(m)) * g variable <- variable - lr_t * update
      Type Parameters:
      T - data type for ApplyAddSign output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      alpha - Must be a scalar.
      signDecay - Must be a scalar.
      beta - Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyAddSign
    • applyCenteredRmsProp

      public <T extends TType> ApplyCenteredRmsProp<T> applyCenteredRmsProp(Operand<T> var, Operand<T> mg, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ApplyCenteredRmsProp.Options... options)
      Update '*var' according to the centered RMSProp algorithm. The centered RMSProp algorithm uses an estimate of the centered second moment (i.e., the variance) for normalization, as opposed to regular RMSProp, which uses the (uncentered) second moment. This often helps with training, but is slightly more expensive in terms of computation and memory.

      Note that in dense implementation of this algorithm, mg, ms, and mom will update even if the grad is zero, but in this sparse implementation, mg, ms, and mom will not update in iterations during which the grad is zero.

      mean_square = decay * mean_square + (1-decay) * gradient ** 2 mean_grad = decay * mean_grad + (1-decay) * gradient

      Delta = learning_rate * gradient / sqrt(mean_square + epsilon - mean_grad ** 2)

      mg <- rho * mg_{t-1} + (1-rho) * grad ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms - mg * mg + epsilon) var <- var - mom

      Type Parameters:
      T - data type for ApplyCenteredRMSProp output and operands
      Parameters:
      var - Should be from a Variable().
      mg - Should be from a Variable().
      ms - Should be from a Variable().
      mom - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay rate. Must be a scalar.
      momentum - Momentum Scale. Must be a scalar.
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyCenteredRmsProp
    • applyFtrl

      public <T extends TType> ApplyFtrl<T> applyFtrl(Operand<T> var, Operand<T> accum, Operand<T> linear, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ApplyFtrl.Options... options)
      Update '*var' according to the Ftrl-proximal scheme. grad_with_shrinkage = grad + 2 * l2_shrinkage * var accum_new = accum + grad * grad linear += grad_with_shrinkage - (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0 accum = accum_new
      Type Parameters:
      T - data type for ApplyFtrlV2 output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      linear - Should be from a Variable().
      grad - The gradient.
      lr - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 shrinkage regularization. Must be a scalar.
      l2Shrinkage - The l2Shrinkage value
      lrPower - Scaling factor. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyFtrl
    • applyGradientDescent

      public <T extends TType> ApplyGradientDescent<T> applyGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> delta, ApplyGradientDescent.Options... options)
      Update '*var' by subtracting 'alpha' * 'delta' from it.
      Type Parameters:
      T - data type for ApplyGradientDescent output and operands
      Parameters:
      var - Should be from a Variable().
      alpha - Scaling factor. Must be a scalar.
      delta - The change.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyGradientDescent
    • applyMomentum

      public <T extends TType> ApplyMomentum<T> applyMomentum(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ApplyMomentum.Options... options)
      Update '*var' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.

      accum = accum * momentum + grad var -= lr * accum

      Type Parameters:
      T - data type for ApplyMomentum output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      grad - The gradient.
      momentum - Momentum. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyMomentum
    • applyPowerSign

      public <T extends TType> ApplyPowerSign<T> applyPowerSign(Operand<T> var, Operand<T> m, Operand<T> lr, Operand<T> logbase, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ApplyPowerSign.Options... options)
      Update '*var' according to the AddSign update. m_t <- beta1 * m_{t-1} + (1 - beta1) * g update <- exp(logbase * sign_decay * sign(g) * sign(m_t)) * g variable <- variable - lr_t * update
      Type Parameters:
      T - data type for ApplyPowerSign output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      logbase - Must be a scalar.
      signDecay - Must be a scalar.
      beta - Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyPowerSign
    • applyProximalAdagrad

      public <T extends TType> ApplyProximalAdagrad<T> applyProximalAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, ApplyProximalAdagrad.Options... options)
      Update '*var' and '*accum' according to FOBOS with Adagrad learning rate. accum += grad * grad prox_v = var - lr * grad * (1 / sqrt(accum)) var = sign(prox_v)/(1+lrl2) * max{|prox_v|-lrl1,0}
      Type Parameters:
      T - data type for ApplyProximalAdagrad output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyProximalAdagrad
    • applyProximalGradientDescent

      public <T extends TType> ApplyProximalGradientDescent<T> applyProximalGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> delta, ApplyProximalGradientDescent.Options... options)
      Update '*var' as FOBOS algorithm with fixed learning rate. prox_v = var - alpha * delta var = sign(prox_v)/(1+alphal2) * max{|prox_v|-alphal1,0}
      Type Parameters:
      T - data type for ApplyProximalGradientDescent output and operands
      Parameters:
      var - Should be from a Variable().
      alpha - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      delta - The change.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyProximalGradientDescent
    • applyRmsProp

      public <T extends TType> ApplyRmsProp<T> applyRmsProp(Operand<T> var, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ApplyRmsProp.Options... options)
      Update '*var' according to the RMSProp algorithm. Note that in dense implementation of this algorithm, ms and mom will update even if the grad is zero, but in this sparse implementation, ms and mom will not update in iterations during which the grad is zero.

      mean_square = decay * mean_square + (1-decay) * gradient ** 2 Delta = learning_rate * gradient / sqrt(mean_square + epsilon)

      ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon) var <- var - mom

      Type Parameters:
      T - data type for ApplyRMSProp output and operands
      Parameters:
      var - Should be from a Variable().
      ms - Should be from a Variable().
      mom - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay rate. Must be a scalar.
      momentum - The momentum value
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ApplyRmsProp
    • batchMatMul

      public <V extends TType> BatchMatMul<V> batchMatMul(Operand<? extends TType> x, Operand<? extends TType> y, Class<V> Tout, BatchMatMul.Options... options)
      Multiplies slices of two tensors in batches. Multiplies all slices of Tensor x and y (each slice can be viewed as an element of a batch), and arranges the individual results in a single output tensor of the same batch size. Each of the individual slices can optionally be adjointed (to adjoint a matrix means to transpose and conjugate it) before multiplication by setting the adj_x or adj_y flag to True, which are by default False.

      The input tensors x and y are 2-D or higher with shape [..., r_x, c_x] and [..., r_y, c_y].

      The output tensor is 2-D or higher with shape [..., r_o, c_o], where:

       r_o = c_x if adj_x else r_x
       c_o = r_y if adj_y else c_y
       

      It is computed as:

       output[..., :, :] = matrix(x[..., :, :]) * matrix(y[..., :, :])
       

      NOTE: train.BatchMatMul supports broadcasting in the batch dimensions. More about broadcasting here .

      Type Parameters:
      V - data type for BatchMatMulV3 output and operands
      Parameters:
      x - 2-D or higher with shape [..., r_x, c_x].
      y - 2-D or higher with shape [..., r_y, c_y].
      Tout - If not spcified, Tout is the same type to input type.
      options - carries optional attribute values
      Returns:
      a new instance of BatchMatMul
    • computeBatchSize

      public ComputeBatchSize computeBatchSize(Operand<? extends TType> inputDataset)
      Computes the static batch size of a dataset sans partial batches.
      Parameters:
      inputDataset - The inputDataset value
      Returns:
      a new instance of ComputeBatchSize
    • conditionalAccumulator

      public <T extends TType> ConditionalAccumulator conditionalAccumulator(Class<T> dtype, Shape shape, ConditionalAccumulator.Options... options)
      A conditional accumulator for aggregating gradients. The accumulator accepts gradients marked with local_step greater or equal to the most recent global_step known to the accumulator. The average can be extracted from the accumulator, provided sufficient gradients have been accumulated. Extracting the average automatically resets the aggregate to 0, and increments the global_step recorded by the accumulator.
      Type Parameters:
      T - data type for ConditionalAccumulator output and operands
      Parameters:
      dtype - The type of the value being accumulated.
      shape - The shape of the values, can be [], in which case shape is unknown.
      options - carries optional attribute values
      Returns:
      a new instance of ConditionalAccumulator
    • distributedSave

      public DistributedSave distributedSave(Operand<? extends TType> dataset, Operand<TString> directory, Operand<TString> address, DistributedSave.Options... options)
      The DistributedSave operation
      Parameters:
      dataset - The dataset value
      directory - The directory value
      address - The address value
      options - carries optional attribute values
      Returns:
      a new instance of DistributedSave
    • generateVocabRemapping

      public GenerateVocabRemapping generateVocabRemapping(Operand<TString> newVocabFile, Operand<TString> oldVocabFile, Long newVocabOffset, Long numNewVocab, GenerateVocabRemapping.Options... options)
      Given a path to new and old vocabulary files, returns a remapping Tensor of length num_new_vocab, where remapping[i] contains the row number in the old vocabulary that corresponds to row i in the new vocabulary (starting at line new_vocab_offset and up to num_new_vocab entities), or -1 if entry i in the new vocabulary is not in the old vocabulary. The old vocabulary is constrained to the first old_vocab_size entries if old_vocab_size is not the default value of -1.

      num_vocab_offset enables use in the partitioned variable case, and should generally be set through examining partitioning info. The format of the files should be a text file, with each line containing a single entity within the vocabulary.

      For example, with new_vocab_file a text file containing each of the following elements on a single line: [f0, f1, f2, f3], old_vocab_file = [f1, f0, f3], num_new_vocab = 3, new_vocab_offset = 1, the returned remapping would be [0, -1, 2].

      The op also returns a count of how many entries in the new vocabulary were present in the old vocabulary, which is used to calculate the number of values to initialize in a weight matrix remapping

      This functionality can be used to remap both row vocabularies (typically, features) and column vocabularies (typically, classes) from TensorFlow checkpoints. Note that the partitioning logic relies on contiguous vocabularies corresponding to div-partitioned variables. Moreover, the underlying remapping uses an IndexTable (as opposed to an inexact CuckooTable), so client code should use the corresponding index_table_from_file() as the FeatureColumn framework does (as opposed to tf.feature_to_id(), which uses a CuckooTable).

      Parameters:
      newVocabFile - Path to the new vocab file.
      oldVocabFile - Path to the old vocab file.
      newVocabOffset - How many entries into the new vocab file to start reading.
      numNewVocab - Number of entries in the new vocab file to remap.
      options - carries optional attribute values
      Returns:
      a new instance of GenerateVocabRemapping
    • mergeV2Checkpoints

      public MergeV2Checkpoints mergeV2Checkpoints(Operand<TString> checkpointPrefixes, Operand<TString> destinationPrefix, MergeV2Checkpoints.Options... options)
      V2 format specific: merges the metadata files of sharded checkpoints. The result is one logical checkpoint, with one physical metadata file and renamed data files.

      Intended for "grouping" multiple checkpoints in a sharded checkpoint setup.

      If delete_old_dirs is true, attempts to delete recursively the dirname of each path in the input checkpoint_prefixes. This is useful when those paths are non user-facing temporary locations.

      If allow_missing_files is true, merges the checkpoint prefixes as long as at least one file exists. Otherwise, if no files exist, an error will be thrown. The default value for allow_missing_files is false.

      Parameters:
      checkpointPrefixes - prefixes of V2 checkpoints to merge.
      destinationPrefix - scalar. The desired final prefix. Allowed to be the same as one of the checkpoint_prefixes.
      options - carries optional attribute values
      Returns:
      a new instance of MergeV2Checkpoints
    • negTrain

      public NegTrain negTrain(Operand<TFloat32> wIn, Operand<TFloat32> wOut, Operand<TInt32> examples, Operand<TInt32> labels, Operand<TFloat32> lr, List<Long> vocabCount, Long numNegativeSamples)
      Training via negative sampling.
      Parameters:
      wIn - input word embedding.
      wOut - output word embedding.
      examples - A vector of word ids.
      labels - A vector of word ids.
      lr - The lr value
      vocabCount - Count of words in the vocabulary.
      numNegativeSamples - Number of negative samples per example.
      Returns:
      a new instance of NegTrain
    • preventGradient

      public <T extends TType> PreventGradient<T> preventGradient(Operand<T> input, PreventGradient.Options... options)
      An identity op that triggers an error if a gradient is requested. When executed in a graph, this op outputs its input tensor as-is.

      When building ops to compute gradients, the TensorFlow gradient system will return an error when trying to lookup the gradient of this op, because no gradient must ever be registered for this function. This op exists to prevent subtle bugs from silently returning unimplemented gradients in some corner cases.

      Type Parameters:
      T - data type for PreventGradient output and operands
      Parameters:
      input - any tensor.
      options - carries optional attribute values
      Returns:
      a new instance of PreventGradient
    • resourceAccumulatorApplyGradient

      public ResourceAccumulatorApplyGradient resourceAccumulatorApplyGradient(Operand<? extends TType> handle, Operand<TInt64> localStep, Operand<? extends TType> gradient)
      Applies a gradient to a given accumulator. Does not add if local_step is lesser than the accumulator's global_step.
      Parameters:
      handle - The handle to a accumulator.
      localStep - The local_step value at which the gradient was computed.
      gradient - A tensor of the gradient to be accumulated.
      Returns:
      a new instance of ResourceAccumulatorApplyGradient
    • resourceAccumulatorNumAccumulated

      public ResourceAccumulatorNumAccumulated resourceAccumulatorNumAccumulated(Operand<? extends TType> handle)
      Returns the number of gradients aggregated in the given accumulators.
      Parameters:
      handle - The handle to an accumulator.
      Returns:
      a new instance of ResourceAccumulatorNumAccumulated
    • resourceAccumulatorSetGlobalStep

      public ResourceAccumulatorSetGlobalStep resourceAccumulatorSetGlobalStep(Operand<? extends TType> handle, Operand<TInt64> newGlobalStep)
      Updates the accumulator with a new value for global_step. Logs warning if the accumulator's value is already higher than new_global_step.
      Parameters:
      handle - The handle to an accumulator.
      newGlobalStep - The new global_step value to set.
      Returns:
      a new instance of ResourceAccumulatorSetGlobalStep
    • resourceAccumulatorTakeGradient

      public <T extends TType> ResourceAccumulatorTakeGradient<T> resourceAccumulatorTakeGradient(Operand<? extends TType> handle, Operand<TInt32> numRequired, Class<T> dtype)
      Extracts the average gradient in the given ConditionalAccumulator. The op blocks until sufficient (i.e., more than num_required) gradients have been accumulated. If the accumulator has already aggregated more than num_required gradients, it returns the average of the accumulated gradients. Also automatically increments the recorded global_step in the accumulator by 1, and resets the aggregate to 0.
      Type Parameters:
      T - data type for ResourceAccumulatorTakeGradient output and operands
      Parameters:
      handle - The handle to an accumulator.
      numRequired - Number of gradients required before we return an aggregate.
      dtype - The data type of accumulated gradients. Needs to correspond to the type of the accumulator.
      Returns:
      a new instance of ResourceAccumulatorTakeGradient
    • resourceApplyAdaMax

      public <T extends TType> ResourceApplyAdaMax resourceApplyAdaMax(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<T> beta1Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdaMax.Options... options)
      Update '*var' according to the AdaMax algorithm. m_t <- beta1 * m_{t-1} + (1 - beta1) * g v_t <- max(beta2 * v_{t-1}, abs(g)) variable <- variable - learning_rate / (1 - beta1^t) * m_t / (v_t + epsilon)
      Type Parameters:
      T - data type for ResourceApplyAdaMax output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      v - Should be from a Variable().
      beta1Power - Must be a scalar.
      lr - Scaling factor. Must be a scalar.
      beta1 - Momentum factor. Must be a scalar.
      beta2 - Momentum factor. Must be a scalar.
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyAdaMax
    • resourceApplyAdadelta

      public <T extends TType> ResourceApplyAdadelta resourceApplyAdadelta(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdadelta.Options... options)
      Update '*var' according to the adadelta scheme. accum = rho() * accum + (1 - rho()) * grad.square(); update = (update_accum + epsilon).sqrt() * (accum + epsilon()).rsqrt() * grad; update_accum = rho() * update_accum + (1 - rho()) * update.square(); var -= update;
      Type Parameters:
      T - data type for ResourceApplyAdadelta output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      accumUpdate - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay factor. Must be a scalar.
      epsilon - Constant factor. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyAdadelta
    • resourceApplyAdagrad

      public <T extends TType> ResourceApplyAdagrad resourceApplyAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdagrad.Options... options)
      Update '*var' according to the adagrad scheme. accum += grad * grad var -= lr * grad * (1 / (sqrt(accum) + epsilon))
      Type Parameters:
      T - data type for ResourceApplyAdagradV2 output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      epsilon - Constant factor. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyAdagrad
    • resourceApplyAdagradDa

      public <T extends TType> ResourceApplyAdagradDa resourceApplyAdagradDa(Operand<? extends TType> var, Operand<? extends TType> gradientAccumulator, Operand<? extends TType> gradientSquaredAccumulator, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ResourceApplyAdagradDa.Options... options)
      Update '*var' according to the proximal adagrad scheme.
      Type Parameters:
      T - data type for ResourceApplyAdagradDA output and operands
      Parameters:
      var - Should be from a Variable().
      gradientAccumulator - Should be from a Variable().
      gradientSquaredAccumulator - Should be from a Variable().
      grad - The gradient.
      lr - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      globalStep - Training step number. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyAdagradDa
    • resourceApplyAdam

      public <T extends TType> ResourceApplyAdam resourceApplyAdam(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdam.Options... options)
      Update '*var' according to the Adam algorithm. $$\text{lr}t := \mathrm{lr} \cdot \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t}$$ $$m_t := \beta_1 \cdot m{t-1} + (1 - \beta_1) \cdot g$$ $$v_t := \beta_2 \cdot v_{t-1} + (1 - \beta_2) \cdot g^2$$ $$\text{var} := \begin{cases} \text{var} - (m_t \beta_1 + g \cdot (1 - \beta_1))\cdot\text{lr}_t/(\sqrt{v_t} + \epsilon), &\text{if use_nesterov}\\ \text{var} - m_t \cdot \text{lr}_t /(\sqrt{v_t} + \epsilon), &\text{otherwise} \end{cases}$$
      Type Parameters:
      T - data type for ResourceApplyAdam output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      v - Should be from a Variable().
      beta1Power - Must be a scalar.
      beta2Power - Must be a scalar.
      lr - Scaling factor. Must be a scalar.
      beta1 - Momentum factor. Must be a scalar.
      beta2 - Momentum factor. Must be a scalar.
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyAdam
    • resourceApplyAdamWithAmsgrad

      public <T extends TType> ResourceApplyAdamWithAmsgrad resourceApplyAdamWithAmsgrad(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<? extends TType> vhat, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdamWithAmsgrad.Options... options)
      Update '*var' according to the Adam algorithm. $$\text{lr}t := \mathrm{learning_rate} * \sqrt{1 - \beta_2^t} / (1 - \beta_1^t)$$ $$m_t := \beta_1 * m{t-1} + (1 - \beta_1) * g$$ $$v_t := \beta_2 * v_{t-1} + (1 - \beta_2) * g * g$$ $$\hat{v}t := max{\hat{v}{t-1}, v_t}$$ $$\text{variable} := \text{variable} - \text{lr}_t * m_t / (\sqrt{\hat{v}_t} + \epsilon)$$
      Type Parameters:
      T - data type for ResourceApplyAdamWithAmsgrad output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      v - Should be from a Variable().
      vhat - Should be from a Variable().
      beta1Power - Must be a scalar.
      beta2Power - Must be a scalar.
      lr - Scaling factor. Must be a scalar.
      beta1 - Momentum factor. Must be a scalar.
      beta2 - Momentum factor. Must be a scalar.
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyAdamWithAmsgrad
    • resourceApplyAddSign

      public <T extends TType> ResourceApplyAddSign resourceApplyAddSign(Operand<? extends TType> var, Operand<? extends TType> m, Operand<T> lr, Operand<T> alpha, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ResourceApplyAddSign.Options... options)
      Update '*var' according to the AddSign update. m_t <- beta1 * m_{t-1} + (1 - beta1) * g update <- (alpha + sign_decay * sign(g) *sign(m)) * g variable <- variable - lr_t * update
      Type Parameters:
      T - data type for ResourceApplyAddSign output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      alpha - Must be a scalar.
      signDecay - Must be a scalar.
      beta - Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyAddSign
    • resourceApplyCenteredRmsProp

      public <T extends TType> ResourceApplyCenteredRmsProp resourceApplyCenteredRmsProp(Operand<? extends TType> var, Operand<? extends TType> mg, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ResourceApplyCenteredRmsProp.Options... options)
      Update '*var' according to the centered RMSProp algorithm. The centered RMSProp algorithm uses an estimate of the centered second moment (i.e., the variance) for normalization, as opposed to regular RMSProp, which uses the (uncentered) second moment. This often helps with training, but is slightly more expensive in terms of computation and memory.

      Note that in dense implementation of this algorithm, mg, ms, and mom will update even if the grad is zero, but in this sparse implementation, mg, ms, and mom will not update in iterations during which the grad is zero.

      mean_square = decay * mean_square + (1-decay) * gradient ** 2 mean_grad = decay * mean_grad + (1-decay) * gradient

      Delta = learning_rate * gradient / sqrt(mean_square + epsilon - mean_grad ** 2)

      mg <- rho * mg_{t-1} + (1-rho) * grad ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms - mg * mg + epsilon) var <- var - mom

      Type Parameters:
      T - data type for ResourceApplyCenteredRMSProp output and operands
      Parameters:
      var - Should be from a Variable().
      mg - Should be from a Variable().
      ms - Should be from a Variable().
      mom - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay rate. Must be a scalar.
      momentum - Momentum Scale. Must be a scalar.
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyCenteredRmsProp
    • resourceApplyFtrl

      public <T extends TType> ResourceApplyFtrl resourceApplyFtrl(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> linear, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ResourceApplyFtrl.Options... options)
      Update '*var' according to the Ftrl-proximal scheme. accum_new = accum + grad * grad grad_with_shrinkage = grad + 2 * l2_shrinkage * var linear += grad_with_shrinkage + (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0 accum = accum_new
      Type Parameters:
      T - data type for ResourceApplyFtrlV2 output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      linear - Should be from a Variable().
      grad - The gradient.
      lr - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 shrinkage regularization. Must be a scalar.
      l2Shrinkage - The l2Shrinkage value
      lrPower - Scaling factor. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyFtrl
    • resourceApplyGradientDescent

      public <T extends TType> ResourceApplyGradientDescent resourceApplyGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> delta, ResourceApplyGradientDescent.Options... options)
      Update '*var' by subtracting 'alpha' * 'delta' from it.
      Type Parameters:
      T - data type for ResourceApplyGradientDescent output and operands
      Parameters:
      var - Should be from a Variable().
      alpha - Scaling factor. Must be a scalar.
      delta - The change.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyGradientDescent
    • resourceApplyKerasMomentum

      public <T extends TType> ResourceApplyKerasMomentum resourceApplyKerasMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ResourceApplyKerasMomentum.Options... options)
      Update '*var' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.

      accum = accum * momentum - lr * grad var += accum

      Type Parameters:
      T - data type for ResourceApplyKerasMomentum output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      grad - The gradient.
      momentum - Momentum. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyKerasMomentum
    • resourceApplyMomentum

      public <T extends TType> ResourceApplyMomentum resourceApplyMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ResourceApplyMomentum.Options... options)
      Update '*var' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.

      accum = accum * momentum + grad var -= lr * accum

      Type Parameters:
      T - data type for ResourceApplyMomentum output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      grad - The gradient.
      momentum - Momentum. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyMomentum
    • resourceApplyPowerSign

      public <T extends TType> ResourceApplyPowerSign resourceApplyPowerSign(Operand<? extends TType> var, Operand<? extends TType> m, Operand<T> lr, Operand<T> logbase, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ResourceApplyPowerSign.Options... options)
      Update '*var' according to the AddSign update. m_t <- beta1 * m_{t-1} + (1 - beta1) * g update <- exp(logbase * sign_decay * sign(g) * sign(m_t)) * g variable <- variable - lr_t * update
      Type Parameters:
      T - data type for ResourceApplyPowerSign output and operands
      Parameters:
      var - Should be from a Variable().
      m - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      logbase - Must be a scalar.
      signDecay - Must be a scalar.
      beta - Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyPowerSign
    • resourceApplyProximalAdagrad

      public <T extends TType> ResourceApplyProximalAdagrad resourceApplyProximalAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, ResourceApplyProximalAdagrad.Options... options)
      Update '*var' and '*accum' according to FOBOS with Adagrad learning rate. accum += grad * grad prox_v = var - lr * grad * (1 / sqrt(accum)) var = sign(prox_v)/(1+lrl2) * max{|prox_v|-lrl1,0}
      Type Parameters:
      T - data type for ResourceApplyProximalAdagrad output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyProximalAdagrad
    • resourceApplyProximalGradientDescent

      public <T extends TType> ResourceApplyProximalGradientDescent resourceApplyProximalGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> delta, ResourceApplyProximalGradientDescent.Options... options)
      Update '*var' as FOBOS algorithm with fixed learning rate. prox_v = var - alpha * delta var = sign(prox_v)/(1+alphal2) * max{|prox_v|-alphal1,0}
      Type Parameters:
      T - data type for ResourceApplyProximalGradientDescent output and operands
      Parameters:
      var - Should be from a Variable().
      alpha - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      delta - The change.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyProximalGradientDescent
    • resourceApplyRmsProp

      public <T extends TType> ResourceApplyRmsProp resourceApplyRmsProp(Operand<? extends TType> var, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ResourceApplyRmsProp.Options... options)
      Update '*var' according to the RMSProp algorithm. Note that in dense implementation of this algorithm, ms and mom will update even if the grad is zero, but in this sparse implementation, ms and mom will not update in iterations during which the grad is zero.

      mean_square = decay * mean_square + (1-decay) * gradient ** 2 Delta = learning_rate * gradient / sqrt(mean_square + epsilon)

      ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon) var <- var - mom

      Type Parameters:
      T - data type for ResourceApplyRMSProp output and operands
      Parameters:
      var - Should be from a Variable().
      ms - Should be from a Variable().
      mom - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay rate. Must be a scalar.
      momentum - The momentum value
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceApplyRmsProp
    • resourceConditionalAccumulator

      public <T extends TType> ResourceConditionalAccumulator resourceConditionalAccumulator(Class<T> dtype, Shape shape, ResourceConditionalAccumulator.Options... options)
      A conditional accumulator for aggregating gradients. The accumulator accepts gradients marked with local_step greater or equal to the most recent global_step known to the accumulator. The average can be extracted from the accumulator, provided sufficient gradients have been accumulated. Extracting the average automatically resets the aggregate to 0, and increments the global_step recorded by the accumulator. This is a resource version of ConditionalAccumulator that will work in TF2.0 with tf.cond version 2.
      Type Parameters:
      T - data type for ResourceConditionalAccumulator output and operands
      Parameters:
      dtype - The type of the value being accumulated.
      shape - The shape of the values, can be [], in which case shape is unknown.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceConditionalAccumulator
    • resourceSparseApplyAdadelta

      public <T extends TType> ResourceSparseApplyAdadelta resourceSparseApplyAdadelta(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdadelta.Options... options)
      var: Should be from a Variable().
      Type Parameters:
      T - data type for ResourceSparseApplyAdadelta output and operands
      Parameters:
      var - The var value
      accum - Should be from a Variable().
      accumUpdate - : Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      rho - Decay factor. Must be a scalar.
      epsilon - Constant factor. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyAdadelta
    • resourceSparseApplyAdagrad

      public <T extends TType> ResourceSparseApplyAdagrad resourceSparseApplyAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdagrad.Options... options)
      Update relevant entries in '*var' and '*accum' according to the adagrad scheme. That is for rows we have grad for, we update var and accum as follows: accum += grad * grad var -= lr * grad * (1 / sqrt(accum))
      Type Parameters:
      T - data type for ResourceSparseApplyAdagrad output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyAdagrad
    • resourceSparseApplyAdagradDa

      public <T extends TType> ResourceSparseApplyAdagradDa resourceSparseApplyAdagradDa(Operand<? extends TType> var, Operand<? extends TType> gradientAccumulator, Operand<? extends TType> gradientSquaredAccumulator, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ResourceSparseApplyAdagradDa.Options... options)
      Update entries in '*var' and '*accum' according to the proximal adagrad scheme.
      Type Parameters:
      T - data type for ResourceSparseApplyAdagradDA output and operands
      Parameters:
      var - Should be from a Variable().
      gradientAccumulator - Should be from a Variable().
      gradientSquaredAccumulator - Should be from a Variable().
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      lr - Learning rate. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      globalStep - Training step number. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyAdagradDa
    • resourceSparseApplyAdagradV2

      public <T extends TType> ResourceSparseApplyAdagradV2 resourceSparseApplyAdagradV2(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdagradV2.Options... options)
      Update relevant entries in '*var' and '*accum' according to the adagrad scheme. That is for rows we have grad for, we update var and accum as follows: accum += grad * grad var -= lr * grad * (1 / sqrt(accum))
      Type Parameters:
      T - data type for ResourceSparseApplyAdagradV2 output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      epsilon - Constant factor. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyAdagradV2
    • resourceSparseApplyCenteredRmsProp

      public <T extends TType> ResourceSparseApplyCenteredRmsProp resourceSparseApplyCenteredRmsProp(Operand<? extends TType> var, Operand<? extends TType> mg, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyCenteredRmsProp.Options... options)
      Update '*var' according to the centered RMSProp algorithm. The centered RMSProp algorithm uses an estimate of the centered second moment (i.e., the variance) for normalization, as opposed to regular RMSProp, which uses the (uncentered) second moment. This often helps with training, but is slightly more expensive in terms of computation and memory.

      Note that in dense implementation of this algorithm, mg, ms, and mom will update even if the grad is zero, but in this sparse implementation, mg, ms, and mom will not update in iterations during which the grad is zero.

      mean_square = decay * mean_square + (1-decay) * gradient ** 2 mean_grad = decay * mean_grad + (1-decay) * gradient Delta = learning_rate * gradient / sqrt(mean_square + epsilon - mean_grad ** 2)

      ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon) var <- var - mom

      Type Parameters:
      T - data type for ResourceSparseApplyCenteredRMSProp output and operands
      Parameters:
      var - Should be from a Variable().
      mg - Should be from a Variable().
      ms - Should be from a Variable().
      mom - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay rate. Must be a scalar.
      momentum - The momentum value
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var, ms and mom.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyCenteredRmsProp
    • resourceSparseApplyFtrl

      public <T extends TType> ResourceSparseApplyFtrl resourceSparseApplyFtrl(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> linear, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ResourceSparseApplyFtrl.Options... options)
      Update relevant entries in '*var' according to the Ftrl-proximal scheme. That is for rows we have grad for, we update var, accum and linear as follows: grad_with_shrinkage = grad + 2 * l2_shrinkage * var accum_new = accum + grad_with_shrinkage * grad_with_shrinkage linear += grad_with_shrinkage + (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0 accum = accum_new
      Type Parameters:
      T - data type for ResourceSparseApplyFtrlV2 output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      linear - Should be from a Variable().
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      lr - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 shrinkage regularization. Must be a scalar.
      l2Shrinkage - The l2Shrinkage value
      lrPower - Scaling factor. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyFtrl
    • resourceSparseApplyKerasMomentum

      public <T extends TType> ResourceSparseApplyKerasMomentum resourceSparseApplyKerasMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, ResourceSparseApplyKerasMomentum.Options... options)
      Update relevant entries in '*var' and '*accum' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.

      That is for rows we have grad for, we update var and accum as follows:

      accum = accum * momentum - lr * grad var += accum

      Type Parameters:
      T - data type for ResourceSparseApplyKerasMomentum output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      momentum - Momentum. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyKerasMomentum
    • resourceSparseApplyMomentum

      public <T extends TType> ResourceSparseApplyMomentum resourceSparseApplyMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, ResourceSparseApplyMomentum.Options... options)
      Update relevant entries in '*var' and '*accum' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.

      That is for rows we have grad for, we update var and accum as follows:

      accum = accum * momentum + grad var -= lr * accum

      Type Parameters:
      T - data type for ResourceSparseApplyMomentum output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      momentum - Momentum. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyMomentum
    • resourceSparseApplyProximalAdagrad

      public <T extends TType> ResourceSparseApplyProximalAdagrad resourceSparseApplyProximalAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyProximalAdagrad.Options... options)
      Sparse update entries in '*var' and '*accum' according to FOBOS algorithm. That is for rows we have grad for, we update var and accum as follows: accum += grad * grad prox_v = var prox_v -= lr * grad * (1 / sqrt(accum)) var = sign(prox_v)/(1+lrl2) * max{|prox_v|-lrl1,0}
      Type Parameters:
      T - data type for ResourceSparseApplyProximalAdagrad output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyProximalAdagrad
    • resourceSparseApplyProximalGradientDescent

      public <T extends TType> ResourceSparseApplyProximalGradientDescent resourceSparseApplyProximalGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyProximalGradientDescent.Options... options)
      Sparse update '*var' as FOBOS algorithm with fixed learning rate. That is for rows we have grad for, we update var as follows: prox_v = var - alpha * grad var = sign(prox_v)/(1+alphal2) * max{|prox_v|-alphal1,0}
      Type Parameters:
      T - data type for ResourceSparseApplyProximalGradientDescent output and operands
      Parameters:
      var - Should be from a Variable().
      alpha - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyProximalGradientDescent
    • resourceSparseApplyRmsProp

      public <T extends TType> ResourceSparseApplyRmsProp resourceSparseApplyRmsProp(Operand<? extends TType> var, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyRmsProp.Options... options)
      Update '*var' according to the RMSProp algorithm. Note that in dense implementation of this algorithm, ms and mom will update even if the grad is zero, but in this sparse implementation, ms and mom will not update in iterations during which the grad is zero.

      mean_square = decay * mean_square + (1-decay) * gradient ** 2 Delta = learning_rate * gradient / sqrt(mean_square + epsilon)

      ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon) var <- var - mom

      Type Parameters:
      T - data type for ResourceSparseApplyRMSProp output and operands
      Parameters:
      var - Should be from a Variable().
      ms - Should be from a Variable().
      mom - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay rate. Must be a scalar.
      momentum - The momentum value
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var, ms and mom.
      options - carries optional attribute values
      Returns:
      a new instance of ResourceSparseApplyRmsProp
    • restore

      public Restore restore(Operand<TString> prefix, Operand<TString> tensorNames, Operand<TString> shapeAndSlices, List<Class<? extends TType>> dtypes)
      Restores tensors from a V2 checkpoint. For backward compatibility with the V1 format, this Op currently allows restoring from a V1 checkpoint as well:
      • This Op first attempts to find the V2 index file pointed to by "prefix", and if found proceed to read it as a V2 checkpoint;
      • Otherwise the V1 read path is invoked. Relying on this behavior is not recommended, as the ability to fall back to read V1 might be deprecated and eventually removed.

      By default, restores the named tensors in full. If the caller wishes to restore specific slices of stored tensors, "shape_and_slices" should be non-empty strings and correspondingly well-formed.

      Callers must ensure all the named tensors are indeed stored in the checkpoint.

      Parameters:
      prefix - Must have a single element. The prefix of a V2 checkpoint.
      tensorNames - shape {N}. The names of the tensors to be restored.
      shapeAndSlices - shape {N}. The slice specs of the tensors to be restored. Empty strings indicate that they are non-partitioned tensors.
      dtypes - shape {N}. The list of expected dtype for the tensors. Must match those stored in the checkpoint.
      Returns:
      a new instance of Restore
    • restoreSlice

      public <T extends TType> RestoreSlice<T> restoreSlice(Operand<TString> filePattern, Operand<TString> tensorName, Operand<TString> shapeAndSlice, Class<T> dt, RestoreSlice.Options... options)
      Restores a tensor from checkpoint files. This is like Restore except that restored tensor can be listed as filling only a slice of a larger tensor. shape_and_slice specifies the shape of the larger tensor and the slice that the restored tensor covers.

      The shape_and_slice input has the same format as the elements of the shapes_and_slices input of the SaveSlices op.

      Type Parameters:
      T - data type for RestoreSlice output and operands
      Parameters:
      filePattern - Must have a single element. The pattern of the files from which we read the tensor.
      tensorName - Must have a single element. The name of the tensor to be restored.
      shapeAndSlice - Scalar. The shapes and slice specifications to use when restoring a tensors.
      dt - The type of the tensor to be restored.
      options - carries optional attribute values
      Returns:
      a new instance of RestoreSlice
    • save

      public Save save(Operand<TString> prefix, Operand<TString> tensorNames, Operand<TString> shapeAndSlices, Iterable<Operand<?>> tensors)
      Saves tensors in V2 checkpoint format. By default, saves the named tensors in full. If the caller wishes to save specific slices of full tensors, "shape_and_slices" should be non-empty strings and correspondingly well-formed.
      Parameters:
      prefix - Must have a single element. The prefix of the V2 checkpoint to which we write the tensors.
      tensorNames - shape {N}. The names of the tensors to be saved.
      shapeAndSlices - shape {N}. The slice specs of the tensors to be saved. Empty strings indicate that they are non-partitioned tensors.
      tensors - N tensors to save.
      Returns:
      a new instance of Save
    • saveSlices

      public SaveSlices saveSlices(Operand<TString> filename, Operand<TString> tensorNames, Operand<TString> shapesAndSlices, Iterable<Operand<?>> data)
      Saves input tensors slices to disk. This is like Save except that tensors can be listed in the saved file as being a slice of a larger tensor. shapes_and_slices specifies the shape of the larger tensor and the slice that this tensor covers. shapes_and_slices must have as many elements as tensor_names.

      Elements of the shapes_and_slices input must either be:

      • The empty string, in which case the corresponding tensor is saved normally.
      • A string of the form dim0 dim1 ... dimN-1 slice-spec where the dimI are the dimensions of the larger tensor and slice-spec specifies what part is covered by the tensor to save.

      slice-spec itself is a :-separated list: slice0:slice1:...:sliceN-1 where each sliceI is either:

      • The string - meaning that the slice covers all indices of this dimension
      • start,length where start and length are integers. In that case the slice covers length indices starting at start.

      See also Save.

      Parameters:
      filename - Must have a single element. The name of the file to which we write the tensor.
      tensorNames - Shape [N]. The names of the tensors to be saved.
      shapesAndSlices - Shape [N]. The shapes and slice specifications to use when saving the tensors.
      data - N tensors to save.
      Returns:
      a new instance of SaveSlices
    • sdcaFprint

      public SdcaFprint sdcaFprint(Operand<TString> input)
      Computes fingerprints of the input strings.
      Parameters:
      input - vector of strings to compute fingerprints on.
      Returns:
      a new instance of SdcaFprint
    • sdcaOptimizer

      public SdcaOptimizer sdcaOptimizer(Iterable<Operand<TInt64>> sparseExampleIndices, Iterable<Operand<TInt64>> sparseFeatureIndices, Iterable<Operand<TFloat32>> sparseFeatureValues, Iterable<Operand<TFloat32>> denseFeatures, Operand<TFloat32> exampleWeights, Operand<TFloat32> exampleLabels, Iterable<Operand<TInt64>> sparseIndices, Iterable<Operand<TFloat32>> sparseWeights, Iterable<Operand<TFloat32>> denseWeights, Operand<TFloat32> exampleStateData, String lossType, Float l1, Float l2, Long numLossPartitions, Long numInnerIterations, SdcaOptimizer.Options... options)
      Distributed version of Stochastic Dual Coordinate Ascent (SDCA) optimizer for linear models with L1 + L2 regularization. As global optimization objective is strongly-convex, the optimizer optimizes the dual objective at each step. The optimizer applies each update one example at a time. Examples are sampled uniformly, and the optimizer is learning rate free and enjoys linear convergence rate.

      Proximal Stochastic Dual Coordinate Ascent .
      Shai Shalev-Shwartz, Tong Zhang. 2012

      $$Loss Objective = \sum f_{i} (wx_{i}) + (l2 / 2) * |w|^2 + l1 * |w|$$

      Adding vs. Averaging in Distributed Primal-Dual Optimization .
      Chenxin Ma, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtarik, Martin Takac. 2015

      Stochastic Dual Coordinate Ascent with Adaptive Probabilities .
      Dominik Csiba, Zheng Qu, Peter Richtarik. 2015

      Parameters:
      sparseExampleIndices - a list of vectors which contain example indices.
      sparseFeatureIndices - a list of vectors which contain feature indices.
      sparseFeatureValues - a list of vectors which contains feature value associated with each feature group.
      denseFeatures - a list of matrices which contains the dense feature values.
      exampleWeights - a vector which contains the weight associated with each example.
      exampleLabels - a vector which contains the label/target associated with each example.
      sparseIndices - a list of vectors where each value is the indices which has corresponding weights in sparse_weights. This field maybe omitted for the dense approach.
      sparseWeights - a list of vectors where each value is the weight associated with a sparse feature group.
      denseWeights - a list of vectors where the values are the weights associated with a dense feature group.
      exampleStateData - a list of vectors containing the example state data.
      lossType - Type of the primal loss. Currently SdcaSolver supports logistic, squared and hinge losses.
      l1 - Symmetric l1 regularization strength.
      l2 - Symmetric l2 regularization strength.
      numLossPartitions - Number of partitions of the global loss function.
      numInnerIterations - Number of iterations per mini-batch.
      options - carries optional attribute values
      Returns:
      a new instance of SdcaOptimizer
    • sdcaShrinkL1

      public SdcaShrinkL1 sdcaShrinkL1(Iterable<Operand<TFloat32>> weights, Float l1, Float l2)
      Applies L1 regularization shrink step on the parameters.
      Parameters:
      weights - a list of vectors where each value is the weight associated with a feature group.
      l1 - Symmetric l1 regularization strength.
      l2 - Symmetric l2 regularization strength. Should be a positive float.
      Returns:
      a new instance of SdcaShrinkL1
    • sparseApplyAdadelta

      public <T extends TType> SparseApplyAdadelta<T> sparseApplyAdadelta(Operand<T> var, Operand<T> accum, Operand<T> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyAdadelta.Options... options)
      var: Should be from a Variable().
      Type Parameters:
      T - data type for SparseApplyAdadelta output and operands
      Parameters:
      var - The var value
      accum - Should be from a Variable().
      accumUpdate - : Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      rho - Decay factor. Must be a scalar.
      epsilon - Constant factor. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyAdadelta
    • sparseApplyAdagrad

      public <T extends TType> SparseApplyAdagrad<T> sparseApplyAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyAdagrad.Options... options)
      Update relevant entries in '*var' and '*accum' according to the adagrad scheme. That is for rows we have grad for, we update var and accum as follows: $$accum += grad * grad$$ $$var -= lr * grad * (1 / sqrt(accum))$$
      Type Parameters:
      T - data type for SparseApplyAdagradV2 output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      epsilon - Constant factor. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyAdagrad
    • sparseApplyAdagradDa

      public <T extends TType> SparseApplyAdagradDa<T> sparseApplyAdagradDa(Operand<T> var, Operand<T> gradientAccumulator, Operand<T> gradientSquaredAccumulator, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, SparseApplyAdagradDa.Options... options)
      Update entries in '*var' and '*accum' according to the proximal adagrad scheme.
      Type Parameters:
      T - data type for SparseApplyAdagradDA output and operands
      Parameters:
      var - Should be from a Variable().
      gradientAccumulator - Should be from a Variable().
      gradientSquaredAccumulator - Should be from a Variable().
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      lr - Learning rate. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      globalStep - Training step number. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyAdagradDa
    • sparseApplyCenteredRmsProp

      public <T extends TType> SparseApplyCenteredRmsProp<T> sparseApplyCenteredRmsProp(Operand<T> var, Operand<T> mg, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyCenteredRmsProp.Options... options)
      Update '*var' according to the centered RMSProp algorithm. The centered RMSProp algorithm uses an estimate of the centered second moment (i.e., the variance) for normalization, as opposed to regular RMSProp, which uses the (uncentered) second moment. This often helps with training, but is slightly more expensive in terms of computation and memory.

      Note that in dense implementation of this algorithm, mg, ms, and mom will update even if the grad is zero, but in this sparse implementation, mg, ms, and mom will not update in iterations during which the grad is zero.

      mean_square = decay * mean_square + (1-decay) * gradient ** 2 mean_grad = decay * mean_grad + (1-decay) * gradient Delta = learning_rate * gradient / sqrt(mean_square + epsilon - mean_grad ** 2)

      $$ms <- rho * ms_{t-1} + (1-rho) * grad * grad$$ $$mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon)$$ $$var <- var - mom$$

      Type Parameters:
      T - data type for SparseApplyCenteredRMSProp output and operands
      Parameters:
      var - Should be from a Variable().
      mg - Should be from a Variable().
      ms - Should be from a Variable().
      mom - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay rate. Must be a scalar.
      momentum - The momentum value
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var, ms and mom.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyCenteredRmsProp
    • sparseApplyFtrl

      public <T extends TType> SparseApplyFtrl<T> sparseApplyFtrl(Operand<T> var, Operand<T> accum, Operand<T> linear, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, SparseApplyFtrl.Options... options)
      Update relevant entries in '*var' according to the Ftrl-proximal scheme. That is for rows we have grad for, we update var, accum and linear as follows: grad_with_shrinkage = grad + 2 * l2_shrinkage * var accum_new = accum + grad * grad linear += grad_with_shrinkage - (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0 accum = accum_new
      Type Parameters:
      T - data type for SparseApplyFtrlV2 output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      linear - Should be from a Variable().
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      lr - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 shrinkage regularization. Must be a scalar.
      l2Shrinkage - The l2Shrinkage value
      lrPower - Scaling factor. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyFtrl
    • sparseApplyMomentum

      public <T extends TType> SparseApplyMomentum<T> sparseApplyMomentum(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, SparseApplyMomentum.Options... options)
      Update relevant entries in '*var' and '*accum' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.

      That is for rows we have grad for, we update var and accum as follows:

      $$accum = accum * momentum + grad$$ $$var -= lr * accum$$

      Type Parameters:
      T - data type for SparseApplyMomentum output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      momentum - Momentum. Must be a scalar.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyMomentum
    • sparseApplyProximalAdagrad

      public <T extends TType> SparseApplyProximalAdagrad<T> sparseApplyProximalAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyProximalAdagrad.Options... options)
      Sparse update entries in '*var' and '*accum' according to FOBOS algorithm. That is for rows we have grad for, we update var and accum as follows: $$accum += grad * grad$$ $$prox_v = var$$ $$prox_v -= lr * grad * (1 / sqrt(accum))$$ $$var = sign(prox_v)/(1+lrl2) * max{|prox_v|-lrl1,0}$$
      Type Parameters:
      T - data type for SparseApplyProximalAdagrad output and operands
      Parameters:
      var - Should be from a Variable().
      accum - Should be from a Variable().
      lr - Learning rate. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyProximalAdagrad
    • sparseApplyProximalGradientDescent

      public <T extends TType> SparseApplyProximalGradientDescent<T> sparseApplyProximalGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyProximalGradientDescent.Options... options)
      Sparse update '*var' as FOBOS algorithm with fixed learning rate. That is for rows we have grad for, we update var as follows: $$prox_v = var - alpha * grad$$ $$var = sign(prox_v)/(1+alphal2) * max{|prox_v|-alphal1,0}$$
      Type Parameters:
      T - data type for SparseApplyProximalGradientDescent output and operands
      Parameters:
      var - Should be from a Variable().
      alpha - Scaling factor. Must be a scalar.
      l1 - L1 regularization. Must be a scalar.
      l2 - L2 regularization. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var and accum.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyProximalGradientDescent
    • sparseApplyRmsProp

      public <T extends TType> SparseApplyRmsProp<T> sparseApplyRmsProp(Operand<T> var, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyRmsProp.Options... options)
      Update '*var' according to the RMSProp algorithm. Note that in dense implementation of this algorithm, ms and mom will update even if the grad is zero, but in this sparse implementation, ms and mom will not update in iterations during which the grad is zero.

      mean_square = decay * mean_square + (1-decay) * gradient ** 2 Delta = learning_rate * gradient / sqrt(mean_square + epsilon)

      $$ms <- rho * ms_{t-1} + (1-rho) * grad * grad$$ $$mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon)$$ $$var <- var - mom$$

      Type Parameters:
      T - data type for SparseApplyRMSProp output and operands
      Parameters:
      var - Should be from a Variable().
      ms - Should be from a Variable().
      mom - Should be from a Variable().
      lr - Scaling factor. Must be a scalar.
      rho - Decay rate. Must be a scalar.
      momentum - The momentum value
      epsilon - Ridge term. Must be a scalar.
      grad - The gradient.
      indices - A vector of indices into the first dimension of var, ms and mom.
      options - carries optional attribute values
      Returns:
      a new instance of SparseApplyRmsProp
    • symbolicGradient

      public SymbolicGradient symbolicGradient(Iterable<Operand<?>> input, List<Class<? extends TType>> Tout, ConcreteFunction f)
      Computes the gradient function for function f via backpropagation.
      Parameters:
      input - a list of input tensors of size N + M;
      Tout - the type list for the input list.
      f - The function we want to compute the gradient for.

      The function 'f' must be a numerical function which takes N inputs and produces M outputs. Its gradient function 'g', which is computed by this SymbolicGradient op is a function taking N + M inputs and produces N outputs.

      I.e. if we have (y1, y2, ..., y_M) = f(x1, x2, ..., x_N), then, g is (dL/dx1, dL/dx2, ..., dL/dx_N) = g(x1, x2, ..., x_N, dL/dy1, dL/dy2, ..., dL/dy_M),

      where L is a scalar-value function of (x1, x2, ..., xN) (e.g., the loss function). dL/dx_i is the partial derivative of L with respect to x_i.

      (Needs some math expert to say the comment above better.)

      Returns:
      a new instance of SymbolicGradient
    • tileGrad

      public <T extends TType> TileGrad<T> tileGrad(Operand<T> input, Operand<TInt32> multiples)
      Returns the gradient of Tile. Since Tile takes an input and repeats the input multiples times along each dimension, train.TileGrad takes in multiples and aggregates each repeated tile of input into output.
      Type Parameters:
      T - data type for TileGrad output and operands
      Parameters:
      input - The input value
      multiples - The multiples value
      Returns:
      a new instance of TileGrad
    • ops

      public final Ops ops()
      Get the parent Ops object.