Class TrainOps
-
Method Summary
Modifier and TypeMethodDescriptionaccumulatorApplyGradient(Operand<TString> handle, Operand<TInt64> localStep, Operand<? extends TType> gradient) Applies a gradient to a given accumulator.accumulatorNumAccumulated(Operand<TString> handle) Returns the number of gradients aggregated in the given accumulators.accumulatorSetGlobalStep(Operand<TString> handle, Operand<TInt64> newGlobalStep) Updates the accumulator with a new value for global_step.<T extends TType>
AccumulatorTakeGradient<T> Extracts the average gradient in the given ConditionalAccumulator.<T extends TType>
ApplyAdadelta<T> applyAdadelta(Operand<T> var, Operand<T> accum, Operand<T> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, ApplyAdadelta.Options... options) Update '*var' according to the adadelta scheme.<T extends TType>
ApplyAdagrad<T> applyAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, ApplyAdagrad.Options... options) Update '*var' according to the adagrad scheme.<T extends TType>
ApplyAdagradDa<T> applyAdagradDa(Operand<T> var, Operand<T> gradientAccumulator, Operand<T> gradientSquaredAccumulator, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ApplyAdagradDa.Options... options) Update '*var' according to the proximal adagrad scheme.<T extends TType>
ApplyAdagradV2<T> applyAdagradV2(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, ApplyAdagradV2.Options... options) Update '*var' according to the adagrad scheme.applyAdam(Operand<T> var, Operand<T> m, Operand<T> v, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ApplyAdam.Options... options) Update '*var' according to the Adam algorithm.<T extends TType>
ApplyAdaMax<T> applyAdaMax(Operand<T> var, Operand<T> m, Operand<T> v, Operand<T> beta1Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ApplyAdaMax.Options... options) Update '*var' according to the AdaMax algorithm.<T extends TType>
ApplyAddSign<T> applyAddSign(Operand<T> var, Operand<T> m, Operand<T> lr, Operand<T> alpha, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ApplyAddSign.Options... options) Update '*var' according to the AddSign update.<T extends TType>
ApplyCenteredRmsProp<T> applyCenteredRmsProp(Operand<T> var, Operand<T> mg, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ApplyCenteredRmsProp.Options... options) Update '*var' according to the centered RMSProp algorithm.applyFtrl(Operand<T> var, Operand<T> accum, Operand<T> linear, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ApplyFtrl.Options... options) Update '*var' according to the Ftrl-proximal scheme.<T extends TType>
ApplyGradientDescent<T> applyGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> delta, ApplyGradientDescent.Options... options) Update '*var' by subtracting 'alpha' * 'delta' from it.<T extends TType>
ApplyMomentum<T> applyMomentum(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ApplyMomentum.Options... options) Update '*var' according to the momentum scheme.<T extends TType>
ApplyPowerSign<T> applyPowerSign(Operand<T> var, Operand<T> m, Operand<T> lr, Operand<T> logbase, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ApplyPowerSign.Options... options) Update '*var' according to the AddSign update.<T extends TType>
ApplyProximalAdagrad<T> applyProximalAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, ApplyProximalAdagrad.Options... options) Update '*var' and '*accum' according to FOBOS with Adagrad learning rate.<T extends TType>
ApplyProximalGradientDescent<T> applyProximalGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> delta, ApplyProximalGradientDescent.Options... options) Update '*var' as FOBOS algorithm with fixed learning rate.<T extends TType>
ApplyRmsProp<T> applyRmsProp(Operand<T> var, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ApplyRmsProp.Options... options) Update '*var' according to the RMSProp algorithm.<V extends TType>
BatchMatMul<V> batchMatMul(Operand<? extends TType> x, Operand<? extends TType> y, Class<V> Tout, BatchMatMul.Options... options) Multiplies slices of two tensors in batches.computeBatchSize(Operand<? extends TType> inputDataset) Computes the static batch size of a dataset sans partial batches.<T extends TType>
ConditionalAccumulatorconditionalAccumulator(Class<T> dtype, Shape shape, ConditionalAccumulator.Options... options) A conditional accumulator for aggregating gradients.distributedSave(Operand<? extends TType> dataset, Operand<TString> directory, Operand<TString> address, DistributedSave.Options... options) The DistributedSave operationgenerateVocabRemapping(Operand<TString> newVocabFile, Operand<TString> oldVocabFile, Long newVocabOffset, Long numNewVocab, GenerateVocabRemapping.Options... options) Given a path to new and old vocabulary files, returns a remapping Tensor of lengthnum_new_vocab, whereremapping[i]contains the row number in the old vocabulary that corresponds to rowiin the new vocabulary (starting at linenew_vocab_offsetand up tonum_new_vocabentities), or-1if entryiin the new vocabulary is not in the old vocabulary.mergeV2Checkpoints(Operand<TString> checkpointPrefixes, Operand<TString> destinationPrefix, MergeV2Checkpoints.Options... options) V2 format specific: merges the metadata files of sharded checkpoints.negTrain(Operand<TFloat32> wIn, Operand<TFloat32> wOut, Operand<TInt32> examples, Operand<TInt32> labels, Operand<TFloat32> lr, List<Long> vocabCount, Long numNegativeSamples) Training via negative sampling.final Opsops()Get the parentOpsobject.<T extends TType>
PreventGradient<T> preventGradient(Operand<T> input, PreventGradient.Options... options) An identity op that triggers an error if a gradient is requested.resourceAccumulatorApplyGradient(Operand<? extends TType> handle, Operand<TInt64> localStep, Operand<? extends TType> gradient) Applies a gradient to a given accumulator.resourceAccumulatorNumAccumulated(Operand<? extends TType> handle) Returns the number of gradients aggregated in the given accumulators.resourceAccumulatorSetGlobalStep(Operand<? extends TType> handle, Operand<TInt64> newGlobalStep) Updates the accumulator with a new value for global_step.<T extends TType>
ResourceAccumulatorTakeGradient<T> resourceAccumulatorTakeGradient(Operand<? extends TType> handle, Operand<TInt32> numRequired, Class<T> dtype) Extracts the average gradient in the given ConditionalAccumulator.<T extends TType>
ResourceApplyAdadeltaresourceApplyAdadelta(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdadelta.Options... options) Update '*var' according to the adadelta scheme.<T extends TType>
ResourceApplyAdagradresourceApplyAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdagrad.Options... options) Update '*var' according to the adagrad scheme.<T extends TType>
ResourceApplyAdagradDaresourceApplyAdagradDa(Operand<? extends TType> var, Operand<? extends TType> gradientAccumulator, Operand<? extends TType> gradientSquaredAccumulator, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ResourceApplyAdagradDa.Options... options) Update '*var' according to the proximal adagrad scheme.<T extends TType>
ResourceApplyAdamresourceApplyAdam(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdam.Options... options) Update '*var' according to the Adam algorithm.<T extends TType>
ResourceApplyAdaMaxresourceApplyAdaMax(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<T> beta1Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdaMax.Options... options) Update '*var' according to the AdaMax algorithm.<T extends TType>
ResourceApplyAdamWithAmsgradresourceApplyAdamWithAmsgrad(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<? extends TType> vhat, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdamWithAmsgrad.Options... options) Update '*var' according to the Adam algorithm.<T extends TType>
ResourceApplyAddSignresourceApplyAddSign(Operand<? extends TType> var, Operand<? extends TType> m, Operand<T> lr, Operand<T> alpha, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ResourceApplyAddSign.Options... options) Update '*var' according to the AddSign update.<T extends TType>
ResourceApplyCenteredRmsPropresourceApplyCenteredRmsProp(Operand<? extends TType> var, Operand<? extends TType> mg, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ResourceApplyCenteredRmsProp.Options... options) Update '*var' according to the centered RMSProp algorithm.<T extends TType>
ResourceApplyFtrlresourceApplyFtrl(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> linear, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ResourceApplyFtrl.Options... options) Update '*var' according to the Ftrl-proximal scheme.<T extends TType>
ResourceApplyGradientDescentresourceApplyGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> delta, ResourceApplyGradientDescent.Options... options) Update '*var' by subtracting 'alpha' * 'delta' from it.<T extends TType>
ResourceApplyKerasMomentumresourceApplyKerasMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ResourceApplyKerasMomentum.Options... options) Update '*var' according to the momentum scheme.<T extends TType>
ResourceApplyMomentumresourceApplyMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ResourceApplyMomentum.Options... options) Update '*var' according to the momentum scheme.<T extends TType>
ResourceApplyPowerSignresourceApplyPowerSign(Operand<? extends TType> var, Operand<? extends TType> m, Operand<T> lr, Operand<T> logbase, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ResourceApplyPowerSign.Options... options) Update '*var' according to the AddSign update.<T extends TType>
ResourceApplyProximalAdagradresourceApplyProximalAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, ResourceApplyProximalAdagrad.Options... options) Update '*var' and '*accum' according to FOBOS with Adagrad learning rate.<T extends TType>
ResourceApplyProximalGradientDescentresourceApplyProximalGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> delta, ResourceApplyProximalGradientDescent.Options... options) Update '*var' as FOBOS algorithm with fixed learning rate.<T extends TType>
ResourceApplyRmsPropresourceApplyRmsProp(Operand<? extends TType> var, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ResourceApplyRmsProp.Options... options) Update '*var' according to the RMSProp algorithm.<T extends TType>
ResourceConditionalAccumulatorresourceConditionalAccumulator(Class<T> dtype, Shape shape, ResourceConditionalAccumulator.Options... options) A conditional accumulator for aggregating gradients.<T extends TType>
ResourceSparseApplyAdadeltaresourceSparseApplyAdadelta(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdadelta.Options... options) var: Should be from a Variable().<T extends TType>
ResourceSparseApplyAdagradresourceSparseApplyAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdagrad.Options... options) Update relevant entries in '*var' and '*accum' according to the adagrad scheme.<T extends TType>
ResourceSparseApplyAdagradDaresourceSparseApplyAdagradDa(Operand<? extends TType> var, Operand<? extends TType> gradientAccumulator, Operand<? extends TType> gradientSquaredAccumulator, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ResourceSparseApplyAdagradDa.Options... options) Update entries in '*var' and '*accum' according to the proximal adagrad scheme.<T extends TType>
ResourceSparseApplyAdagradV2resourceSparseApplyAdagradV2(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdagradV2.Options... options) Update relevant entries in '*var' and '*accum' according to the adagrad scheme.<T extends TType>
ResourceSparseApplyCenteredRmsPropresourceSparseApplyCenteredRmsProp(Operand<? extends TType> var, Operand<? extends TType> mg, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyCenteredRmsProp.Options... options) Update '*var' according to the centered RMSProp algorithm.<T extends TType>
ResourceSparseApplyFtrlresourceSparseApplyFtrl(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> linear, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ResourceSparseApplyFtrl.Options... options) Update relevant entries in '*var' according to the Ftrl-proximal scheme.<T extends TType>
ResourceSparseApplyKerasMomentumresourceSparseApplyKerasMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, ResourceSparseApplyKerasMomentum.Options... options) Update relevant entries in '*var' and '*accum' according to the momentum scheme.<T extends TType>
ResourceSparseApplyMomentumresourceSparseApplyMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, ResourceSparseApplyMomentum.Options... options) Update relevant entries in '*var' and '*accum' according to the momentum scheme.<T extends TType>
ResourceSparseApplyProximalAdagradresourceSparseApplyProximalAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyProximalAdagrad.Options... options) Sparse update entries in '*var' and '*accum' according to FOBOS algorithm.<T extends TType>
ResourceSparseApplyProximalGradientDescentresourceSparseApplyProximalGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyProximalGradientDescent.Options... options) Sparse update '*var' as FOBOS algorithm with fixed learning rate.<T extends TType>
ResourceSparseApplyRmsPropresourceSparseApplyRmsProp(Operand<? extends TType> var, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyRmsProp.Options... options) Update '*var' according to the RMSProp algorithm.restore(Operand<TString> prefix, Operand<TString> tensorNames, Operand<TString> shapeAndSlices, List<Class<? extends TType>> dtypes) Restores tensors from a V2 checkpoint.<T extends TType>
RestoreSlice<T> restoreSlice(Operand<TString> filePattern, Operand<TString> tensorName, Operand<TString> shapeAndSlice, Class<T> dt, RestoreSlice.Options... options) Restores a tensor from checkpoint files.save(Operand<TString> prefix, Operand<TString> tensorNames, Operand<TString> shapeAndSlices, Iterable<Operand<?>> tensors) Saves tensors in V2 checkpoint format.saveSlices(Operand<TString> filename, Operand<TString> tensorNames, Operand<TString> shapesAndSlices, Iterable<Operand<?>> data) Saves input tensors slices to disk.sdcaFprint(Operand<TString> input) Computes fingerprints of the input strings.sdcaOptimizer(Iterable<Operand<TInt64>> sparseExampleIndices, Iterable<Operand<TInt64>> sparseFeatureIndices, Iterable<Operand<TFloat32>> sparseFeatureValues, Iterable<Operand<TFloat32>> denseFeatures, Operand<TFloat32> exampleWeights, Operand<TFloat32> exampleLabels, Iterable<Operand<TInt64>> sparseIndices, Iterable<Operand<TFloat32>> sparseWeights, Iterable<Operand<TFloat32>> denseWeights, Operand<TFloat32> exampleStateData, String lossType, Float l1, Float l2, Long numLossPartitions, Long numInnerIterations, SdcaOptimizer.Options... options) Distributed version of Stochastic Dual Coordinate Ascent (SDCA) optimizer for linear models with L1 + L2 regularization.Applies L1 regularization shrink step on the parameters.<T extends TType>
SparseApplyAdadelta<T> sparseApplyAdadelta(Operand<T> var, Operand<T> accum, Operand<T> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyAdadelta.Options... options) var: Should be from a Variable().<T extends TType>
SparseApplyAdagrad<T> sparseApplyAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyAdagrad.Options... options) Update relevant entries in '*var' and '*accum' according to the adagrad scheme.<T extends TType>
SparseApplyAdagradDa<T> sparseApplyAdagradDa(Operand<T> var, Operand<T> gradientAccumulator, Operand<T> gradientSquaredAccumulator, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, SparseApplyAdagradDa.Options... options) Update entries in '*var' and '*accum' according to the proximal adagrad scheme.<T extends TType>
SparseApplyCenteredRmsProp<T> sparseApplyCenteredRmsProp(Operand<T> var, Operand<T> mg, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyCenteredRmsProp.Options... options) Update '*var' according to the centered RMSProp algorithm.<T extends TType>
SparseApplyFtrl<T> sparseApplyFtrl(Operand<T> var, Operand<T> accum, Operand<T> linear, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, SparseApplyFtrl.Options... options) Update relevant entries in '*var' according to the Ftrl-proximal scheme.<T extends TType>
SparseApplyMomentum<T> sparseApplyMomentum(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, SparseApplyMomentum.Options... options) Update relevant entries in '*var' and '*accum' according to the momentum scheme.<T extends TType>
SparseApplyProximalAdagrad<T> sparseApplyProximalAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyProximalAdagrad.Options... options) Sparse update entries in '*var' and '*accum' according to FOBOS algorithm.<T extends TType>
SparseApplyProximalGradientDescent<T> sparseApplyProximalGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyProximalGradientDescent.Options... options) Sparse update '*var' as FOBOS algorithm with fixed learning rate.<T extends TType>
SparseApplyRmsProp<T> sparseApplyRmsProp(Operand<T> var, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyRmsProp.Options... options) Update '*var' according to the RMSProp algorithm.symbolicGradient(Iterable<Operand<?>> input, List<Class<? extends TType>> Tout, ConcreteFunction f) Computes the gradient function for function f via backpropagation.Returns the gradient ofTile.
-
Method Details
-
accumulatorApplyGradient
public AccumulatorApplyGradient accumulatorApplyGradient(Operand<TString> handle, Operand<TInt64> localStep, Operand<? extends TType> gradient) Applies a gradient to a given accumulator. Does not add if local_step is lesser than the accumulator's global_step.- Parameters:
handle- The handle to a accumulator.localStep- The local_step value at which the gradient was computed.gradient- A tensor of the gradient to be accumulated.- Returns:
- a new instance of AccumulatorApplyGradient
-
accumulatorNumAccumulated
Returns the number of gradients aggregated in the given accumulators.- Parameters:
handle- The handle to an accumulator.- Returns:
- a new instance of AccumulatorNumAccumulated
-
accumulatorSetGlobalStep
public AccumulatorSetGlobalStep accumulatorSetGlobalStep(Operand<TString> handle, Operand<TInt64> newGlobalStep) Updates the accumulator with a new value for global_step. Logs warning if the accumulator's value is already higher than new_global_step.- Parameters:
handle- The handle to an accumulator.newGlobalStep- The new global_step value to set.- Returns:
- a new instance of AccumulatorSetGlobalStep
-
accumulatorTakeGradient
public <T extends TType> AccumulatorTakeGradient<T> accumulatorTakeGradient(Operand<TString> handle, Operand<TInt32> numRequired, Class<T> dtype) Extracts the average gradient in the given ConditionalAccumulator. The op blocks until sufficient (i.e., more than num_required) gradients have been accumulated. If the accumulator has already aggregated more than num_required gradients, it returns the average of the accumulated gradients. Also automatically increments the recorded global_step in the accumulator by 1, and resets the aggregate to 0.- Type Parameters:
T- data type forAccumulatorTakeGradientoutput and operands- Parameters:
handle- The handle to an accumulator.numRequired- Number of gradients required before we return an aggregate.dtype- The data type of accumulated gradients. Needs to correspond to the type of the accumulator.- Returns:
- a new instance of AccumulatorTakeGradient
-
applyAdaMax
public <T extends TType> ApplyAdaMax<T> applyAdaMax(Operand<T> var, Operand<T> m, Operand<T> v, Operand<T> beta1Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ApplyAdaMax.Options... options) Update '*var' according to the AdaMax algorithm. m_t <- beta1 * m_{t-1} + (1 - beta1) * g v_t <- max(beta2 * v_{t-1}, abs(g)) variable <- variable - learning_rate / (1 - beta1^t) * m_t / (v_t + epsilon)- Type Parameters:
T- data type forApplyAdaMaxoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().v- Should be from a Variable().beta1Power- Must be a scalar.lr- Scaling factor. Must be a scalar.beta1- Momentum factor. Must be a scalar.beta2- Momentum factor. Must be a scalar.epsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyAdaMax
-
applyAdadelta
public <T extends TType> ApplyAdadelta<T> applyAdadelta(Operand<T> var, Operand<T> accum, Operand<T> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, ApplyAdadelta.Options... options) Update '*var' according to the adadelta scheme. accum = rho() * accum + (1 - rho()) * grad.square(); update = (update_accum + epsilon).sqrt() * (accum + epsilon()).rsqrt() * grad; update_accum = rho() * update_accum + (1 - rho()) * update.square(); var -= update;- Type Parameters:
T- data type forApplyAdadeltaoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().accumUpdate- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay factor. Must be a scalar.epsilon- Constant factor. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyAdadelta
-
applyAdagrad
public <T extends TType> ApplyAdagrad<T> applyAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, ApplyAdagrad.Options... options) Update '*var' according to the adagrad scheme. accum += grad * grad var -= lr * grad * (1 / sqrt(accum))- Type Parameters:
T- data type forApplyAdagradoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Scaling factor. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyAdagrad
-
applyAdagradDa
public <T extends TType> ApplyAdagradDa<T> applyAdagradDa(Operand<T> var, Operand<T> gradientAccumulator, Operand<T> gradientSquaredAccumulator, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ApplyAdagradDa.Options... options) Update '*var' according to the proximal adagrad scheme.- Type Parameters:
T- data type forApplyAdagradDAoutput and operands- Parameters:
var- Should be from a Variable().gradientAccumulator- Should be from a Variable().gradientSquaredAccumulator- Should be from a Variable().grad- The gradient.lr- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.globalStep- Training step number. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ApplyAdagradDa
-
applyAdagradV2
public <T extends TType> ApplyAdagradV2<T> applyAdagradV2(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, ApplyAdagradV2.Options... options) Update '*var' according to the adagrad scheme. accum += grad * grad var -= lr * grad * (1 / sqrt(accum))- Type Parameters:
T- data type forApplyAdagradV2output and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Scaling factor. Must be a scalar.epsilon- Constant factor. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyAdagradV2
-
applyAdam
public <T extends TType> ApplyAdam<T> applyAdam(Operand<T> var, Operand<T> m, Operand<T> v, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ApplyAdam.Options... options) Update '*var' according to the Adam algorithm. $$\text{lr}t := \mathrm{lr} \cdot \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t}$$ $$m_t := \beta_1 \cdot m{t-1} + (1 - \beta_1) \cdot g$$ $$v_t := \beta_2 \cdot v_{t-1} + (1 - \beta_2) \cdot g^2$$ $$\text{var} := \begin{cases} \text{var} - (m_t \beta_1 + g \cdot (1 - \beta_1))\cdot\text{lr}_t/(\sqrt{v_t} + \epsilon), &\text{if use_nesterov}\\ \text{var} - m_t \cdot \text{lr}_t /(\sqrt{v_t} + \epsilon), &\text{otherwise} \end{cases}$$- Type Parameters:
T- data type forApplyAdamoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().v- Should be from a Variable().beta1Power- Must be a scalar.beta2Power- Must be a scalar.lr- Scaling factor. Must be a scalar.beta1- Momentum factor. Must be a scalar.beta2- Momentum factor. Must be a scalar.epsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyAdam
-
applyAddSign
public <T extends TType> ApplyAddSign<T> applyAddSign(Operand<T> var, Operand<T> m, Operand<T> lr, Operand<T> alpha, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ApplyAddSign.Options... options) Update '*var' according to the AddSign update. m_t <- beta1 * m_{t-1} + (1 - beta1) * g update <- (alpha + sign_decay * sign(g) *sign(m)) * g variable <- variable - lr_t * update- Type Parameters:
T- data type forApplyAddSignoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().lr- Scaling factor. Must be a scalar.alpha- Must be a scalar.signDecay- Must be a scalar.beta- Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyAddSign
-
applyCenteredRmsProp
public <T extends TType> ApplyCenteredRmsProp<T> applyCenteredRmsProp(Operand<T> var, Operand<T> mg, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ApplyCenteredRmsProp.Options... options) Update '*var' according to the centered RMSProp algorithm. The centered RMSProp algorithm uses an estimate of the centered second moment (i.e., the variance) for normalization, as opposed to regular RMSProp, which uses the (uncentered) second moment. This often helps with training, but is slightly more expensive in terms of computation and memory.Note that in dense implementation of this algorithm, mg, ms, and mom will update even if the grad is zero, but in this sparse implementation, mg, ms, and mom will not update in iterations during which the grad is zero.
mean_square = decay * mean_square + (1-decay) * gradient ** 2 mean_grad = decay * mean_grad + (1-decay) * gradient
Delta = learning_rate * gradient / sqrt(mean_square + epsilon - mean_grad ** 2)
mg <- rho * mg_{t-1} + (1-rho) * grad ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms - mg * mg + epsilon) var <- var - mom
- Type Parameters:
T- data type forApplyCenteredRMSPropoutput and operands- Parameters:
var- Should be from a Variable().mg- Should be from a Variable().ms- Should be from a Variable().mom- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay rate. Must be a scalar.momentum- Momentum Scale. Must be a scalar.epsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyCenteredRmsProp
-
applyFtrl
public <T extends TType> ApplyFtrl<T> applyFtrl(Operand<T> var, Operand<T> accum, Operand<T> linear, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ApplyFtrl.Options... options) Update '*var' according to the Ftrl-proximal scheme. grad_with_shrinkage = grad + 2 * l2_shrinkage * var accum_new = accum + grad * grad linear += grad_with_shrinkage - (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0 accum = accum_new- Type Parameters:
T- data type forApplyFtrlV2output and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().linear- Should be from a Variable().grad- The gradient.lr- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 shrinkage regularization. Must be a scalar.l2Shrinkage- The l2Shrinkage valuelrPower- Scaling factor. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ApplyFtrl
-
applyGradientDescent
public <T extends TType> ApplyGradientDescent<T> applyGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> delta, ApplyGradientDescent.Options... options) Update '*var' by subtracting 'alpha' * 'delta' from it.- Type Parameters:
T- data type forApplyGradientDescentoutput and operands- Parameters:
var- Should be from a Variable().alpha- Scaling factor. Must be a scalar.delta- The change.options- carries optional attribute values- Returns:
- a new instance of ApplyGradientDescent
-
applyMomentum
public <T extends TType> ApplyMomentum<T> applyMomentum(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ApplyMomentum.Options... options) Update '*var' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.accum = accum * momentum + grad var -= lr * accum
- Type Parameters:
T- data type forApplyMomentumoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Scaling factor. Must be a scalar.grad- The gradient.momentum- Momentum. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ApplyMomentum
-
applyPowerSign
public <T extends TType> ApplyPowerSign<T> applyPowerSign(Operand<T> var, Operand<T> m, Operand<T> lr, Operand<T> logbase, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ApplyPowerSign.Options... options) Update '*var' according to the AddSign update. m_t <- beta1 * m_{t-1} + (1 - beta1) * g update <- exp(logbase * sign_decay * sign(g) * sign(m_t)) * g variable <- variable - lr_t * update- Type Parameters:
T- data type forApplyPowerSignoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().lr- Scaling factor. Must be a scalar.logbase- Must be a scalar.signDecay- Must be a scalar.beta- Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyPowerSign
-
applyProximalAdagrad
public <T extends TType> ApplyProximalAdagrad<T> applyProximalAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, ApplyProximalAdagrad.Options... options) Update '*var' and '*accum' according to FOBOS with Adagrad learning rate. accum += grad * grad prox_v = var - lr * grad * (1 / sqrt(accum)) var = sign(prox_v)/(1+lrl2) * max{|prox_v|-lrl1,0}- Type Parameters:
T- data type forApplyProximalAdagradoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyProximalAdagrad
-
applyProximalGradientDescent
public <T extends TType> ApplyProximalGradientDescent<T> applyProximalGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> delta, ApplyProximalGradientDescent.Options... options) Update '*var' as FOBOS algorithm with fixed learning rate. prox_v = var - alpha * delta var = sign(prox_v)/(1+alphal2) * max{|prox_v|-alphal1,0}- Type Parameters:
T- data type forApplyProximalGradientDescentoutput and operands- Parameters:
var- Should be from a Variable().alpha- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.delta- The change.options- carries optional attribute values- Returns:
- a new instance of ApplyProximalGradientDescent
-
applyRmsProp
public <T extends TType> ApplyRmsProp<T> applyRmsProp(Operand<T> var, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ApplyRmsProp.Options... options) Update '*var' according to the RMSProp algorithm. Note that in dense implementation of this algorithm, ms and mom will update even if the grad is zero, but in this sparse implementation, ms and mom will not update in iterations during which the grad is zero.mean_square = decay * mean_square + (1-decay) * gradient ** 2 Delta = learning_rate * gradient / sqrt(mean_square + epsilon)
ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon) var <- var - mom
- Type Parameters:
T- data type forApplyRMSPropoutput and operands- Parameters:
var- Should be from a Variable().ms- Should be from a Variable().mom- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay rate. Must be a scalar.momentum- The momentum valueepsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ApplyRmsProp
-
batchMatMul
public <V extends TType> BatchMatMul<V> batchMatMul(Operand<? extends TType> x, Operand<? extends TType> y, Class<V> Tout, BatchMatMul.Options... options) Multiplies slices of two tensors in batches. Multiplies all slices ofTensorxandy(each slice can be viewed as an element of a batch), and arranges the individual results in a single output tensor of the same batch size. Each of the individual slices can optionally be adjointed (to adjoint a matrix means to transpose and conjugate it) before multiplication by setting theadj_xoradj_yflag toTrue, which are by defaultFalse.The input tensors
xandyare 2-D or higher with shape[..., r_x, c_x]and[..., r_y, c_y].The output tensor is 2-D or higher with shape
[..., r_o, c_o], where:r_o = c_x if adj_x else r_x c_o = r_y if adj_y else c_y
It is computed as:
output[..., :, :] = matrix(x[..., :, :]) * matrix(y[..., :, :])
NOTE:
train.BatchMatMulsupports broadcasting in the batch dimensions. More about broadcasting here .- Type Parameters:
V- data type forBatchMatMulV3output and operands- Parameters:
x- 2-D or higher with shape[..., r_x, c_x].y- 2-D or higher with shape[..., r_y, c_y].Tout- If not spcified, Tout is the same type to input type.options- carries optional attribute values- Returns:
- a new instance of BatchMatMul
-
computeBatchSize
Computes the static batch size of a dataset sans partial batches.- Parameters:
inputDataset- The inputDataset value- Returns:
- a new instance of ComputeBatchSize
-
conditionalAccumulator
public <T extends TType> ConditionalAccumulator conditionalAccumulator(Class<T> dtype, Shape shape, ConditionalAccumulator.Options... options) A conditional accumulator for aggregating gradients. The accumulator accepts gradients marked with local_step greater or equal to the most recent global_step known to the accumulator. The average can be extracted from the accumulator, provided sufficient gradients have been accumulated. Extracting the average automatically resets the aggregate to 0, and increments the global_step recorded by the accumulator.- Type Parameters:
T- data type forConditionalAccumulatoroutput and operands- Parameters:
dtype- The type of the value being accumulated.shape- The shape of the values, can be [], in which case shape is unknown.options- carries optional attribute values- Returns:
- a new instance of ConditionalAccumulator
-
distributedSave
public DistributedSave distributedSave(Operand<? extends TType> dataset, Operand<TString> directory, Operand<TString> address, DistributedSave.Options... options) The DistributedSave operation- Parameters:
dataset- The dataset valuedirectory- The directory valueaddress- The address valueoptions- carries optional attribute values- Returns:
- a new instance of DistributedSave
-
generateVocabRemapping
public GenerateVocabRemapping generateVocabRemapping(Operand<TString> newVocabFile, Operand<TString> oldVocabFile, Long newVocabOffset, Long numNewVocab, GenerateVocabRemapping.Options... options) Given a path to new and old vocabulary files, returns a remapping Tensor of lengthnum_new_vocab, whereremapping[i]contains the row number in the old vocabulary that corresponds to rowiin the new vocabulary (starting at linenew_vocab_offsetand up tonum_new_vocabentities), or-1if entryiin the new vocabulary is not in the old vocabulary. The old vocabulary is constrained to the firstold_vocab_sizeentries ifold_vocab_sizeis not the default value of -1.num_vocab_offsetenables use in the partitioned variable case, and should generally be set through examining partitioning info. The format of the files should be a text file, with each line containing a single entity within the vocabulary.For example, with
new_vocab_filea text file containing each of the following elements on a single line:[f0, f1, f2, f3], old_vocab_file = [f1, f0, f3],num_new_vocab = 3, new_vocab_offset = 1, the returned remapping would be[0, -1, 2].The op also returns a count of how many entries in the new vocabulary were present in the old vocabulary, which is used to calculate the number of values to initialize in a weight matrix remapping
This functionality can be used to remap both row vocabularies (typically, features) and column vocabularies (typically, classes) from TensorFlow checkpoints. Note that the partitioning logic relies on contiguous vocabularies corresponding to div-partitioned variables. Moreover, the underlying remapping uses an IndexTable (as opposed to an inexact CuckooTable), so client code should use the corresponding index_table_from_file() as the FeatureColumn framework does (as opposed to tf.feature_to_id(), which uses a CuckooTable).
- Parameters:
newVocabFile- Path to the new vocab file.oldVocabFile- Path to the old vocab file.newVocabOffset- How many entries into the new vocab file to start reading.numNewVocab- Number of entries in the new vocab file to remap.options- carries optional attribute values- Returns:
- a new instance of GenerateVocabRemapping
-
mergeV2Checkpoints
public MergeV2Checkpoints mergeV2Checkpoints(Operand<TString> checkpointPrefixes, Operand<TString> destinationPrefix, MergeV2Checkpoints.Options... options) V2 format specific: merges the metadata files of sharded checkpoints. The result is one logical checkpoint, with one physical metadata file and renamed data files.Intended for "grouping" multiple checkpoints in a sharded checkpoint setup.
If delete_old_dirs is true, attempts to delete recursively the dirname of each path in the input checkpoint_prefixes. This is useful when those paths are non user-facing temporary locations.
If allow_missing_files is true, merges the checkpoint prefixes as long as at least one file exists. Otherwise, if no files exist, an error will be thrown. The default value for allow_missing_files is false.
- Parameters:
checkpointPrefixes- prefixes of V2 checkpoints to merge.destinationPrefix- scalar. The desired final prefix. Allowed to be the same as one of the checkpoint_prefixes.options- carries optional attribute values- Returns:
- a new instance of MergeV2Checkpoints
-
negTrain
public NegTrain negTrain(Operand<TFloat32> wIn, Operand<TFloat32> wOut, Operand<TInt32> examples, Operand<TInt32> labels, Operand<TFloat32> lr, List<Long> vocabCount, Long numNegativeSamples) Training via negative sampling.- Parameters:
wIn- input word embedding.wOut- output word embedding.examples- A vector of word ids.labels- A vector of word ids.lr- The lr valuevocabCount- Count of words in the vocabulary.numNegativeSamples- Number of negative samples per example.- Returns:
- a new instance of NegTrain
-
preventGradient
public <T extends TType> PreventGradient<T> preventGradient(Operand<T> input, PreventGradient.Options... options) An identity op that triggers an error if a gradient is requested. When executed in a graph, this op outputs its input tensor as-is.When building ops to compute gradients, the TensorFlow gradient system will return an error when trying to lookup the gradient of this op, because no gradient must ever be registered for this function. This op exists to prevent subtle bugs from silently returning unimplemented gradients in some corner cases.
- Type Parameters:
T- data type forPreventGradientoutput and operands- Parameters:
input- any tensor.options- carries optional attribute values- Returns:
- a new instance of PreventGradient
-
resourceAccumulatorApplyGradient
public ResourceAccumulatorApplyGradient resourceAccumulatorApplyGradient(Operand<? extends TType> handle, Operand<TInt64> localStep, Operand<? extends TType> gradient) Applies a gradient to a given accumulator. Does not add if local_step is lesser than the accumulator's global_step.- Parameters:
handle- The handle to a accumulator.localStep- The local_step value at which the gradient was computed.gradient- A tensor of the gradient to be accumulated.- Returns:
- a new instance of ResourceAccumulatorApplyGradient
-
resourceAccumulatorNumAccumulated
public ResourceAccumulatorNumAccumulated resourceAccumulatorNumAccumulated(Operand<? extends TType> handle) Returns the number of gradients aggregated in the given accumulators.- Parameters:
handle- The handle to an accumulator.- Returns:
- a new instance of ResourceAccumulatorNumAccumulated
-
resourceAccumulatorSetGlobalStep
public ResourceAccumulatorSetGlobalStep resourceAccumulatorSetGlobalStep(Operand<? extends TType> handle, Operand<TInt64> newGlobalStep) Updates the accumulator with a new value for global_step. Logs warning if the accumulator's value is already higher than new_global_step.- Parameters:
handle- The handle to an accumulator.newGlobalStep- The new global_step value to set.- Returns:
- a new instance of ResourceAccumulatorSetGlobalStep
-
resourceAccumulatorTakeGradient
public <T extends TType> ResourceAccumulatorTakeGradient<T> resourceAccumulatorTakeGradient(Operand<? extends TType> handle, Operand<TInt32> numRequired, Class<T> dtype) Extracts the average gradient in the given ConditionalAccumulator. The op blocks until sufficient (i.e., more than num_required) gradients have been accumulated. If the accumulator has already aggregated more than num_required gradients, it returns the average of the accumulated gradients. Also automatically increments the recorded global_step in the accumulator by 1, and resets the aggregate to 0.- Type Parameters:
T- data type forResourceAccumulatorTakeGradientoutput and operands- Parameters:
handle- The handle to an accumulator.numRequired- Number of gradients required before we return an aggregate.dtype- The data type of accumulated gradients. Needs to correspond to the type of the accumulator.- Returns:
- a new instance of ResourceAccumulatorTakeGradient
-
resourceApplyAdaMax
public <T extends TType> ResourceApplyAdaMax resourceApplyAdaMax(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<T> beta1Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdaMax.Options... options) Update '*var' according to the AdaMax algorithm. m_t <- beta1 * m_{t-1} + (1 - beta1) * g v_t <- max(beta2 * v_{t-1}, abs(g)) variable <- variable - learning_rate / (1 - beta1^t) * m_t / (v_t + epsilon)- Type Parameters:
T- data type forResourceApplyAdaMaxoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().v- Should be from a Variable().beta1Power- Must be a scalar.lr- Scaling factor. Must be a scalar.beta1- Momentum factor. Must be a scalar.beta2- Momentum factor. Must be a scalar.epsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyAdaMax
-
resourceApplyAdadelta
public <T extends TType> ResourceApplyAdadelta resourceApplyAdadelta(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdadelta.Options... options) Update '*var' according to the adadelta scheme. accum = rho() * accum + (1 - rho()) * grad.square(); update = (update_accum + epsilon).sqrt() * (accum + epsilon()).rsqrt() * grad; update_accum = rho() * update_accum + (1 - rho()) * update.square(); var -= update;- Type Parameters:
T- data type forResourceApplyAdadeltaoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().accumUpdate- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay factor. Must be a scalar.epsilon- Constant factor. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyAdadelta
-
resourceApplyAdagrad
public <T extends TType> ResourceApplyAdagrad resourceApplyAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdagrad.Options... options) Update '*var' according to the adagrad scheme. accum += grad * grad var -= lr * grad * (1 / (sqrt(accum) + epsilon))- Type Parameters:
T- data type forResourceApplyAdagradV2output and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Scaling factor. Must be a scalar.epsilon- Constant factor. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyAdagrad
-
resourceApplyAdagradDa
public <T extends TType> ResourceApplyAdagradDa resourceApplyAdagradDa(Operand<? extends TType> var, Operand<? extends TType> gradientAccumulator, Operand<? extends TType> gradientSquaredAccumulator, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ResourceApplyAdagradDa.Options... options) Update '*var' according to the proximal adagrad scheme.- Type Parameters:
T- data type forResourceApplyAdagradDAoutput and operands- Parameters:
var- Should be from a Variable().gradientAccumulator- Should be from a Variable().gradientSquaredAccumulator- Should be from a Variable().grad- The gradient.lr- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.globalStep- Training step number. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyAdagradDa
-
resourceApplyAdam
public <T extends TType> ResourceApplyAdam resourceApplyAdam(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdam.Options... options) Update '*var' according to the Adam algorithm. $$\text{lr}t := \mathrm{lr} \cdot \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t}$$ $$m_t := \beta_1 \cdot m{t-1} + (1 - \beta_1) \cdot g$$ $$v_t := \beta_2 \cdot v_{t-1} + (1 - \beta_2) \cdot g^2$$ $$\text{var} := \begin{cases} \text{var} - (m_t \beta_1 + g \cdot (1 - \beta_1))\cdot\text{lr}_t/(\sqrt{v_t} + \epsilon), &\text{if use_nesterov}\\ \text{var} - m_t \cdot \text{lr}_t /(\sqrt{v_t} + \epsilon), &\text{otherwise} \end{cases}$$- Type Parameters:
T- data type forResourceApplyAdamoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().v- Should be from a Variable().beta1Power- Must be a scalar.beta2Power- Must be a scalar.lr- Scaling factor. Must be a scalar.beta1- Momentum factor. Must be a scalar.beta2- Momentum factor. Must be a scalar.epsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyAdam
-
resourceApplyAdamWithAmsgrad
public <T extends TType> ResourceApplyAdamWithAmsgrad resourceApplyAdamWithAmsgrad(Operand<? extends TType> var, Operand<? extends TType> m, Operand<? extends TType> v, Operand<? extends TType> vhat, Operand<T> beta1Power, Operand<T> beta2Power, Operand<T> lr, Operand<T> beta1, Operand<T> beta2, Operand<T> epsilon, Operand<T> grad, ResourceApplyAdamWithAmsgrad.Options... options) Update '*var' according to the Adam algorithm. $$\text{lr}t := \mathrm{learning_rate} * \sqrt{1 - \beta_2^t} / (1 - \beta_1^t)$$ $$m_t := \beta_1 * m{t-1} + (1 - \beta_1) * g$$ $$v_t := \beta_2 * v_{t-1} + (1 - \beta_2) * g * g$$ $$\hat{v}t := max{\hat{v}{t-1}, v_t}$$ $$\text{variable} := \text{variable} - \text{lr}_t * m_t / (\sqrt{\hat{v}_t} + \epsilon)$$- Type Parameters:
T- data type forResourceApplyAdamWithAmsgradoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().v- Should be from a Variable().vhat- Should be from a Variable().beta1Power- Must be a scalar.beta2Power- Must be a scalar.lr- Scaling factor. Must be a scalar.beta1- Momentum factor. Must be a scalar.beta2- Momentum factor. Must be a scalar.epsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyAdamWithAmsgrad
-
resourceApplyAddSign
public <T extends TType> ResourceApplyAddSign resourceApplyAddSign(Operand<? extends TType> var, Operand<? extends TType> m, Operand<T> lr, Operand<T> alpha, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ResourceApplyAddSign.Options... options) Update '*var' according to the AddSign update. m_t <- beta1 * m_{t-1} + (1 - beta1) * g update <- (alpha + sign_decay * sign(g) *sign(m)) * g variable <- variable - lr_t * update- Type Parameters:
T- data type forResourceApplyAddSignoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().lr- Scaling factor. Must be a scalar.alpha- Must be a scalar.signDecay- Must be a scalar.beta- Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyAddSign
-
resourceApplyCenteredRmsProp
public <T extends TType> ResourceApplyCenteredRmsProp resourceApplyCenteredRmsProp(Operand<? extends TType> var, Operand<? extends TType> mg, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ResourceApplyCenteredRmsProp.Options... options) Update '*var' according to the centered RMSProp algorithm. The centered RMSProp algorithm uses an estimate of the centered second moment (i.e., the variance) for normalization, as opposed to regular RMSProp, which uses the (uncentered) second moment. This often helps with training, but is slightly more expensive in terms of computation and memory.Note that in dense implementation of this algorithm, mg, ms, and mom will update even if the grad is zero, but in this sparse implementation, mg, ms, and mom will not update in iterations during which the grad is zero.
mean_square = decay * mean_square + (1-decay) * gradient ** 2 mean_grad = decay * mean_grad + (1-decay) * gradient
Delta = learning_rate * gradient / sqrt(mean_square + epsilon - mean_grad ** 2)
mg <- rho * mg_{t-1} + (1-rho) * grad ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms - mg * mg + epsilon) var <- var - mom
- Type Parameters:
T- data type forResourceApplyCenteredRMSPropoutput and operands- Parameters:
var- Should be from a Variable().mg- Should be from a Variable().ms- Should be from a Variable().mom- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay rate. Must be a scalar.momentum- Momentum Scale. Must be a scalar.epsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyCenteredRmsProp
-
resourceApplyFtrl
public <T extends TType> ResourceApplyFtrl resourceApplyFtrl(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> linear, Operand<T> grad, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ResourceApplyFtrl.Options... options) Update '*var' according to the Ftrl-proximal scheme. accum_new = accum + grad * grad grad_with_shrinkage = grad + 2 * l2_shrinkage * var linear += grad_with_shrinkage + (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0 accum = accum_new- Type Parameters:
T- data type forResourceApplyFtrlV2output and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().linear- Should be from a Variable().grad- The gradient.lr- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 shrinkage regularization. Must be a scalar.l2Shrinkage- The l2Shrinkage valuelrPower- Scaling factor. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyFtrl
-
resourceApplyGradientDescent
public <T extends TType> ResourceApplyGradientDescent resourceApplyGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> delta, ResourceApplyGradientDescent.Options... options) Update '*var' by subtracting 'alpha' * 'delta' from it.- Type Parameters:
T- data type forResourceApplyGradientDescentoutput and operands- Parameters:
var- Should be from a Variable().alpha- Scaling factor. Must be a scalar.delta- The change.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyGradientDescent
-
resourceApplyKerasMomentum
public <T extends TType> ResourceApplyKerasMomentum resourceApplyKerasMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ResourceApplyKerasMomentum.Options... options) Update '*var' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.accum = accum * momentum - lr * grad var += accum
- Type Parameters:
T- data type forResourceApplyKerasMomentumoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Scaling factor. Must be a scalar.grad- The gradient.momentum- Momentum. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyKerasMomentum
-
resourceApplyMomentum
public <T extends TType> ResourceApplyMomentum resourceApplyMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<T> momentum, ResourceApplyMomentum.Options... options) Update '*var' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.accum = accum * momentum + grad var -= lr * accum
- Type Parameters:
T- data type forResourceApplyMomentumoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Scaling factor. Must be a scalar.grad- The gradient.momentum- Momentum. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyMomentum
-
resourceApplyPowerSign
public <T extends TType> ResourceApplyPowerSign resourceApplyPowerSign(Operand<? extends TType> var, Operand<? extends TType> m, Operand<T> lr, Operand<T> logbase, Operand<T> signDecay, Operand<T> beta, Operand<T> grad, ResourceApplyPowerSign.Options... options) Update '*var' according to the AddSign update. m_t <- beta1 * m_{t-1} + (1 - beta1) * g update <- exp(logbase * sign_decay * sign(g) * sign(m_t)) * g variable <- variable - lr_t * update- Type Parameters:
T- data type forResourceApplyPowerSignoutput and operands- Parameters:
var- Should be from a Variable().m- Should be from a Variable().lr- Scaling factor. Must be a scalar.logbase- Must be a scalar.signDecay- Must be a scalar.beta- Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyPowerSign
-
resourceApplyProximalAdagrad
public <T extends TType> ResourceApplyProximalAdagrad resourceApplyProximalAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, ResourceApplyProximalAdagrad.Options... options) Update '*var' and '*accum' according to FOBOS with Adagrad learning rate. accum += grad * grad prox_v = var - lr * grad * (1 / sqrt(accum)) var = sign(prox_v)/(1+lrl2) * max{|prox_v|-lrl1,0}- Type Parameters:
T- data type forResourceApplyProximalAdagradoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyProximalAdagrad
-
resourceApplyProximalGradientDescent
public <T extends TType> ResourceApplyProximalGradientDescent resourceApplyProximalGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> delta, ResourceApplyProximalGradientDescent.Options... options) Update '*var' as FOBOS algorithm with fixed learning rate. prox_v = var - alpha * delta var = sign(prox_v)/(1+alphal2) * max{|prox_v|-alphal1,0}- Type Parameters:
T- data type forResourceApplyProximalGradientDescentoutput and operands- Parameters:
var- Should be from a Variable().alpha- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.delta- The change.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyProximalGradientDescent
-
resourceApplyRmsProp
public <T extends TType> ResourceApplyRmsProp resourceApplyRmsProp(Operand<? extends TType> var, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, ResourceApplyRmsProp.Options... options) Update '*var' according to the RMSProp algorithm. Note that in dense implementation of this algorithm, ms and mom will update even if the grad is zero, but in this sparse implementation, ms and mom will not update in iterations during which the grad is zero.mean_square = decay * mean_square + (1-decay) * gradient ** 2 Delta = learning_rate * gradient / sqrt(mean_square + epsilon)
ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon) var <- var - mom
- Type Parameters:
T- data type forResourceApplyRMSPropoutput and operands- Parameters:
var- Should be from a Variable().ms- Should be from a Variable().mom- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay rate. Must be a scalar.momentum- The momentum valueepsilon- Ridge term. Must be a scalar.grad- The gradient.options- carries optional attribute values- Returns:
- a new instance of ResourceApplyRmsProp
-
resourceConditionalAccumulator
public <T extends TType> ResourceConditionalAccumulator resourceConditionalAccumulator(Class<T> dtype, Shape shape, ResourceConditionalAccumulator.Options... options) A conditional accumulator for aggregating gradients. The accumulator accepts gradients marked with local_step greater or equal to the most recent global_step known to the accumulator. The average can be extracted from the accumulator, provided sufficient gradients have been accumulated. Extracting the average automatically resets the aggregate to 0, and increments the global_step recorded by the accumulator. This is a resource version of ConditionalAccumulator that will work in TF2.0 with tf.cond version 2.- Type Parameters:
T- data type forResourceConditionalAccumulatoroutput and operands- Parameters:
dtype- The type of the value being accumulated.shape- The shape of the values, can be [], in which case shape is unknown.options- carries optional attribute values- Returns:
- a new instance of ResourceConditionalAccumulator
-
resourceSparseApplyAdadelta
public <T extends TType> ResourceSparseApplyAdadelta resourceSparseApplyAdadelta(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdadelta.Options... options) var: Should be from a Variable().- Type Parameters:
T- data type forResourceSparseApplyAdadeltaoutput and operands- Parameters:
var- The var valueaccum- Should be from a Variable().accumUpdate- : Should be from a Variable().lr- Learning rate. Must be a scalar.rho- Decay factor. Must be a scalar.epsilon- Constant factor. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyAdadelta
-
resourceSparseApplyAdagrad
public <T extends TType> ResourceSparseApplyAdagrad resourceSparseApplyAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdagrad.Options... options) Update relevant entries in '*var' and '*accum' according to the adagrad scheme. That is for rows we have grad for, we update var and accum as follows: accum += grad * grad var -= lr * grad * (1 / sqrt(accum))- Type Parameters:
T- data type forResourceSparseApplyAdagradoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Learning rate. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyAdagrad
-
resourceSparseApplyAdagradDa
public <T extends TType> ResourceSparseApplyAdagradDa resourceSparseApplyAdagradDa(Operand<? extends TType> var, Operand<? extends TType> gradientAccumulator, Operand<? extends TType> gradientSquaredAccumulator, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, ResourceSparseApplyAdagradDa.Options... options) Update entries in '*var' and '*accum' according to the proximal adagrad scheme.- Type Parameters:
T- data type forResourceSparseApplyAdagradDAoutput and operands- Parameters:
var- Should be from a Variable().gradientAccumulator- Should be from a Variable().gradientSquaredAccumulator- Should be from a Variable().grad- The gradient.indices- A vector of indices into the first dimension of var and accum.lr- Learning rate. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.globalStep- Training step number. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyAdagradDa
-
resourceSparseApplyAdagradV2
public <T extends TType> ResourceSparseApplyAdagradV2 resourceSparseApplyAdagradV2(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyAdagradV2.Options... options) Update relevant entries in '*var' and '*accum' according to the adagrad scheme. That is for rows we have grad for, we update var and accum as follows: accum += grad * grad var -= lr * grad * (1 / sqrt(accum))- Type Parameters:
T- data type forResourceSparseApplyAdagradV2output and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Learning rate. Must be a scalar.epsilon- Constant factor. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyAdagradV2
-
resourceSparseApplyCenteredRmsProp
public <T extends TType> ResourceSparseApplyCenteredRmsProp resourceSparseApplyCenteredRmsProp(Operand<? extends TType> var, Operand<? extends TType> mg, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyCenteredRmsProp.Options... options) Update '*var' according to the centered RMSProp algorithm. The centered RMSProp algorithm uses an estimate of the centered second moment (i.e., the variance) for normalization, as opposed to regular RMSProp, which uses the (uncentered) second moment. This often helps with training, but is slightly more expensive in terms of computation and memory.Note that in dense implementation of this algorithm, mg, ms, and mom will update even if the grad is zero, but in this sparse implementation, mg, ms, and mom will not update in iterations during which the grad is zero.
mean_square = decay * mean_square + (1-decay) * gradient ** 2 mean_grad = decay * mean_grad + (1-decay) * gradient Delta = learning_rate * gradient / sqrt(mean_square + epsilon - mean_grad ** 2)
ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon) var <- var - mom
- Type Parameters:
T- data type forResourceSparseApplyCenteredRMSPropoutput and operands- Parameters:
var- Should be from a Variable().mg- Should be from a Variable().ms- Should be from a Variable().mom- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay rate. Must be a scalar.momentum- The momentum valueepsilon- Ridge term. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var, ms and mom.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyCenteredRmsProp
-
resourceSparseApplyFtrl
public <T extends TType> ResourceSparseApplyFtrl resourceSparseApplyFtrl(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<? extends TType> linear, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, ResourceSparseApplyFtrl.Options... options) Update relevant entries in '*var' according to the Ftrl-proximal scheme. That is for rows we have grad for, we update var, accum and linear as follows: grad_with_shrinkage = grad + 2 * l2_shrinkage * var accum_new = accum + grad_with_shrinkage * grad_with_shrinkage linear += grad_with_shrinkage + (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0 accum = accum_new- Type Parameters:
T- data type forResourceSparseApplyFtrlV2output and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().linear- Should be from a Variable().grad- The gradient.indices- A vector of indices into the first dimension of var and accum.lr- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 shrinkage regularization. Must be a scalar.l2Shrinkage- The l2Shrinkage valuelrPower- Scaling factor. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyFtrl
-
resourceSparseApplyKerasMomentum
public <T extends TType> ResourceSparseApplyKerasMomentum resourceSparseApplyKerasMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, ResourceSparseApplyKerasMomentum.Options... options) Update relevant entries in '*var' and '*accum' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.That is for rows we have grad for, we update var and accum as follows:
accum = accum * momentum - lr * grad var += accum
- Type Parameters:
T- data type forResourceSparseApplyKerasMomentumoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Learning rate. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.momentum- Momentum. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyKerasMomentum
-
resourceSparseApplyMomentum
public <T extends TType> ResourceSparseApplyMomentum resourceSparseApplyMomentum(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, ResourceSparseApplyMomentum.Options... options) Update relevant entries in '*var' and '*accum' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.That is for rows we have grad for, we update var and accum as follows:
accum = accum * momentum + grad var -= lr * accum
- Type Parameters:
T- data type forResourceSparseApplyMomentumoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Learning rate. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.momentum- Momentum. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyMomentum
-
resourceSparseApplyProximalAdagrad
public <T extends TType> ResourceSparseApplyProximalAdagrad resourceSparseApplyProximalAdagrad(Operand<? extends TType> var, Operand<? extends TType> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyProximalAdagrad.Options... options) Sparse update entries in '*var' and '*accum' according to FOBOS algorithm. That is for rows we have grad for, we update var and accum as follows: accum += grad * grad prox_v = var prox_v -= lr * grad * (1 / sqrt(accum)) var = sign(prox_v)/(1+lrl2) * max{|prox_v|-lrl1,0}- Type Parameters:
T- data type forResourceSparseApplyProximalAdagradoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Learning rate. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyProximalAdagrad
-
resourceSparseApplyProximalGradientDescent
public <T extends TType> ResourceSparseApplyProximalGradientDescent resourceSparseApplyProximalGradientDescent(Operand<? extends TType> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyProximalGradientDescent.Options... options) Sparse update '*var' as FOBOS algorithm with fixed learning rate. That is for rows we have grad for, we update var as follows: prox_v = var - alpha * grad var = sign(prox_v)/(1+alphal2) * max{|prox_v|-alphal1,0}- Type Parameters:
T- data type forResourceSparseApplyProximalGradientDescentoutput and operands- Parameters:
var- Should be from a Variable().alpha- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyProximalGradientDescent
-
resourceSparseApplyRmsProp
public <T extends TType> ResourceSparseApplyRmsProp resourceSparseApplyRmsProp(Operand<? extends TType> var, Operand<? extends TType> ms, Operand<? extends TType> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, ResourceSparseApplyRmsProp.Options... options) Update '*var' according to the RMSProp algorithm. Note that in dense implementation of this algorithm, ms and mom will update even if the grad is zero, but in this sparse implementation, ms and mom will not update in iterations during which the grad is zero.mean_square = decay * mean_square + (1-decay) * gradient ** 2 Delta = learning_rate * gradient / sqrt(mean_square + epsilon)
ms <- rho * ms_{t-1} + (1-rho) * grad * grad mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon) var <- var - mom
- Type Parameters:
T- data type forResourceSparseApplyRMSPropoutput and operands- Parameters:
var- Should be from a Variable().ms- Should be from a Variable().mom- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay rate. Must be a scalar.momentum- The momentum valueepsilon- Ridge term. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var, ms and mom.options- carries optional attribute values- Returns:
- a new instance of ResourceSparseApplyRmsProp
-
restore
public Restore restore(Operand<TString> prefix, Operand<TString> tensorNames, Operand<TString> shapeAndSlices, List<Class<? extends TType>> dtypes) Restores tensors from a V2 checkpoint. For backward compatibility with the V1 format, this Op currently allows restoring from a V1 checkpoint as well:- This Op first attempts to find the V2 index file pointed to by "prefix", and if found proceed to read it as a V2 checkpoint;
- Otherwise the V1 read path is invoked. Relying on this behavior is not recommended, as the ability to fall back to read V1 might be deprecated and eventually removed.
By default, restores the named tensors in full. If the caller wishes to restore specific slices of stored tensors, "shape_and_slices" should be non-empty strings and correspondingly well-formed.
Callers must ensure all the named tensors are indeed stored in the checkpoint.
- Parameters:
prefix- Must have a single element. The prefix of a V2 checkpoint.tensorNames- shape {N}. The names of the tensors to be restored.shapeAndSlices- shape {N}. The slice specs of the tensors to be restored. Empty strings indicate that they are non-partitioned tensors.dtypes- shape {N}. The list of expected dtype for the tensors. Must match those stored in the checkpoint.- Returns:
- a new instance of Restore
-
restoreSlice
public <T extends TType> RestoreSlice<T> restoreSlice(Operand<TString> filePattern, Operand<TString> tensorName, Operand<TString> shapeAndSlice, Class<T> dt, RestoreSlice.Options... options) Restores a tensor from checkpoint files. This is likeRestoreexcept that restored tensor can be listed as filling only a slice of a larger tensor.shape_and_slicespecifies the shape of the larger tensor and the slice that the restored tensor covers.The
shape_and_sliceinput has the same format as the elements of theshapes_and_slicesinput of theSaveSlicesop.- Type Parameters:
T- data type forRestoreSliceoutput and operands- Parameters:
filePattern- Must have a single element. The pattern of the files from which we read the tensor.tensorName- Must have a single element. The name of the tensor to be restored.shapeAndSlice- Scalar. The shapes and slice specifications to use when restoring a tensors.dt- The type of the tensor to be restored.options- carries optional attribute values- Returns:
- a new instance of RestoreSlice
-
save
public Save save(Operand<TString> prefix, Operand<TString> tensorNames, Operand<TString> shapeAndSlices, Iterable<Operand<?>> tensors) Saves tensors in V2 checkpoint format. By default, saves the named tensors in full. If the caller wishes to save specific slices of full tensors, "shape_and_slices" should be non-empty strings and correspondingly well-formed.- Parameters:
prefix- Must have a single element. The prefix of the V2 checkpoint to which we write the tensors.tensorNames- shape {N}. The names of the tensors to be saved.shapeAndSlices- shape {N}. The slice specs of the tensors to be saved. Empty strings indicate that they are non-partitioned tensors.tensors-Ntensors to save.- Returns:
- a new instance of Save
-
saveSlices
public SaveSlices saveSlices(Operand<TString> filename, Operand<TString> tensorNames, Operand<TString> shapesAndSlices, Iterable<Operand<?>> data) Saves input tensors slices to disk. This is likeSaveexcept that tensors can be listed in the saved file as being a slice of a larger tensor.shapes_and_slicesspecifies the shape of the larger tensor and the slice that this tensor covers.shapes_and_slicesmust have as many elements astensor_names.Elements of the
shapes_and_slicesinput must either be:- The empty string, in which case the corresponding tensor is saved normally.
- A string of the form
dim0 dim1 ... dimN-1 slice-specwhere thedimIare the dimensions of the larger tensor andslice-specspecifies what part is covered by the tensor to save.
slice-specitself is a:-separated list:slice0:slice1:...:sliceN-1where eachsliceIis either:- The string
-meaning that the slice covers all indices of this dimension start,lengthwherestartandlengthare integers. In that case the slice coverslengthindices starting atstart.
See also
Save.- Parameters:
filename- Must have a single element. The name of the file to which we write the tensor.tensorNames- Shape[N]. The names of the tensors to be saved.shapesAndSlices- Shape[N]. The shapes and slice specifications to use when saving the tensors.data-Ntensors to save.- Returns:
- a new instance of SaveSlices
-
sdcaFprint
Computes fingerprints of the input strings.- Parameters:
input- vector of strings to compute fingerprints on.- Returns:
- a new instance of SdcaFprint
-
sdcaOptimizer
public SdcaOptimizer sdcaOptimizer(Iterable<Operand<TInt64>> sparseExampleIndices, Iterable<Operand<TInt64>> sparseFeatureIndices, Iterable<Operand<TFloat32>> sparseFeatureValues, Iterable<Operand<TFloat32>> denseFeatures, Operand<TFloat32> exampleWeights, Operand<TFloat32> exampleLabels, Iterable<Operand<TInt64>> sparseIndices, Iterable<Operand<TFloat32>> sparseWeights, Iterable<Operand<TFloat32>> denseWeights, Operand<TFloat32> exampleStateData, String lossType, Float l1, Float l2, Long numLossPartitions, Long numInnerIterations, SdcaOptimizer.Options... options) Distributed version of Stochastic Dual Coordinate Ascent (SDCA) optimizer for linear models with L1 + L2 regularization. As global optimization objective is strongly-convex, the optimizer optimizes the dual objective at each step. The optimizer applies each update one example at a time. Examples are sampled uniformly, and the optimizer is learning rate free and enjoys linear convergence rate.Proximal Stochastic Dual Coordinate Ascent .
Shai Shalev-Shwartz, Tong Zhang. 2012$$Loss Objective = \sum f_{i} (wx_{i}) + (l2 / 2) * |w|^2 + l1 * |w|$$
Adding vs. Averaging in Distributed Primal-Dual Optimization .
Chenxin Ma, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtarik, Martin Takac. 2015Stochastic Dual Coordinate Ascent with Adaptive Probabilities .
Dominik Csiba, Zheng Qu, Peter Richtarik. 2015- Parameters:
sparseExampleIndices- a list of vectors which contain example indices.sparseFeatureIndices- a list of vectors which contain feature indices.sparseFeatureValues- a list of vectors which contains feature value associated with each feature group.denseFeatures- a list of matrices which contains the dense feature values.exampleWeights- a vector which contains the weight associated with each example.exampleLabels- a vector which contains the label/target associated with each example.sparseIndices- a list of vectors where each value is the indices which has corresponding weights in sparse_weights. This field maybe omitted for the dense approach.sparseWeights- a list of vectors where each value is the weight associated with a sparse feature group.denseWeights- a list of vectors where the values are the weights associated with a dense feature group.exampleStateData- a list of vectors containing the example state data.lossType- Type of the primal loss. Currently SdcaSolver supports logistic, squared and hinge losses.l1- Symmetric l1 regularization strength.l2- Symmetric l2 regularization strength.numLossPartitions- Number of partitions of the global loss function.numInnerIterations- Number of iterations per mini-batch.options- carries optional attribute values- Returns:
- a new instance of SdcaOptimizer
-
sdcaShrinkL1
Applies L1 regularization shrink step on the parameters.- Parameters:
weights- a list of vectors where each value is the weight associated with a feature group.l1- Symmetric l1 regularization strength.l2- Symmetric l2 regularization strength. Should be a positive float.- Returns:
- a new instance of SdcaShrinkL1
-
sparseApplyAdadelta
public <T extends TType> SparseApplyAdadelta<T> sparseApplyAdadelta(Operand<T> var, Operand<T> accum, Operand<T> accumUpdate, Operand<T> lr, Operand<T> rho, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyAdadelta.Options... options) var: Should be from a Variable().- Type Parameters:
T- data type forSparseApplyAdadeltaoutput and operands- Parameters:
var- The var valueaccum- Should be from a Variable().accumUpdate- : Should be from a Variable().lr- Learning rate. Must be a scalar.rho- Decay factor. Must be a scalar.epsilon- Constant factor. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of SparseApplyAdadelta
-
sparseApplyAdagrad
public <T extends TType> SparseApplyAdagrad<T> sparseApplyAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyAdagrad.Options... options) Update relevant entries in '*var' and '*accum' according to the adagrad scheme. That is for rows we have grad for, we update var and accum as follows: $$accum += grad * grad$$ $$var -= lr * grad * (1 / sqrt(accum))$$- Type Parameters:
T- data type forSparseApplyAdagradV2output and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Learning rate. Must be a scalar.epsilon- Constant factor. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of SparseApplyAdagrad
-
sparseApplyAdagradDa
public <T extends TType> SparseApplyAdagradDa<T> sparseApplyAdagradDa(Operand<T> var, Operand<T> gradientAccumulator, Operand<T> gradientSquaredAccumulator, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<TInt64> globalStep, SparseApplyAdagradDa.Options... options) Update entries in '*var' and '*accum' according to the proximal adagrad scheme.- Type Parameters:
T- data type forSparseApplyAdagradDAoutput and operands- Parameters:
var- Should be from a Variable().gradientAccumulator- Should be from a Variable().gradientSquaredAccumulator- Should be from a Variable().grad- The gradient.indices- A vector of indices into the first dimension of var and accum.lr- Learning rate. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.globalStep- Training step number. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of SparseApplyAdagradDa
-
sparseApplyCenteredRmsProp
public <T extends TType> SparseApplyCenteredRmsProp<T> sparseApplyCenteredRmsProp(Operand<T> var, Operand<T> mg, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyCenteredRmsProp.Options... options) Update '*var' according to the centered RMSProp algorithm. The centered RMSProp algorithm uses an estimate of the centered second moment (i.e., the variance) for normalization, as opposed to regular RMSProp, which uses the (uncentered) second moment. This often helps with training, but is slightly more expensive in terms of computation and memory.Note that in dense implementation of this algorithm, mg, ms, and mom will update even if the grad is zero, but in this sparse implementation, mg, ms, and mom will not update in iterations during which the grad is zero.
mean_square = decay * mean_square + (1-decay) * gradient ** 2 mean_grad = decay * mean_grad + (1-decay) * gradient Delta = learning_rate * gradient / sqrt(mean_square + epsilon - mean_grad ** 2)
$$ms <- rho * ms_{t-1} + (1-rho) * grad * grad$$ $$mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon)$$ $$var <- var - mom$$
- Type Parameters:
T- data type forSparseApplyCenteredRMSPropoutput and operands- Parameters:
var- Should be from a Variable().mg- Should be from a Variable().ms- Should be from a Variable().mom- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay rate. Must be a scalar.momentum- The momentum valueepsilon- Ridge term. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var, ms and mom.options- carries optional attribute values- Returns:
- a new instance of SparseApplyCenteredRmsProp
-
sparseApplyFtrl
public <T extends TType> SparseApplyFtrl<T> sparseApplyFtrl(Operand<T> var, Operand<T> accum, Operand<T> linear, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> l2Shrinkage, Operand<T> lrPower, SparseApplyFtrl.Options... options) Update relevant entries in '*var' according to the Ftrl-proximal scheme. That is for rows we have grad for, we update var, accum and linear as follows: grad_with_shrinkage = grad + 2 * l2_shrinkage * var accum_new = accum + grad * grad linear += grad_with_shrinkage - (accum_new^(-lr_power) - accum^(-lr_power)) / lr * var quadratic = 1.0 / (accum_new^(lr_power) * lr) + 2 * l2 var = (sign(linear) * l1 - linear) / quadratic if |linear| > l1 else 0.0 accum = accum_new- Type Parameters:
T- data type forSparseApplyFtrlV2output and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().linear- Should be from a Variable().grad- The gradient.indices- A vector of indices into the first dimension of var and accum.lr- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 shrinkage regularization. Must be a scalar.l2Shrinkage- The l2Shrinkage valuelrPower- Scaling factor. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of SparseApplyFtrl
-
sparseApplyMomentum
public <T extends TType> SparseApplyMomentum<T> sparseApplyMomentum(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> grad, Operand<? extends TNumber> indices, Operand<T> momentum, SparseApplyMomentum.Options... options) Update relevant entries in '*var' and '*accum' according to the momentum scheme. Set use_nesterov = True if you want to use Nesterov momentum.That is for rows we have grad for, we update var and accum as follows:
$$accum = accum * momentum + grad$$ $$var -= lr * accum$$
- Type Parameters:
T- data type forSparseApplyMomentumoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Learning rate. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.momentum- Momentum. Must be a scalar.options- carries optional attribute values- Returns:
- a new instance of SparseApplyMomentum
-
sparseApplyProximalAdagrad
public <T extends TType> SparseApplyProximalAdagrad<T> sparseApplyProximalAdagrad(Operand<T> var, Operand<T> accum, Operand<T> lr, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyProximalAdagrad.Options... options) Sparse update entries in '*var' and '*accum' according to FOBOS algorithm. That is for rows we have grad for, we update var and accum as follows: $$accum += grad * grad$$ $$prox_v = var$$ $$prox_v -= lr * grad * (1 / sqrt(accum))$$ $$var = sign(prox_v)/(1+lrl2) * max{|prox_v|-lrl1,0}$$- Type Parameters:
T- data type forSparseApplyProximalAdagradoutput and operands- Parameters:
var- Should be from a Variable().accum- Should be from a Variable().lr- Learning rate. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of SparseApplyProximalAdagrad
-
sparseApplyProximalGradientDescent
public <T extends TType> SparseApplyProximalGradientDescent<T> sparseApplyProximalGradientDescent(Operand<T> var, Operand<T> alpha, Operand<T> l1, Operand<T> l2, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyProximalGradientDescent.Options... options) Sparse update '*var' as FOBOS algorithm with fixed learning rate. That is for rows we have grad for, we update var as follows: $$prox_v = var - alpha * grad$$ $$var = sign(prox_v)/(1+alphal2) * max{|prox_v|-alphal1,0}$$- Type Parameters:
T- data type forSparseApplyProximalGradientDescentoutput and operands- Parameters:
var- Should be from a Variable().alpha- Scaling factor. Must be a scalar.l1- L1 regularization. Must be a scalar.l2- L2 regularization. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var and accum.options- carries optional attribute values- Returns:
- a new instance of SparseApplyProximalGradientDescent
-
sparseApplyRmsProp
public <T extends TType> SparseApplyRmsProp<T> sparseApplyRmsProp(Operand<T> var, Operand<T> ms, Operand<T> mom, Operand<T> lr, Operand<T> rho, Operand<T> momentum, Operand<T> epsilon, Operand<T> grad, Operand<? extends TNumber> indices, SparseApplyRmsProp.Options... options) Update '*var' according to the RMSProp algorithm. Note that in dense implementation of this algorithm, ms and mom will update even if the grad is zero, but in this sparse implementation, ms and mom will not update in iterations during which the grad is zero.mean_square = decay * mean_square + (1-decay) * gradient ** 2 Delta = learning_rate * gradient / sqrt(mean_square + epsilon)
$$ms <- rho * ms_{t-1} + (1-rho) * grad * grad$$ $$mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon)$$ $$var <- var - mom$$
- Type Parameters:
T- data type forSparseApplyRMSPropoutput and operands- Parameters:
var- Should be from a Variable().ms- Should be from a Variable().mom- Should be from a Variable().lr- Scaling factor. Must be a scalar.rho- Decay rate. Must be a scalar.momentum- The momentum valueepsilon- Ridge term. Must be a scalar.grad- The gradient.indices- A vector of indices into the first dimension of var, ms and mom.options- carries optional attribute values- Returns:
- a new instance of SparseApplyRmsProp
-
symbolicGradient
public SymbolicGradient symbolicGradient(Iterable<Operand<?>> input, List<Class<? extends TType>> Tout, ConcreteFunction f) Computes the gradient function for function f via backpropagation.- Parameters:
input- a list of input tensors of size N + M;Tout- the type list for the input list.f- The function we want to compute the gradient for.The function 'f' must be a numerical function which takes N inputs and produces M outputs. Its gradient function 'g', which is computed by this SymbolicGradient op is a function taking N + M inputs and produces N outputs.
I.e. if we have (y1, y2, ..., y_M) = f(x1, x2, ..., x_N), then, g is (dL/dx1, dL/dx2, ..., dL/dx_N) = g(x1, x2, ..., x_N, dL/dy1, dL/dy2, ..., dL/dy_M),
where L is a scalar-value function of (x1, x2, ..., xN) (e.g., the loss function). dL/dx_i is the partial derivative of L with respect to x_i.
(Needs some math expert to say the comment above better.)
- Returns:
- a new instance of SymbolicGradient
-
tileGrad
Returns the gradient ofTile. SinceTiletakes an input and repeats the inputmultiplestimes along each dimension,train.TileGradtakes inmultiplesand aggregates each repeated tile ofinputintooutput.- Type Parameters:
T- data type forTileGradoutput and operands- Parameters:
input- The input valuemultiples- The multiples value- Returns:
- a new instance of TileGrad
-
ops
-