Class NnOps
These are higher level ops that may invoke core ops. Higher level Ops may perform the operation solely in the TensorFlow framework or do preprocessing of the Operands before invoking a core level Op.
-
Method Summary
Modifier and TypeMethodDescriptionCompute the Gaussian Error Linear Unit (GELU) activation function without approximation.Compute the Gaussian Error Linear Unit (GELU) activation function.sigmoidCrossEntropyWithLogits(Operand<T> labels, Operand<T> logits) Computes sigmoid cross entropy givenlogits.softmaxCrossEntropyWithLogits(Operand<U> labels, Operand<T> logits, int axis) Computes softmax cross entropy betweenlogitsandlabels.sparseSoftmaxCrossEntropyWithLogits(Operand<U> labels, Operand<T> logits) Computes sparse softmax cross entropy betweenlogitsandlabels.
-
Method Details
-
sigmoidCrossEntropyWithLogits
public <T extends TNumber> Operand<T> sigmoidCrossEntropyWithLogits(Operand<T> labels, Operand<T> logits) Computes sigmoid cross entropy givenlogits.Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.
For brevity, let
x = logits,z = labels. The logistic loss in pseudo-code isz * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x)) = z * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x))) = z * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x))) = z * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x)) = (1 - z) * x + log(1 + exp(-x)) = x - x * z + log(1 + exp(-x))
For
x < 0, to avoid overflow inexp(-x), we reformulate the abovex - x * z + log(1 + exp(-x)) = log(exp(x)) - x * z + log(1 + exp(-x)) = - x * z + log(1 + exp(x))
Hence, to ensure stability and avoid overflow, the implementation uses this equivalent formulation
max(x, 0) - x * z + log(1 + exp(-abs(x)))
logitsandlabelsmust have the same type and shape.- Type Parameters:
T- the type of labels and logits- Parameters:
labels- the labelslogits- the logits of type float32 or float64- Returns:
- the component-wise logistic losses.
- Throws:
IllegalArgumentException- if logits and labels do not have the same shape
-
softmaxCrossEntropyWithLogits
public <T extends TNumber, U extends TNumber> Operand<T> softmaxCrossEntropyWithLogits(Operand<U> labels, Operand<T> logits, int axis) Computes softmax cross entropy betweenlogitsandlabels.Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.
NOTE:
While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of
labelsis a valid probability distribution. If they are not, the computation of the gradient will be incorrect.If using exclusive
labels(wherein one and only one class is true at a time), seeNnOps.sparseSoftmaxCrossEntropyWithLogits(Operand, Operand)Usage:
Operand<TFloat32> logits = tf.constant(new float[][] {{4.0F, 2.0F, 1.0F}, {0.0F, 5.0F, 1.0F}} ); Operand<TFloat32> labels = tf.constant(new float[][] {{1.0F, 0.0F, 0.0F}, {0.0F, 0.8F, 0.2F}} ); Operand<TFloat32> output = tf.nn.softmaxCrossEntropyWithLogits(labels, logits, -1); // output Shape = [2] // dataType = FLOAT (1) // values { 0.169846, 0.824745 }Backpropagation will happen into both
logitsandlabels. To disallow backpropagation intolabels, pass label tensors throughtf.stopGradientbefore feeding it to this function.- Type Parameters:
T- the number type of the operandsU- the data type for the labels.- Parameters:
labels- Each vector along the class dimension should hold a valid probability distribution e.g. for the case in which labels are of shape[batch_size, num_classes], each row oflabels[i]must be a valid probability distribution.logits- Per-label activations, typically a linear output. These activation energies are interpreted as unnormalized log probabilities.axis- The class dimension. -1 is the last dimension.- Returns:
- the softmax cross entropy loss. Its type is the same as
logitsand its shape is the same aslabelsexcept that it does not have the last dimension oflabels.
-
sparseSoftmaxCrossEntropyWithLogits
public <T extends TNumber, U extends TNumber> Operand<T> sparseSoftmaxCrossEntropyWithLogits(Operand<U> labels, Operand<T> logits) Computes sparse softmax cross entropy betweenlogitsandlabels.Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.
NOTE:
For this operation, the probability of a given label is considered exclusive. That is, soft classes are not allowed, and the
labelsvector must provide a single specific index for the true class for each row oflogits(each minibatch entry). For soft softmax classification with a probability distribution for each entry,NnOps.softmaxCrossEntropyWithLogits(Operand, Operand).WARNING:
This op expects unscaled logits, since it performs a
softmaxonlogitsinternally for efficiency. Do not call this op with the output ofsoftmax, as it will produce incorrect results.A common use case is to have logits of shape
[batchSize, numClasses]and have labels of shape[batchSize], but higher dimensions are supported, in which case thedim-th dimension is assumed to be of sizenumClasses.logitsmust have thedataTypeofTFloat16,TFloat32, orTFloat64, andlabelsmust have the dtype ofTInt32orTInt64.- Type Parameters:
T- the data tyoe for the loss and logits.U- the data type for the labels- Parameters:
labels-Tensorof shape[d_0, d_1, ..., d_{r-1}](whereris rank oflabelsand result) and the dataType isTInt32orTInt64. Each entry inlabelsmust be an index in[0, numClasses). Other values will raise an exception when this op is run on CPU, and returnNaNfor corresponding loss and gradient rows on GPU.logits- Per-label activations (typically a linear output) of shape[d_0, d_1, ..., d_{r-1}, numClasses]and dataType ofTFloat16,TFloat32, orTFloat64. These activation energies are interpreted as unnormalized log probabilities.- Returns:
- the loss
- Throws:
IllegalArgumentException- If logits are scalars (need to haverank >= 1) or if the rank of the labels is not equal to the rank of the logits minus one.
-
gelu
Compute the Gaussian Error Linear Unit (GELU) activation function without approximation.Gaussian error linear unit (GELU) computes
x * P(X <= x), whereP(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLU.- Type Parameters:
T- the data type for the input and result- Parameters:
input- the input- Returns:
- The Gaussian Error Linear Unit computation
-
gelu
Compute the Gaussian Error Linear Unit (GELU) activation function.Gaussian error linear unit (GELU) computes
x * P(X <= x), whereP(X) ~ N(0, 1). The (GELU) nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLU.- Type Parameters:
T- the data type for the input and result- Parameters:
input- the inputapproximate- Whether to enable approximation.- Returns:
- The Gaussian Error Linear Unit computation
-