org.tensorflow.op.strings.UnicodeTranscode

All Implemented Interfaces:: Shaped, Op, Operand<TString>

@Operator(group="strings") public final class UnicodeTranscode extends RawOp implements Operand<TString>

Transcode the input text from a source encoding to a destination encoding. The input is a string tensor of any shape. The output is a string tensor of the same shape containing the transcoded strings. Output strings are always valid unicode. If the input contains invalid encoding positions, the errors attribute sets the policy for how to deal with them. If the default error-handling policy is used, invalid formatting will be substituted in the output by the replacement_char. If the errors policy is to ignore, any invalid encoding positions in the input are skipped and not included in the output. If it set to strict then any invalid formatting will result in an InvalidArgument error.

This operation can be used with output_encoding = input_encoding to enforce correct formatting for inputs even if they are already in the desired encoding.

If the input is prefixed by a Byte Order Mark needed to determine encoding (e.g. if the encoding is UTF-16 and the BOM indicates big-endian), then that BOM will be consumed and not emitted into the output. If the input encoding is marked with an explicit endianness (e.g. UTF-16-BE), then the BOM is interpreted as a non-breaking-space and is preserved in the output (including always for UTF-8).

The end result is that if the input is marked as an explicit endianness the transcoding is faithful to all codepoints in the source. If it is not marked with an explicit endianness, the BOM is not considered part of the string itself but as metadata, and so is not preserved in the output.

Examples:

tf.strings.unicode_transcode(["Hello", "TensorFlow", "2.x"], "UTF-8", "UTF-16-BE") <tf.Tensor: shape=(3,), dtype=string, numpy= array([b'\x00H\x00e\x00l\x00l\x00o', b'\x00T\x00e\x00n\x00s\x00o\x00r\x00F\x00l\x00o\x00w', b'\x002\x00.\x00x'], dtype=object)> tf.strings.unicode_transcode(["A", "B", "C"], "US ASCII", "UTF-8").numpy() array([b'A', b'B', b'C'], dtype=object)

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

UnicodeTranscode.Inputs

static class

UnicodeTranscode.Options

Optional attributes for UnicodeTranscode
Field Summary

Fields

Modifier and Type

Field

Description

static final String

OP_NAME

The name of this op, as known by TensorFlow core engine

Fields inherited from class RawOp
operation

Modifier and Type

Field

Description

protected final Operation

operation
Constructor Summary

Constructors

Constructor

Description

UnicodeTranscode(Operation operation)
Method Summary

Modifier and Type

Method

Description

Output<TString>

asOutput()

Returns the symbolic handle of the tensor.

static UnicodeTranscode

create(Scope scope, Operand<TString> input, String inputEncoding, String outputEncoding, UnicodeTranscode.Options... options)

Factory method to create a class wrapping a new UnicodeTranscode operation.

static UnicodeTranscode.Options

errors(String errors)

Sets the errors option.

Output<TString>

output()

Gets output.

static UnicodeTranscode.Options

replaceControlCharacters(Boolean replaceControlCharacters)

Sets the replaceControlCharacters option.

static UnicodeTranscode.Options

replacementChar(Long replacementChar)

Sets the replacementChar option.

Methods inherited from class RawOp
equals, hashCode, op, toString

Modifier and Type

Method

Description

final boolean

equals(Object obj)

final int

hashCode()

Operation

op()

Return this unit of computation as a single Operation.

final String

toString()

Methods inherited from class Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface Op
env

Modifier and Type

Method

Description

default ExecutionEnvironment

env()

Return the execution environment this op was created in.

Methods inherited from interface Operand
asTensor, shape, type

Modifier and Type

Method

Description

default TString

asTensor()

Returns the tensor at this operand.

default Shape

shape()

Returns the (possibly partially known) shape of the tensor referred to by the Output of this operand.

default Class<TString>

type()

Returns the tensor type of this operand

Methods inherited from interface Shaped
rank, size

Modifier and Type

Method

Description

default int

rank()

default long

size()

Computes and returns the total size of this container, in number of values.

Field Details
- OP_NAME
  public static final String OP_NAME
  
  The name of this op, as known by TensorFlow core engine
  
  See Also:
  
  Constant Field Values
Constructor Details
- UnicodeTranscode
  
  public UnicodeTranscode(Operation operation)
Method Details
- create
  
  @Endpoint(describeByClass=true) public static UnicodeTranscode create(Scope scope, Operand<TString> input, String inputEncoding, String outputEncoding, UnicodeTranscode.Options... options)
  
  Factory method to create a class wrapping a new UnicodeTranscode operation.
  
  Parameters:
  
  scope - current scope
  
  input - The text to be processed. Can have any shape.
  
  inputEncoding - Text encoding of the input strings. This is any of the encodings supported by ICU ucnv algorithmic converters. Examples: "UTF-16", "US ASCII", "UTF-8".
  
  outputEncoding - The unicode encoding to use in the output. Must be one of "UTF-8", "UTF-16-BE", "UTF-32-BE". Multi-byte encodings will be big-endian.
  
  options - carries optional attribute values
  
  Returns:
  
  a new instance of UnicodeTranscode
- errors
  
  public static UnicodeTranscode.Options errors(String errors)
  
  Sets the errors option.
  
  Parameters:
  
  errors - Error handling policy when there is invalid formatting found in the input. The value of 'strict' will cause the operation to produce a InvalidArgument error on any invalid input formatting. A value of 'replace' (the default) will cause the operation to replace any invalid formatting in the input with the replacement_char codepoint. A value of 'ignore' will cause the operation to skip any invalid formatting in the input and produce no corresponding output character.
  
  Returns:
  
  this Options instance.
- replacementChar
  
  public static UnicodeTranscode.Options replacementChar(Long replacementChar)
  
  Sets the replacementChar option.
  
  Parameters:
  
  replacementChar - The replacement character codepoint to be used in place of any invalid formatting in the input when errors='replace'. Any valid unicode codepoint may be used. The default value is the default unicode replacement character is 0xFFFD or U+65533.)
  Note that for UTF-8, passing a replacement character expressible in 1 byte, such as ' ', will preserve string alignment to the source since invalid bytes will be replaced with a 1-byte replacement. For UTF-16-BE and UTF-16-LE, any 1 or 2 byte replacement character will preserve byte alignment to the source.
  
  Returns:
  
  this Options instance.
- replaceControlCharacters
  
  public static UnicodeTranscode.Options replaceControlCharacters(Boolean replaceControlCharacters)
  
  Sets the replaceControlCharacters option.
  
  Parameters:
  
  replaceControlCharacters - Whether to replace the C0 control characters (00-1F) with the replacement_char. Default is false.
  
  Returns:
  
  this Options instance.
- output
  
  public Output<TString> output()
  
  Gets output. A string tensor containing unicode text encoded using output_encoding.
  
  Returns:
  
  output.
- asOutput
  public Output<TString> asOutput()
  
  Description copied from interface: Operand
  
  Returns the symbolic handle of the tensor.
  Inputs to TensorFlow operations are outputs of another TensorFlow operation. This method is used to obtain a symbolic handle that represents the computation of the input.
  
  Specified by:
  
  asOutput in interface Operand<TString>
  
  See Also:
  
  OperationBuilder.addInput(Output)

Class UnicodeTranscode

Nested Class Summary

Field Summary

Fields inherited from class RawOp

Constructor Summary

Method Summary

Methods inherited from class RawOp

Methods inherited from class Object

Methods inherited from interface Op

Methods inherited from interface Operand

Methods inherited from interface Shaped

Field Details

OP_NAME

Constructor Details

UnicodeTranscode

Method Details

create

errors

replacementChar

replaceControlCharacters

output

asOutput