lingvo.tasks.car.base_extractor module

Base extractor interface.

lingvo.tasks.car.base_extractor._ParseSequenceExample(record, feature_map, context_map)[source]

Parse a SequenceExample, adding the context features to the features.

lingvo.tasks.car.base_extractor._TextInput(record, feature_map)[source]
class lingvo.tasks.car.base_extractor._BaseExtractor(*args, **kwargs)[source]

Bases: lingvo.core.base_input_generator.BaseInputGeneratorFromFiles

The base extractor for all lingvo car task datasets.

Subclasses should define and pass in a custom dictionary of extractors to select which fields from car datasets to output from an input generator.

Preprocessors are applied to all the extracted outputs jointly, in the specified sequence.

classmethod Params(extractors)[source]

Defaults params.

Parameters

extractors – An hyperparams.Params of extractor names to Extractors. A few extractor types are required: ‘labels’: A LabelExtractor.Params().

Returns

A base_layer Params object.

FeatureMap()[source]

Get a mapping from feature names to feature tensors.

ContextMap()[source]

Get a mapping from context names to context tensors.

ContextMap() is used for tf.SequenceExample datasets to extract context_features. In that scenario, FeatureMap() is used to extract the sequence_features.

Returns

A map from context keys to context features.

Shape()[source]
DType()[source]
property class_names
_DataSourceFromFilePattern(file_pattern, input_source_weights=None)[source]

Return a NestedMap containing an input batch from a string file_pattern.

Subclasses should implement this function.

Parameters
  • file_pattern – A string file pattern.

  • input_source_weights – A list of float input source weights to control input example mix in the batch. The records will be sampled from inputs proportionally to these weights. Defaults to None which should be treated as an empty list.

Returns

A NestedMap of tf.Tensors containing a batch of input data with shapes [batch, …].

ProcessFeatures(features)[source]

Process extracted features.

Parameters

features – A dict of extracted Tensors from the records.

Returns

  • bucket_id: A scalar int Tensor.

  • extracted: a NestedMap of Tensors extracted.

Return type

A tuple of tensors

ExtractUsingExtractors(record)[source]

Extracts Tensors from a tf.Example record using self.extractors.

Parameters

record – A tf.Example input to pass to tf.io.parse_single_example.

Returns

  • bucket_id: A scalar int Tensor.

  • extracted: a NestedMap of Tensors extracted.

Return type

A tuple of tensors

GetCpuPassthroughKeys()[source]

Return a list of keys from the input to skip sending to the device.

When running on TPU, a user may want to avoid sending some inputs to the device; either the type is not supported (e.g., string), or the input will not be processed on the device at all. However, these items may be still useful to passthrough to the “output”, e.g., for decoding purposes.

This function should return a list of keys from InputBatch() that should not be sent to the TPU, but can be combined with the outputs of Decode() before passing to PostProcessDecodeOut().

Returns

A list of keys from the input to filter from being sent to the device,

which may be combined with the output of Decode() prior to PostProcessDecodeOut().

_NestedMapFromBatchedOutputs(outputs)[source]
NestedMapFromBatchedOutputs(outputs)[source]

Create a NestedMap from a list/tuple of batched outputs.

Parameters

outputs – A tuple or list of Tensors whose order matches the flattened structure of Shape() and DType().

Returns

A NestedMap reconstructing the structure of the output of extractors

and preprocessors, where each Tensor’s shape is statically padded/trimmed to match the Shape() specification.

Raises
  • ValueError – If outputs contains a shape that is not fully defined.

  • AssertionError – If any shape of a Tensor in outputs cannot be PadOrTrimTo’d by the corresponding Shape() specification.