lingvo.tasks.car.input_extractor module

Input extractors.

Input extractors are an API for parsing and processing a set of fields from serialized records.

class lingvo.tasks.car.input_extractor.FieldsExtractor(*args, **kwargs)[source]

Bases: lingvo.core.base_layer.BaseLayer

An API for parsing and processing a set of fields from serialized records.

Input generators often need to parse several fields from a serialized record. This involves two stages: specifying the name and type of the fields to extract from serialized records (tf.Example or tf.SequenceExample), and then processing the raw output into a form to be consumed by higher-level callers.

This class attempts to modularize this processing within the Minecraft input generators, so that users can easily create input generator pipelines that mix and match the composition of different fields from the same dataset.

A descendant of this class will implement three functions:

  1. FeatureMap(): returning a dictionary of field names to field types, e.g., ‘images’ to tf.io.VarLenFeature(tf.string). For PlainTextIterator datasets, FeatureMap() should be empty.

  2. _Extract(features): Given a ‘features’ dictionary containing the result from calling tf.io.parse_example or tf.parse_sequence_example on all extractors’ features, produce a NestedMap of Tensors.

    NOTE: The return of the overall pipeline is a NestedMap of batched Tensors. However, the names and associations of the fields of each extractor are lost on the boundary of the map fn. At the moment, one must implement _Extract() such that the names of the fields returned in the NestedMap matches self.Shape()’s keys; this is checked during the parent’s Extract() call.

  3. Shape(): A NestedMap mapping names of outputs to their static shape, without the batch dimension. In _InputBatch, this shape will be used to ensure that every output has a statically known shape.

The caller of Extractors calls each extractor’s FeatureMap() to populate the schema passed to tf.io.parse_example() or tf.parse_sequence_example(). The resulting dicationary of Tensors is then passed to each extractor’s _Extract() function (via FieldsExtractor.Extract()) to return each extractor’s output.

It is the responsibility of the caller to maintain orders of outputs, since NestedMaps do not have any inherent ordering during iteration.

classmethod Params()[source]

Defaults params.

FeatureMap()[source]

Return a dictionary from tf.Example feature names to Features.

ContextMap()[source]

Return a dict mapping tf.SequenceExample context names to Features.

Extract(features)[source]

Given ‘feature’ (Sparse)Tensors, output Tensors for consumption.

NOTE: Implementation provided by subclasses’s _Extract() method.

Parameters

features – A dictionary of (Sparse)Tensors which includes tensors from all extractors.

Returns

A NestedMap of output Tensors.

ExtractBatch(features)[source]

Given ‘features’ batched Tensors, output Tensors for consumption.

NOTE: Implementation provided by subclasses’s _ExtractBatch() method.

Parameters

features – A dictionary of Tensors which includes tensors from this extractor.

Returns

A NestedMap of batched output Tensors.

Filter(outputs)[source]

Return the bucket based on the result of Extract().

This function should return 1 if the example should pass through without being dropped, and a value in [BUCKET_UPPER_BOUND, inf) if the example should be dropped. Currently no other bucketing strategies are supported.

Parameters

outputs – The NestedMap returned by this extractor’s _Extract() function. This is useful to implement filtering based on the values of the extracted example.

Returns

A scalar bucket id.

FilterBatch(outputs)[source]

Like Filter but runs over batches of outputs.

This function should be called to decide whether the entire batch should be dropped. Downstream implementations that do not run within an input pipeline must figure out how to handle these outputs, if filtering at the batch level is desired.

Parameters

outputs – A NestedMap of preprocessed Tensors.

Returns

A scalar bucket id.

Shape()[source]

Return a NestedMap of un-batched fully-specified tf.TensorShapes.

DType()[source]

Return a NestedMap mapping names to tf.DType.

_Extract(features)[source]

The subclass-defined implementation of Extract().

Parameters

features – A dictionary of (Sparse)Tensors which includes tensors from this extractor.

Returns

A NestedMap of output Tensors whose key names match self.Shape()’s keys.

_ExtractBatch(features)[source]

The subclass-defined implementation of ExtractBatch().

Parameters

features – A dictionary of batched Tensors including tensors from this extractor.

Returns

A NestedMap of output Tensors whose key names match self.Shape()’s keys.

class lingvo.tasks.car.input_extractor.LaserExtractor(*args, **kwargs)[source]

Bases: lingvo.tasks.car.input_extractor.FieldsExtractor

Interface for extracting laser data.

Must produce:

points_xyz: [max_num_points, 3] - XYZ coordinates of laser points.

points_feature: [max_num_points, num_features] - Features for each point in

points_xyz.

points_padding: [max_num_points]: Padding for points. 0 means the

corresponding point is the original, and 1 means there is no point (xyz or feature) present. Only present if max_num_points is not None.

classmethod Params()[source]

Defaults params.

Shape()[source]

Return a NestedMap of un-batched fully-specified tf.TensorShapes.

DType()[source]

Return a NestedMap mapping names to tf.DType.