lingvo.tasks.car.input_extractor module¶
Input extractors.
Input extractors are an API for parsing and processing a set of fields from serialized records.
-
class
lingvo.tasks.car.input_extractor.FieldsExtractor(*args, **kwargs)[source]¶ Bases:
lingvo.core.base_layer.BaseLayerAn API for parsing and processing a set of fields from serialized records.
Input generators often need to parse several fields from a serialized record. This involves two stages: specifying the name and type of the fields to extract from serialized records (tf.Example or tf.SequenceExample), and then processing the raw output into a form to be consumed by higher-level callers.
This class attempts to modularize this processing within the Minecraft input generators, so that users can easily create input generator pipelines that mix and match the composition of different fields from the same dataset.
A descendant of this class will implement three functions:
FeatureMap(): returning a dictionary of field names to field types, e.g., ‘images’ to tf.io.VarLenFeature(tf.string). For PlainTextIterator datasets, FeatureMap() should be empty.
_Extract(features): Given a ‘features’ dictionary containing the result from calling tf.io.parse_example or tf.parse_sequence_example on all extractors’ features, produce a NestedMap of Tensors.
NOTE: The return of the overall pipeline is a NestedMap of batched Tensors. However, the names and associations of the fields of each extractor are lost on the boundary of the map fn. At the moment, one must implement _Extract() such that the names of the fields returned in the NestedMap matches self.Shape()’s keys; this is checked during the parent’s Extract() call.
Shape(): A NestedMap mapping names of outputs to their static shape, without the batch dimension. In _InputBatch, this shape will be used to ensure that every output has a statically known shape.
The caller of Extractors calls each extractor’s FeatureMap() to populate the schema passed to tf.io.parse_example() or tf.parse_sequence_example(). The resulting dicationary of Tensors is then passed to each extractor’s _Extract() function (via FieldsExtractor.Extract()) to return each extractor’s output.
It is the responsibility of the caller to maintain orders of outputs, since NestedMaps do not have any inherent ordering during iteration.
-
Extract(features)[source]¶ Given ‘feature’ (Sparse)Tensors, output Tensors for consumption.
NOTE: Implementation provided by subclasses’s _Extract() method.
- Parameters
features – A dictionary of (Sparse)Tensors which includes tensors from all extractors.
- Returns
A NestedMap of output Tensors.
-
ExtractBatch(features)[source]¶ Given ‘features’ batched Tensors, output Tensors for consumption.
NOTE: Implementation provided by subclasses’s _ExtractBatch() method.
- Parameters
features – A dictionary of Tensors which includes tensors from this extractor.
- Returns
A NestedMap of batched output Tensors.
-
Filter(outputs)[source]¶ Return the bucket based on the result of Extract().
This function should return 1 if the example should pass through without being dropped, and a value in [BUCKET_UPPER_BOUND, inf) if the example should be dropped. Currently no other bucketing strategies are supported.
- Parameters
outputs – The NestedMap returned by this extractor’s _Extract() function. This is useful to implement filtering based on the values of the extracted example.
- Returns
A scalar bucket id.
-
FilterBatch(outputs)[source]¶ Like Filter but runs over batches of outputs.
This function should be called to decide whether the entire batch should be dropped. Downstream implementations that do not run within an input pipeline must figure out how to handle these outputs, if filtering at the batch level is desired.
- Parameters
outputs – A NestedMap of preprocessed Tensors.
- Returns
A scalar bucket id.
-
class
lingvo.tasks.car.input_extractor.LaserExtractor(*args, **kwargs)[source]¶ Bases:
lingvo.tasks.car.input_extractor.FieldsExtractorInterface for extracting laser data.
- Must produce:
points_xyz: [max_num_points, 3] - XYZ coordinates of laser points.
- points_feature: [max_num_points, num_features] - Features for each point in
points_xyz.
- points_padding: [max_num_points]: Padding for points. 0 means the
corresponding point is the original, and 1 means there is no point (xyz or feature) present. Only present if max_num_points is not None.