lingvo.tasks.car.kitti_input_generator module

Input generator for KITTI data.

lingvo.tasks.car.kitti_input_generator._Dense(sparse, default_value=0)[source]
lingvo.tasks.car.kitti_input_generator._NestedMapToParams(nmap)[source]
lingvo.tasks.car.kitti_input_generator.ComputeKITTIDifficulties(box_image_height, occlusion, truncation)[source]

Compute difficulties from box height, occlusion, and truncation.

class lingvo.tasks.car.kitti_input_generator.KITTILaserExtractor(*args, **kwargs)[source]

Bases: lingvo.tasks.car.input_extractor.LaserExtractor

Base extractor for the laser points from a KITTI tf.Example.

classmethod Params()[source]

Defaults params.

FeatureMap()[source]

Return a dictionary from tf.Example feature names to Features.

_Extract(features)[source]

The subclass-defined implementation of Extract().

Parameters

features – A dictionary of (Sparse)Tensors which includes tensors from this extractor.

Returns

A NestedMap of output Tensors whose key names match self.Shape()’s keys.

class lingvo.tasks.car.kitti_input_generator.KITTIImageExtractor(*args, **kwargs)[source]

Bases: lingvo.tasks.car.input_extractor.FieldsExtractor

Extracts the image information (left camera) from a KITTI tf.Example.

Produces:

image: [512, 1382, 3] - Floating point Tensor containing image data. Note that image may not be produced if decode_image is set to False. During training, we may not want to decode the images.

width: [1] - integer scalar width of the original image.

height: [1] - integer scalar width of the original image.

velo_to_image_plane: [3, 4] - transformation matrix from velo xyz to image plane xy. After multiplication, you need to divide by last coordinate to recover 2D pixel locations.

velo_to_camera: [4, 4] - transformation matrix from velo xyz to camera xyz.

camera_to_velo: [4, 4] - transformation matrix from camera xyz to velo xyz.

_KITTI_MAX_HEIGHT = 512
_KITTI_MAX_WIDTH = 1382
classmethod Params()[source]

Defaults params.

FeatureMap()[source]

Return a dictionary from tf.Example feature names to Features.

_Extract(features)[source]

The subclass-defined implementation of Extract().

Parameters

features – A dictionary of (Sparse)Tensors which includes tensors from this extractor.

Returns

A NestedMap of output Tensors whose key names match self.Shape()’s keys.

Shape()[source]

Return a NestedMap of un-batched fully-specified tf.TensorShapes.

DType()[source]

Return a NestedMap mapping names to tf.DType.

class lingvo.tasks.car.kitti_input_generator.KITTILabelExtractor(*args, **kwargs)[source]

Bases: lingvo.tasks.car.input_extractor.FieldsExtractor

Extracts the object labels from a KITTI tf.Example.

Emits:

bboxes_count: Scalar number of 2D bounding boxes in the example.

bboxes: [p.max_num_objects, 4] - 2D bounding box data in [ymin, xmin, ymax, xmax] format.

bboxes_padding: [p.max_num_objects] - Padding for bboxes.

bboxes_3d: [p.max_num_objects, 7] - 3D bounding box data in [x, y, z, dx, dy, dz, phi] format. x, y, z are the object center; dx, dy, dz are the dimensions of the box, and phi is the rotation angle around the z-axis. 3D bboxes are defined in the velodyne coordinate frame.

bboxes_3d_mask: [p.max_num_objects] - Mask for bboxes (mask is the inversion of padding).

bboxes3d_proj_to_image_plane: [p.max_num_objects, 8, 2] - For each bounding box, the 8 corners of the bounding box in projected image coordinates (x, y).

bboxes_td: [p.max_num_objects, 4] - The 3D bounding box data in top down projected coordinates (ymin, xmin, ymax, xmax). This currently ignores rotation.

bboxes_td_mask: [p.max_num_objects]: Mask for bboxes_td.

bboxes_3d_num_points: [p.max_num_objects]: Number of points in each box.

labels: [p.max_num_objects] - Integer label for each bounding box object corresponding to the index in KITTI_CLASS_NAMES.

texts: [p.max_num_objects] - The class name for each label in labels.

source_id: Scalar string. The unique identifier for each example.

See ComputeKITTIDifficulties for more info of the following:

box_image_height: [p.max_num_objects] - The height of the box in pixels
of each box in the projected image plane.

occlusion: [p.max_num_objects] - The occlusion level of each bounding box.

truncation: [p.max_num_objects] - The truncation level of each bounding box.

difficulties: [p.max_num_objects] - The computed difficulty based on the
above three factors.
KITTI_CLASS_NAMES = ['Background', 'Car', 'Van', 'Truck', 'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram', 'Misc', 'DontCare']
SUBCLASS_DICT = {'cyclist': [6], 'human': [4, 5], 'motor': [1, 2, 3, 7], 'pedestrian': [4]}
classmethod Params()[source]

Defaults params.

FeatureMap()[source]

Return a dictionary from tf.Example feature names to Features.

_Extract(features)[source]

The subclass-defined implementation of Extract().

Parameters

features – A dictionary of (Sparse)Tensors which includes tensors from this extractor.

Returns

A NestedMap of output Tensors whose key names match self.Shape()’s keys.

Shape()[source]

Return a NestedMap of un-batched fully-specified tf.TensorShapes.

DType()[source]

Return a NestedMap mapping names to tf.DType.

class lingvo.tasks.car.kitti_input_generator.KITTIBase(*args, **kwargs)[source]

Bases: lingvo.tasks.car.base_extractor._BaseExtractor

KITTI dataset base parameters.

classmethod Params(*args, **kwargs)[source]

Defaults params.

Parameters

extractors – An hyperparams.Params of extractor names to Extractors. A few extractor types are required: ‘labels’: A LabelExtractor.Params().

Returns

A base_layer Params object.

property class_names
class lingvo.tasks.car.kitti_input_generator.KITTILaser(*args, **kwargs)[source]

Bases: lingvo.tasks.car.kitti_input_generator.KITTIBase

KITTI object detection dataset.

This class emits KITTI images, labels, and the raw laser representation of the data. See KITTIGrid and KITTISparse for alternative laser representations.

Input batch contains outputs from:
  • KITTIImageExtractor

  • KITTILabelExtractor

  • KITTILaserExtractor

classmethod Params()[source]

Defaults params.

property class_names
class lingvo.tasks.car.kitti_input_generator.KITTISparseLaser(*args, **kwargs)[source]

Bases: lingvo.tasks.car.kitti_input_generator.KITTIBase

KITTI object detection dataset for sparse detection models.

This class emits KITTI images, labels, and the sparse laser representation of the data. See KITTIGrid and KITTISparse for alternative laser representations.

Input batch contains outputs from:
  • KITTILabelExtractor

  • KITTILaserExtractor

Transformed with:
  • Metadata annotation: - CountNumberOfPointsInBoxes3D

  • Visualization: - CreateDecoderCopy

  • Sparse gather of points for featurization: - SparseCenterSelector - SparseCellGatherFeatures

  • Anchor creation for classification regression targets: - TileAnchorBBoxes - AnchorAssignment

classmethod Params()[source]

Defaults params.

class lingvo.tasks.car.kitti_input_generator.KITTIGrid(*args, **kwargs)[source]

Bases: lingvo.tasks.car.kitti_input_generator.KITTIBase

KITTI object detection dataset.

This class emits KITTI images, labels, and the fixed grid laser representation of the data.

Input batch contains outputs from:
  • KITTILabelExtractor

  • KITTILaserExtractor

Transformed with:
  • Metadata annotation: - CountNumberOfPointsInBoxes3D

  • Visualization: - CreateDecoderCopy

  • Points to Pillars - PointsToGrid - GridToPillars

  • Anchor creation for classification regression targets: - GridAnchorCenters - TileAnchorBBoxes - AnchorAssignment

classmethod Params()[source]

Defaults params.