lingvo.tasks.car.waymo.tools.waymo_proto_to_tfe module¶

Library to convert Waymo Open Dataset to tf.Examples.

Generates a tf.Example proto for every dataset_pb2.Frame containing the following keys and their values:

Frame-level metadata

run_segment: string - The identifier of the driving sequence in the dataset.

run_start_offset: int64 - The start offset within the run_segment sequence.

time_of_day: string - Categorical description of time of day, e.g., “Day”.

location: string - Categorical description of geographical location, e.g., “location_sf”.

weather: string - Categorical description of weather of scene, e.g., “sunny”.

pose: float: 4x4 transformation matrix for converting from “world” coordinates to SDC coordinates.

Lasers

There are 5 LIDAR sensors: “TOP”, “SIDE_LEFT”, “SIDE_RIGHT”, “FRONT”, “REAR”. Each LIDAR currently provides two returns, “ri1” and “ri2” for the first and second returns of each shot.

For every $LASER and $RI, we embed the raw range image:

$LASER_$RI: float - flattened range image data of shape [H, W, C] from the original proto.

$LASER_$RI_shape: int64 - shape of the range image.

For every lidar $LASER, we extract the calibrations:

$LASER_beam_inclinations: float - List of beam angle inclinations for TOP LIDAR (non-uniform).

$LASER_beam_inclination_min: float - Minimum beam inclination for uniform LIDARs.

$LASER_beam_inclination_max: float - Maximum beam inclination for uniform LIDARs.

$LASER_extrinsics: float - 4x4 transformation matrix for converting from SDC coordinates to LIDAR coordinates.

The TOP LIDAR currently has a per-pixel range image pose to accommodate for rolling shutter effects when projecting to 3D cartesian coordinates. We embed this range image pose as TOP_pose.

To allow for easier use, we also project all $LASERs to a stacked 3D cartesian coordinate point cloud as:

laser_$LASER_$RI - float: An [N, 6] matrix where there are N total points, the first three dimensions are the x, y, z caresian coordinates, and the last three dimensions are the intensity, elongation, and “is_in_no_label_zone” bit for each point.

Camera images

There are 5 cameras in the dataset: “FRONT”, “FRONT_LEFT”, “FRONT_RIGHT”, “SIDE_LEFT”, and “SIDE_RIGHT”.

For each $CAM, we store:

image_$CAM: string - Scalar Png format camera image.

image_$CAM_shape: int64 - [3] - Vector containing the shape of the camera image as [height, width, channels].

image_$CAM_pose: float - [4, 4] Matrix transformation for converting from world coordinates to camera center.

image_$CAM_pose: float - Scalar timestamp offset of when image was taken.

image_$CAM_shutter: float - Scalar shutter value.

image_$CAM_velocity: float - [6] Vector describing velocity of camera for rolling shutter adjustment. See original proto for details.

image_%CAM_camera_trigger_time: Scalar float for when camera was triggered.

image_$CAM_camera_readout_done_time: Scalar float for when camera image finished reading out data.

camera_$CAM_extrinsics: float - 4x4 pose transformation for converting from camera center coordinates to 2d projected view.

camera_$CAM_intrinsics: float - [9] intrinsics transformation for converting from camera center coordinates to 2d projected view.

camera_$CAM_width: int64 - Scalar width of image.

camera_$CAM_height: int64 - Scalar height of image.

camera_$CAM_rolling_shutter_direction: int64 - Scalar value indicating the direction of the rolling shutter adjustment.

Labels

For each frame, we store the following label information for the M bounding boxes in the frame.

labels: int64 - [M] - The integer label class for every 3D bounding box corresponding to the enumeration defined in the proto.

label_ids: string - [M] - The unique label string identifying each labeled object. This can be used for associating the same object across frames of the same run segment.

bboxes_3d: float - A flattened [M, 7] matrix where there are M boxes in the frame, and each box is defined by a 7-DOF format - [center_x ,center_y, center_z, length, width, height, heading].

label_metadata: floating point - A flattened [M, 4] matrix where there are M boxes in the frame, and each md entry is the [speed_x, speed_y, accel_x, accel_y] of the object.

bboxes_3d_num_points: int64 - [M] - The number of points that fall into each 3D bounding box: can be used for computing the difficulty of each bounding box.

detection_difficulties: int64 - DO NOT USE FOR EVALUATION. Indicates whether the labelers have determined that the object is of LEVEL_2 difficulty. Should be used jointly with num_points above to set the difficulty level, which we save in single_frame_detection_difficulties. Because it does not include information about the number of points in its calculation, it is an incomplete definition of difficulty and will not correspond to the leaderboard if used to calculate metrics.

single_frame_detection_difficulties: int64 - Indicates the difficulty level as either LEVEL_1 (1), or LEVEL_2 (2), or IGNORE (999). We first ignore all 3D labels without any LiDAR points. Next, we assign LEVEL_2 to examples where either the labeler annotates as hard or if the example has <= 5 LiDAR points. Finally, the rest of the examples are assigned to LEVEL_1.

tracking_difficulties: int64 - Indicates whether the labelers have determined that the tracked object is of LEVEL_2 difficulty.

nlz_proto_strs: string - Vector of NoLabelZone polygon protos. Currently unused.

class lingvo.tasks.car.waymo.tools.waymo_proto_to_tfe.FrameToTFE(use_range_image_index_as_lidar_feature=None)[source]¶

Bases: object

Converter utility from car.open_dataset.Frame to tf.Examples.

process(item)[source]¶: Convert ‘item’ into tf.Example format.

_get_range_image_pose(lasers)[source]¶: Fetches the per-pixel pose information for the range image.

_parse_range_image(range_image)[source]¶: Parse range_image proto and convert to MatrixFloat form.

extract_camera_images(feature, camera_images, camera_calibrations_dict)[source]¶

Extract the images into the tf.Example feature map.

Parameters

feature – A tf.Example feature map.
camera_images – A repeated car.open_dataset.CameraImage proto.
camera_calibrations_dict – A dictionary maps camera name to car.open_dataset.CameraCalibration proto.

extract_camera_calibrations(feature, camera_calibrations)[source]¶

Extract the camera calibrations into the tf.Example feature map.

Parameters

feature – A tf.Example feature map.
camera_calibrations – A CameraCalibration proto from the Waymo Dataset.

extract_lasers(feature, lasers)[source]¶

Extract the lasers from range_images into the tf.Example feature map.

Parameters

feature – A tf.Example feature map.
lasers – A repeated car.open_dataset.Laser proto.

extract_laser_calibrations(feature, laser_calibrations)[source]¶

Extract the laser calibrations into the tf.Example feature map.

Parameters

feature – A tf.Example feature map.
laser_calibrations – A LaserCalibrations proto from the Waymo Dataset.

add_point_cloud(feature, laser_names, range_image_pose)[source]¶

Convert the range images in feature to 3D point clouds.

Adds the point cloud data to the tf.Example feature map.

Parameters

feature – A tf.Example feature map.
laser_names – A list of laser names (e.g., ‘TOP’, ‘REAR’, ‘SIDE_LEFT’).
range_image_pose – A range image pose Tensor for the GBR.

_single_frame_detection_difficulty(human_difficulty, num_points)[source]¶

Create the single_frame_detection_difficulty field.

When labeling, humans have the option to label a particular frame’s bbox as difficult, which overrides the normal number of points based definition. Additionally, boxes with 0 points are ignored by the metric code.

Parameters

human_difficulty – What the human raters labeled the difficulty as. This is from the detection_difficulty_level field, and will be either 0 (default value, which is UKNOWN in the proto enum) or 2 (LEVEL_2 difficulty).
num_points – The number of points in the bbox.

Returns

The single frame detection difficulty: per the Waymo Open Dataset paper’s definition.

Return type

single_frame_detection_difficulty

add_labels(feature, labels)[source]¶

Add 3d bounding box labels into the output feature map.

Parameters

feature – A tf.Example feature map.
labels – A repeated car.open_dataset.Label proto.

add_no_label_zones(feature, no_label_zones)[source]¶

Add no label zones into the output feature map.

Parameters

feature – A tf.Example feature map.
no_label_zones – A repeated car.open_dataset.Polygon2dProto proto.

class lingvo.tasks.car.waymo.tools.waymo_proto_to_tfe.WaymoOpenDatasetConverter(emitter_fn)[source]¶

Bases: apache_beam.transforms.core.DoFn

Converts WaymoOpenDataset into tf.Examples. See file docstring.

process(item)[source]¶

Method to use for processing elements.

This is invoked by DoFnRunner for each element of a input PCollection.

The following parameters can be used as default values on process arguments to indicate that a DoFn accepts the corresponding parameters. For example, a DoFn might accept the element and its timestamp with the following signature:

def process(element=DoFn.ElementParam, timestamp=DoFn.TimestampParam):
  ...

The full set of parameters is:

DoFn.ElementParam: element to be processed, should not be mutated.
DoFn.SideInputParam: a side input that may be used when processing.
DoFn.TimestampParam: timestamp of the input element.
DoFn.WindowParam: Window the input element belongs to.
DoFn.TimerParam: a userstate.RuntimeTimer object defined by the spec of the parameter.
DoFn.StateParam: a userstate.RuntimeState object defined by the spec of the parameter.
DoFn.KeyParam: key associated with the element.
DoFn.RestrictionParam: an iobase.RestrictionTracker will be provided here to allow treatment as a Splittable DoFn. The restriction tracker will be derived from the restriction provider in the parameter.
DoFn.WatermarkEstimatorParam: a function that can be used to track output watermark of Splittable DoFn implementations.

Parameters

element – The element to be processed
*args – side inputs
**kwargs – other keyword arguments.

Returns

An Iterable of output elements or None.