lingvo.tasks.car.waymo.tools.waymo_proto_to_tfe module¶
Library to convert Waymo Open Dataset to tf.Examples.
Generates a tf.Example proto for every dataset_pb2.Frame containing the following keys and their values:
Frame-level metadata
run_segment: string - The identifier of the driving sequence in the dataset.
run_start_offset: int64 - The start offset within the run_segment sequence.
time_of_day: string - Categorical description of time of day, e.g., “Day”.
location: string - Categorical description of geographical location, e.g., “location_sf”.
weather: string - Categorical description of weather of scene, e.g., “sunny”.
pose: float: 4x4 transformation matrix for converting from “world” coordinates to SDC coordinates.
Lasers
There are 5 LIDAR sensors: “TOP”, “SIDE_LEFT”, “SIDE_RIGHT”, “FRONT”, “REAR”. Each LIDAR currently provides two returns, “ri1” and “ri2” for the first and second returns of each shot.
For every $LASER and $RI, we embed the raw range image:
$LASER_$RI: float - flattened range image data of shape [H, W, C] from the original proto.
$LASER_$RI_shape: int64 - shape of the range image.
For every lidar $LASER, we extract the calibrations:
$LASER_beam_inclinations: float - List of beam angle inclinations for TOP LIDAR (non-uniform).
$LASER_beam_inclination_min: float - Minimum beam inclination for uniform LIDARs.
$LASER_beam_inclination_max: float - Maximum beam inclination for uniform LIDARs.
$LASER_extrinsics: float - 4x4 transformation matrix for converting from SDC coordinates to LIDAR coordinates.
The TOP LIDAR currently has a per-pixel range image pose to accommodate for rolling shutter effects when projecting to 3D cartesian coordinates. We embed this range image pose as TOP_pose.
To allow for easier use, we also project all $LASERs to a stacked 3D cartesian coordinate point cloud as:
laser_$LASER_$RI - float: An [N, 6] matrix where there are N total points, the first three dimensions are the x, y, z caresian coordinates, and the last three dimensions are the intensity, elongation, and “is_in_no_label_zone” bit for each point.
Camera images
There are 5 cameras in the dataset: “FRONT”, “FRONT_LEFT”, “FRONT_RIGHT”, “SIDE_LEFT”, and “SIDE_RIGHT”.
For each $CAM, we store:
image_$CAM: string - Scalar Png format camera image.
image_$CAM_shape: int64 - [3] - Vector containing the shape of the camera image as [height, width, channels].
image_$CAM_pose: float - [4, 4] Matrix transformation for converting from world coordinates to camera center.
image_$CAM_pose: float - Scalar timestamp offset of when image was taken.
image_$CAM_shutter: float - Scalar shutter value.
image_$CAM_velocity: float - [6] Vector describing velocity of camera for rolling shutter adjustment. See original proto for details.
image_%CAM_camera_trigger_time: Scalar float for when camera was triggered.
image_$CAM_camera_readout_done_time: Scalar float for when camera image finished reading out data.
camera_$CAM_extrinsics: float - 4x4 pose transformation for converting from camera center coordinates to 2d projected view.
camera_$CAM_intrinsics: float - [9] intrinsics transformation for converting from camera center coordinates to 2d projected view.
camera_$CAM_width: int64 - Scalar width of image.
camera_$CAM_height: int64 - Scalar height of image.
camera_$CAM_rolling_shutter_direction: int64 - Scalar value indicating the direction of the rolling shutter adjustment.
Labels
For each frame, we store the following label information for the M bounding boxes in the frame.
labels: int64 - [M] - The integer label class for every 3D bounding box corresponding to the enumeration defined in the proto.
label_ids: string - [M] - The unique label string identifying each labeled object. This can be used for associating the same object across frames of the same run segment.
bboxes_3d: float - A flattened [M, 7] matrix where there are M boxes in the frame, and each box is defined by a 7-DOF format - [center_x ,center_y, center_z, length, width, height, heading].
label_metadata: floating point - A flattened [M, 4] matrix where there are M boxes in the frame, and each md entry is the [speed_x, speed_y, accel_x, accel_y] of the object.
bboxes_3d_num_points: int64 - [M] - The number of points that fall into each 3D bounding box: can be used for computing the difficulty of each bounding box.
detection_difficulties: int64 - DO NOT USE FOR EVALUATION. Indicates whether the
labelers have determined that the object is of LEVEL_2 difficulty.
Should be used jointly with num_points above to set the difficulty level,
which we save in single_frame_detection_difficulties. Because it does not
include information about the number of points in its calculation,
it is an incomplete definition of difficulty and will not correspond to the
leaderboard if used to calculate metrics.
single_frame_detection_difficulties: int64 - Indicates the difficulty level as either LEVEL_1 (1), or LEVEL_2 (2), or IGNORE (999). We first ignore all 3D labels without any LiDAR points. Next, we assign LEVEL_2 to examples where either the labeler annotates as hard or if the example has <= 5 LiDAR points. Finally, the rest of the examples are assigned to LEVEL_1.
tracking_difficulties: int64 - Indicates whether the labelers have determined that the tracked object is of LEVEL_2 difficulty.
nlz_proto_strs: string - Vector of NoLabelZone polygon protos. Currently unused.
-
class
lingvo.tasks.car.waymo.tools.waymo_proto_to_tfe.FrameToTFE(use_range_image_index_as_lidar_feature=None)[source]¶ Bases:
objectConverter utility from car.open_dataset.Frame to tf.Examples.
-
extract_camera_images(feature, camera_images, camera_calibrations_dict)[source]¶ Extract the images into the tf.Example feature map.
- Parameters
feature – A tf.Example feature map.
camera_images – A repeated car.open_dataset.CameraImage proto.
camera_calibrations_dict – A dictionary maps camera name to car.open_dataset.CameraCalibration proto.
-
extract_camera_calibrations(feature, camera_calibrations)[source]¶ Extract the camera calibrations into the tf.Example feature map.
- Parameters
feature – A tf.Example feature map.
camera_calibrations – A CameraCalibration proto from the Waymo Dataset.
-
extract_lasers(feature, lasers)[source]¶ Extract the lasers from range_images into the tf.Example feature map.
- Parameters
feature – A tf.Example feature map.
lasers – A repeated car.open_dataset.Laser proto.
-
extract_laser_calibrations(feature, laser_calibrations)[source]¶ Extract the laser calibrations into the tf.Example feature map.
- Parameters
feature – A tf.Example feature map.
laser_calibrations – A LaserCalibrations proto from the Waymo Dataset.
-
add_point_cloud(feature, laser_names, range_image_pose)[source]¶ Convert the range images in
featureto 3D point clouds.Adds the point cloud data to the tf.Example feature map.
- Parameters
feature – A tf.Example feature map.
laser_names – A list of laser names (e.g., ‘TOP’, ‘REAR’, ‘SIDE_LEFT’).
range_image_pose – A range image pose Tensor for the GBR.
-
_single_frame_detection_difficulty(human_difficulty, num_points)[source]¶ Create the
single_frame_detection_difficultyfield.When labeling, humans have the option to label a particular frame’s bbox as difficult, which overrides the normal number of points based definition. Additionally, boxes with 0 points are ignored by the metric code.
- Parameters
human_difficulty – What the human raters labeled the difficulty as. This is from the detection_difficulty_level field, and will be either 0 (default value, which is UKNOWN in the proto enum) or 2 (LEVEL_2 difficulty).
num_points – The number of points in the bbox.
- Returns
- The single frame detection difficulty
per the Waymo Open Dataset paper’s definition.
- Return type
single_frame_detection_difficulty
-
-
class
lingvo.tasks.car.waymo.tools.waymo_proto_to_tfe.WaymoOpenDatasetConverter(emitter_fn)[source]¶ Bases:
apache_beam.transforms.core.DoFnConverts WaymoOpenDataset into tf.Examples. See file docstring.
-
process(item)[source]¶ Method to use for processing elements.
This is invoked by
DoFnRunnerfor each element of a inputPCollection.The following parameters can be used as default values on
processarguments to indicate that a DoFn accepts the corresponding parameters. For example, a DoFn might accept the element and its timestamp with the following signature:def process(element=DoFn.ElementParam, timestamp=DoFn.TimestampParam): ...
The full set of parameters is:
DoFn.ElementParam: element to be processed, should not be mutated.DoFn.SideInputParam: a side input that may be used when processing.DoFn.TimestampParam: timestamp of the input element.DoFn.WindowParam:Windowthe input element belongs to.DoFn.TimerParam: auserstate.RuntimeTimerobject defined by the spec of the parameter.DoFn.StateParam: auserstate.RuntimeStateobject defined by the spec of the parameter.DoFn.KeyParam: key associated with the element.DoFn.RestrictionParam: aniobase.RestrictionTrackerwill be provided here to allow treatment as a SplittableDoFn. The restriction tracker will be derived from the restriction provider in the parameter.DoFn.WatermarkEstimatorParam: a function that can be used to track output watermark of SplittableDoFnimplementations.
- Parameters
element – The element to be processed
*args – side inputs
**kwargs – other keyword arguments.
- Returns
An Iterable of output elements or None.
-