lingvo.tasks.car.tools.kitti_exporter module

Create TFRecords files from KITTI raw data.

Parses KITTI raw data with different splits, indicated with split files. A split file is a text file that specifies frame names to be included in the split, with one name per line. Splits with ‘test’ in the filename use testing data, while other splits use training data. This program expects KITTI raw data in the following directory structure:

kitti_object/
training/ # Contains KITTI raw train data

label2/ velodyne/ calib/ image_2/

testing/ # Contains KITTI raw test data

velodyne/ calib/ image_2/

splits/ # Contains split files identifying frame names in the split.

split_name.txt

Outputs examples in TFRecords files correspond to KITTI frames with the following format:

# frame information image/source_id: unique frame name e.g ‘000000’, ‘000010’

# 2D image data image/encoded: PNG encoded string image/height: image height image/width: image width image/format: ‘PNG’

# 3D velodyne pointcloud data (variable P points per frame) pointcloud/xyz = point positions (P x 3 tensor). pointcloud/reflectance: point reflectances (P x 1 tensor).

# Object level data (variable N objects per frame) object/image/bbox/xmin: min X pixel location in raw image (N x 1 tensor).

object/image/bbox/xmax: max X pixel location in raw image (N x 1 tensor).

object/image/bbox/ymin: min Y pixel location in raw image (N x 1 tensor).

object/image/bbox/ymax: max Y pixel location in raw image (N x 1 tensor).

object/label: one of {‘Car’, ‘Pedestrian’, ‘Cyclist’} identifying object class (N x 1 tensor).

object/has_3d_info: 1 if object has valid 3D info else 0 (N x 1 tensor).

object/occlusion: int in {0, 1, 2, 3} of occlusion state (N x 1 tensor).

object/truncation: float in 0 (non-truncated) to 1 (truncated) (N x 1 tensor).

object/velo/bbox/xyz: 3D bbox locations in velo frame (N x 3 tensor).

object/velo/bbox/dim_xyz: length (dx), width (dy), height (dz) indicating object dimensions (N x 3 tensor).

object/velo/bbox/phi: bbox rotation in velo frame (N x 1 tensor).

# Transformation matrices transform/velo_to_image_plane: 3x4 matrix from velo xyz to image plane xy. After multiplication, you need to divide by last coordinate to recover 2D pixel locations.

transform/velo_to_camera: 4x4 matrix from velo xyz to camera xyz.

transform/camera_to_velo 4x4 matrix from camera xyz to velo xyz.

lingvo.tasks.car.tools.kitti_exporter._ReadObjectDataset(root_dir, frame_names)[source]

Reads and parses KITTI dataset files into a list of TFExample protos.

lingvo.tasks.car.tools.kitti_exporter._ExportObjectDatasetToTFRecord(root_dir, split_file, tfrecord_path, num_shards)[source]

Exports KITTI dataset files to TFRecord files.

lingvo.tasks.car.tools.kitti_exporter.main(unused_argv)[source]