lingvo.tasks.milan.common_schema module

Defines a common tf.train.Example format for image-caption(-like) data.

lingvo.tasks.milan.common_schema._Feature(shape, dtype=tf.float32)[source]

lingvo.tasks.milan.common_schema.ImageFeatures(images_per_example=1)[source]

Returns definitions of common image features.

Parameters: images_per_example – Number of images stored in each example.
Returns: A dict of feature definitions usable with tf.io.parse_example.

lingvo.tasks.milan.common_schema.TextFeatures(captions_per_example=1, bert_embeddings_shape=None)[source]

Returns definitions of the common text features.

Parameters

captions_per_example – Number of text captions stored in each example.
bert_embeddings_shape – Optional time-major shape of BERT embedding sequences to include in the schema (if given). Set the leading (time) dimension to None if the sequences have variable length.

Returns

A dict of feature definitions usable with tf.io.parse_example.

lingvo.tasks.milan.common_schema.AudioFeatures(mfcc_shape=None, cpc8k_shape=None)[source]

Returns definitions of common audio features.

Parameters

mfcc_shape – Optional time-major shape of MFCC features to include in the schema (if given). Set the leading (time) dimension to None if the sequences have variable length.
cpc8k_shape – Optional time-major shape of CPC-8K features to include.

Returns

A dict of feature definitions usable with tf.io.parse_example.