lingvo.tools.beam_utils module

Tools for car beam pipelines.

lingvo.tools.beam_utils.BeamInit()[source]

Initialize the beam program.

Typically first thing to run in main(). This call is needed before FLAGS are accessed, for example.

lingvo.tools.beam_utils.GetPipelineRoot(options=None)[source]

Return the root of the beam pipeline.

Typical usage looks like:

with GetPipelineRoot() as root:

_ = (root | beam.ParDo() | …)

In this example, the pipeline is automatically executed when the context is exited, though one can manually run the pipeline built from the root object as well.

Parameters

options – A beam.options.pipeline_options.PipelineOptions object.

Returns

A beam.Pipeline root object.

lingvo.tools.beam_utils.GetReader(record_format, file_pattern, value_coder, **kwargs)[source]

Returns a beam Reader based on record_format and file_pattern.

Parameters
  • record_format – String record format, e.g., ‘tfrecord’.

  • file_pattern – String path describing files to be read.

  • value_coder – Coder to use for the values of each record.

  • **kwargs – arguments to pass to the corresponding Reader object constructor.

Returns

A beam reader object.

Raises

ValueError – If an unsupported record_format is provided.

lingvo.tools.beam_utils.GetWriter(record_format, file_pattern, value_coder, **kwargs)[source]

Returns a beam Writer.

Parameters
  • record_format – String record format, e.g., ‘tfrecord’ to write as.

  • file_pattern – String path describing files to be written to.

  • value_coder – Coder to use for the values of each written record.

  • **kwargs – arguments to pass to the corresponding Writer object constructor.

Returns

A beam writer object.

Raises

ValueError – If an unsupported record_format is provided.

lingvo.tools.beam_utils.GetEmitterFn(record_format)[source]

Returns an Emitter function for the given record_format.

An Emitter function takes in a key and value as arguments and returns a structure that is compatible with the Beam Writer associated with the corresponding record_format.

Parameters

record_format – String record format, e.g., ‘tfrecord’ to write as.

Returns

An emitter function of (key, value) -> Writer’s input type.

Raises

ValueError – If an unsupported record_format is provided.