lingvo.core.scorers module

Helper classes for computing scores.

lingvo.core.scorers._ToUnicode(line)[source]

lingvo.core.scorers._Tokenize(string)[source]

lingvo.core.scorers.NGrams(lst, order)[source]: Generator that yields all n-grams of the given order present in lst.

class lingvo.core.scorers.Unsegmenter(separator_type=None)[source]

Bases: object

Un-segments (merges) segmented strings.

Used to retain back the original surface form of strings that are encoded using byte-pair-encoding (BPE), word-piece-models (WPM) or sentence-piece-models (SPM).

_BPE_SEPARATOR = '@@ '

_WPM_SEPARATOR = '▁'

_UnsegmentWpm(line)[source]

_UnsegmentBpe(line)[source]

class lingvo.core.scorers.BleuScorer(max_ngram=4, separator_type=None)[source]

Bases: object

Scorer to compute BLEU scores to measure translation quality.

The BLEU score is the geometric average precision of all token n-grams of order 1 to max_ngram across all sentences.

Successive calls to AddSentence() accumulate statistics which are converted to an overall score on calls to ComputeOverallScore().

Example usage: >>> scorer = BleuScorer(max_ngram=4) >>> scorer.AddSentence(“hyp matches ref str”, “hyp matches ref str”) >>> scorer.AddSentence(“almost right”, “almost write”) >>> print(scorer.ComputeOverallScore()) 0.6687…

property unsegmenter

AddSentence(ref_str, hyp_str)[source]: Accumulates ngram statistics for the given ref and hyp string pair.

ComputeOverallScore()[source]: Computes overall BLEU score from the statistics accumulated so far.