lingvo.tasks.asr.tools.simple_wer_v2 module

The new version script to evalute the word error rate (WER) for ASR tasks.

Tensorflow and Lingvo are not required to run this script.

Example of Usage:

python simple_wer_v2.py file_hypothesis file_reference
python simple_wer_v2.py file_hypothesis file_reference file_keyphrases

where file_hypothesis is the filename for hypothesis text, file_reference is the filename for reference text, and file_keyphrases is the optional filename for important phrases (one phrase per line).

Note that the program will also generate a html to diagnose the errors, and the html filename is {$file_hypothesis}_diagnois.html.

Another way is to use this file as a stand-alone library, by calling class SimpleWER with the following member functions:

AddHypRef(hyp, ref): Updates the evaluation for each (hyp,ref) pair.
GetWER(): Computes word error rate (WER) for all the added hyp-ref pairs.
GetSummaries(): Generates strings to summarize word and key phrase errors.
GetKeyPhraseStats(): Measures stats for key phrases.
Stats include: (1) Jaccard similarity: https://en.wikipedia.org/wiki/Jaccard_index. (2) F1 score: https://en.wikipedia.org/wiki/Precision_and_recall.

lingvo.tasks.asr.tools.simple_wer_v2.TxtPreprocess(txt)[source]: Preprocess text before WER caculation.

lingvo.tasks.asr.tools.simple_wer_v2.RemoveCommentTxtPreprocess(txt)[source]: Preprocess text and remove comments in the brancket, such as [comments].

lingvo.tasks.asr.tools.simple_wer_v2.HighlightAlignedHtml(hyp, ref, err_type)[source]

Generate a html element to highlight the difference between hyp and ref.

Parameters

hyp – Hypothesis string.
ref – Reference string.
err_type – one of ‘none’, ‘sub’, ‘del’, ‘ins’.

Returns

a html string where disagreements are highlighted.: Note hyp is highlighted in green, and marked with <del> </del> ref is highlighted in yellow. If you want html with nother styles, consider to write your own function.

Raises

ValueError – if err_type is not among [‘none’, ‘sub’, ‘del’, ‘ins’]. or if when err_type == ‘none’, hyp != ref

lingvo.tasks.asr.tools.simple_wer_v2.ComputeEditDistanceMatrix(hyp_words, ref_words)[source]

Compute edit distance between two list of strings.

Parameters

hyp_words – the list of words in the hypothesis sentence
ref_words – the list of words in the reference sentence

Returns

Edit distance matrix (in the format of list of lists), where the first index is the reference and the second index is the hypothesis.

lingvo.tasks.asr.tools.simple_wer_v2.RemoveTags(txt)[source]: Remove angle-bracket enclosed tags, such as <tag>.

class lingvo.tasks.asr.tools.simple_wer_v2.HtmlHandler[source]

Bases: object

Template class for HtmlHandler children.

Each handler needs to implmement the Render method which incrementally writes the html for the current word. It has access to various relevant variables from kwargs, such as the current hyp and ref word, the current hyp and ref positions, and the error type.

Optionally can implement the Setup method which is run once at the beginning.

Setup(hypothesis, reference)[source]: Setup the handler state prior to looping through the transcript.

Render(**kwargs)[source]

Render the html for each word as we loop through the transcript.

Parameters: **kwargs – dict of arguments needed for the handler to render correctly
Returns: Html string for word

class lingvo.tasks.asr.tools.simple_wer_v2.HighlightAlignedHtmlHandler(highlight_fn=<function HighlightAlignedHtml>)[source]

Bases: HtmlHandler

Handler for HighlightAlignedHtml.

Render(hyp_word=None, ref_word=None, err_type=None, **kwargs)[source]

Render the html for each word as we loop through the transcript.

Parameters: **kwargs – dict of arguments needed for the handler to render correctly
Returns: Html string for word

class lingvo.tasks.asr.tools.simple_wer_v2.SimpleWER(key_phrases=None, html_handler=<lingvo.tasks.asr.tools.simple_wer_v2.HighlightAlignedHtmlHandler object>, preprocess_handler=<function RemoveCommentTxtPreprocess>)[source]

Bases: object

Compute word error rates after the alignment.

key_phrases: list of important phrases.

aligned_htmls: list of diagnois htmls, each of which corresponding to a pair of hypothesis and reference.

hyp_keyphrase_counts: dict. hyp_keyphrase_counts[w] counts how often a key phrases w appear in the hypotheses.

ref_keyphrase_counts: dict. ref_keyphrase_counts[w] counts how often a key phrases w appear in the references.

matched_keyphrase_counts: dict. matched_keyphrase_counts[w] counts how often a key phrase w appear in the aligned transcripts when the reference and hyp_keyphrase match.

wer_info: dict with four keys: ‘sub’ (substitution error), ‘ins’ (insersion error), ‘del’ (deletion error), ‘nw’ (number of words). We can use wer_info to compute word error rate (WER) as (wer_info[‘sub’]+wer_info[‘ins’]+wer_info[‘del’])*100.0/wer_info[‘nw’]

AddHypRef(hypothesis, reference)[source]

Update WER when adding one pair of strings: (hypothesis, reference).

Parameters

hypothesis – Hypothesis string.
reference – Reference string.

Raises

ValueError – when the program fails to parse edit distance matrix.

GetWER()[source]

Compute Word Error Rate (WER).

Note WER can be larger than 100.0, esp when there are many insertion errors.

Returns: WER as percentage number, usually between 0.0 to 100.0

GetBreakdownWER()[source]

Compute breakdown WER.

Returns: A dictionary with del/ins/sub as key, and the error rates in percentage number as value.

GetKeyPhraseStats()[source]

Measure the Jaccard similarity of key phrases between hyps and refs.

Returns: jaccard similarity, between 0.0 and 1.0 F1_keyphrase: F1 score (=2/(1/prec + 1/recall)), between 0.0 and 1.0 matched_keyphrases: num of matched key phrases. ref_keyphrases: num of key phrases in the reference strings. hyp_keyphrases: num of key phrases in the hypothesis strings.
Return type: jaccard_similarity

GetSummaries()[source]

Generate strings to summarize word errors and key phrase errors.

Returns: string summarizing total error, total word and WER. str_details: string breaking down three error types: del, ins, sub. str_str_keyphrases_info: string summarizing kerphrase information.
Return type: str_sum

lingvo.tasks.asr.tools.simple_wer_v2.main(argv)[source]