lingvo.tasks.asr.tools.simple_wer_v2 module
The new version script to evalute the word error rate (WER) for ASR tasks.
Tensorflow and Lingvo are not required to run this script.
Example of Usage:
python simple_wer_v2.py file_hypothesis file_reference
python simple_wer_v2.py file_hypothesis file_reference file_keyphrases
where file_hypothesis
is the filename for hypothesis text,
file_reference
is the filename for reference text, and
file_keyphrases
is the optional filename for important phrases
(one phrase per line).
Note that the program will also generate a html to diagnose the errors,
and the html filename is {$file_hypothesis}_diagnois.html
.
Another way is to use this file as a stand-alone library, by calling class SimpleWER with the following member functions:
AddHypRef(hyp, ref): Updates the evaluation for each (hyp,ref) pair.
GetWER(): Computes word error rate (WER) for all the added hyp-ref pairs.
GetSummaries(): Generates strings to summarize word and key phrase errors.
- GetKeyPhraseStats(): Measures stats for key phrases.
Stats include: (1) Jaccard similarity: https://en.wikipedia.org/wiki/Jaccard_index. (2) F1 score: https://en.wikipedia.org/wiki/Precision_and_recall.
- lingvo.tasks.asr.tools.simple_wer_v2.TxtPreprocess(txt)[source]
Preprocess text before WER caculation.
- lingvo.tasks.asr.tools.simple_wer_v2.RemoveCommentTxtPreprocess(txt)[source]
Preprocess text and remove comments in the brancket, such as [comments].
- lingvo.tasks.asr.tools.simple_wer_v2.HighlightAlignedHtml(hyp, ref, err_type)[source]
Generate a html element to highlight the difference between hyp and ref.
- Parameters
hyp – Hypothesis string.
ref – Reference string.
err_type – one of ‘none’, ‘sub’, ‘del’, ‘ins’.
- Returns
- a html string where disagreements are highlighted.
Note
hyp
is highlighted in green, and marked with <del> </del>ref
is highlighted in yellow. If you want html with nother styles, consider to write your own function.
- Raises
ValueError – if err_type is not among [‘none’, ‘sub’, ‘del’, ‘ins’]. or if when err_type == ‘none’, hyp != ref
- lingvo.tasks.asr.tools.simple_wer_v2.ComputeEditDistanceMatrix(hyp_words, ref_words)[source]
Compute edit distance between two list of strings.
- Parameters
hyp_words – the list of words in the hypothesis sentence
ref_words – the list of words in the reference sentence
- Returns
Edit distance matrix (in the format of list of lists), where the first index is the reference and the second index is the hypothesis.
- lingvo.tasks.asr.tools.simple_wer_v2.RemoveTags(txt)[source]
Remove angle-bracket enclosed tags, such as <tag>.
- class lingvo.tasks.asr.tools.simple_wer_v2.HtmlHandler[source]
Bases:
object
Template class for HtmlHandler children.
Each handler needs to implmement the Render method which incrementally writes the html for the current word. It has access to various relevant variables from kwargs, such as the current hyp and ref word, the current hyp and ref positions, and the error type.
Optionally can implement the Setup method which is run once at the beginning.
- class lingvo.tasks.asr.tools.simple_wer_v2.HighlightAlignedHtmlHandler(highlight_fn=<function HighlightAlignedHtml>)[source]
Bases:
HtmlHandler
Handler for HighlightAlignedHtml.
- class lingvo.tasks.asr.tools.simple_wer_v2.SimpleWER(key_phrases=None, html_handler=<lingvo.tasks.asr.tools.simple_wer_v2.HighlightAlignedHtmlHandler object>, preprocess_handler=<function RemoveCommentTxtPreprocess>)[source]
Bases:
object
Compute word error rates after the alignment.
- key_phrases
list of important phrases.
- aligned_htmls
list of diagnois htmls, each of which corresponding to a pair of hypothesis and reference.
- hyp_keyphrase_counts
dict.
hyp_keyphrase_counts[w]
counts how often a key phrasesw
appear in the hypotheses.
- ref_keyphrase_counts
dict.
ref_keyphrase_counts[w]
counts how often a key phrasesw
appear in the references.
- matched_keyphrase_counts
dict.
matched_keyphrase_counts[w]
counts how often a key phrasew
appear in the aligned transcripts when the reference and hyp_keyphrase match.
- wer_info
dict with four keys: ‘sub’ (substitution error), ‘ins’ (insersion error), ‘del’ (deletion error), ‘nw’ (number of words). We can use wer_info to compute word error rate (WER) as (wer_info[‘sub’]+wer_info[‘ins’]+wer_info[‘del’])*100.0/wer_info[‘nw’]
- AddHypRef(hypothesis, reference)[source]
Update WER when adding one pair of strings: (hypothesis, reference).
- Parameters
hypothesis – Hypothesis string.
reference – Reference string.
- Raises
ValueError – when the program fails to parse edit distance matrix.
- GetWER()[source]
Compute Word Error Rate (WER).
Note WER can be larger than 100.0, esp when there are many insertion errors.
- Returns
WER as percentage number, usually between 0.0 to 100.0
- GetBreakdownWER()[source]
Compute breakdown WER.
- Returns
A dictionary with del/ins/sub as key, and the error rates in percentage number as value.
- GetKeyPhraseStats()[source]
Measure the Jaccard similarity of key phrases between hyps and refs.
- Returns
jaccard similarity, between 0.0 and 1.0 F1_keyphrase: F1 score (=2/(1/prec + 1/recall)), between 0.0 and 1.0 matched_keyphrases: num of matched key phrases. ref_keyphrases: num of key phrases in the reference strings. hyp_keyphrases: num of key phrases in the hypothesis strings.
- Return type
jaccard_similarity
- GetSummaries()[source]
Generate strings to summarize word errors and key phrase errors.
- Returns
string summarizing total error, total word and WER. str_details: string breaking down three error types: del, ins, sub. str_str_keyphrases_info: string summarizing kerphrase information.
- Return type
str_sum