Normalize wer computation
Closes #23 (closed)
Edited by Solene Tarride
Merge request reports
Activity
assigned to @starride
I updated
format_string_for_wer
to remove the punctuation ifremove_punct=True
.>>> format_string_for_wer("Hello! This is a string, and it contains punctuation.", layout_tokens=None, remove_punct=False) ['Hello!', 'This', 'is', 'a', 'string,', 'and', 'it', 'contains', 'punctuation.'] >>> format_string_for_wer("Hello! This is a string, and it contains punctuation.", layout_tokens=None, remove_punct=True) ['Hello', 'This', 'is', 'a', 'string', 'and', 'it', 'contains', 'punctuation']
Here is a training log:
EPOCH 84/50000: 100%|██████████████████████████| 7/7 [00:00<00:00, 10.38it/s, values={'loss_ce': 1.2848, 'cer': 0.3093, 'wer': 0.7069, 'wer_no_punct': 0.6897, 'syn_max_lines': 1.0, 'syn_prob_lines': 0.9}] EPOCH 85/50000: 100%|██████████████████████████| 7/7 [00:00<00:00, 9.83it/s, values={'loss_ce': 1.3133, 'cer': 0.378, 'wer': 0.7416, 'wer_no_punct': 0.7273, 'syn_max_lines': 1.0, 'syn_prob_lines': 0.9}] Evaluation E85: 100%|██████████████████████████| 7/7 [00:00<00:00, 8.26it/s, values={'cer': 1.0, 'wer': 1.0, 'wer_no_punct': 1.0}]
Note: I also fixed an indentation error in
get_syn_proba_lines
changed milestone to %ML Prod - December 2022 n°1
added P2 label
Please register or sign in to reply