Worst predictions display fails due to an empty string
Evaluation fails on Socface images due to empty lines: the worst predictions are not displayed due to an empty string:
Evaluation: 1%| | 1/126 [00:06<12:56, 6.21s/it, values={'cer': nan, 'cer_no_token': nan, 'wer': 0.0,
2024-02-08 11:06:23,748 INFO/dan.ocr.evaluate: Evaluating on set `test`
Evaluation: 1%| | 1/109 [00:22<40:16, 22.37s/it, values={'cer': 0.0599, 'cer_no_token': 0.0651, 'wer':
#### DAN evaluation
| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) | NER |
|:-----:|:-------------:|:---------:|:-------------:|:---------:|:------------------:|:----:|
| val | nan | nan | 0.0 | 0.0 | 0.0 | nan |
| test | 5.99 | 6.51 | 17.24 | 16.14 | 16.14 | 1.84 |
#### Nerval evaluation
##### test
| tag | predicted | matched | Precision | Recall | F1 | Support |
|:-----------------:|:---------:|:-------:|:---------:|:------:|:-----:|:-------:|
| surname_household | 10 | 9 | 90.0 | 81.82 | 85.71 | 11 |
| surname | 20 | 18 | 90.0 | 94.74 | 92.31 | 19 |
| occupation | 30 | 30 | 100.0 | 100.0 | 100.0 | 30 |
| nationality | 30 | 30 | 100.0 | 100.0 | 100.0 | 30 |
| lob | 30 | 23 | 76.67 | 79.31 | 77.97 | 29 |
| link | 30 | 30 | 100.0 | 100.0 | 100.0 | 30 |
| firstname | 30 | 28 | 93.33 | 93.33 | 93.33 | 30 |
| employer | 8 | 7 | 87.5 | 87.5 | 87.5 | 8 |
| birth_date | 30 | 30 | 100.0 | 100.0 | 100.0 | 30 |
| All | 218 | 205 | 94.04 | 94.47 | 94.25 | 217 |
Traceback (most recent call last):
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/env/bin/teklia-dan", line 8, in <module>
sys.exit(main())
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/cli.py", line 26, in main
status = args.pop("func")(**args)
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 221, in run
eval(0, config, nerval_threshold, mlflow_logging)
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 199, in eval
print_worst_predictions(all_inferences)
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 90, in print_worst_predictions
alignment = getNiceAlignment(
File "edlib.pyx", line 195, in edlib.getNiceAlignment
Exception: The object alignResult contains an empty CIGAR string. Users must run align() with task='path'. Please check the input alignResult.