Skip to content

Worst predictions display fails due to an empty string

Evaluation fails on Socface images due to empty lines: the worst predictions are not displayed due to an empty string:

Evaluation:   1%| | 1/126 [00:06<12:56,  6.21s/it, values={'cer': nan, 'cer_no_token': nan, 'wer': 0.0,
2024-02-08 11:06:23,748 INFO/dan.ocr.evaluate: Evaluating on set `test`
Evaluation:   1%| | 1/109 [00:22<40:16, 22.37s/it, values={'cer': 0.0599, 'cer_no_token': 0.0651, 'wer': 

#### DAN evaluation

| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) | NER  |
|:-----:|:-------------:|:---------:|:-------------:|:---------:|:------------------:|:----:|
|  val  |      nan      |    nan    |      0.0      |    0.0    |        0.0         | nan  |
|  test |      5.99     |    6.51   |     17.24     |   16.14   |       16.14        | 1.84 |

#### Nerval evaluation

##### test

|        tag        | predicted | matched | Precision | Recall |   F1  | Support |
|:-----------------:|:---------:|:-------:|:---------:|:------:|:-----:|:-------:|
| surname_household |     10    |    9    |    90.0   | 81.82  | 85.71 |    11   |
|      surname      |     20    |    18   |    90.0   | 94.74  | 92.31 |    19   |
|     occupation    |     30    |    30   |   100.0   | 100.0  | 100.0 |    30   |
|    nationality    |     30    |    30   |   100.0   | 100.0  | 100.0 |    30   |
|        lob        |     30    |    23   |   76.67   | 79.31  | 77.97 |    29   |
|        link       |     30    |    30   |   100.0   | 100.0  | 100.0 |    30   |
|     firstname     |     30    |    28   |   93.33   | 93.33  | 93.33 |    30   |
|      employer     |     8     |    7    |    87.5   |  87.5  |  87.5 |    8    |
|     birth_date    |     30    |    30   |   100.0   | 100.0  | 100.0 |    30   |
|        All        |    218    |   205   |   94.04   | 94.47  | 94.25 |   217   |
Traceback (most recent call last):
  File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/env/bin/teklia-dan", line 8, in <module>
    sys.exit(main())
  File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/cli.py", line 26, in main
    status = args.pop("func")(**args)
  File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 221, in run
    eval(0, config, nerval_threshold, mlflow_logging)
  File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 199, in eval
    print_worst_predictions(all_inferences)
  File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 90, in print_worst_predictions
    alignment = getNiceAlignment(
  File "edlib.pyx", line 195, in edlib.getNiceAlignment
Exception: The object alignResult contains an empty CIGAR string. Users must run align() with task='path'. Please check the input alignResult.