Update eval data for tests (!340) · Merge requests · Automatic Text Recognition / DAN

Manon Blanco requested to merge update-eval-data-test into main Dec 21, 2023

If we follow the actual documentation, the teklia-dan evaluate --config configs/eval.json command gives us the following results:

| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) |
|:-----:|:-------------:|:---------:|:-------------:|:---------:|:------------------:|
| train |     130.23    |   130.23  |     100.0     |   100.0   |       100.0        |
|  val  |     126.83    |   126.83  |     100.0     |   100.0   |       100.0        |
|  test |     112.24    |   112.24  |     100.0     |   100.0   |       100.0        |

So I updated the data to have consistent results:

I used a tokens.yml file,
I used a batch size of 1,
I used the prediction model/data:
- I replace the previous tests/data/evaluate/checkpoints/best_0.pt file by the tests/data/prediction/model.pt file),
- I wrote a labels.json file

Edited Dec 21, 2023 by Manon Blanco

Update eval data for tests

Merge request reports