Evaluation
Description
Use the teklia-dan evaluate
command to evaluate a trained DAN model.
To evaluate DAN on your dataset:
- Create a JSON configuration file. You can base the configuration file off the training one. Refer to the dedicated page for a description of parameters.
- Run
teklia-dan evaluate --config path/to/your/config.json
.
This will, for each evaluated split:
- Create a YAML file with the evaluation results in the
results
subfolder of thetraining.output_folder
indicated in your configuration. - Print in the console a metrics Markdown table (see HTR example below).
- Print in the console a Nerval metrics Markdown table, if the
dataset.tokens
parameter in your configuration is defined (see HTR and NER example below). - Print in the console the 5 worst predictions (see examples below).
!!! warning
The display of the worst predictions does not support batch evaluation. If the `training.data.batch_size` parameter is not equal to `1`, then the `WER` displayed is the `WER` of the **whole batch** and not just the image.
Parameter | Description | Type | Default |
---|---|---|---|
--config |
Path to the configuration file. | pathlib.Path |
|
--nerval-threshold |
Distance threshold for the match between gold and predicted entity during Nerval evaluation. 0 would impose perfect matches, 1 would allow completely different strings to be considered as a match. |
float |
0.3 |
--output-json |
Where to save evaluation results in JSON format. | pathlib.Path |
None |
--sets |
Which sets should be evaluated. | str |
["train", "val", "test"] |
Examples
HTR evaluation
#### DAN evaluation
| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) |
| :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: |
| train | x | x | x | x | x |
| val | x | x | x | x | x |
| test | x | x | x | x | x |
#### 5 worst prediction(s)
| Image name | WER | Alignment between ground truth - prediction |
| :------------: | :-: | :-----------------------------------------: |
| <image_id>.png | x | x |
| | | | |
| | | x |
HTR and NER evaluation
#### DAN evaluation
| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) | NER |
| :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: | :-: |
| train | x | x | x | x | x | x |
| val | x | x | x | x | x | x |
| test | x | x | x | x | x | x |
#### Nerval evaluation
##### train
| tag | predicted | matched | Precision | Recall | F1 | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: |
| Surname | x | x | x | x | x | x |
| All | x | x | x | x | x | x |
##### val
| tag | predicted | matched | Precision | Recall | F1 | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: |
| Surname | x | x | x | x | x | x |
| All | x | x | x | x | x | x |
##### test
| tag | predicted | matched | Precision | Recall | F1 | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: |
| Surname | x | x | x | x | x | x |
| All | x | x | x | x | x | x |
#### 5 worst prediction(s)
| Image name | WER | Alignment between ground truth - prediction |
| :------------: | :-: | :-----------------------------------------: |
| <image_id>.png | x | x |
| | | | |
| | | x |