# Evaluation ## Description Use the `teklia-dan evaluate` command to evaluate a trained DAN model. To evaluate DAN on your dataset: 1. Create a JSON configuration file. You can base the configuration file off the training one. Refer to the [dedicated page](../train/config.md) for a description of parameters. 1. Run `teklia-dan evaluate --config path/to/your/config.json`. This will, for each evaluated split: 1. Create a YAML file with the evaluation results in the `results` subfolder of the `training.output_folder` indicated in your configuration. 1. Print in the console a metrics Markdown table (see [HTR example below](#htr-evaluation)). 1. Print in the console a [Nerval](https://gitlab.teklia.com/ner/nerval) metrics Markdown table, if the `dataset.tokens` parameter in your configuration is defined (see [HTR and NER example below](#htr-and-ner-evaluation)). 1. Print in the console the 5 worst predictions (see [examples below](#examples)). !!! warning The display of the worst predictions does not support batch evaluation. If the `training.data.batch_size` parameter is not equal to `1`, then the `WER` displayed is the `WER` of the **whole batch** and not just the image. | Parameter | Description | Type | Default | | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | -------------------------- | | `--config` | Path to the configuration file. | `pathlib.Path` | | | `--nerval-threshold` | Distance threshold for the match between gold and predicted entity during Nerval evaluation. `0` would impose perfect matches, `1` would allow completely different strings to be considered as a match. | `float` | `0.3` | | `--output-json` | Where to save evaluation results in JSON format. | `pathlib.Path` | `None` | | `--sets` | Which sets should be evaluated. | `str` | `["train", "val", "test"]` | ## Examples ### HTR evaluation ``` #### DAN evaluation | Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) | | :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: | | train | x | x | x | x | x | | val | x | x | x | x | x | | test | x | x | x | x | x | #### 5 worst prediction(s) | Image name | WER | Alignment between ground truth - prediction | | :------------: | :-: | :-----------------------------------------: | | <image_id>.png | x | x | | | | | | | | | x | ``` ### HTR and NER evaluation ``` #### DAN evaluation | Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) | NER | | :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: | :-: | | train | x | x | x | x | x | x | | val | x | x | x | x | x | x | | test | x | x | x | x | x | x | #### Nerval evaluation ##### train | tag | predicted | matched | Precision | Recall | F1 | Support | | :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: | | Surname | x | x | x | x | x | x | | All | x | x | x | x | x | x | ##### val | tag | predicted | matched | Precision | Recall | F1 | Support | | :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: | | Surname | x | x | x | x | x | x | | All | x | x | x | x | x | x | ##### test | tag | predicted | matched | Precision | Recall | F1 | Support | | :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: | | Surname | x | x | x | x | x | x | | All | x | x | x | x | x | x | #### 5 worst prediction(s) | Image name | WER | Alignment between ground truth - prediction | | :------------: | :-: | :-----------------------------------------: | | <image_id>.png | x | x | | | | | | | | | x | ```