# Evaluation

## Description

Use the `teklia-dan evaluate` command to evaluate a trained DAN model.

To evaluate DAN on your dataset:

1. Create a JSON configuration file. You can base the configuration file off the training one. Refer to the [dedicated page](../train/config.md) for a description of parameters.
1. Run `teklia-dan evaluate --config path/to/your/config.json`.

This will, for each evaluated split:

1. Create a YAML file with the evaluation results in the `results` subfolder of the `training.output_folder` indicated in your configuration.
1. Print in the console a metrics Markdown table (see [HTR example below](#htr-evaluation)).
1. Print in the console a [Nerval](https://gitlab.teklia.com/ner/nerval) metrics Markdown table, if the `dataset.tokens` parameter in your configuration is defined (see [HTR and NER example below](#htr-and-ner-evaluation)).
1. Print in the console the 5 worst predictions (see [examples below](#examples)).

!!! warning

    The display of the worst predictions does not support batch evaluation. If the `training.data.batch_size` parameter is not equal to `1`, then the `WER` displayed is the `WER` of the **whole batch** and not just the image.

| Parameter            | Description                                                                                                                                                                                              | Type           | Default                    |
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | -------------------------- |
| `--config`           | Path to the configuration file.                                                                                                                                                                          | `pathlib.Path` |                            |
| `--nerval-threshold` | Distance threshold for the match between gold and predicted entity during Nerval evaluation. `0` would impose perfect matches, `1` would allow completely different strings to be considered as a match. | `float`        | `0.3`                      |
| `--output-json`      | Where to save evaluation results in JSON format.                                                                                                                                                         | `pathlib.Path` | `None`                     |
| `--sets`             | Which sets should be evaluated.                                                                                                                                                                          | `str`          | `["train", "val", "test"]` |

## Examples

### HTR evaluation

```
#### DAN evaluation

| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) |
| :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: |
| train |       x       |     x     |       x       |     x     |         x          |
|  val  |       x       |     x     |       x       |     x     |         x          |
| test  |       x       |     x     |       x       |     x     |         x          |

#### 5 worst prediction(s)

|   Image name   | WER | Alignment between ground truth - prediction |
| :------------: | :-: | :-----------------------------------------: |
| <image_id>.png |  x  |                      x                      |
|                |     |                      |                      |
|                |     |                      x                      |
```

### HTR and NER evaluation

```
#### DAN evaluation

| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) | NER |
| :---: | :-----------: | :-------: | :-----------: | :-------: | :----------------: | :-: |
| train |       x       |     x     |       x       |     x     |         x          |  x  |
|  val  |       x       |     x     |       x       |     x     |         x          |  x  |
| test  |       x       |     x     |       x       |     x     |         x          |  x  |

#### Nerval evaluation

##### train

|   tag   | predicted | matched | Precision | Recall | F1  | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: |
| Surname |     x     |    x    |     x     |   x    |  x  |    x    |
|   All   |     x     |    x    |     x     |   x    |  x  |    x    |

##### val

|   tag   | predicted | matched | Precision | Recall | F1  | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: |
| Surname |     x     |    x    |     x     |   x    |  x  |    x    |
|   All   |     x     |    x    |     x     |   x    |  x  |    x    |

##### test

|   tag   | predicted | matched | Precision | Recall | F1  | Support |
| :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: |
| Surname |     x     |    x    |     x     |   x    |  x  |    x    |
|   All   |     x     |    x    |     x     |   x    |  x  |    x    |

#### 5 worst prediction(s)

|   Image name   | WER | Alignment between ground truth - prediction |
| :------------: | :-: | :-----------------------------------------: |
| <image_id>.png |  x  |                      x                      |
|                |     |                      |                      |
|                |     |                      x                      |
```