diff --git a/docs/usage/evaluate/index.md b/docs/usage/evaluate/index.md
index 2153ebd776c0767b8e55edf8ffaa9cec2422f307..94d57d18c812acbe184c721c0de3e52ad196d8a2 100644
--- a/docs/usage/evaluate/index.md
+++ b/docs/usage/evaluate/index.md
@@ -12,15 +12,19 @@ To evaluate DAN on your dataset:
 This will, for each evaluated split:
 
 1. Create a YAML file with the evaluation results in the `results` subfolder of the `training.output_folder` indicated in your configuration.
-1. Print in the console a metrics Markdown table (see [table example below](#htr-evaluation)).
-1. Print in the console a [Nerval](https://gitlab.teklia.com/ner/nerval) metrics Markdown table, if the `dataset.tokens` parameter in your configuration is defined (see [table example below](#htr-and-ner-evaluation)).
+1. Print in the console a metrics Markdown table (see [HTR example below](#htr-evaluation)).
+1. Print in the console a [Nerval](https://gitlab.teklia.com/ner/nerval) metrics Markdown table, if the `dataset.tokens` parameter in your configuration is defined (see [HTR and NER example below](#htr-and-ner-evaluation)).
+1. Print in the console the 5 worst predictions (see [examples below](#examples)).
+
+!!! warning
+    The display of the worst predictions does not support batch evaluation. If the `training.data.batch_size` parameter is not equal to `1`, then the `WER` displayed is the `WER` of the **whole batch** and not just the image.
 
 | Parameter            | Description                                                                                                                                                                                              | Type           | Default |
 | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | ------- |
 | `--config`           | Path to the configuration file.                                                                                                                                                                          | `pathlib.Path` |         |
 | `--nerval-threshold` | Distance threshold for the match between gold and predicted entity during Nerval evaluation. `0` would impose perfect matches, `1` would allow completely different strings to be considered as a match. | `float`        | `0.3`   |
 
-## Example output
+## Examples
 
 ### HTR evaluation
 
@@ -32,6 +36,14 @@ This will, for each evaluated split:
 | train |       x       |     x     |       x       |     x     |         x          |
 |  val  |       x       |     x     |       x       |     x     |         x          |
 | test  |       x       |     x     |       x       |     x     |         x          |
+
+#### 5 worst prediction(s)
+
+|   Image name   | WER | Alignment between ground truth - prediction |
+| :------------: | :-: | :-----------------------------------------: |
+| <image_id>.png |  x  |                      x                      |
+|                |     |                      |                      |
+|                |     |                      x                      |
 ```
 
 ### HTR and NER evaluation
@@ -67,4 +79,12 @@ This will, for each evaluated split:
 | :-----: | :-------: | :-----: | :-------: | :----: | :-: | :-----: |
 | Surname |     x     |    x    |     x     |   x    |  x  |    x    |
 |   All   |     x     |    x    |     x     |   x    |  x  |    x    |
+
+#### 5 worst prediction(s)
+
+|   Image name   | WER | Alignment between ground truth - prediction |
+| :------------: | :-: | :-----------------------------------------: |
+| <image_id>.png |  x  |                      x                      |
+|                |     |                      |                      |
+|                |     |                      x                      |
 ```