Compare revisions

5a7dd7ef · 5a7dd7ef · 5a7dd7ef · 5a7dd7ef · 5a7dd7ef · 5a7dd7ef
--- a/docs/assets/augmentations/document_shearx.png
+++ b/docs/assets/augmentations/document_shearx.png
--- a/docs/assets/augmentations/line_color_jitter.png
+++ b/docs/assets/augmentations/line_color_jitter.png
--- a/docs/assets/augmentations/line_downscale.png
+++ b/docs/assets/augmentations/line_downscale.png
--- a/docs/assets/augmentations/line_dropout.png
+++ b/docs/assets/augmentations/line_dropout.png
--- a/docs/assets/augmentations/line_elastic.png
+++ b/docs/assets/augmentations/line_elastic.png
--- a/docs/assets/augmentations/line_erosion_dilation.png
+++ b/docs/assets/augmentations/line_erosion_dilation.png
--- a/docs/assets/augmentations/line_full_pipeline.png
+++ b/docs/assets/augmentations/line_full_pipeline.png
--- a/docs/assets/augmentations/line_gaussian_blur.png
+++ b/docs/assets/augmentations/line_gaussian_blur.png
--- a/docs/assets/augmentations/line_gaussian_noise.png
+++ b/docs/assets/augmentations/line_gaussian_noise.png
--- a/docs/assets/augmentations/line_grayscale.png
+++ b/docs/assets/augmentations/line_grayscale.png
--- a/docs/assets/augmentations/line_perspective.png
+++ b/docs/assets/augmentations/line_perspective.png
--- a/docs/assets/augmentations/line_piecewise.png
+++ b/docs/assets/augmentations/line_piecewise.png
--- a/docs/assets/augmentations/line_sharpen.png
+++ b/docs/assets/augmentations/line_sharpen.png
--- a/docs/assets/augmentations/line_shearx.png
+++ b/docs/assets/augmentations/line_shearx.png
--- a/docs/usage/train/augmentation.md
+++ b/docs/usage/train/augmentation.md
+# Data augmentation transforms
+
+This page lists data augmentation transforms used in DAN.
+
+## Individual augmentation transforms
+
+### Elastic Transform
+
+|                              | Elastic Transform                                                                                                                                                                              |
+| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation applies local distortions that rotate characters locally.                                                                                                                  |
+| Comments                     | The impact of this transformation is mostly visible on documents, not so much on lines. Results are comparable to the original DAN implementation.                                             |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.ElasticTransform). |
+| Examples                     | ![](../../assets/augmentations/line_elastic.png) ![](../../assets/augmentations/document_elastic.png)                                                                                          |
+| CPU time (seconds/10 images) | 0.44 (3013x128 pixels) / 0.86 (1116x581 pixels)                                                                                                                                                |
+
+### PieceWise Affine
+
+|                              | PieceWise Affine                                                                                                                                                                              |
+| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation also applies local distortions but with a larger grid than ElasticTransform.                                                                                              |
+| Comments                     | This transformation is very slow. It is a new transform that was not in the original DAN implementation.                                                                                      |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.PiecewiseAffine). |
+| Examples                     | ![](../../assets/augmentations/line_piecewise.png) ![](../../assets/augmentations/document_piecewise.png)                                                                                     |
+| CPU time (seconds/10 images) | 2.92 (3013x128 pixels) / 3.76 (1116x581 pixels)                                                                                                                                               |
+
+### Dilation Erosion
+
+|                              | Dilation & Erosion                                                                                                                                                                         |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Description                  | This transformation makes the pen stroke thicker or thinner.                                                                                                                               |
+| Comments                     | The `RandomDilationErosion` class randomly selects a kernel size and applies a dilation or an erosion to the image. It relies on opencv and is similar to the original DAN implementation. |
+| Documentation                | See the [`opencv` documentation](https://docs.opencv.org/3.4/db/df6/tutorial_erosion_dilatation.html).                                                                                     |
+| Examples                     | ![](../../assets/augmentations/line_erosion_dilation.png) ![](../../assets/augmentations/document_erosion_dilation.png)                                                                    |
+| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.03 (1116x581 pixels)                                                                                                                                            |
+
+### Sharpen
+
+|                              | Sharpen                                                                                                                                                           |
+| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation makes the image sharper.                                                                                                                      |
+| Comments                     | Similar to the original DAN implementation.                                                                                                                       |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Sharpen). |
+| Examples                     | ![](../../assets/augmentations/line_sharpen.png) ![](../../assets/augmentations/document_sharpen.png)                                                             |
+| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.04 (1116x581 pixels)                                                                                                                   |
+
+### Color Jittering
+
+|                              | Color Jittering                                                                                                                                                       |
+| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation alters the colors of the image.                                                                                                                   |
+| Comments                     | Similar to the original DAN implementation.                                                                                                                           |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.ColorJitter). |
+| Examples                     | ![](../../assets/augmentations/line_color_jitter.png) ![](../../assets/augmentations/document_color_jitter.png)                                                       |
+| CPU time (seconds/10 images) | 0.03 (3013x128 pixels) / 0.04 (1116x581 pixels)                                                                                                                       |
+
+### Gaussian Noise
+
+|                              | Gaussian Noise                                                                                                                                                          |
+| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation adds Gaussian noise to the image.                                                                                                                   |
+| Comments                     | The noise from the original DAN implementation is more uniform.                                                                                                         |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.GaussianNoise). |
+| Examples                     | ![](../../assets/augmentations/line_gaussian_noise.png) ![](../../assets/augmentations/document_gaussian_noise.png)                                                     |
+| CPU time (seconds/10 images) | 0.29 (3013x128 pixels) / 0.53 (1116x581 pixels)                                                                                                                         |
+
+### Gaussian Blur
+
+|                              | Gaussian Blur                                                                                                                                                          |
+| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation blurs the image.                                                                                                                                   |
+| Comments                     | Similar to the original DAN implementation.                                                                                                                            |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.GaussianBlur). |
+| Examples                     | ![](../../assets/augmentations/line_gaussian_blur.png) ![](../../assets/augmentations/document_gaussian_blur.png)                                                      |
+| CPU time (seconds/10 images) | 0.01 (3013x128 pixels) / 0.02 (1116x581 pixels)                                                                                                                        |
+
+### Random Perspective
+
+|                              | Random Perspective                                                                                                                                                    |
+| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation changes the perspective from which the photo is taken.                                                                                            |
+| Comments                     | Similar to the original DAN implementation.                                                                                                                           |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Perspective). |
+| Examples                     | ![](../../assets/augmentations/line_perspective.png) ![](../../assets/augmentations/document_perspective.png)                                                         |
+| CPU time (seconds/10 images) | 0.05 (3013x128 pixels) / 0.05 (1116x581 pixels)                                                                                                                       |
+
+### Shearing (x-axis)
+
+|                              | Shearing (x-axis)                                                                                                                                                                    |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Description                  | This transformation changes the slant of the text on the image.                                                                                                                      |
+| Comments                     | New transform that was not in the original DAN implementation.                                                                                                                       |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.Affine). |
+| Examples                     | ![](../../assets/augmentations/line_shearx.png) ![](../../assets/augmentations/document_shearx.png)                                                                                  |
+| CPU time (seconds/10 images) | 0.05 (3013x128 pixels) / 0.04 (1116x581 pixels)                                                                                                                                      |
+
+### Coarse Dropout
+
+|                              | Coarse Dropout                                                                                                                                                                              |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation adds dropout on the image, turning small patches into black pixels.                                                                                                     |
+| Comments                     | It is a new transform that was not in the original DAN implementation.                                                                                                                      |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/dropout/coarse_dropout/#coarsedropout-augmentation-augmentationsdropoutcoarse_dropout). |
+| Examples                     | ![](../../assets/augmentations/line_dropout.png) ![](../../assets/augmentations/document_dropout.png)                                                                                       |
+| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.02 (1116x581 pixels)                                                                                                                                             |
+
+### Downscale
+
+|                              | Downscale                                                                                                                                                           |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation downscales the image by a random factor.                                                                                                        |
+| Comments                     | It is a new transform that was not in the original DAN implementation.                                                                                              |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Downscale). |
+| Examples                     | ![](../../assets/augmentations/line_downscale.png) ![](../../assets/augmentations/document_downscale.png)                                                           |
+| CPU time (seconds/10 images) | 0.03 (3013x128 pixels) / 0.03 (1116x581 pixels)                                                                                                                     |
+
+### Grayscale
+
+|                              | Grayscale                                                                                                                                                        |
+| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation transforms an RGB image into grayscale.                                                                                                      |
+| Comments                     | It is a new transform that was not in the original DAN implementation.                                                                                           |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.ToGray). |
+| Examples                     | ![](../../assets/augmentations/line_grayscale.png) ![](../../assets/augmentations/document_grayscale.png)                                                        |
+| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.02 (1116x581 pixels)                                                                                                                  |
+
+## Full augmentation pipeline
+
+* Data augmentation is applied with a probability of 0.9.
+* In this case, two transformations are randomly selected to be applied.
+*  `ElasticTransform` and `PieceWiseAffine` cannot be applied on the same image.
+* Reproducibility is possible by setting `random.seed` and `np.random.seed` (already done in `dan/ocr/document/train.py`)
+* Examples with new pipeline:
+
+![](../../assets/augmentations/line_full_pipeline.png)
+![](../../assets/augmentations/document_full_pipeline.png)
+![](../../assets/augmentations/document_full_pipeline_2.png)
--- a/docs/usage/train/index.md
+++ b/docs/usage/train/index.md
@@ -21,3 +21,4 @@ To train DAN on lines, run `teklia-dan train document` with a line dataset.
 ## Additional page

 * [Jean Zay tutorial](jeanzay.md)
+* [Data augmentation](augmentation.md)
--- a/docs/usage/train/parameters.md
+++ b/docs/usage/train/parameters.md
@@ -3,65 +3,82 @@ All hyperparameters are specified and editable in the training scripts (meaning

 ## Dataset parameters

-| Parameter                               | Description                                                                            | Type         | Default                                        |
-| --------------------------------------- | -------------------------------------------------------------------------------------- | ------------ | ---------------------------------------------- |
-| `dataset_name`                          | Name of the dataset.                                                                   | `str`        |                                                |
-| `dataset_level`                         | Level of the dataset. Should be named after the element type.                          | `str`        |                                                |
-| `dataset_variant`                       | Variant of the dataset. Usually empty for HTR datasets, `"_sem"` for HTR+NER datasets. | `str`        |                                                |
-| `dataset_path`                          | Path to the dataset.                                                                   | `str`        |                                                |
-| `dataset_params.config.dataset_manager` | Dataset manager class.                                                                 | custom class | `OCRDatasetManager`                            |
-| `dataset_params.config.dataset_class`   | Dataset class.                                                                         | custom class | `OCRDataset`                                   |
-| `dataset_params.config.datasets`        | Dataset dictionary with the dataset name as key and dataset path as value.             | `dict`       |                                                |
-| `dataset_params.config.load_in_memory`  | Load all images in CPU memory.                                                         | `str`        | `True`                                         |
-| `dataset_params.config.worker_per_gpu`  | Number of parallel processes per gpu for data loading.                                 | `int`        | `4`                                            |
-| `dataset_params.config.preprocessings`  | List of pre-processing functions to apply to input images.                             | `list`       | (see [dedicated section](#data-preprocessing)) |
-| `dataset_params.config.augmentation`    | Configuration for data augmentation.                                                   | `dict`       | (see [dedicated section](#data-augmentation))  |
+| Parameter                               | Description                                                                            | Type         | Default                                              |
+| --------------------------------------- | -------------------------------------------------------------------------------------- | ------------ | ---------------------------------------------------- |
+| `dataset_name`                          | Name of the dataset.                                                                   | `str`        |                                                      |
+| `dataset_level`                         | Level of the dataset. Should be named after the element type.                          | `str`        |                                                      |
+| `dataset_variant`                       | Variant of the dataset. Usually empty for HTR datasets, `"_sem"` for HTR+NER datasets. | `str`        |                                                      |
+| `dataset_path`                          | Path to the dataset.                                                                   | `str`        |                                                      |
+| `dataset_params.config.dataset_manager` | Dataset manager class.                                                                 | custom class | `OCRDatasetManager`                                  |
+| `dataset_params.config.dataset_class`   | Dataset class.                                                                         | custom class | `OCRDataset`                                         |
+| `dataset_params.config.datasets`        | Dataset dictionary with the dataset name as key and dataset path as value.             | `dict`       |                                                      |
+| `dataset_params.config.load_in_memory`  | Load all images in CPU memory.                                                         | `str`        | `True`                                               |
+| `dataset_params.config.worker_per_gpu`  | Number of parallel processes per gpu for data loading.                                 | `int`        | `4`                                                  |
+| `dataset_params.config.preprocessings`  | List of pre-processing functions to apply to input images.                             | `list`       | (see [dedicated section](#data-preprocessing))       |
+| `dataset_params.config.augmentation`    | Whether to use data augmentation on the training set.                                  | `bool`       | `True` (see [dedicated section](#data-augmentation)) |


 ### Data preprocessing

-Preprocessing is applied before training the network (see `dan/manager/dataset.py`).
-The following transformations are implemented:
+Preprocessing is applied before training the network (see `dan/manager/dataset.py`). The list of accepted transforms is defined in `dan/transforms.py`:

-* Convert to grayscale
 ```py
-{
-    "type": "to_grayscaled"
-}
+class Preprocessing(Enum):
+    # If the image is bigger than the given size, resize it while keeping the original ratio
+    MaxResize = "max_resize"
+    # Resize the height to a fixed value while keeping the original ratio
+    FixedHeightResize = "fixed_height_resize"
+    # Resize the width to a fixed value while keeping the original ratio
+    FixedWidthResize = "fixed_width_resize"
 ```
-* Convert to RGB
+
+Usage:
+
+* Resize to a fixed height
+
 ```py
-{
-    "type": "to_RGB"
-}
+[
+    {
+        "type": Preprocessing.FixedHeightResize,
+        "fixed_height": 1500,
+    }
+]
 ```
-* Resize to a fixed height
+
+* Resize to a fixed width
+
 ```py
-{
-    "type": "fixed_height",
-    "fixed_height": 1000,
-}
+[
+    {
+        "type": Preprocessing.FixedWidthResize,
+        "fixed_height": 1500,
+    }
+]
 ```
-* Resize to a maximum size
+
+* Resize to a maximum size (only if the image is bigger than the given size)
+
 ```py
-{
-    "type": "resize",
-    "keep_ratio": True,
-    "max_height": 1000,
-    "max_width": None,
-}
+[
+    {
+        "type": Preprocessing.MaxResize,
+        "max_height": 2000,
+        "max_width": 2000,
+    }
+]
 ```

-Multiple transformations can be combined. For example, to resize an image to a fixed height of 1000 pixels and convert images to RGB, use the following configuration in `dataset_params.config.preprocessings`:
+* Combine these pre-processings

 ```py
 [
    {
-        "type": "fixed_height",
-        "fixed_height": 1000
+        "type": Preprocessing.FixedHeightResize,
+        "fixed_height": 2000,
    },
    {
-        "type": "to_RGB"
+        "type": Preprocessing.FixedWidthResize,
+        "fixed_width": 2000,
    }
 ]
 ```
@@ -70,83 +87,41 @@ Multiple transformations can be combined. For example, to resize an image to a f

 Augmentation transformations are applied on-the-fly during training to artificially increase data variability.

-The following transformations are implemented in `dan/transforms.py`:
-* Color inversion
-* Dilation and erosion
-* Elastic distortion
-* Reducing interline spacing
-* Gaussian blur
-* Gaussian noise
-
-DAN also takes advantage of [transforms from torchvision](https://pytorch.org/vision/stable/transforms.html):
-* ColorJitter
-* GaussianBlur
-* RandomCrop
-* RandomPerspective
-
-The following configuration is used by default when using the `teklia-dan train document` command. Data augmentation is applied with a probability of 0.9, and each transformation has a 0.1 probability to be used.
+DAN takes advantage of transforms from [albumentations](https://albumentations.ai/).
+The following configuration is used by default when using the `teklia-dan train document` command. Data augmentation is applied with a probability of 0.9. In this case, two transformations are randomly selected to be applied.

 ```py
-{
-        "order": "random",
-        "proba": 0.9,
-        "augmentations": [
-            {
-                "type": "perspective",
-                "proba": 0.1,
-                "min_factor": 0,
-                "max_factor": 0.4,
-            },
-            {
-                "type": "elastic_distortion",
-                "proba": 0.1,
-                "min_alpha": 0.5,
-                "max_alpha": 1,
-                "min_sigma": 1,
-                "max_sigma": 10,
-                "min_kernel_size": 3,
-                "max_kernel_size": 9,
-            },
-            {
-                "type": "dilation_erosion",
-                "proba": 0.1,
-                "min_kernel": 1,
-                "max_kernel": 3,
-                "iterations": 1,
-            },
-            {
-                "type": "color_jittering",
-                "proba": 0.1,
-                "factor_hue": 0.2,
-                "factor_brightness": 0.4,
-                "factor_contrast": 0.4,
-                "factor_saturation": 0.4,
-            },
-            {
-                "type": "gaussian_blur",
-                "proba": 0.1,
-                "min_kernel": 3,
-                "max_kernel": 5,
-                "min_sigma": 3,
-                "max_sigma": 5,
-            },
-            {
-                "type": "gaussian_noise",
-                "proba": 0.1,
-                "std": 0.5,
-            },
-            {
-                "type": "sharpen",
-                "proba": 0.1,
-                "min_alpha": 0,
-                "max_alpha": 1,
-                "min_strength": 0,
-                "max_strength": 1,
-            },
-        ],
-    }
+transforms = SomeOf(
+    [
+        Perspective(scale=(0.05, 0.09), fit_output=True),
+        GaussianBlur(sigma_limit=2.5),
+        GaussNoise(var_limit=50**2),
+        ColorJitter(contrast=0.2, brightness=0.2, saturation=0.2, hue=0.2),
+        OneOf(
+            [
+                ElasticTransform(
+                    alpha=20.0,
+                    sigma=5.0,
+                    alpha_affine=1.0,
+                    border_mode=0,
+                ),
+                PiecewiseAffine(scale=(0.01, 0.04), nb_rows=1, nb_cols=4),
+            ]
+        ),
+        Sharpen(alpha=(0.0, 1.0)),
+        ErosionDilation(min_kernel=1, max_kernel=4, iterations=1),
+        Affine(shear={"x": (-20, 20), "y": (0, 0)}),
+        CoarseDropout(),
+        Downscale(scale_min=0.5, scale_max=0.9, interpolation=INTER_NEAREST),
+        ToGray(),
+    ],
+    n=2,
+    p=0.9,
+)
 ```

+For a detailed description of all augmentation transforms, see the [dedicated page](augmentation.md).
+
 ## Model parameters

 | Name                                      | Description                                                                          | Type          | Default                                                           |
@@ -180,32 +155,32 @@ The following configuration is used by default when using the `teklia-dan train

 ## Training parameters

-| Name                                                        | Description                                                                 | Type         | Default                                     |
-| ----------------------------------------------------------- | --------------------------------------------------------------------------- | ------------ | ------------------------------------------- |
-| `training_params.output_folder`                             | Directory for checkpoint and results.                                       | `str`        |                                             |
-| `training_params.max_nb_epochs`                             | Maximum number of epochs before stopping training.                          | `int`        | `800`                                       |
-| `training_params.max_training_time`                         | Maximum time (in seconds) before stopping training.                         | `int`        | `350000`                                    |
-| `training_params.load_epoch`                                | Model to load. Should be either `"best"` (evaluation) or `last` (training). | `str`        | `"last"`                                    |
-| `training_params.interval_save_weights`                     | Step to save weights. Set to `None` to keep only best and last epochs.      | `int`        | `None`                                      |
-| `training_params.batch_size`                                | Mini-batch size for the training loop.                                      | `int`        | `2`                                         |
-| `training_params.valid_batch_size`                          | Mini-batch size for the valdiation loop.                                    | `int`        | `4`                                         |
-| `training_params.use_ddp`                                   | Whether to use DistributedDataParallel.                                     | `bool`       | `False`                                     |
-| `training_params.ddp_port`                                  | DDP port.                                                                   | `int`        | `20027`                                     |
-| `training_params.use_amp`                                   | Whether to enable automatic mix-precision.                                  | `int`        | `torch.cuda.device_count()`                 |
-| `training_params.nb_gpu`                                    | Number of GPUs to train DAN.                                                | `str`        |                                             |
-| `training_params.optimizers.all.class`                      | Optimizer class.                                                            | custom class | `Adam`                                      |
-| `training_params.optimizers.all.args.lr`                    | Learning rate for the optimizer.                                            | `float`      | `0.0001`                                    |
-| `training_params.optimizers.all.args.amsgrad`               | Whether to use AMSGrad optimization.                                        | custom class | `False`                                     |
-| `training_params.lr_schedulers`                             | Learning rate schedulers.                                                   | custom class | `None`                                      |
-| `training_params.eval_on_valid`                             | Whether to evaluate and log metrics on the validation set during training.  | `bool`       | `True`                                      |
-| `training_params.eval_on_valid_interval`                    | Interval (in epochs) to evaluate during training.                           | `int`        | `5`                                         |
-| `training_params.focus_metric`                              | Metrics to focus on to determine best epoch.                                | `str`        | `cer`                                       |
-| `training_params.expected_metric_value`                     | Best value for the focus metric. Should be either `"high"` or `"low"`.      | `low`        | `cer`                                       |
-| `training_params.set_name_focus_metric`                     | Dataset to focus on to select best weights.                                 | `str`        |                                             |
-| `training_params.train_metrics`                             | List of metrics to compute during training.                                 | `list`       | `["loss_ce", "cer", "wer", "wer_no_punct"]` |
-| `training_params.train_metrics`                             | List of metrics to compute during validation.                               | `list`       | `["cer", "wer", "wer_no_punct"]`            |
-| `training_params.force_cpu`                                 | Whether to train on CPU (for debugging).                                    | `bool`       | `False`                                     |
-| `training_params.max_char_prediction`                       | Maximum number of characters to predict.                                    | `int`        | `1000`                                      |
+| Name                                                    | Description                                                                 | Type         | Default                                     |
+| ------------------------------------------------------- | --------------------------------------------------------------------------- | ------------ | ------------------------------------------- |
+| `training_params.output_folder`                         | Directory for checkpoint and results.                                       | `str`        |                                             |
+| `training_params.max_nb_epochs`                         | Maximum number of epochs before stopping training.                          | `int`        | `800`                                       |
+| `training_params.max_training_time`                     | Maximum time (in seconds) before stopping training.                         | `int`        | `350000`                                    |
+| `training_params.load_epoch`                            | Model to load. Should be either `"best"` (evaluation) or `last` (training). | `str`        | `"last"`                                    |
+| `training_params.interval_save_weights`                 | Step to save weights. Set to `None` to keep only best and last epochs.      | `int`        | `None`                                      |
+| `training_params.batch_size`                            | Mini-batch size for the training loop.                                      | `int`        | `2`                                         |
+| `training_params.valid_batch_size`                      | Mini-batch size for the valdiation loop.                                    | `int`        | `4`                                         |
+| `training_params.use_ddp`                               | Whether to use DistributedDataParallel.                                     | `bool`       | `False`                                     |
+| `training_params.ddp_port`                              | DDP port.                                                                   | `int`        | `20027`                                     |
+| `training_params.use_amp`                               | Whether to enable automatic mix-precision.                                  | `int`        | `torch.cuda.device_count()`                 |
+| `training_params.nb_gpu`                                | Number of GPUs to train DAN.                                                | `str`        |                                             |
+| `training_params.optimizers.all.class`                  | Optimizer class.                                                            | custom class | `Adam`                                      |
+| `training_params.optimizers.all.args.lr`                | Learning rate for the optimizer.                                            | `float`      | `0.0001`                                    |
+| `training_params.optimizers.all.args.amsgrad`           | Whether to use AMSGrad optimization.                                        | custom class | `False`                                     |
+| `training_params.lr_schedulers`                         | Learning rate schedulers.                                                   | custom class | `None`                                      |
+| `training_params.eval_on_valid`                         | Whether to evaluate and log metrics on the validation set during training.  | `bool`       | `True`                                      |
+| `training_params.eval_on_valid_interval`                | Interval (in epochs) to evaluate during training.                           | `int`        | `5`                                         |
+| `training_params.focus_metric`                          | Metrics to focus on to determine best epoch.                                | `str`        | `cer`                                       |
+| `training_params.expected_metric_value`                 | Best value for the focus metric. Should be either `"high"` or `"low"`.      | `low`        | `cer`                                       |
+| `training_params.set_name_focus_metric`                 | Dataset to focus on to select best weights.                                 | `str`        |                                             |
+| `training_params.train_metrics`                         | List of metrics to compute during training.                                 | `list`       | `["loss_ce", "cer", "wer", "wer_no_punct"]` |
+| `training_params.train_metrics`                         | List of metrics to compute during validation.                               | `list`       | `["cer", "wer", "wer_no_punct"]`            |
+| `training_params.force_cpu`                             | Whether to train on CPU (for debugging).                                    | `bool`       | `False`                                     |
+| `training_params.max_char_prediction`                   | Maximum number of characters to predict.                                    | `int`        | `1000`                                      |
 | `training_params.label_noise_scheduler.min_error_rate`  | Minimum ratio of teacher forcing.                                           | `float`      | `0.2`                                       |
 | `training_params.label_noise_scheduler.max_error_rate`  | Maximum ratio of teacher forcing.                                           | `float`      | `0.2`                                       |
 | `training_params.label_noise_scheduler.total_num_steps` | Number of steps before stopping teacher forcing.                            | `float`      | `5e4`                                       |
@@ -214,11 +189,12 @@ The following configuration is used by default when using the `teklia-dan train
 ## MLFlow logging

 To log your experiment on MLFlow, you need to:
+
 - install the extra requirements via

-    ```shell
-    $ pip install .[mlflow]
-    ```
+```shell
+$ pip install .[mlflow]
+```

 - update the following arguments:


--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -62,6 +62,7 @@ nav:
    - Training:
      - usage/train/index.md
      - Parameters: usage/train/parameters.md
+      - Data augmentation: usage/train/augmentation.md
      - Jean Zay tutorial: usage/train/jeanzay.md
    - Predict: usage/predict.md
  - Documentation development: dev/build_docs.md

--- a/requirements.txt
+++ b/requirements.txt
+albumentations==1.3.1
 arkindex-export==0.1.3
 boto3==1.26.124
 editdistance==0.6.2

--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -9,7 +9,7 @@ from torch.optim import Adam
 from dan.decoder import GlobalHTADecoder
 from dan.encoder import FCN_Encoder
 from dan.schedulers import exponential_dropout_scheduler
-from dan.transforms import aug_config
+from dan.transforms import Preprocessing

 FIXTURES = Path(__file__).resolve().parent / "data"

@@ -70,11 +70,12 @@ def training_config():
                "load_in_memory": True,  # Load all images in CPU memory
                "preprocessings": [
                    {
-                        "type": "to_RGB",
-                        # if grayscaled image, produce RGB one (3 channels with same value) otherwise do nothing
+                        "type": Preprocessing.MaxResize,
+                        "max_width": 2000,
+                        "max_height": 2000,
                    },
                ],
-                "augmentation": aug_config(0.9, 0.1),
+                "augmentation": True,
            },
        },
        "model_params": {
No results found