Compare revisions

5a7dd7ef · 5a7dd7ef · 5a7dd7ef · 5a7dd7ef · 5a7dd7ef · 5a7dd7ef
--- a/docs/assets/augmentations/document_shearx.png
+++ b/docs/assets/augmentations/document_shearx.png
--- a/docs/assets/augmentations/line_color_jitter.png
+++ b/docs/assets/augmentations/line_color_jitter.png
--- a/docs/assets/augmentations/line_downscale.png
+++ b/docs/assets/augmentations/line_downscale.png
--- a/docs/assets/augmentations/line_dropout.png
+++ b/docs/assets/augmentations/line_dropout.png
--- a/docs/assets/augmentations/line_elastic.png
+++ b/docs/assets/augmentations/line_elastic.png
--- a/docs/assets/augmentations/line_erosion_dilation.png
+++ b/docs/assets/augmentations/line_erosion_dilation.png
--- a/docs/assets/augmentations/line_full_pipeline.png
+++ b/docs/assets/augmentations/line_full_pipeline.png
--- a/docs/assets/augmentations/line_gaussian_blur.png
+++ b/docs/assets/augmentations/line_gaussian_blur.png
--- a/docs/assets/augmentations/line_gaussian_noise.png
+++ b/docs/assets/augmentations/line_gaussian_noise.png
--- a/docs/assets/augmentations/line_grayscale.png
+++ b/docs/assets/augmentations/line_grayscale.png
--- a/docs/assets/augmentations/line_perspective.png
+++ b/docs/assets/augmentations/line_perspective.png
--- a/docs/assets/augmentations/line_piecewise.png
+++ b/docs/assets/augmentations/line_piecewise.png
--- a/docs/assets/augmentations/line_sharpen.png
+++ b/docs/assets/augmentations/line_sharpen.png
--- a/docs/assets/augmentations/line_shearx.png
+++ b/docs/assets/augmentations/line_shearx.png
--- a/docs/usage/train/augmentation.md
+++ b/docs/usage/train/augmentation.md
+# Data augmentation transforms
+
+This page lists data augmentation transforms used in DAN.
+
+## Individual augmentation transforms
+
+### Elastic Transform
+
+|                              | Elastic Transform                                                                                                                                                                              |
+| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation applies local distortions that rotate characters locally.                                                                                                                  |
+| Comments                     | The impact of this transformation is mostly visible on documents, not so much on lines. Results are comparable to the original DAN implementation.                                             |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.ElasticTransform). |
+| Examples                     | ![](../../assets/augmentations/line_elastic.png) ![](../../assets/augmentations/document_elastic.png)                                                                                          |
+| CPU time (seconds/10 images) | 0.44 (3013x128 pixels) / 0.86 (1116x581 pixels)                                                                                                                                                |
+
+### PieceWise Affine
+
+|                              | PieceWise Affine                                                                                                                                                                              |
+| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation also applies local distortions but with a larger grid than ElasticTransform.                                                                                              |
+| Comments                     | This transformation is very slow. It is a new transform that was not in the original DAN implementation.                                                                                      |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.PiecewiseAffine). |
+| Examples                     | ![](../../assets/augmentations/line_piecewise.png) ![](../../assets/augmentations/document_piecewise.png)                                                                                     |
+| CPU time (seconds/10 images) | 2.92 (3013x128 pixels) / 3.76 (1116x581 pixels)                                                                                                                                               |
+
+### Dilation Erosion
+
+|                              | Dilation & Erosion                                                                                                                                                                         |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Description                  | This transformation makes the pen stroke thicker or thinner.                                                                                                                               |
+| Comments                     | The `RandomDilationErosion` class randomly selects a kernel size and applies a dilation or an erosion to the image. It relies on opencv and is similar to the original DAN implementation. |
+| Documentation                | See the [`opencv` documentation](https://docs.opencv.org/3.4/db/df6/tutorial_erosion_dilatation.html).                                                                                     |
+| Examples                     | ![](../../assets/augmentations/line_erosion_dilation.png) ![](../../assets/augmentations/document_erosion_dilation.png)                                                                    |
+| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.03 (1116x581 pixels)                                                                                                                                            |
+
+### Sharpen
+
+|                              | Sharpen                                                                                                                                                           |
+| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation makes the image sharper.                                                                                                                      |
+| Comments                     | Similar to the original DAN implementation.                                                                                                                       |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Sharpen). |
+| Examples                     | ![](../../assets/augmentations/line_sharpen.png) ![](../../assets/augmentations/document_sharpen.png)                                                             |
+| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.04 (1116x581 pixels)                                                                                                                   |
+
+### Color Jittering
+
+|                              | Color Jittering                                                                                                                                                       |
+| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation alters the colors of the image.                                                                                                                   |
+| Comments                     | Similar to the original DAN implementation.                                                                                                                           |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.ColorJitter). |
+| Examples                     | ![](../../assets/augmentations/line_color_jitter.png) ![](../../assets/augmentations/document_color_jitter.png)                                                       |
+| CPU time (seconds/10 images) | 0.03 (3013x128 pixels) / 0.04 (1116x581 pixels)                                                                                                                       |
+
+### Gaussian Noise
+
+|                              | Gaussian Noise                                                                                                                                                          |
+| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation adds Gaussian noise to the image.                                                                                                                   |
+| Comments                     | The noise from the original DAN implementation is more uniform.                                                                                                         |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.GaussianNoise). |
+| Examples                     | ![](../../assets/augmentations/line_gaussian_noise.png) ![](../../assets/augmentations/document_gaussian_noise.png)                                                     |
+| CPU time (seconds/10 images) | 0.29 (3013x128 pixels) / 0.53 (1116x581 pixels)                                                                                                                         |
+
+### Gaussian Blur
+
+|                              | Gaussian Blur                                                                                                                                                          |
+| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation blurs the image.                                                                                                                                   |
+| Comments                     | Similar to the original DAN implementation.                                                                                                                            |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.GaussianBlur). |
+| Examples                     | ![](../../assets/augmentations/line_gaussian_blur.png) ![](../../assets/augmentations/document_gaussian_blur.png)                                                      |
+| CPU time (seconds/10 images) | 0.01 (3013x128 pixels) / 0.02 (1116x581 pixels)                                                                                                                        |
+
+### Random Perspective
+
+|                              | Random Perspective                                                                                                                                                    |
+| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation changes the perspective from which the photo is taken.                                                                                            |
+| Comments                     | Similar to the original DAN implementation.                                                                                                                           |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Perspective). |
+| Examples                     | ![](../../assets/augmentations/line_perspective.png) ![](../../assets/augmentations/document_perspective.png)                                                         |
+| CPU time (seconds/10 images) | 0.05 (3013x128 pixels) / 0.05 (1116x581 pixels)                                                                                                                       |
+
+### Shearing (x-axis)
+
+|                              | Shearing (x-axis)                                                                                                                                                                    |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| Description                  | This transformation changes the slant of the text on the image.                                                                                                                      |
+| Comments                     | New transform that was not in the original DAN implementation.                                                                                                                       |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.Affine). |
+| Examples                     | ![](../../assets/augmentations/line_shearx.png) ![](../../assets/augmentations/document_shearx.png)                                                                                  |
+| CPU time (seconds/10 images) | 0.05 (3013x128 pixels) / 0.04 (1116x581 pixels)                                                                                                                                      |
+
+### Coarse Dropout
+
+|                              | Coarse Dropout                                                                                                                                                                              |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation adds dropout on the image, turning small patches into black pixels.                                                                                                     |
+| Comments                     | It is a new transform that was not in the original DAN implementation.                                                                                                                      |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/dropout/coarse_dropout/#coarsedropout-augmentation-augmentationsdropoutcoarse_dropout). |
+| Examples                     | ![](../../assets/augmentations/line_dropout.png) ![](../../assets/augmentations/document_dropout.png)                                                                                       |
+| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.02 (1116x581 pixels)                                                                                                                                             |
+
+### Downscale
+
+|                              | Downscale                                                                                                                                                           |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation downscales the image by a random factor.                                                                                                        |
+| Comments                     | It is a new transform that was not in the original DAN implementation.                                                                                              |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Downscale). |
+| Examples                     | ![](../../assets/augmentations/line_downscale.png) ![](../../assets/augmentations/document_downscale.png)                                                           |
+| CPU time (seconds/10 images) | 0.03 (3013x128 pixels) / 0.03 (1116x581 pixels)                                                                                                                     |
+
+### Grayscale
+
+|                              | Grayscale                                                                                                                                                        |
+| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Description                  | This transformation transforms an RGB image into grayscale.                                                                                                      |
+| Comments                     | It is a new transform that was not in the original DAN implementation.                                                                                           |
+| Documentation                | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.ToGray). |
+| Examples                     | ![](../../assets/augmentations/line_grayscale.png) ![](../../assets/augmentations/document_grayscale.png)                                                        |
+| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.02 (1116x581 pixels)                                                                                                                  |
+
+## Full augmentation pipeline
+
+* Data augmentation is applied with a probability of 0.9.
+* In this case, two transformations are randomly selected to be applied.
+*  `ElasticTransform` and `PieceWiseAffine` cannot be applied on the same image.
+* Reproducibility is possible by setting `random.seed` and `np.random.seed` (already done in `dan/ocr/document/train.py`)
+* Examples with new pipeline:
+
+![](../../assets/augmentations/line_full_pipeline.png)
+![](../../assets/augmentations/document_full_pipeline.png)
+![](../../assets/augmentations/document_full_pipeline_2.png)
--- a/docs/usage/train/index.md
+++ b/docs/usage/train/index.md
@@ -21,3 +21,4 @@ To train DAN on lines, run `teklia-dan train document` with a line dataset.
 ## Additional page

 * [Jean Zay tutorial](jeanzay.md)
+* [Data augmentation](augmentation.md)
--- a/docs/usage/train/parameters.md
+++ b/docs/usage/train/parameters.md
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -62,6 +62,7 @@ nav:
    - Training:
      - usage/train/index.md
      - Parameters: usage/train/parameters.md
+      - Data augmentation: usage/train/augmentation.md
      - Jean Zay tutorial: usage/train/jeanzay.md
    - Predict: usage/predict.md
  - Documentation development: dev/build_docs.md

--- a/requirements.txt
+++ b/requirements.txt
+albumentations==1.3.1
 arkindex-export==0.1.3
 boto3==1.26.124
 editdistance==0.6.2

--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -9,7 +9,7 @@ from torch.optim import Adam
 from dan.decoder import GlobalHTADecoder
 from dan.encoder import FCN_Encoder
 from dan.schedulers import exponential_dropout_scheduler
-from dan.transforms import aug_config
+from dan.transforms import Preprocessing

 FIXTURES = Path(__file__).resolve().parent / "data"

@@ -70,11 +70,12 @@ def training_config():
                "load_in_memory": True,  # Load all images in CPU memory
                "preprocessings": [
                    {
-                        "type": "to_RGB",
-                        # if grayscaled image, produce RGB one (3 channels with same value) otherwise do nothing
+                        "type": Preprocessing.MaxResize,
+                        "max_width": 2000,
+                        "max_height": 2000,
                    },
                ],
-                "augmentation": aug_config(0.9, 0.1),
+                "augmentation": True,
            },
        },
        "model_params": {
No results found