Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • atr/dan
1 result
Show changes
Showing
with 262 additions and 145 deletions
docs/assets/augmentations/document_shearx.png

374 KiB

docs/assets/augmentations/line_color_jitter.png

85.9 KiB

docs/assets/augmentations/line_downscale.png

86.9 KiB

docs/assets/augmentations/line_dropout.png

83.5 KiB

docs/assets/augmentations/line_elastic.png

113 KiB

docs/assets/augmentations/line_erosion_dilation.png

90.2 KiB

docs/assets/augmentations/line_full_pipeline.png

329 KiB

docs/assets/augmentations/line_gaussian_blur.png

86.1 KiB

docs/assets/augmentations/line_gaussian_noise.png

105 KiB

docs/assets/augmentations/line_grayscale.png

30 KiB

docs/assets/augmentations/line_perspective.png

82 KiB

docs/assets/augmentations/line_piecewise.png

86.9 KiB

docs/assets/augmentations/line_sharpen.png

90 KiB

docs/assets/augmentations/line_shearx.png

84.6 KiB

# Data augmentation transforms
This page lists data augmentation transforms used in DAN.
## Individual augmentation transforms
### Elastic Transform
| | Elastic Transform |
| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation applies local distortions that rotate characters locally. |
| Comments | The impact of this transformation is mostly visible on documents, not so much on lines. Results are comparable to the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.ElasticTransform). |
| Examples | ![](../../assets/augmentations/line_elastic.png) ![](../../assets/augmentations/document_elastic.png) |
| CPU time (seconds/10 images) | 0.44 (3013x128 pixels) / 0.86 (1116x581 pixels) |
### PieceWise Affine
| | PieceWise Affine |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation also applies local distortions but with a larger grid than ElasticTransform. |
| Comments | This transformation is very slow. It is a new transform that was not in the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.PiecewiseAffine). |
| Examples | ![](../../assets/augmentations/line_piecewise.png) ![](../../assets/augmentations/document_piecewise.png) |
| CPU time (seconds/10 images) | 2.92 (3013x128 pixels) / 3.76 (1116x581 pixels) |
### Dilation Erosion
| | Dilation & Erosion |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Description | This transformation makes the pen stroke thicker or thinner. |
| Comments | The `RandomDilationErosion` class randomly selects a kernel size and applies a dilation or an erosion to the image. It relies on opencv and is similar to the original DAN implementation. |
| Documentation | See the [`opencv` documentation](https://docs.opencv.org/3.4/db/df6/tutorial_erosion_dilatation.html). |
| Examples | ![](../../assets/augmentations/line_erosion_dilation.png) ![](../../assets/augmentations/document_erosion_dilation.png) |
| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.03 (1116x581 pixels) |
### Sharpen
| | Sharpen |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation makes the image sharper. |
| Comments | Similar to the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Sharpen). |
| Examples | ![](../../assets/augmentations/line_sharpen.png) ![](../../assets/augmentations/document_sharpen.png) |
| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.04 (1116x581 pixels) |
### Color Jittering
| | Color Jittering |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation alters the colors of the image. |
| Comments | Similar to the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.ColorJitter). |
| Examples | ![](../../assets/augmentations/line_color_jitter.png) ![](../../assets/augmentations/document_color_jitter.png) |
| CPU time (seconds/10 images) | 0.03 (3013x128 pixels) / 0.04 (1116x581 pixels) |
### Gaussian Noise
| | Gaussian Noise |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation adds Gaussian noise to the image. |
| Comments | The noise from the original DAN implementation is more uniform. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.GaussianNoise). |
| Examples | ![](../../assets/augmentations/line_gaussian_noise.png) ![](../../assets/augmentations/document_gaussian_noise.png) |
| CPU time (seconds/10 images) | 0.29 (3013x128 pixels) / 0.53 (1116x581 pixels) |
### Gaussian Blur
| | Gaussian Blur |
| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation blurs the image. |
| Comments | Similar to the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.GaussianBlur). |
| Examples | ![](../../assets/augmentations/line_gaussian_blur.png) ![](../../assets/augmentations/document_gaussian_blur.png) |
| CPU time (seconds/10 images) | 0.01 (3013x128 pixels) / 0.02 (1116x581 pixels) |
### Random Perspective
| | Random Perspective |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation changes the perspective from which the photo is taken. |
| Comments | Similar to the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Perspective). |
| Examples | ![](../../assets/augmentations/line_perspective.png) ![](../../assets/augmentations/document_perspective.png) |
| CPU time (seconds/10 images) | 0.05 (3013x128 pixels) / 0.05 (1116x581 pixels) |
### Shearing (x-axis)
| | Shearing (x-axis) |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Description | This transformation changes the slant of the text on the image. |
| Comments | New transform that was not in the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.Affine). |
| Examples | ![](../../assets/augmentations/line_shearx.png) ![](../../assets/augmentations/document_shearx.png) |
| CPU time (seconds/10 images) | 0.05 (3013x128 pixels) / 0.04 (1116x581 pixels) |
### Coarse Dropout
| | Coarse Dropout |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation adds dropout on the image, turning small patches into black pixels. |
| Comments | It is a new transform that was not in the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/dropout/coarse_dropout/#coarsedropout-augmentation-augmentationsdropoutcoarse_dropout). |
| Examples | ![](../../assets/augmentations/line_dropout.png) ![](../../assets/augmentations/document_dropout.png) |
| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.02 (1116x581 pixels) |
### Downscale
| | Downscale |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation downscales the image by a random factor. |
| Comments | It is a new transform that was not in the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.Downscale). |
| Examples | ![](../../assets/augmentations/line_downscale.png) ![](../../assets/augmentations/document_downscale.png) |
| CPU time (seconds/10 images) | 0.03 (3013x128 pixels) / 0.03 (1116x581 pixels) |
### Grayscale
| | Grayscale |
| ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Description | This transformation transforms an RGB image into grayscale. |
| Comments | It is a new transform that was not in the original DAN implementation. |
| Documentation | See the [`albumentations` documentation](https://albumentations.ai/docs/api_reference/augmentations/transforms/#albumentations.augmentations.transforms.ToGray). |
| Examples | ![](../../assets/augmentations/line_grayscale.png) ![](../../assets/augmentations/document_grayscale.png) |
| CPU time (seconds/10 images) | 0.02 (3013x128 pixels) / 0.02 (1116x581 pixels) |
## Full augmentation pipeline
* Data augmentation is applied with a probability of 0.9.
* In this case, two transformations are randomly selected to be applied.
* `ElasticTransform` and `PieceWiseAffine` cannot be applied on the same image.
* Reproducibility is possible by setting `random.seed` and `np.random.seed` (already done in `dan/ocr/document/train.py`)
* Examples with new pipeline:
![](../../assets/augmentations/line_full_pipeline.png)
![](../../assets/augmentations/document_full_pipeline.png)
![](../../assets/augmentations/document_full_pipeline_2.png)
......@@ -21,3 +21,4 @@ To train DAN on lines, run `teklia-dan train document` with a line dataset.
## Additional page
* [Jean Zay tutorial](jeanzay.md)
* [Data augmentation](augmentation.md)
This diff is collapsed.
......@@ -62,6 +62,7 @@ nav:
- Training:
- usage/train/index.md
- Parameters: usage/train/parameters.md
- Data augmentation: usage/train/augmentation.md
- Jean Zay tutorial: usage/train/jeanzay.md
- Predict: usage/predict.md
- Documentation development: dev/build_docs.md
......
albumentations==1.3.1
arkindex-export==0.1.3
boto3==1.26.124
editdistance==0.6.2
......
......@@ -9,7 +9,7 @@ from torch.optim import Adam
from dan.decoder import GlobalHTADecoder
from dan.encoder import FCN_Encoder
from dan.schedulers import exponential_dropout_scheduler
from dan.transforms import aug_config
from dan.transforms import Preprocessing
FIXTURES = Path(__file__).resolve().parent / "data"
......@@ -70,11 +70,12 @@ def training_config():
"load_in_memory": True, # Load all images in CPU memory
"preprocessings": [
{
"type": "to_RGB",
# if grayscaled image, produce RGB one (3 channels with same value) otherwise do nothing
"type": Preprocessing.MaxResize,
"max_width": 2000,
"max_height": 2000,
},
],
"augmentation": aug_config(0.9, 0.1),
"augmentation": True,
},
},
"model_params": {
......