Skip to content
Snippets Groups Projects

Data augmentation transforms

This page lists data augmentation transforms used in DAN.

Individual augmentation transforms

Elastic Transform

Elastic Transform
Description This transformation applies local distortions that rotate characters locally.
Comments The impact of this transformation is mostly visible on documents, not so much on lines. Results are comparable to the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.44 (3013x128 pixels) / 0.86 (1116x581 pixels)

PieceWise Affine

!!! warning This transform is temporarily removed from the pipeline until this issue is fixed.

PieceWise Affine
Description This transformation also applies local distortions but with a larger grid than ElasticTransform.
Comments This transformation is very slow. It is a new transform that was not in the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 2.92 (3013x128 pixels) / 3.76 (1116x581 pixels)

Dilation Erosion

Dilation & Erosion
Description This transformation makes the pen stroke thicker or thinner.
Comments The RandomDilationErosion class randomly selects a kernel size and applies a dilation or an erosion to the image. It relies on opencv and is similar to the original DAN implementation.
Documentation See the opencv documentation
Examples
CPU time (seconds/10 images) 0.02 (3013x128 pixels) / 0.03 (1116x581 pixels)

Sharpen

Sharpen
Description This transformation makes the image sharper.
Comments Similar to the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.02 (3013x128 pixels) / 0.04 (1116x581 pixels)

Color Jittering

Color Jittering
Description This transformation alters the colors of the image.
Comments Similar to the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.03 (3013x128 pixels) / 0.04 (1116x581 pixels)

Gaussian Noise

Gaussian Noise
Description This transformation adds Gaussian noise to the image.
Comments The noise from the original DAN implementation is more uniform.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.29 (3013x128 pixels) / 0.53 (1116x581 pixels)

Gaussian Blur

Gaussian Blur
Description This transformation blurs the image.
Comments Similar to the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.01 (3013x128 pixels) / 0.02 (1116x581 pixels)

Random Perspective

Random Perspective
Description This transformation changes the perspective from which the photo is taken.
Comments Similar to the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.05 (3013x128 pixels) / 0.05 (1116x581 pixels)

Shearing (x-axis)

Shearing (x-axis)
Description This transformation changes the slant of the text on the image.
Comments New transform that was not in the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.05 (3013x128 pixels) / 0.04 (1116x581 pixels)

Coarse Dropout

Coarse Dropout
Description This transformation adds dropout on the image, turning small patches into black pixels.
Comments It is a new transform that was not in the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.02 (3013x128 pixels) / 0.02 (1116x581 pixels)

Random Scale

RandomScale
Description This transformation downscales the image from a random factor.
Comments The original DAN implementation reimplemented it as DPIAdjusting.
Documentation See the albumentations documentation
Examples

To Gray

ToGray
Description This transformation transforms an RGB image into grayscale.
Comments It is a new transform that was not in the original DAN implementation.
Documentation See the albumentations documentation
Examples
CPU time (seconds/10 images) 0.02 (3013x128 pixels) / 0.02 (1116x581 pixels)

Full augmentation pipeline

  • Data augmentation is applied with a probability of 0.9.
  • In this case, two transformations are randomly selected to be applied.
  • Reproducibility is possible by setting random.seed and np.random.seed (already done in dan/ocr/document/train.py)
  • Examples with new pipeline: