Skip to content
Snippets Groups Projects

Training configuration

To train a model, you need to write a JSON configuration file. The list of fields are described in the next section. An empty configuration file is available at configs/quickstart.json. You will need to fill in the paths.

Dataset parameters

Parameter Description Type Default
dataset.max_char_prediction Maximum number of characters to predict. int 1000
dataset.tokens Path to a NER tokens configuration file similar to the one used for extraction. pathlib.Path

To determine the value to use for dataset.max_char_prediction, you can use the analyze command to find the maximum number of characters in a label of the dataset.

!!! note You must replace the pseudo-variables $dataset_name and $dataset_path with respectively the name and the relative/absolute path to your dataset.

Model parameters

Name Description Type Default
model.transfered_charset Transfer learning of the decision layer based on charset of the model to transfer. bool True
model.additional_tokens For decision layer = [, ], only for transferred charset. int 1
model.h_max Maximum height for encoder output (for 2D positional embedding). int 500
model.w_max Maximum width for encoder output (for 2D positional embedding). int 1000

Encoder

Name Description Type Default
model.encoder.dropout Dropout probability in the encoder. float 0.5
model.encoder.nb_layers Number of layers in the encoder. int 5

Decoder

Name Description Type Default
model.decoder.enc_dim Dimension of features extracted by the encoder. int 256
model.decoder.l_max Maximum predicted sequence length (for 1D positional embedding). int 15000
model.decoder.dec_num_layers Number of transformer decoder layers. int 8
model.decoder.dec_num_heads Number of heads in transformer decoder layers. int 4
model.decoder.dec_res_dropout Dropout probability in transformer decoder layers. float 0.1
model.decoder.dec_pred_dropout Dropout rate before decision layer. float 0.1
model.decoder.dec_att_dropout Dropout rate in multi head attention. float 0.1
model.decoder.dec_dim_feedforward Number of dimensions for feedforward layer in transformer decoder layers. int 256
model.decoder.attention_win Length of attention window. int 100

Language model

This assumes that you have already trained a language model.

Name Description Type Default
model.lm.path Path to the language model. str
model.lm.weight How much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions. float

!!! note - linebreaks are treated as spaces by language models, as a result predictions will not include linebreaks.

The model.lm.path argument expects a path to the language mode, but the parent folder should also contains:

  • a lexicon.txt file,
  • a tokens.txt file.

You should get the following tree structure:

folder/
├── <model.lm.path> # Path to the language model
├── lexicon.txt
└── tokens.txt

Training parameters

Name Description Type Default
training.output_folder Directory for checkpoint and results. str
training.max_nb_epochs Maximum number of epochs before stopping training. int 800
training.load_epoch Model to load. Should be either "best" (evaluation) or last (training). str "last"
training.lr_schedulers Learning rate schedulers. custom class

Device

Name Description Type Default
training.device.use_ddp Whether to use DistributedDataParallel. bool False
training.device.ddp_port DDP port. int 20027
training.device.use_amp Whether to enable automatic mix-precision. bool True
training.device.nb_gpu Number of GPUs to train DAN. Set to null to use all GPUs available. int
training.device.force Use a specific device if available. Use cpu to train on CPU (for debugging) or cuda/cuda:$gpu_device to train on GPU. str

To train on several GPUs, simply set the training.device.use_ddp parameter to True. By default, the model will use all available GPUs. To restrict access to fewer GPUs, one can modify the training.device.nb_gpu parameter.

Optimizers

Name Description Type Default
training.optimizers.all.args.lr Learning rate for the optimizer. float 0.0001
training.optimizers.all.args.amsgrad Whether to use AMSGrad optimization. bool False

Validation

Name Description Type Default
training.validation.eval_on_valid Whether to evaluate and log metrics on the validation set during training. bool True
training.validation.eval_on_valid_interval Interval (in epochs) to evaluate during training. int 5
training.validation.set_name_focus_metric Dataset to focus on to select best weights. str

During the validation stage, the batch size is set to 1. This avoids problems associated with image sizes that can be very different inside batches and lead to significant padding, resulting in performance degradations.

Metrics

Name Description Type Default
training.metrics.train List of metrics to compute during training. list ["loss_ce", "cer", "cer_no_token", "wer", "wer_no_punct", "wer_no_token"]
training.metrics.eval List of metrics to compute during validation. list ["cer", "cer_no_token", "wer", "wer_no_punct", "wer_no_token"]

Label noise scheduler

Name Description Type Default
training.label_noise_scheduler.min_error_rate Minimum ratio of teacher forcing. float 0.2
training.label_noise_scheduler.max_error_rate Maximum ratio of teacher forcing. float 0.2
training.label_noise_scheduler.total_num_steps Number of steps before stopping teacher forcing. float 5e4

Transfer learning

Name Description Type Default
training.transfer_learning.encoder Model to load for the encoder [state_dict_name, checkpoint_path, learnable, strict]. list ["encoder", "pretrained_models/dan_rimes_page.pt", True, True]
training.transfer_learning.decoder Model to load for the decoder [state_dict_name, checkpoint_path, learnable, strict]. list ["decoder", "pretrained_models/dan_rimes_page.pt", True, False]

Data

Name Description Type Default
training.data.batch_size Mini-batch size for the training loop. int 2
training.data.load_in_memory Load all images in CPU memory. bool True
training.data.worker_per_gpu Number of parallel processes per gpu for data loading. int 4
training.data.preprocessings List of pre-processing functions to apply to input images. list (see dedicated section)
training.data.augmentation Whether to use data augmentation on the training set. bool True (see dedicated section)

Preprocessing

Preprocessing is applied before training the network (see the dedicated references). The list of accepted transforms is defined in the dedicated references.

Usage:

  • Resize to a fixed height
[
    {
        "type": "fixed_height_resize",
        "fixed_height": 1500,
    }
]
  • Resize to a fixed width
[
    {
        "type": "fixed_width_resize",
        "fixed_width": 1500,
    }
]
  • Resize to a maximum size (only if the image is bigger than the given size)
[
    {
        "type": "max_resize,
        "max_height": 2000,
        "max_width": 2000,
    }
]
  • Combine these pre-processings
[
    {
        "type": "fixed_height_resize",
        "fixed_height": 2000,
    },
    {
        "type": "fixed_width_resize",
        "fixed_width": 2000,
    }
]

Augmentation

Augmentation transformations are applied on-the-fly during training to artificially increase data variability.

DAN takes advantage of transforms from albumentations. The following configuration is used by default when using the teklia-dan train command. Data augmentation is applied with a probability of 0.9. In this case, two transformations are randomly selected to be applied.

transforms = A.Compose(
    [
        # Scale between 0.75 and 1.0
        RandomScale(scale_limit=[-0.25, 0], always_apply=True, interpolation=cv2.INTER_AREA),
        A.SomeOf(
            [
                ErosionDilation(min_kernel=1, max_kernel=4, iterations=1),
                Perspective(scale=(0.05, 0.09), fit_output=True, p=0.4),
                GaussianBlur(sigma_limit=2.5, p=1),
                GaussNoise(var_limit=50**2, p=1),
                ColorJitter(
                    contrast=0.2, brightness=0.2, saturation=0.2, hue=0.2, p=1
                ),
                ElasticTransform(
                    alpha=20.0, sigma=5.0, alpha_affine=1.0, border_mode=0, p=1
                ),
                Sharpen(alpha=(0.0, 1.0), p=1),
                Affine(shear={"x": (-20, 20), "y": (0, 0)}, p=1),
                CoarseDropout(p=1),
                ToGray(p=0.5),
            ],
            n=2,
            p=0.9,
        ),
    ],
    p=0.9,
)

For a detailed description of all augmentation transforms, see the dedicated page.

MLFlow logging

To log your experiment on MLFlow, you need to:

  • install the extra requirements via
$ pip install --index-url https://gitlab.teklia.com/api/v4/projects/210/packages/pypi/simple .[mlflow]

The --index-url argument is required to find the nerval package.

  • update the following arguments:
Name Description Type Default
mlflow.run_id ID of the current run in MLflow. int
mlflow.run_name Name of the current run in MLflow. str
mlflow.s3_endpoint_url URL of S3 endpoint. str
mlflow.tracking_uri URI of a tracking server. str
mlflow.experiment_id ID of the current experiment in MLFlow. str
mlflow.aws_access_key_id Access key ID to the AWS server. str
mlflow.aws_secret_access_key Secret access key to the AWS server. str