Skip to content
Snippets Groups Projects
Commit 7030e6e9 authored by Manon Blanco's avatar Manon Blanco
Browse files

Split training parameters in documentation

parent 10dea026
No related branches found
No related tags found
1 merge request!347Load a language model and decode with it during evaluation
This commit is part of merge request !347. Comments created here will be created in the context of that merge request.
...@@ -17,23 +17,33 @@ To determine the value to use for `dataset.max_char_prediction`, you can use the ...@@ -17,23 +17,33 @@ To determine the value to use for `dataset.max_char_prediction`, you can use the
## Model parameters ## Model parameters
| Name | Description | Type | Default | | Name | Description | Type | Default |
| ----------------------------------- | ---------------------------------------------------------------------------------- | ------- | ------- | | -------------------------- | ---------------------------------------------------------------------------------- | ------ | ------- |
| `model.transfered_charset` | Transfer learning of the decision layer based on charset of the model to transfer. | `bool` | `True` | | `model.transfered_charset` | Transfer learning of the decision layer based on charset of the model to transfer. | `bool` | `True` |
| `model.additional_tokens` | For decision layer = \[<eot>, \], only for transferred charset. | `int` | `1` | | `model.additional_tokens` | For decision layer = \[<eot>, \], only for transferred charset. | `int` | `1` |
| `model.encoder.dropout` | Dropout probability in the encoder. | `float` | `0.5` | | `model.h_max` | Maximum height for encoder output (for 2D positional embedding). | `int` | `500` |
| `model.encoder.nb_layers` | Number of layers in the encoder. | `int` | `5` | | `model.w_max` | Maximum width for encoder output (for 2D positional embedding). | `int` | `1000` |
| `model.h_max` | Maximum height for encoder output (for 2D positional embedding). | `int` | `500` |
| `model.w_max` | Maximum width for encoder output (for 2D positional embedding). | `int` | `1000` | ### Encoder
| `model.decoder.enc_dim` | Dimension of features extracted by the encoder. | `int` | `256` |
| `model.decoder.l_max` | Maximum predicted sequence length (for 1D positional embedding). | `int` | `15000` | | Name | Description | Type | Default |
| `model.decoder.dec_num_layers` | Number of transformer decoder layers. | `int` | `8` | | ------------------------- | ----------------------------------- | ------- | ------- |
| `model.decoder.dec_num_heads` | Number of heads in transformer decoder layers. | `int` | `4` | | `model.encoder.dropout` | Dropout probability in the encoder. | `float` | `0.5` |
| `model.decoder.dec_res_dropout` | Dropout probability in transformer decoder layers. | `float` | `0.1` | | `model.encoder.nb_layers` | Number of layers in the encoder. | `int` | `5` |
| `model.decoder.dec_pred_dropout` | Dropout rate before decision layer. | `float` | `0.1` |
| `model.decoder.dec_att_dropout` | Dropout rate in multi head attention. | `float` | `0.1` | ### Decoder
| `model.decoder.dec_dim_feedforward` | Number of dimensions for feedforward layer in transformer decoder layers. | `int` | `256` |
| `model.decoder.attention_win` | Length of attention window. | `int` | `100` | | Name | Description | Type | Default |
| ----------------------------------- | ------------------------------------------------------------------------- | ------- | ------- |
| `model.decoder.enc_dim` | Dimension of features extracted by the encoder. | `int` | `256` |
| `model.decoder.l_max` | Maximum predicted sequence length (for 1D positional embedding). | `int` | `15000` |
| `model.decoder.dec_num_layers` | Number of transformer decoder layers. | `int` | `8` |
| `model.decoder.dec_num_heads` | Number of heads in transformer decoder layers. | `int` | `4` |
| `model.decoder.dec_res_dropout` | Dropout probability in transformer decoder layers. | `float` | `0.1` |
| `model.decoder.dec_pred_dropout` | Dropout rate before decision layer. | `float` | `0.1` |
| `model.decoder.dec_att_dropout` | Dropout rate in multi head attention. | `float` | `0.1` |
| `model.decoder.dec_dim_feedforward` | Number of dimensions for feedforward layer in transformer decoder layers. | `int` | `256` |
| `model.decoder.attention_win` | Length of attention window. | `int` | `100` |
### Language model ### Language model
...@@ -63,39 +73,75 @@ folder/ ...@@ -63,39 +73,75 @@ folder/
## Training parameters ## Training parameters
| Name | Description | Type | Default | | Name | Description | Type | Default |
| ------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------- | | ------------------------ | --------------------------------------------------------------------------- | ------------ | -------- |
| `training.data.batch_size` | Mini-batch size for the training loop. | `int` | `2` | | `training.output_folder` | Directory for checkpoint and results. | `str` | |
| `training.data.load_in_memory` | Load all images in CPU memory. | `bool` | `True` | | `training.max_nb_epochs` | Maximum number of epochs before stopping training. | `int` | `800` |
| `training.data.worker_per_gpu` | Number of parallel processes per gpu for data loading. | `int` | `4` | | `training.load_epoch` | Model to load. Should be either `"best"` (evaluation) or `last` (training). | `str` | `"last"` |
| `training.data.preprocessings` | List of pre-processing functions to apply to input images. | `list` | (see [dedicated section](#data-preprocessing)) | | `training.lr_schedulers` | Learning rate schedulers. | custom class | |
| `training.data.augmentation` | Whether to use data augmentation on the training set. | `bool` | `True` (see [dedicated section](#data-augmentation)) |
| `training.output_folder` | Directory for checkpoint and results. | `str` | | ### Device
| `training.max_nb_epochs` | Maximum number of epochs before stopping training. | `int` | `800` |
| `training.load_epoch` | Model to load. Should be either `"best"` (evaluation) or `last` (training). | `str` | `"last"` | | Name | Description | Type | Default |
| `training.device.use_ddp` | Whether to use DistributedDataParallel. | `bool` | `False` | | -------------------------- | --------------------------------------------------------------------------------------------------------------------------- | ------ | ------- |
| `training.device.ddp_port` | DDP port. | `int` | `20027` | | `training.device.use_ddp` | Whether to use DistributedDataParallel. | `bool` | `False` |
| `training.device.use_amp` | Whether to enable automatic mix-precision. | `bool` | `True` | | `training.device.ddp_port` | DDP port. | `int` | `20027` |
| `training.device.nb_gpu` | Number of GPUs to train DAN. Set to `null` to use all GPUs available. | `int` | | | `training.device.use_amp` | Whether to enable automatic mix-precision. | `bool` | `True` |
| `training.device.force` | Use a specific device if available. Use `cpu` to train on CPU (for debugging) or `cuda`/`cuda:$gpu_device` to train on GPU. | `str` | | | `training.device.nb_gpu` | Number of GPUs to train DAN. Set to `null` to use all GPUs available. | `int` | |
| `training.optimizers.all.args.lr` | Learning rate for the optimizer. | `float` | `0.0001` | | `training.device.force` | Use a specific device if available. Use `cpu` to train on CPU (for debugging) or `cuda`/`cuda:$gpu_device` to train on GPU. | `str` | |
| `training.optimizers.all.args.amsgrad` | Whether to use AMSGrad optimization. | `bool` | `False` |
| `training.lr_schedulers` | Learning rate schedulers. | custom class | | To train on several GPUs, simply set the `training.device.use_ddp` parameter to `True`. By default, the model will use all available GPUs. To restrict access to fewer GPUs, one can modify the `training.device.nb_gpu` parameter.
| `training.validation.eval_on_valid` | Whether to evaluate and log metrics on the validation set during training. | `bool` | `True` |
| `training.validation.eval_on_valid_interval` | Interval (in epochs) to evaluate during training. | `int` | `5` | ### Optimizers
| `training.validation.set_name_focus_metric` | Dataset to focus on to select best weights. | `str` | |
| `training.metrics.train` | List of metrics to compute during training. | `list` | `["loss_ce", "cer", "cer_no_token", "wer", "wer_no_punct", "wer_no_token"]` | | Name | Description | Type | Default |
| `training.metrics.eval` | List of metrics to compute during validation. | `list` | `["cer", "cer_no_token", "wer", "wer_no_punct", "wer_no_token"]` | | -------------------------------------- | ------------------------------------ | ------- | -------- |
| `training.label_noise_scheduler.min_error_rate` | Minimum ratio of teacher forcing. | `float` | `0.2` | | `training.optimizers.all.args.lr` | Learning rate for the optimizer. | `float` | `0.0001` |
| `training.label_noise_scheduler.max_error_rate` | Maximum ratio of teacher forcing. | `float` | `0.2` | | `training.optimizers.all.args.amsgrad` | Whether to use AMSGrad optimization. | `bool` | `False` |
| `training.label_noise_scheduler.total_num_steps` | Number of steps before stopping teacher forcing. | `float` | `5e4` |
| `training.transfer_learning.encoder` | Model to load for the encoder \[state_dict_name, checkpoint_path, learnable, strict\]. | `list` | `["encoder", "pretrained_models/dan_rimes_page.pt", True, True]` | ### Validation
| `training.transfer_learning.decoder` | Model to load for the decoder \[state_dict_name, checkpoint_path, learnable, strict\]. | `list` | `["decoder", "pretrained_models/dan_rimes_page.pt", True, False]` |
| Name | Description | Type | Default |
- To train on several GPUs, simply set the `training.use_ddp` parameter to `True`. By default, the model will use all available GPUs. To restrict access to fewer GPUs, one can modify the `training.nb_gpu` parameter. | -------------------------------------------- | -------------------------------------------------------------------------- | ------ | ------- |
- During the validation stage, the batch size is set to 1. This avoids problems associated with image sizes that can be very different inside batches and lead to significant padding, resulting in performance degradations. | `training.validation.eval_on_valid` | Whether to evaluate and log metrics on the validation set during training. | `bool` | `True` |
| `training.validation.eval_on_valid_interval` | Interval (in epochs) to evaluate during training. | `int` | `5` |
### Data preprocessing | `training.validation.set_name_focus_metric` | Dataset to focus on to select best weights. | `str` | |
During the validation stage, the batch size is set to 1. This avoids problems associated with image sizes that can be very different inside batches and lead to significant padding, resulting in performance degradations.
### Metrics
| Name | Description | Type | Default |
| ------------------------ | --------------------------------------------- | ------ | --------------------------------------------------------------------------- |
| `training.metrics.train` | List of metrics to compute during training. | `list` | `["loss_ce", "cer", "cer_no_token", "wer", "wer_no_punct", "wer_no_token"]` |
| `training.metrics.eval` | List of metrics to compute during validation. | `list` | `["cer", "cer_no_token", "wer", "wer_no_punct", "wer_no_token"]` |
### Label noise scheduler
| Name | Description | Type | Default |
| ------------------------------------------------ | ------------------------------------------------ | ------- | ------- |
| `training.label_noise_scheduler.min_error_rate` | Minimum ratio of teacher forcing. | `float` | `0.2` |
| `training.label_noise_scheduler.max_error_rate` | Maximum ratio of teacher forcing. | `float` | `0.2` |
| `training.label_noise_scheduler.total_num_steps` | Number of steps before stopping teacher forcing. | `float` | `5e4` |
### Transfer learning
| Name | Description | Type | Default |
| ------------------------------------ | -------------------------------------------------------------------------------------- | ------ | ----------------------------------------------------------------- |
| `training.transfer_learning.encoder` | Model to load for the encoder \[state_dict_name, checkpoint_path, learnable, strict\]. | `list` | `["encoder", "pretrained_models/dan_rimes_page.pt", True, True]` |
| `training.transfer_learning.decoder` | Model to load for the decoder \[state_dict_name, checkpoint_path, learnable, strict\]. | `list` | `["decoder", "pretrained_models/dan_rimes_page.pt", True, False]` |
### Data
| Name | Description | Type | Default |
| ------------------------------ | ---------------------------------------------------------- | ------ | ----------------------------------------------- |
| `training.data.batch_size` | Mini-batch size for the training loop. | `int` | `2` |
| `training.data.load_in_memory` | Load all images in CPU memory. | `bool` | `True` |
| `training.data.worker_per_gpu` | Number of parallel processes per gpu for data loading. | `int` | `4` |
| `training.data.preprocessings` | List of pre-processing functions to apply to input images. | `list` | (see [dedicated section](#preprocessing)) |
| `training.data.augmentation` | Whether to use data augmentation on the training set. | `bool` | `True` (see [dedicated section](#augmentation)) |
#### Preprocessing
Preprocessing is applied before training the network (see the [dedicated references](../../ref/ocr/managers/dataset.md)). The list of accepted transforms is defined in the [dedicated references](../../ref/ocr/transforms.md#dan.ocr.transforms.Preprocessing). Preprocessing is applied before training the network (see the [dedicated references](../../ref/ocr/managers/dataset.md)). The list of accepted transforms is defined in the [dedicated references](../../ref/ocr/transforms.md#dan.ocr.transforms.Preprocessing).
...@@ -150,7 +196,7 @@ Usage: ...@@ -150,7 +196,7 @@ Usage:
] ]
``` ```
### Data augmentation #### Augmentation
Augmentation transformations are applied on-the-fly during training to artificially increase data variability. Augmentation transformations are applied on-the-fly during training to artificially increase data variability.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment