Skip to content
Snippets Groups Projects
Commit f4650117 authored by Manon Blanco's avatar Manon Blanco
Browse files

Update documentation

parent b7e519cb
No related branches found
No related tags found
No related merge requests found
This commit is part of merge request !347. Comments created here will be created in the context of that merge request.
......@@ -166,14 +166,13 @@ It will create the following JSON file named after the image and a GIF showing a
This example assumes that you have already [trained a language model](../train/language_model.md).
Note that:
- the `weight` parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions.
- linebreaks are treated as spaces by language models, as a result predictions will not include linebreaks.
!!! note
- the `weight` parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions.
- linebreaks are treated as spaces by language models, as a result predictions will not include linebreaks.
#### Language model at character level
First, update the `parameters.yml` file obtained during DAN training.
Update the `parameters.yml` file obtained during DAN training.
```yaml
parameters:
......@@ -185,8 +184,6 @@ parameters:
weight: 0.5
```
Note that the `weight` parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions.
Then, run this command:
```shell
......
......@@ -35,6 +35,32 @@ To determine the value to use for `dataset.max_char_prediction`, you can use the
| `model.decoder.dec_dim_feedforward` | Number of dimensions for feedforward layer in transformer decoder layers. | `int` | `256` |
| `model.decoder.attention_win` | Length of attention window. | `int` | `100` |
### Language model
This assumes that you have already [trained a language model](../train/language_model.md).
| Name | Description | Type | Default |
| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ------- |
| `model.lm.path` | Path to the language model. | `str` | |
| `model.lm.weight` | How much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions. | `float` | |
!!! note
- linebreaks are treated as spaces by language models, as a result predictions will not include linebreaks.
The `model.lm.path` argument expects a path to the language mode, but the parent folder should also contains:
- a `lexicon.txt` file,
- a `tokens.txt` file.
You should get the following tree structure:
```
folder/
├── `model.lm.path` # Path to the language model
├── lexicon.txt
└── tokens.txt
```
## Training parameters
| Name | Description | Type | Default |
......@@ -64,7 +90,7 @@ To determine the value to use for `dataset.max_char_prediction`, you can use the
| `training.label_noise_scheduler.max_error_rate` | Maximum ratio of teacher forcing. | `float` | `0.2` |
| `training.label_noise_scheduler.total_num_steps` | Number of steps before stopping teacher forcing. | `float` | `5e4` |
| `training.transfer_learning.encoder` | Model to load for the encoder \[state_dict_name, checkpoint_path, learnable, strict\]. | `list` | `["encoder", "pretrained_models/dan_rimes_page.pt", True, True]` |
| `training.transfer_learning.decoder` | Model to load for the decoder \[state_dict_name, checkpoint_path, learnable, strict\]. | `list` | `["encoder", "pretrained_models/dan_rimes_page.pt", True, False]` |
| `training.transfer_learning.decoder` | Model to load for the decoder \[state_dict_name, checkpoint_path, learnable, strict\]. | `list` | `["decoder", "pretrained_models/dan_rimes_page.pt", True, False]` |
- To train on several GPUs, simply set the `training.use_ddp` parameter to `True`. By default, the model will use all available GPUs. To restrict access to fewer GPUs, one can modify the `training.nb_gpu` parameter.
- During the validation stage, the batch size is set to 1. This avoids problems associated with image sizes that can be very different inside batches and lead to significant padding, resulting in performance degradations.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment