diff --git a/docs/usage/predict/index.md b/docs/usage/predict/index.md
index 8654a88dc48da402206ecd0d5ef5f59ff9b6697a..656011b1966cefc03de25ebbe99448ded57feeab 100644
--- a/docs/usage/predict/index.md
+++ b/docs/usage/predict/index.md
@@ -166,14 +166,13 @@ It will create the following JSON file named after the image and a GIF showing a
 
 This example assumes that you have already [trained a language model](../train/language_model.md).
 
-Note that:
-
-- the `weight` parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions.
-- linebreaks are treated as spaces by language models, as a result predictions will not include linebreaks.
+!!! note
+    - the `weight` parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions.
+    - linebreaks are treated as spaces by language models, as a result predictions will not include linebreaks.
 
 #### Language model at character level
 
-First, update the `parameters.yml` file obtained during DAN training.
+Update the `parameters.yml` file obtained during DAN training.
 
 ```yaml
 parameters:
@@ -185,8 +184,6 @@ parameters:
     weight: 0.5
 ```
 
-Note that the `weight` parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions.
-
 Then, run this command:
 
 ```shell
diff --git a/docs/usage/train/config.md b/docs/usage/train/config.md
index ac7e1d60102bfcfc5bce97b6e24124590bdce402..8384b637cd4ebebb95c8753b21f003c9cfcb394b 100644
--- a/docs/usage/train/config.md
+++ b/docs/usage/train/config.md
@@ -35,6 +35,32 @@ To determine the value to use for `dataset.max_char_prediction`, you can use the
 | `model.decoder.dec_dim_feedforward` | Number of dimensions for feedforward layer in transformer decoder layers.          | `int`   | `256`   |
 | `model.decoder.attention_win`       | Length of attention window.                                                        | `int`   | `100`   |
 
+### Language model
+
+This assumes that you have already [trained a language model](../train/language_model.md).
+
+| Name              | Description                                                                                                                                               | Type    | Default |
+| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ------- |
+| `model.lm.path`   | Path to the language model.                                                                                                                               | `str`   |         |
+| `model.lm.weight` | How much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions. | `float` |         |
+
+!!! note
+    - linebreaks are treated as spaces by language models, as a result predictions will not include linebreaks.
+
+The `model.lm.path` argument expects a path to the language mode, but the parent folder should also contains:
+
+- a `lexicon.txt` file,
+- a `tokens.txt` file.
+
+You should get the following tree structure:
+
+```
+folder/
+â”œâ”€â”€ `model.lm.path` # Path to the language model
+â”œâ”€â”€ lexicon.txt
+â””â”€â”€ tokens.txt
+```
+
 ## Training parameters
 
 | Name                                             | Description                                                                                                                 | Type         | Default                                                                     |
@@ -64,7 +90,7 @@ To determine the value to use for `dataset.max_char_prediction`, you can use the
 | `training.label_noise_scheduler.max_error_rate`  | Maximum ratio of teacher forcing.                                                                                           | `float`      | `0.2`                                                                       |
 | `training.label_noise_scheduler.total_num_steps` | Number of steps before stopping teacher forcing.                                                                            | `float`      | `5e4`                                                                       |
 | `training.transfer_learning.encoder`             | Model to load for the encoder \[state_dict_name, checkpoint_path, learnable, strict\].                                      | `list`       | `["encoder", "pretrained_models/dan_rimes_page.pt", True, True]`            |
-| `training.transfer_learning.decoder`             | Model to load for the decoder \[state_dict_name, checkpoint_path, learnable, strict\].                                      | `list`       | `["encoder", "pretrained_models/dan_rimes_page.pt", True, False]`           |
+| `training.transfer_learning.decoder`             | Model to load for the decoder \[state_dict_name, checkpoint_path, learnable, strict\].                                      | `list`       | `["decoder", "pretrained_models/dan_rimes_page.pt", True, False]`           |
 
 - To train on several GPUs, simply set the `training.use_ddp` parameter to `True`. By default, the model will use all available GPUs. To restrict access to fewer GPUs, one can modify the `training.nb_gpu` parameter.
 - During the validation stage, the batch size is set to 1. This avoids problems associated with image sizes that can be very different inside batches and lead to significant padding, resulting in performance degradations.