diff --git a/README.md b/README.md
index c382b4853e28bfddd6c2acbf51f18dcdd4724c18..a9d524ac7de6b38a5ff8f2b1f25faeca7c8824aa 100644
--- a/README.md
+++ b/README.md
@@ -13,12 +13,14 @@ pip install -e .
 For more details about this package, make sure to see the documentation available at https://teklia.gitlab.io/atr/dan/.
 
 ## Development
+
 For development and tests purpose it may be useful to install the project as a editable package with pip.
 
 * Use a virtualenv (e.g. with virtualenvwrapper `mkvirtualenv -a . dan`)
 * Install `dan` as a package (e.g. `pip install -e .`)
 
 ### Linter
+
 Code syntax is analyzed before submitting the code.\
 To run the linter tools suite you may use pre-commit.
 ```shell
@@ -27,6 +29,7 @@ pre-commit run -a
 ```
 
 ### Run tests
+
 Tests are executed with `tox` using [pytest](https://pytest.org).
 To install `tox`,
 ```shell
@@ -41,6 +44,16 @@ Run a single test: `tox -- <test_path>::<test_function>`
 
 The tests use a large file stored via [Git-LFS](https://docs.gitlab.com/ee/topics/git/lfs/). Make sure to run `git-lfs pull` before running them.
 
+### Update documentation
+
+Please keep the documentation updated when modifying or adding features.
+It's pretty easy to do:
+```shell
+pip install -r doc-requirements.txt
+mkdocs serve
+```
+
+You can then write in Markdown in the relevant `docs/*.md` files, and see live output on http://localhost:8000.
 
 ## Inference
 
@@ -71,6 +84,10 @@ text, confidence_scores = model.predict(image, confidences=True)
 
 This package provides three subcommands. To get more information about any subcommand, use the `--help` option.
 
+### Get started
+
+See the [dedicated section](https://teklia.gitlab.io/atr/dan/get_started/training/) on the official DAN documentation.
+
 ### Data extraction from Arkindex
 
 See the [dedicated section](https://teklia.gitlab.io/atr/dan/usage/datasets/extract/) on the official DAN documentation.
diff --git a/dan/ocr/document/train.py b/dan/ocr/document/train.py
index 7ba39de33f2043016653f88fe4f92de2bcd43562..63b5080a38857a483c3d205296a828b1cfc8d082 100644
--- a/dan/ocr/document/train.py
+++ b/dan/ocr/document/train.py
@@ -163,7 +163,7 @@ def get_config():
         },
         "training_params": {
             "output_folder": "outputs/dan_esposalles_record",  # folder name for checkpoint and results
-            "max_nb_epochs": 710,  # maximum number of epochs before to stop
+            "max_nb_epochs": 800,  # maximum number of epochs before to stop
             "max_training_time": 3600
             * 24
             * 1.9,  # maximum time before to stop (in seconds)
diff --git a/docs/get_started/development.md b/docs/get_started/development.md
new file mode 100644
index 0000000000000000000000000000000000000000..0e26ccd42725b9a27737b537c95da8ec8347334a
--- /dev/null
+++ b/docs/get_started/development.md
@@ -0,0 +1,36 @@
+# Development
+
+DAN uses different tools during its development.
+
+## Linter
+
+Code syntax is analyzed before submitting the code.
+
+To run the linter tools suite you may use [pre-commit](https://pre-commit.com).
+
+```shell
+pip install pre-commit
+pre-commit run -a
+```
+
+## Run tests
+
+Tests are executed with [tox](https://tox.wiki) using [pytest](https://pytest.org).
+
+```shell
+pip install tox
+tox
+```
+
+To recreate tox virtual environment (e.g. a dependencies update), you may run `tox -r`.
+
+## Update documentation
+
+Documentation is built with [MkDocs](https://www.mkdocs.org/).
+
+```shell
+pip install -r doc-requirements.txt
+mkdocs serve
+```
+
+You can then write in Markdown in the relevant `docs/*.md` files, and see live output on http://localhost:8000.
diff --git a/docs/get_started/index.md b/docs/get_started/index.md
new file mode 100644
index 0000000000000000000000000000000000000000..1eba3b4f21f6ff76c1c714be5c423ca4c7b64c00
--- /dev/null
+++ b/docs/get_started/index.md
@@ -0,0 +1,17 @@
+# Get started
+
+To use DAN in your own environment, install it using pip:
+
+```shell
+pip install -e .
+```
+
+To learn more about the newly installed `teklia-dan` command, make sure to run:
+```shell
+teklia-dan --help
+```
+
+Get started with:
+
+* [Developments](development.md)
+* [Training workflow](training.md)
diff --git a/docs/get_started/training.md b/docs/get_started/training.md
new file mode 100644
index 0000000000000000000000000000000000000000..8eb5dc0c2c08b8f63d489cedfce0fc8962b717f6
--- /dev/null
+++ b/docs/get_started/training.md
@@ -0,0 +1,74 @@
+# Training workflow
+
+There are a several steps to follow when training a DAN model.
+
+## 1. Extract data
+
+The data must be extracted and formatted for training. To extract the data, DAN uses an Arkindex export database in SQLite format. You will need to:
+
+1. Structure the data into folders (`train` / `val` / `test`) in [Arkindex](https://arkindex.teklia.com/).
+2. [Export the project](https://doc.arkindex.org/howto/export/) in SQLite format.
+3. Extract the data with the [extract command](../usage/datasets/extract.md).
+4. Format the data with the [format command](../usage/datasets/format.md).
+
+At the end, you should have a tree structure like this:
+```
+output/
+â”œâ”€â”€ charset.pkl
+â”œâ”€â”€ labels.json
+â”œâ”€â”€ split.json
+â”œâ”€â”€ images
+â”‚   â”œâ”€â”€ train
+â”‚   â”œâ”€â”€ val
+â”‚   â””â”€â”€ test
+â””â”€â”€ labels
+    â”œâ”€â”€ train
+    â”œâ”€â”€ val
+    â””â”€â”€ test
+```
+
+## 2. Train
+
+The training command does not take any input parameters for now. To train a DAN model, you will therefore need to:
+
+1. Update the parameters from those listed in the [dedicated page](../usage/train/parameters.md). You will always need to update at least these variables:
+
+  - `dataset_name`, `dataset_level`, `dataset_variant` and `dataset_path`,
+  - `model_params.transfer_learning.*[checkpoint_path]` to finetune an existing model,
+  - `training_params.output_folder`.
+
+2. Train a DAN model with the [train command](../usage/train/index.md).
+
+## 3. Predict
+
+Once the training is complete, you can apply a trained DAN model on an image.
+
+To do this, you will need to:
+
+1. Create a `parameters.yml` file using the parameters saved during training in the `params` file, located in `{training_params.output_folder}/results`. This file should have the following format:
+```yml
+version: 0.0.1
+parameters:
+  mean: [float, float, float]
+  std: [float, float, float]
+  max_char_prediction: int
+  encoder:
+    input_channels: int
+    dropout: float
+  decoder:
+    enc_dim: int
+    l_max: int
+    dec_pred_dropout: float
+    attention_win: int
+    use_1d_pe: bool
+    use_lstm: bool
+    vocab_size: int
+    h_max: int
+    w_max: int
+    dec_num_layers: int
+    dec_dim_feedforward: int
+    dec_num_heads: int
+    dec_att_dropout: float
+    dec_res_dropout: float
+```
+2. Apply a trained DAN model on an image using the [predict command](../usage/predict.md).
diff --git a/docs/index.md b/docs/index.md
index 925c2f15a016e517a35b053e857686a6121e4244..9aef9d1a4782ab746be76f990c7553ee1982713e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -9,36 +9,4 @@ The model uses a character-level attention to handle slanted lines:
 
 Click [here](original_paper.md) to learn more about the model and how it fares against SOTA models.
 
-## Getting started
-
-To use DAN in your own environment, install it using pip:
-
-```shell
-pip install -e .
-```
-
-To learn more about the newly installed `teklia-dan` command, make sure to run:
-```shell
-teklia-dan --help
-```
-
-## Linter
-
-Code syntax is analyzed before submitting the code.\
-To run the linter tools suite you may use pre-commit.
-
-```shell
-pip install pre-commit
-pre-commit run -a
-```
-
-## Run tests
-
-Tests are executed with tox using [pytest](https://pytest.org).
-
-```shell
-pip install tox
-tox
-```
-
-To recreate tox virtual environment (e.g. a dependencies update), you may run `tox -r`
+[Get started with DAN](get_started/index.md) now!
diff --git a/docs/usage/predict.md b/docs/usage/predict.md
index a15d425ce7392fa8cc8d4b0c9d6f833d3e3443fd..0532c603a85179d98345c59a279cb85a5d3db425 100644
--- a/docs/usage/predict.md
+++ b/docs/usage/predict.md
@@ -1,29 +1,29 @@
 # Predict
 
-Use the `teklia-dan predict` command to predict a trained DAN model on an image.
+Use the `teklia-dan predict` command to apply a trained DAN model on an image.
 
 ## Description of parameters
 
-| Parameter                   | Description                                                                                  | Type    | Default       |
-| --------------------------- | -------------------------------------------------------------------------------------------- | ------- | ------------- |
-| `--image`                   | Path to the image to predict. Must not be provided with `--image-dir`.                                                                | `Path`  |               |
-| `--image-dir`                   | Path to the folder where the images to predict are stored. Must not be provided with `--image`.                                                                | `Path`  |               |
-| `--image-extension`                   | The extension of the images in the folder. Ignored if `--image-dir` is not provided.                                                                 | `str`  |      .jpg         |
-| `--model`                   | Path to the model to use for prediction                                                      | `Path`  |               |
-| `--parameters`              | Path to the YAML parameters file.                                                            | `Path`  |               |
-| `--charset`                 | Path to the charset file.                                                                    | `Path`  |               |
-| `--output`                  | Path to the output folder. Results will be saved in this directory.                          | `Path`  |               |
-| `--scale`                   | Image scaling factor before feeding it to DAN.                                               | `float` | `1.0`         |
-| `--confidence-score`        | Whether to return confidence scores.                                                         | `bool`  | `False`       |
-| `--confidence-score-levels` | Level to return confidence scores. Should be any combination of `["line", "word", "char"]`.  | `str`   |               |
-| `--attention-map`           | Whether to plot attention maps.                                                              | `bool`  | `False`       |
-| `--attention-map-scale`     | Image scaling factor before creating the GIF.                                                | `float` | `0.5`         |
-| `--attention-map-level`     | Level to plot the attention maps. Should be in `["line", "word", "char"]`.                   | `str`   | `"line"`      |
-| `--predict-objects`         | Whether to return polygons coordinates.                                                      | `bool`  | `False`       |
-| `--word-separators`         | List of word separators.                                                                     | `list`  | `[" ", "\n"]` |
-| `--line-separators`         | List of line separators.                                                                     | `list`  | `["\n"]`      |
-| `--threshold-method`        | Method to use for attention mask thresholding. Should be in `["otsu", "simple"]`.            | `str`   | `"otsu"`      |
-| `--threshold-value `        | Threshold to use for the "simple" thresholding method.                                       | `int`   | `0`      |
+| Parameter                   | Description                                                                                     | Type    | Default       |
+| --------------------------- | ----------------------------------------------------------------------------------------------- | ------- | ------------- |
+| `--image`                   | Path to the image to predict. Must not be provided with `--image-dir`.                          | `Path`  |               |
+| `--image-dir`               | Path to the folder where the images to predict are stored. Must not be provided with `--image`. | `Path`  |               |
+| `--image-extension`         | The extension of the images in the folder. Ignored if `--image-dir` is not provided.            | `str`   | .jpg          |
+| `--model`                   | Path to the model to use for prediction                                                         | `Path`  |               |
+| `--parameters`              | Path to the YAML parameters file.                                                               | `Path`  |               |
+| `--charset`                 | Path to the charset file.                                                                       | `Path`  |               |
+| `--output`                  | Path to the output folder. Results will be saved in this directory.                             | `Path`  |               |
+| `--scale`                   | Image scaling factor before feeding it to DAN.                                                  | `float` | `1.0`         |
+| `--confidence-score`        | Whether to return confidence scores.                                                            | `bool`  | `False`       |
+| `--confidence-score-levels` | Level to return confidence scores. Should be any combination of `["line", "word", "char"]`.     | `str`   |               |
+| `--attention-map`           | Whether to plot attention maps.                                                                 | `bool`  | `False`       |
+| `--attention-map-scale`     | Image scaling factor before creating the GIF.                                                   | `float` | `0.5`         |
+| `--attention-map-level`     | Level to plot the attention maps. Should be in `["line", "word", "char"]`.                      | `str`   | `"line"`      |
+| `--predict-objects`         | Whether to return polygons coordinates.                                                         | `bool`  | `False`       |
+| `--word-separators`         | List of word separators.                                                                        | `list`  | `[" ", "\n"]` |
+| `--line-separators`         | List of line separators.                                                                        | `list`  | `["\n"]`      |
+| `--threshold-method`        | Method to use for attention mask thresholding. Should be in `["otsu", "simple"]`.               | `str`   | `"otsu"`      |
+| `--threshold-value `        | Threshold to use for the "simple" thresholding method.                                          | `int`   | `0`           |
 
 ## Examples
 
diff --git a/docs/usage/train/index.md b/docs/usage/train/index.md
index 2627b06902f5f353e7a05eb497550767edb5925d..2f316e7f1a01c24773d074e3a2bb3ee5ca713b1c 100644
--- a/docs/usage/train/index.md
+++ b/docs/usage/train/index.md
@@ -18,7 +18,7 @@ To train DAN on documents:
 
 To train DAN on lines, run `teklia-dan train document` with a line dataset.
 
-## Additional page
+## Additional pages
 
 * [Jean Zay tutorial](jeanzay.md)
 * [Data augmentation](augmentation.md)
diff --git a/docs/usage/train/parameters.md b/docs/usage/train/parameters.md
index ce10c3299fd5d0ab8624cf15e30a668246be295f..e2aad9531cad0f0775e33fa93ae9d9ac5283a6a2 100644
--- a/docs/usage/train/parameters.md
+++ b/docs/usage/train/parameters.md
@@ -1,22 +1,23 @@
 # Training configuration
 
-All hyperparameters are specified and editable in the training scripts (meaning are in comments). This page introduces some useful keys and their description.
+All hyperparameters are specified and editable in the training scripts `dan/ocr/document/train.py::get_config` (descriptions are in comments). This page introduces some useful keys and theirs descriptions.
 
 ## Dataset parameters
 
-| Parameter                               | Description                                                                            | Type         | Default                                              |
-| --------------------------------------- | -------------------------------------------------------------------------------------- | ------------ | ---------------------------------------------------- |
-| `dataset_name`                          | Name of the dataset.                                                                   | `str`        |                                                      |
-| `dataset_level`                         | Level of the dataset. Should be named after the element type.                          | `str`        |                                                      |
-| `dataset_variant`                       | Variant of the dataset. Usually empty for HTR datasets, `"_sem"` for HTR+NER datasets. | `str`        |                                                      |
-| `dataset_path`                          | Path to the dataset.                                                                   | `str`        |                                                      |
-| `dataset_params.config.dataset_manager` | Dataset manager class.                                                                 | custom class | `OCRDatasetManager`                                  |
-| `dataset_params.config.dataset_class`   | Dataset class.                                                                         | custom class | `OCRDataset`                                         |
-| `dataset_params.config.datasets`        | Dataset dictionary with the dataset name as key and dataset path as value.             | `dict`       |                                                      |
-| `dataset_params.config.load_in_memory`  | Load all images in CPU memory.                                                         | `str`        | `True`                                               |
-| `dataset_params.config.worker_per_gpu`  | Number of parallel processes per gpu for data loading.                                 | `int`        | `4`                                                  |
-| `dataset_params.config.preprocessings`  | List of pre-processing functions to apply to input images.                             | `list`       | (see [dedicated section](#data-preprocessing))       |
-| `dataset_params.config.augmentation`    | Whether to use data augmentation on the training set.                                  | `bool`       | `True` (see [dedicated section](#data-augmentation)) |
+| Parameter                              | Description                                                                            | Type   | Default                                              |
+| -------------------------------------- | -------------------------------------------------------------------------------------- | ------ | ---------------------------------------------------- |
+| `dataset_name`                         | Name of the dataset.                                                                   | `str`  |                                                      |
+| `dataset_level`                        | Level of the dataset. Should be named after the element type.                          | `str`  |                                                      |
+| `dataset_variant`                      | Variant of the dataset. Usually empty for HTR datasets, `"_sem"` for HTR+NER datasets. | `str`  |                                                      |
+| `dataset_path`                         | Path to the dataset.                                                                   | `str`  |                                                      |
+| `dataset_params.config.load_in_memory` | Load all images in CPU memory.                                                         | `bool` | `True`                                               |
+| `dataset_params.config.worker_per_gpu` | Number of parallel processes per gpu for data loading.                                 | `int`  | `4`                                                  |
+| `dataset_params.config.preprocessings` | List of pre-processing functions to apply to input images.                             | `list` | (see [dedicated section](#data-preprocessing))       |
+| `dataset_params.config.augmentation`   | Whether to use data augmentation on the training set.                                  | `bool` | `True` (see [dedicated section](#data-augmentation)) |
+
+!!! warning
+    The variables `dataset_name`, `dataset_level`, `dataset_variant` and `dataset_path` must have values such that the data is located in `{dataset_path}/{dataset_name}_{dataset_level}{dataset_variant}`.
+
 
 ### Data preprocessing
 
@@ -123,33 +124,33 @@ For a detailed description of all augmentation transforms, see the [dedicated pa
 
 ## Model parameters
 
-| Name                                      | Description                                                                          | Type          | Default                                                           |
-| ----------------------------------------- | ------------------------------------------------------------------------------------ | ------------- | ----------------------------------------------------------------- |
-| `model_params.models.encoder`             | Encoder class.                                                                       | custom class  | `FCN_encoder`                                                     |
-| `model_params.models.decoder`             | Decoder class.                                                                       | custom class  | `GlobalHTADecoder`                                                |
-| `model_params.transfer_learning.encoder`  | Model to load for the encoder [state_dict_name, checkpoint_path, learnable, strict]. | `list`        | `["encoder", "pretrained_models/dan_rimes_page.pt", True, True]`  |
-| `model_params.transfer_learning.decoder`  | Model to load for the decoder [state_dict_name, checkpoint_path, learnable, strict]. | `list`        | `["encoder", "pretrained_models/dan_rimes_page.pt", True, False]` |
-| `model_params.transfered_charset`         | Transfer learning of the decision layer based on charset of the model to transfer.   | `bool`        | `True`                                                            |
-| `model_params.additional_tokens`          | For decision layer = [<eot>, ], only for transferred charset.                        | `int`         | `1`                                                               |
-| `model_params.input_channels`             | Number of channels of input image.                                                   | `int`         | `3`                                                               |
-| `model_params.dropout`                    | Dropout probability in the encoder.                                                  | `float`       | `0.5`                                                             |
-| `model_params.enc_dim`                    | Dimension of features extracted by the encoder.                                      | `int`         | `256`                                                             |
-| `model_params.nb_layers`                  | Number of layers in the encoder.                                                     | `int`         | `5`                                                               |
-| `model_params.h_max`                      | Maximum height for encoder output (for 2D positional embedding).                     | `int`         | `500`                                                             |
-| `model_params.w_max`                      | Maximum width for encoder output (for 2D positional embedding).                      | `int`         | `1000`                                                            |
-| `model_params.l_max`                      | Maximum predicted sequence length (for 1D positional embedding).                     | `int`         | `15000`                                                           |
-| `model_params.dec_num_layers`             | Number of transformer decoder layers.                                                | `int`         | `8`                                                               |
-| `model_params.dec_num_heads`              | Number of heads in transformer decoder layers.                                       | `int`         | `4`                                                               |
-| `model_params.dec_res_dropout`            | Dropout probability in transformer decoder layers.                                   | `int`         | `0.1`                                                             |
-| `model_params.dec_pred_dropout`           | Dropout rate before decision layer.                                                  | `float`       | `0.1`                                                             |
-| `model_params.dec_att_dropout`            | Dropout rate in multi head attention.                                                | `float`       | `0.1`                                                             |
-| `model_params.dec_dim_feedforward`        | Number of dimensions for feedforward layer in transformer decoder layers.            | `int`         | `256`                                                             |
-| `model_params.use_2d_pe`                  | Whether to use 2D positional embedding.                                              | `bool`        | `True`                                                            |
-| `model_params.use_1d_pe`                  | Whether to use 1D positional embedding.                                              | `bool`        | `True`                                                            |
-| `model_params.use_lstm`                   | Whether to use a LSTM layer in the decoder.                                          | `bool`        | `False`                                                           |
-| `model_params.attention_win`              | Length of attention window.                                                          | `int`         | `100`                                                             |
-| `model_params.dropout_scheduler.function` | Curriculum dropout scheduler.                                                        | custom class. | `100`                                                             |
-| `model_params.dropout_scheduler.T`        | Exponential factor.                                                                  | `float`       | `5e4`                                                             |
+| Name                                      | Description                                                                          | Type         | Default                                                           |
+| ----------------------------------------- | ------------------------------------------------------------------------------------ | ------------ | ----------------------------------------------------------------- |
+| `model_params.models.encoder`             | Encoder class.                                                                       | custom class | `FCN_encoder`                                                     |
+| `model_params.models.decoder`             | Decoder class.                                                                       | custom class | `GlobalHTADecoder`                                                |
+| `model_params.transfer_learning.encoder`  | Model to load for the encoder [state_dict_name, checkpoint_path, learnable, strict]. | `list`       | `["encoder", "pretrained_models/dan_rimes_page.pt", True, True]`  |
+| `model_params.transfer_learning.decoder`  | Model to load for the decoder [state_dict_name, checkpoint_path, learnable, strict]. | `list`       | `["encoder", "pretrained_models/dan_rimes_page.pt", True, False]` |
+| `model_params.transfered_charset`         | Transfer learning of the decision layer based on charset of the model to transfer.   | `bool`       | `True`                                                            |
+| `model_params.additional_tokens`          | For decision layer = [<eot>, ], only for transferred charset.                        | `int`        | `1`                                                               |
+| `model_params.input_channels`             | Number of channels of input image.                                                   | `int`        | `3`                                                               |
+| `model_params.dropout`                    | Dropout probability in the encoder.                                                  | `float`      | `0.5`                                                             |
+| `model_params.enc_dim`                    | Dimension of features extracted by the encoder.                                      | `int`        | `256`                                                             |
+| `model_params.nb_layers`                  | Number of layers in the encoder.                                                     | `int`        | `5`                                                               |
+| `model_params.h_max`                      | Maximum height for encoder output (for 2D positional embedding).                     | `int`        | `500`                                                             |
+| `model_params.w_max`                      | Maximum width for encoder output (for 2D positional embedding).                      | `int`        | `1000`                                                            |
+| `model_params.l_max`                      | Maximum predicted sequence length (for 1D positional embedding).                     | `int`        | `15000`                                                           |
+| `model_params.dec_num_layers`             | Number of transformer decoder layers.                                                | `int`        | `8`                                                               |
+| `model_params.dec_num_heads`              | Number of heads in transformer decoder layers.                                       | `int`        | `4`                                                               |
+| `model_params.dec_res_dropout`            | Dropout probability in transformer decoder layers.                                   | `int`        | `0.1`                                                             |
+| `model_params.dec_pred_dropout`           | Dropout rate before decision layer.                                                  | `float`      | `0.1`                                                             |
+| `model_params.dec_att_dropout`            | Dropout rate in multi head attention.                                                | `float`      | `0.1`                                                             |
+| `model_params.dec_dim_feedforward`        | Number of dimensions for feedforward layer in transformer decoder layers.            | `int`        | `256`                                                             |
+| `model_params.use_2d_pe`                  | Whether to use 2D positional embedding.                                              | `bool`       | `True`                                                            |
+| `model_params.use_1d_pe`                  | Whether to use 1D positional embedding.                                              | `bool`       | `True`                                                            |
+| `model_params.use_lstm`                   | Whether to use a LSTM layer in the decoder.                                          | `bool`       | `False`                                                           |
+| `model_params.attention_win`              | Length of attention window.                                                          | `int`        | `100`                                                             |
+| `model_params.dropout_scheduler.function` | Curriculum dropout scheduler.                                                        | custom class | `exponential_dropout_scheduler`                                   |
+| `model_params.dropout_scheduler.T`        | Exponential factor.                                                                  | `float`      | `5e4`                                                             |
 
 
 ## Training parameters
@@ -158,17 +159,17 @@ For a detailed description of all augmentation transforms, see the [dedicated pa
 | ------------------------------------------------------- | --------------------------------------------------------------------------- | ------------ | ------------------------------------------- |
 | `training_params.output_folder`                         | Directory for checkpoint and results.                                       | `str`        |                                             |
 | `training_params.max_nb_epochs`                         | Maximum number of epochs before stopping training.                          | `int`        | `800`                                       |
-| `training_params.max_training_time`                     | Maximum time (in seconds) before stopping training.                         | `int`        | `350000`                                    |
+| `training_params.max_training_time`                     | Maximum time (in seconds) before stopping training.                         | `int`        | `164160`                                    |
 | `training_params.load_epoch`                            | Model to load. Should be either `"best"` (evaluation) or `last` (training). | `str`        | `"last"`                                    |
 | `training_params.interval_save_weights`                 | Step to save weights. Set to `None` to keep only best and last epochs.      | `int`        | `None`                                      |
 | `training_params.batch_size`                            | Mini-batch size for the training loop.                                      | `int`        | `2`                                         |
 | `training_params.use_ddp`                               | Whether to use DistributedDataParallel.                                     | `bool`       | `False`                                     |
 | `training_params.ddp_port`                              | DDP port.                                                                   | `int`        | `20027`                                     |
-| `training_params.use_amp`                               | Whether to enable automatic mix-precision.                                  | `int`        | `torch.cuda.device_count()`                 |
-| `training_params.nb_gpu`                                | Number of GPUs to train DAN.                                                | `str`        |                                             |
+| `training_params.use_amp`                               | Whether to enable automatic mix-precision.                                  | `bool`       | `True`                                      |
+| `training_params.nb_gpu`                                | Number of GPUs to train DAN.                                                | `int`        | `torch.cuda.device_count()`                 |
 | `training_params.optimizers.all.class`                  | Optimizer class.                                                            | custom class | `Adam`                                      |
 | `training_params.optimizers.all.args.lr`                | Learning rate for the optimizer.                                            | `float`      | `0.0001`                                    |
-| `training_params.optimizers.all.args.amsgrad`           | Whether to use AMSGrad optimization.                                        | custom class | `False`                                     |
+| `training_params.optimizers.all.args.amsgrad`           | Whether to use AMSGrad optimization.                                        | `bool`       | `False`                                     |
 | `training_params.lr_schedulers`                         | Learning rate schedulers.                                                   | custom class | `None`                                      |
 | `training_params.eval_on_valid`                         | Whether to evaluate and log metrics on the validation set during training.  | `bool`       | `True`                                      |
 | `training_params.eval_on_valid_interval`                | Interval (in epochs) to evaluate during training.                           | `int`        | `5`                                         |
@@ -176,7 +177,7 @@ For a detailed description of all augmentation transforms, see the [dedicated pa
 | `training_params.expected_metric_value`                 | Best value for the focus metric. Should be either `"high"` or `"low"`.      | `low`        | `cer`                                       |
 | `training_params.set_name_focus_metric`                 | Dataset to focus on to select best weights.                                 | `str`        |                                             |
 | `training_params.train_metrics`                         | List of metrics to compute during training.                                 | `list`       | `["loss_ce", "cer", "wer", "wer_no_punct"]` |
-| `training_params.train_metrics`                         | List of metrics to compute during validation.                               | `list`       | `["cer", "wer", "wer_no_punct"]`            |
+| `training_params.eval_metrics`                          | List of metrics to compute during validation.                               | `list`       | `["cer", "wer", "wer_no_punct"]`            |
 | `training_params.force_cpu`                             | Whether to train on CPU (for debugging).                                    | `bool`       | `False`                                     |
 | `training_params.max_char_prediction`                   | Maximum number of characters to predict.                                    | `int`        | `1000`                                      |
 | `training_params.label_noise_scheduler.min_error_rate`  | Minimum ratio of teacher forcing.                                           | `float`      | `0.2`                                       |
diff --git a/mkdocs.yml b/mkdocs.yml
index 3c7787d4cb9616024eb7203881731e841bf4ad30..79cc177cf8abda07c90289b4c3457208be422780 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -53,6 +53,10 @@ plugins:
 nav:
   - Home: index.md
   - Original implementation: original_paper.md
+  - Get started:
+    - get_started/index.md
+    - Development: get_started/development.md
+    - Training: get_started/training.md
   - Usage:
     - usage/index.md
     - Datasets: