diff --git a/README.md b/README.md index c382b4853e28bfddd6c2acbf51f18dcdd4724c18..a9d524ac7de6b38a5ff8f2b1f25faeca7c8824aa 100644 --- a/README.md +++ b/README.md @@ -13,12 +13,14 @@ pip install -e . For more details about this package, make sure to see the documentation available at https://teklia.gitlab.io/atr/dan/. ## Development + For development and tests purpose it may be useful to install the project as a editable package with pip. * Use a virtualenv (e.g. with virtualenvwrapper `mkvirtualenv -a . dan`) * Install `dan` as a package (e.g. `pip install -e .`) ### Linter + Code syntax is analyzed before submitting the code.\ To run the linter tools suite you may use pre-commit. ```shell @@ -27,6 +29,7 @@ pre-commit run -a ``` ### Run tests + Tests are executed with `tox` using [pytest](https://pytest.org). To install `tox`, ```shell @@ -41,6 +44,16 @@ Run a single test: `tox -- <test_path>::<test_function>` The tests use a large file stored via [Git-LFS](https://docs.gitlab.com/ee/topics/git/lfs/). Make sure to run `git-lfs pull` before running them. +### Update documentation + +Please keep the documentation updated when modifying or adding features. +It's pretty easy to do: +```shell +pip install -r doc-requirements.txt +mkdocs serve +``` + +You can then write in Markdown in the relevant `docs/*.md` files, and see live output on http://localhost:8000. ## Inference @@ -71,6 +84,10 @@ text, confidence_scores = model.predict(image, confidences=True) This package provides three subcommands. To get more information about any subcommand, use the `--help` option. +### Get started + +See the [dedicated section](https://teklia.gitlab.io/atr/dan/get_started/training/) on the official DAN documentation. + ### Data extraction from Arkindex See the [dedicated section](https://teklia.gitlab.io/atr/dan/usage/datasets/extract/) on the official DAN documentation. diff --git a/dan/ocr/document/train.py b/dan/ocr/document/train.py index 7ba39de33f2043016653f88fe4f92de2bcd43562..63b5080a38857a483c3d205296a828b1cfc8d082 100644 --- a/dan/ocr/document/train.py +++ b/dan/ocr/document/train.py @@ -163,7 +163,7 @@ def get_config(): }, "training_params": { "output_folder": "outputs/dan_esposalles_record", # folder name for checkpoint and results - "max_nb_epochs": 710, # maximum number of epochs before to stop + "max_nb_epochs": 800, # maximum number of epochs before to stop "max_training_time": 3600 * 24 * 1.9, # maximum time before to stop (in seconds) diff --git a/docs/get_started/development.md b/docs/get_started/development.md new file mode 100644 index 0000000000000000000000000000000000000000..0e26ccd42725b9a27737b537c95da8ec8347334a --- /dev/null +++ b/docs/get_started/development.md @@ -0,0 +1,36 @@ +# Development + +DAN uses different tools during its development. + +## Linter + +Code syntax is analyzed before submitting the code. + +To run the linter tools suite you may use [pre-commit](https://pre-commit.com). + +```shell +pip install pre-commit +pre-commit run -a +``` + +## Run tests + +Tests are executed with [tox](https://tox.wiki) using [pytest](https://pytest.org). + +```shell +pip install tox +tox +``` + +To recreate tox virtual environment (e.g. a dependencies update), you may run `tox -r`. + +## Update documentation + +Documentation is built with [MkDocs](https://www.mkdocs.org/). + +```shell +pip install -r doc-requirements.txt +mkdocs serve +``` + +You can then write in Markdown in the relevant `docs/*.md` files, and see live output on http://localhost:8000. diff --git a/docs/get_started/index.md b/docs/get_started/index.md new file mode 100644 index 0000000000000000000000000000000000000000..1eba3b4f21f6ff76c1c714be5c423ca4c7b64c00 --- /dev/null +++ b/docs/get_started/index.md @@ -0,0 +1,17 @@ +# Get started + +To use DAN in your own environment, install it using pip: + +```shell +pip install -e . +``` + +To learn more about the newly installed `teklia-dan` command, make sure to run: +```shell +teklia-dan --help +``` + +Get started with: + +* [Developments](development.md) +* [Training workflow](training.md) diff --git a/docs/get_started/training.md b/docs/get_started/training.md new file mode 100644 index 0000000000000000000000000000000000000000..8eb5dc0c2c08b8f63d489cedfce0fc8962b717f6 --- /dev/null +++ b/docs/get_started/training.md @@ -0,0 +1,74 @@ +# Training workflow + +There are a several steps to follow when training a DAN model. + +## 1. Extract data + +The data must be extracted and formatted for training. To extract the data, DAN uses an Arkindex export database in SQLite format. You will need to: + +1. Structure the data into folders (`train` / `val` / `test`) in [Arkindex](https://arkindex.teklia.com/). +2. [Export the project](https://doc.arkindex.org/howto/export/) in SQLite format. +3. Extract the data with the [extract command](../usage/datasets/extract.md). +4. Format the data with the [format command](../usage/datasets/format.md). + +At the end, you should have a tree structure like this: +``` +output/ +├── charset.pkl +├── labels.json +├── split.json +├── images +│ ├── train +│ ├── val +│ └── test +└── labels + ├── train + ├── val + └── test +``` + +## 2. Train + +The training command does not take any input parameters for now. To train a DAN model, you will therefore need to: + +1. Update the parameters from those listed in the [dedicated page](../usage/train/parameters.md). You will always need to update at least these variables: + + - `dataset_name`, `dataset_level`, `dataset_variant` and `dataset_path`, + - `model_params.transfer_learning.*[checkpoint_path]` to finetune an existing model, + - `training_params.output_folder`. + +2. Train a DAN model with the [train command](../usage/train/index.md). + +## 3. Predict + +Once the training is complete, you can apply a trained DAN model on an image. + +To do this, you will need to: + +1. Create a `parameters.yml` file using the parameters saved during training in the `params` file, located in `{training_params.output_folder}/results`. This file should have the following format: +```yml +version: 0.0.1 +parameters: + mean: [float, float, float] + std: [float, float, float] + max_char_prediction: int + encoder: + input_channels: int + dropout: float + decoder: + enc_dim: int + l_max: int + dec_pred_dropout: float + attention_win: int + use_1d_pe: bool + use_lstm: bool + vocab_size: int + h_max: int + w_max: int + dec_num_layers: int + dec_dim_feedforward: int + dec_num_heads: int + dec_att_dropout: float + dec_res_dropout: float +``` +2. Apply a trained DAN model on an image using the [predict command](../usage/predict.md). diff --git a/docs/index.md b/docs/index.md index 925c2f15a016e517a35b053e857686a6121e4244..9aef9d1a4782ab746be76f990c7553ee1982713e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -9,36 +9,4 @@ The model uses a character-level attention to handle slanted lines: Click [here](original_paper.md) to learn more about the model and how it fares against SOTA models. -## Getting started - -To use DAN in your own environment, install it using pip: - -```shell -pip install -e . -``` - -To learn more about the newly installed `teklia-dan` command, make sure to run: -```shell -teklia-dan --help -``` - -## Linter - -Code syntax is analyzed before submitting the code.\ -To run the linter tools suite you may use pre-commit. - -```shell -pip install pre-commit -pre-commit run -a -``` - -## Run tests - -Tests are executed with tox using [pytest](https://pytest.org). - -```shell -pip install tox -tox -``` - -To recreate tox virtual environment (e.g. a dependencies update), you may run `tox -r` +[Get started with DAN](get_started/index.md) now! diff --git a/docs/usage/predict.md b/docs/usage/predict.md index a15d425ce7392fa8cc8d4b0c9d6f833d3e3443fd..0532c603a85179d98345c59a279cb85a5d3db425 100644 --- a/docs/usage/predict.md +++ b/docs/usage/predict.md @@ -1,29 +1,29 @@ # Predict -Use the `teklia-dan predict` command to predict a trained DAN model on an image. +Use the `teklia-dan predict` command to apply a trained DAN model on an image. ## Description of parameters -| Parameter | Description | Type | Default | -| --------------------------- | -------------------------------------------------------------------------------------------- | ------- | ------------- | -| `--image` | Path to the image to predict. Must not be provided with `--image-dir`. | `Path` | | -| `--image-dir` | Path to the folder where the images to predict are stored. Must not be provided with `--image`. | `Path` | | -| `--image-extension` | The extension of the images in the folder. Ignored if `--image-dir` is not provided. | `str` | .jpg | -| `--model` | Path to the model to use for prediction | `Path` | | -| `--parameters` | Path to the YAML parameters file. | `Path` | | -| `--charset` | Path to the charset file. | `Path` | | -| `--output` | Path to the output folder. Results will be saved in this directory. | `Path` | | -| `--scale` | Image scaling factor before feeding it to DAN. | `float` | `1.0` | -| `--confidence-score` | Whether to return confidence scores. | `bool` | `False` | -| `--confidence-score-levels` | Level to return confidence scores. Should be any combination of `["line", "word", "char"]`. | `str` | | -| `--attention-map` | Whether to plot attention maps. | `bool` | `False` | -| `--attention-map-scale` | Image scaling factor before creating the GIF. | `float` | `0.5` | -| `--attention-map-level` | Level to plot the attention maps. Should be in `["line", "word", "char"]`. | `str` | `"line"` | -| `--predict-objects` | Whether to return polygons coordinates. | `bool` | `False` | -| `--word-separators` | List of word separators. | `list` | `[" ", "\n"]` | -| `--line-separators` | List of line separators. | `list` | `["\n"]` | -| `--threshold-method` | Method to use for attention mask thresholding. Should be in `["otsu", "simple"]`. | `str` | `"otsu"` | -| `--threshold-value ` | Threshold to use for the "simple" thresholding method. | `int` | `0` | +| Parameter | Description | Type | Default | +| --------------------------- | ----------------------------------------------------------------------------------------------- | ------- | ------------- | +| `--image` | Path to the image to predict. Must not be provided with `--image-dir`. | `Path` | | +| `--image-dir` | Path to the folder where the images to predict are stored. Must not be provided with `--image`. | `Path` | | +| `--image-extension` | The extension of the images in the folder. Ignored if `--image-dir` is not provided. | `str` | .jpg | +| `--model` | Path to the model to use for prediction | `Path` | | +| `--parameters` | Path to the YAML parameters file. | `Path` | | +| `--charset` | Path to the charset file. | `Path` | | +| `--output` | Path to the output folder. Results will be saved in this directory. | `Path` | | +| `--scale` | Image scaling factor before feeding it to DAN. | `float` | `1.0` | +| `--confidence-score` | Whether to return confidence scores. | `bool` | `False` | +| `--confidence-score-levels` | Level to return confidence scores. Should be any combination of `["line", "word", "char"]`. | `str` | | +| `--attention-map` | Whether to plot attention maps. | `bool` | `False` | +| `--attention-map-scale` | Image scaling factor before creating the GIF. | `float` | `0.5` | +| `--attention-map-level` | Level to plot the attention maps. Should be in `["line", "word", "char"]`. | `str` | `"line"` | +| `--predict-objects` | Whether to return polygons coordinates. | `bool` | `False` | +| `--word-separators` | List of word separators. | `list` | `[" ", "\n"]` | +| `--line-separators` | List of line separators. | `list` | `["\n"]` | +| `--threshold-method` | Method to use for attention mask thresholding. Should be in `["otsu", "simple"]`. | `str` | `"otsu"` | +| `--threshold-value ` | Threshold to use for the "simple" thresholding method. | `int` | `0` | ## Examples diff --git a/docs/usage/train/index.md b/docs/usage/train/index.md index 2627b06902f5f353e7a05eb497550767edb5925d..2f316e7f1a01c24773d074e3a2bb3ee5ca713b1c 100644 --- a/docs/usage/train/index.md +++ b/docs/usage/train/index.md @@ -18,7 +18,7 @@ To train DAN on documents: To train DAN on lines, run `teklia-dan train document` with a line dataset. -## Additional page +## Additional pages * [Jean Zay tutorial](jeanzay.md) * [Data augmentation](augmentation.md) diff --git a/docs/usage/train/parameters.md b/docs/usage/train/parameters.md index ce10c3299fd5d0ab8624cf15e30a668246be295f..e2aad9531cad0f0775e33fa93ae9d9ac5283a6a2 100644 --- a/docs/usage/train/parameters.md +++ b/docs/usage/train/parameters.md @@ -1,22 +1,23 @@ # Training configuration -All hyperparameters are specified and editable in the training scripts (meaning are in comments). This page introduces some useful keys and their description. +All hyperparameters are specified and editable in the training scripts `dan/ocr/document/train.py::get_config` (descriptions are in comments). This page introduces some useful keys and theirs descriptions. ## Dataset parameters -| Parameter | Description | Type | Default | -| --------------------------------------- | -------------------------------------------------------------------------------------- | ------------ | ---------------------------------------------------- | -| `dataset_name` | Name of the dataset. | `str` | | -| `dataset_level` | Level of the dataset. Should be named after the element type. | `str` | | -| `dataset_variant` | Variant of the dataset. Usually empty for HTR datasets, `"_sem"` for HTR+NER datasets. | `str` | | -| `dataset_path` | Path to the dataset. | `str` | | -| `dataset_params.config.dataset_manager` | Dataset manager class. | custom class | `OCRDatasetManager` | -| `dataset_params.config.dataset_class` | Dataset class. | custom class | `OCRDataset` | -| `dataset_params.config.datasets` | Dataset dictionary with the dataset name as key and dataset path as value. | `dict` | | -| `dataset_params.config.load_in_memory` | Load all images in CPU memory. | `str` | `True` | -| `dataset_params.config.worker_per_gpu` | Number of parallel processes per gpu for data loading. | `int` | `4` | -| `dataset_params.config.preprocessings` | List of pre-processing functions to apply to input images. | `list` | (see [dedicated section](#data-preprocessing)) | -| `dataset_params.config.augmentation` | Whether to use data augmentation on the training set. | `bool` | `True` (see [dedicated section](#data-augmentation)) | +| Parameter | Description | Type | Default | +| -------------------------------------- | -------------------------------------------------------------------------------------- | ------ | ---------------------------------------------------- | +| `dataset_name` | Name of the dataset. | `str` | | +| `dataset_level` | Level of the dataset. Should be named after the element type. | `str` | | +| `dataset_variant` | Variant of the dataset. Usually empty for HTR datasets, `"_sem"` for HTR+NER datasets. | `str` | | +| `dataset_path` | Path to the dataset. | `str` | | +| `dataset_params.config.load_in_memory` | Load all images in CPU memory. | `bool` | `True` | +| `dataset_params.config.worker_per_gpu` | Number of parallel processes per gpu for data loading. | `int` | `4` | +| `dataset_params.config.preprocessings` | List of pre-processing functions to apply to input images. | `list` | (see [dedicated section](#data-preprocessing)) | +| `dataset_params.config.augmentation` | Whether to use data augmentation on the training set. | `bool` | `True` (see [dedicated section](#data-augmentation)) | + +!!! warning + The variables `dataset_name`, `dataset_level`, `dataset_variant` and `dataset_path` must have values such that the data is located in `{dataset_path}/{dataset_name}_{dataset_level}{dataset_variant}`. + ### Data preprocessing @@ -123,33 +124,33 @@ For a detailed description of all augmentation transforms, see the [dedicated pa ## Model parameters -| Name | Description | Type | Default | -| ----------------------------------------- | ------------------------------------------------------------------------------------ | ------------- | ----------------------------------------------------------------- | -| `model_params.models.encoder` | Encoder class. | custom class | `FCN_encoder` | -| `model_params.models.decoder` | Decoder class. | custom class | `GlobalHTADecoder` | -| `model_params.transfer_learning.encoder` | Model to load for the encoder [state_dict_name, checkpoint_path, learnable, strict]. | `list` | `["encoder", "pretrained_models/dan_rimes_page.pt", True, True]` | -| `model_params.transfer_learning.decoder` | Model to load for the decoder [state_dict_name, checkpoint_path, learnable, strict]. | `list` | `["encoder", "pretrained_models/dan_rimes_page.pt", True, False]` | -| `model_params.transfered_charset` | Transfer learning of the decision layer based on charset of the model to transfer. | `bool` | `True` | -| `model_params.additional_tokens` | For decision layer = [<eot>, ], only for transferred charset. | `int` | `1` | -| `model_params.input_channels` | Number of channels of input image. | `int` | `3` | -| `model_params.dropout` | Dropout probability in the encoder. | `float` | `0.5` | -| `model_params.enc_dim` | Dimension of features extracted by the encoder. | `int` | `256` | -| `model_params.nb_layers` | Number of layers in the encoder. | `int` | `5` | -| `model_params.h_max` | Maximum height for encoder output (for 2D positional embedding). | `int` | `500` | -| `model_params.w_max` | Maximum width for encoder output (for 2D positional embedding). | `int` | `1000` | -| `model_params.l_max` | Maximum predicted sequence length (for 1D positional embedding). | `int` | `15000` | -| `model_params.dec_num_layers` | Number of transformer decoder layers. | `int` | `8` | -| `model_params.dec_num_heads` | Number of heads in transformer decoder layers. | `int` | `4` | -| `model_params.dec_res_dropout` | Dropout probability in transformer decoder layers. | `int` | `0.1` | -| `model_params.dec_pred_dropout` | Dropout rate before decision layer. | `float` | `0.1` | -| `model_params.dec_att_dropout` | Dropout rate in multi head attention. | `float` | `0.1` | -| `model_params.dec_dim_feedforward` | Number of dimensions for feedforward layer in transformer decoder layers. | `int` | `256` | -| `model_params.use_2d_pe` | Whether to use 2D positional embedding. | `bool` | `True` | -| `model_params.use_1d_pe` | Whether to use 1D positional embedding. | `bool` | `True` | -| `model_params.use_lstm` | Whether to use a LSTM layer in the decoder. | `bool` | `False` | -| `model_params.attention_win` | Length of attention window. | `int` | `100` | -| `model_params.dropout_scheduler.function` | Curriculum dropout scheduler. | custom class. | `100` | -| `model_params.dropout_scheduler.T` | Exponential factor. | `float` | `5e4` | +| Name | Description | Type | Default | +| ----------------------------------------- | ------------------------------------------------------------------------------------ | ------------ | ----------------------------------------------------------------- | +| `model_params.models.encoder` | Encoder class. | custom class | `FCN_encoder` | +| `model_params.models.decoder` | Decoder class. | custom class | `GlobalHTADecoder` | +| `model_params.transfer_learning.encoder` | Model to load for the encoder [state_dict_name, checkpoint_path, learnable, strict]. | `list` | `["encoder", "pretrained_models/dan_rimes_page.pt", True, True]` | +| `model_params.transfer_learning.decoder` | Model to load for the decoder [state_dict_name, checkpoint_path, learnable, strict]. | `list` | `["encoder", "pretrained_models/dan_rimes_page.pt", True, False]` | +| `model_params.transfered_charset` | Transfer learning of the decision layer based on charset of the model to transfer. | `bool` | `True` | +| `model_params.additional_tokens` | For decision layer = [<eot>, ], only for transferred charset. | `int` | `1` | +| `model_params.input_channels` | Number of channels of input image. | `int` | `3` | +| `model_params.dropout` | Dropout probability in the encoder. | `float` | `0.5` | +| `model_params.enc_dim` | Dimension of features extracted by the encoder. | `int` | `256` | +| `model_params.nb_layers` | Number of layers in the encoder. | `int` | `5` | +| `model_params.h_max` | Maximum height for encoder output (for 2D positional embedding). | `int` | `500` | +| `model_params.w_max` | Maximum width for encoder output (for 2D positional embedding). | `int` | `1000` | +| `model_params.l_max` | Maximum predicted sequence length (for 1D positional embedding). | `int` | `15000` | +| `model_params.dec_num_layers` | Number of transformer decoder layers. | `int` | `8` | +| `model_params.dec_num_heads` | Number of heads in transformer decoder layers. | `int` | `4` | +| `model_params.dec_res_dropout` | Dropout probability in transformer decoder layers. | `int` | `0.1` | +| `model_params.dec_pred_dropout` | Dropout rate before decision layer. | `float` | `0.1` | +| `model_params.dec_att_dropout` | Dropout rate in multi head attention. | `float` | `0.1` | +| `model_params.dec_dim_feedforward` | Number of dimensions for feedforward layer in transformer decoder layers. | `int` | `256` | +| `model_params.use_2d_pe` | Whether to use 2D positional embedding. | `bool` | `True` | +| `model_params.use_1d_pe` | Whether to use 1D positional embedding. | `bool` | `True` | +| `model_params.use_lstm` | Whether to use a LSTM layer in the decoder. | `bool` | `False` | +| `model_params.attention_win` | Length of attention window. | `int` | `100` | +| `model_params.dropout_scheduler.function` | Curriculum dropout scheduler. | custom class | `exponential_dropout_scheduler` | +| `model_params.dropout_scheduler.T` | Exponential factor. | `float` | `5e4` | ## Training parameters @@ -158,17 +159,17 @@ For a detailed description of all augmentation transforms, see the [dedicated pa | ------------------------------------------------------- | --------------------------------------------------------------------------- | ------------ | ------------------------------------------- | | `training_params.output_folder` | Directory for checkpoint and results. | `str` | | | `training_params.max_nb_epochs` | Maximum number of epochs before stopping training. | `int` | `800` | -| `training_params.max_training_time` | Maximum time (in seconds) before stopping training. | `int` | `350000` | +| `training_params.max_training_time` | Maximum time (in seconds) before stopping training. | `int` | `164160` | | `training_params.load_epoch` | Model to load. Should be either `"best"` (evaluation) or `last` (training). | `str` | `"last"` | | `training_params.interval_save_weights` | Step to save weights. Set to `None` to keep only best and last epochs. | `int` | `None` | | `training_params.batch_size` | Mini-batch size for the training loop. | `int` | `2` | | `training_params.use_ddp` | Whether to use DistributedDataParallel. | `bool` | `False` | | `training_params.ddp_port` | DDP port. | `int` | `20027` | -| `training_params.use_amp` | Whether to enable automatic mix-precision. | `int` | `torch.cuda.device_count()` | -| `training_params.nb_gpu` | Number of GPUs to train DAN. | `str` | | +| `training_params.use_amp` | Whether to enable automatic mix-precision. | `bool` | `True` | +| `training_params.nb_gpu` | Number of GPUs to train DAN. | `int` | `torch.cuda.device_count()` | | `training_params.optimizers.all.class` | Optimizer class. | custom class | `Adam` | | `training_params.optimizers.all.args.lr` | Learning rate for the optimizer. | `float` | `0.0001` | -| `training_params.optimizers.all.args.amsgrad` | Whether to use AMSGrad optimization. | custom class | `False` | +| `training_params.optimizers.all.args.amsgrad` | Whether to use AMSGrad optimization. | `bool` | `False` | | `training_params.lr_schedulers` | Learning rate schedulers. | custom class | `None` | | `training_params.eval_on_valid` | Whether to evaluate and log metrics on the validation set during training. | `bool` | `True` | | `training_params.eval_on_valid_interval` | Interval (in epochs) to evaluate during training. | `int` | `5` | @@ -176,7 +177,7 @@ For a detailed description of all augmentation transforms, see the [dedicated pa | `training_params.expected_metric_value` | Best value for the focus metric. Should be either `"high"` or `"low"`. | `low` | `cer` | | `training_params.set_name_focus_metric` | Dataset to focus on to select best weights. | `str` | | | `training_params.train_metrics` | List of metrics to compute during training. | `list` | `["loss_ce", "cer", "wer", "wer_no_punct"]` | -| `training_params.train_metrics` | List of metrics to compute during validation. | `list` | `["cer", "wer", "wer_no_punct"]` | +| `training_params.eval_metrics` | List of metrics to compute during validation. | `list` | `["cer", "wer", "wer_no_punct"]` | | `training_params.force_cpu` | Whether to train on CPU (for debugging). | `bool` | `False` | | `training_params.max_char_prediction` | Maximum number of characters to predict. | `int` | `1000` | | `training_params.label_noise_scheduler.min_error_rate` | Minimum ratio of teacher forcing. | `float` | `0.2` | diff --git a/mkdocs.yml b/mkdocs.yml index 3c7787d4cb9616024eb7203881731e841bf4ad30..79cc177cf8abda07c90289b4c3457208be422780 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -53,6 +53,10 @@ plugins: nav: - Home: index.md - Original implementation: original_paper.md + - Get started: + - get_started/index.md + - Development: get_started/development.md + - Training: get_started/training.md - Usage: - usage/index.md - Datasets: