# DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition This repository is a public implementation of the paper: "DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition".  The model uses a character-level attention to handle slanted lines:  The paper is available at https://arxiv.org/abs/2203.12273. To discover my other works, here is my [academic page](https://factodeeplearning.github.io/). Click to see the demo: [](https://www.youtube.com/watch?v=HrrUsQfW66E) This work focus on handwritten text and layout recognition through the use of an end-to-end segmentation-free attention-based network. We evaluate the DAN on two public datasets: RIMES and READ 2016 at single-page and double-page levels. We obtained the following results: | | CER (%) | WER (%) | LOER (%) | mAP_cer (%) | |:-----------------------:|---------|:-------:|:--------:|-------------| | RIMES (single page) | 4.54 | 11.85 | 3.82 | 93.74 | | READ 2016 (single page) | 3.53 | 13.33 | 5.94 | 92.57 | | READ 2016 (double page) | 3.69 | 14.20 | 4.60 | 93.92 | Pretrained model weights are available [here](https://git.litislab.fr/dcoquenet/dan). Table of contents: 1. [Getting Started](#Getting-Started) 2. [Datasets](#Datasets) 3. [Training And Evaluation](#Training-and-evaluation) ## Getting Started We used Python 3.9.1, Pytorch 1.8.2 and CUDA 10.2 for the scripts. Clone the repository: ``` git clone https://github.com/FactoDeepLearning/DAN.git ``` Install the dependencies: ``` pip install -r requirements.txt ``` ### Remarks (for pre-training and training) All hyperparameters are specified and editable in the training scripts (meaning are in comments).\ Evaluation is performed just after training ending (training is stopped when the maximum elapsed time is reached or after a maximum number of epoch as specified in the training script).\ The outputs files are split into two subfolders: "checkpoints" and "results". \ "checkpoints" contains model weights for the last trained epoch and for the epoch giving the best valid CER. \ "results" contains tensorboard log for loss and metrics as well as text file for used hyperparameters and results of evaluation. ## `Predict` module This repository also contains a package to run a pre-trained model on an image. ### Installation To use DAN in your own scripts, install it using pip: ```console pip install -e . ``` ### Usage To apply DAN to an image, one needs to first add a few imports and to load an image. Note that the image should be in RGB. ```python import cv2 from dan.predict import DAN image = cv2.cvtColor(cv2.imread(IMAGE_PATH), cv2.COLOR_BGR2RGB) ``` Then one can initialize and load the trained model with the parameters used during training. ```python model_path = 'model.pt' params_path = 'parameters.yml' charset_path = 'charset.pkl' model = DAN('cpu') model.load(model_path, params_path, charset_path, mode="eval") ``` To run the inference on a GPU, one can replace `cpu` by the name of the GPU. In the end, one can run the prediction: ```python text, confidence_scores = model.predict(image, confidences=True) ```