Newer
Older
# DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition
This repository is a public implementation of the paper: "DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition".

The model uses a character-level attention to handle slanted lines:

The paper is available at https://arxiv.org/abs/2203.12273.
To discover my other works, here is my [academic page](https://factodeeplearning.github.io/).
[](https://www.youtube.com/watch?v=HrrUsQfW66E)
This work focus on handwritten text and layout recognition through the use of an end-to-end segmentation-free attention-based network.
We evaluate the DAN on two public datasets: RIMES and READ 2016 at single-page and double-page levels.
We obtained the following results:
| | CER (%) | WER (%) | LOER (%) | mAP_cer (%) |
|:-----------------------:|---------|:-------:|:--------:|-------------|
| RIMES (single page) | 4.54 | 11.85 | 3.82 | 93.74 |
| READ 2016 (single page) | 3.53 | 13.33 | 5.94 | 92.57 |
| READ 2016 (double page) | 3.69 | 14.20 | 4.60 | 93.92 |
Pretrained model weights are available [here](https://git.litislab.fr/dcoquenet/dan).
Table of contents:
1. [Getting Started](#Getting-Started)
2. [Datasets](#Datasets)
3. [Training And Evaluation](#Training-and-evaluation)
## Getting Started
We used Python 3.9.1, Pytorch 1.8.2 and CUDA 10.2 for the scripts.
Clone the repository:
```
git clone https://github.com/FactoDeepLearning/DAN.git
```
Install the dependencies:
```
pip install -r requirements.txt
```
## Datasets
This section is dedicated to the datasets used in the paper: download and formatting instructions are provided
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
for experiment replication purposes.
RIMES dataset at page level was distributed during the [evaluation compaign of 2009](https://ieeexplore.ieee.org/document/5277557).
READ 2016 dataset corresponds to the one used in the [ICFHR 2016 competition on handwritten text recognition](https://ieeexplore.ieee.org/document/7814136).
It can be found [here](https://zenodo.org/record/1164045#.YiINkBvjKEA)
Raw dataset files must be placed in Datasets/raw/{dataset_name} \
where dataset name is "READ 2016" or "RIMES"
## Training And Evaluation
### Step 1: Download the dataset
### Step 2: Format the dataset
```
python3 Datasets/dataset_formatters/read2016_formatter.py
python3 Datasets/dataset_formatters/rimes_formatter.py
```
### Step 3: Add any font you want as .ttf file in the folder Fonts
### Step 4 : Generate synthetic line dataset for pre-training
```
python3 OCR/line_OCR/ctc/main_syn_line.py
```
There are two lines in this script to adapt to the used dataset:
```
model.generate_syn_line_dataset("READ_2016_syn_line")
dataset_name = "READ_2016"
```
### Step 5 : Pre-training on synthetic lines
```
python3 OCR/line_OCR/ctc/main_line_ctc.py
```
There are two lines in this script to adapt to the used dataset:
```
dataset_name = "READ_2016"
"output_folder": "FCN_read_line_syn"
```
Weights and evaluation results are stored in OCR/line_OCR/ctc/outputs
### Step 6 : Training the DAN
```
python3 OCR/document_OCR/dan/main_dan.py
```
The following lines must be adapted to the dataset used and pre-training folder names:
```
dataset_name = "READ_2016"
"transfer_learning": {
# model_name: [state_dict_name, checkpoint_path, learnable, strict]
"encoder": ["encoder", "../../line_OCR/ctc/outputs/FCN_read_2016_line_syn/checkpoints/best.pt", True, True],
"decoder": ["decoder", "../../line_OCR/ctc/outputs/FCN_read_2016_line_syn/best.pt", True, False],
},
```
Weights and evaluation results are stored in OCR/document_OCR/dan/outputs
### Remarks (for pre-training and training)
All hyperparameters are specified and editable in the training scripts (meaning are in comments).\
Evaluation is performed just after training ending (training is stopped when the maximum elapsed time is reached or after a maximum number of epoch as specified in the training script).\
The outputs files are split into two subfolders: "checkpoints" and "results". \
"checkpoints" contains model weights for the last trained epoch and for the epoch giving the best valid CER. \
"results" contains tensorboard log for loss and metrics as well as text file for used hyperparameters and results of evaluation.
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
## `Predict` module
This repository also contains a package to run a pre-trained model on an image.
### Installation
To use DAN in your own scripts, install it using pip:
```console
pip install -e .
```
### Usage
To apply DAN to an image, one needs to first add a few imports and to load an image. Note that the image should be in RGB.
```python
import cv2
from dan.predict import DAN
image = cv2.cvtColor(cv2.imread(IMAGE_PATH), cv2.COLOR_BGR2RGB)
```
Then one can initialize and load the trained model with the parameters used during training.
```python
model_path = 'model.pt'
params_path = 'parameters.yml'
charset_path = 'charset.pkl'
model = DAN('cpu')
model.load(model_path, params_path, charset_path, mode="eval")
```
To run the inference on a GPU, one can replace `cpu` by the name of the GPU. In the end, one can run the prediction:
```python
text, confidence_scores = model.predict(image, confidences=True)
```
@misc{Coquenet2022b,
author = {Coquenet, Denis and Chatelain, Clément and Paquet, Thierry},
title = {DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition},
doi = {10.48550/ARXIV.2203.12273},
url = {https://arxiv.org/abs/2203.12273},
publisher = {arXiv},
year = {2022},
}