Something went wrong on our end
training.md 1.59 KiB
Training workflow
There are a several steps to follow when training a DAN model.
1. Extract data
To extract the data, DAN uses an Arkindex export database in SQLite format. You will need to:
- Structure the data into folders (
train
/val
/test
) in Arkindex. - Export the project in SQLite format.
- Extract the data with the extract command.
This command will extract and format the images and labels needed to train DAN. It will also tokenize the training corpus at character, subword, and word levels, allowing you to combine DAN with an explicit statistical language model to improve performance.
At the end, you should get the following tree structure:
output/
├── charset.pkl
├── labels.json
├── images
│ ├── train
│ ├── val
│ └── test
├── language_model
│ ├── corpus_characters.txt
│ ├── lexicon_characters.txt
│ ├── corpus_subwords.txt
│ ├── lexicon_subwords.txt
│ ├── corpus_words.txt
│ ├── lexicon_words.txt
│ ├── subword_tokenizer.model
│ ├── subword_tokenizer.vocab
│ └── tokens.txt
2. Train
To train a DAN model, please refer to the documentation of the training command.
3. Predict
Once the training is complete, you can apply a trained DAN model on an image using the predict command and the inference_parameters.yml
file, located in {training.output_folder}/results
.