# Training workflow

There are a several steps to follow when training a DAN model.

## 1. Extract data

To extract the data, DAN uses an Arkindex export database in SQLite format. You will need to:

1. Structure the data into folders (`train` / `val` / `test`) in [Arkindex](https://demo.arkindex.org/).
1. [Export the project](https://doc.arkindex.org/howto/export/) in SQLite format.
1. Extract the data with the [extract command](../usage/datasets/extract.md).

This command will extract and format the images and labels needed to train DAN. It will also tokenize the training corpus at character, subword, and word levels, allowing you to combine DAN with an explicit statistical language model to improve performance.

At the end, you should get the following tree structure:

```
output/
├── charset.pkl
├── labels.json
├── images
│   ├── train
│   ├── val
│   └── test
├── language_model
│   ├── corpus_characters.txt
│   ├── lexicon_characters.txt
│   ├── corpus_subwords.txt
│   ├── lexicon_subwords.txt
│   ├── corpus_words.txt
│   ├── lexicon_words.txt
│   ├── subword_tokenizer.model
│   ├── subword_tokenizer.vocab
│   └── tokens.txt
```

## 2. Train

To train a DAN model, please refer to the [documentation of the training command](../usage/train/index.md).

## 3. Predict

Once the training is complete, you can apply  a trained DAN model on an image using the [predict command](../usage/predict/index.md) and the `inference_parameters.yml` file, located in `{training.output_folder}/results`.