# Training workflow

There are a several steps to follow when training a DAN model.

## 1. Extract data

The data must be extracted and formatted for training. To extract the data, DAN uses an Arkindex export database in SQLite format. You will need to:

1. Structure the data into folders (`train` / `val` / `test`) in [Arkindex](https://arkindex.teklia.com/).
2. [Export the project](https://doc.arkindex.org/howto/export/) in SQLite format.
3. Extract the data with the [extract command](../usage/datasets/extract.md).
4. Format the data with the [format command](../usage/datasets/format.md).

At the end, you should have a tree structure like this:
```
output/
├── charset.pkl
├── labels.json
├── split.json
├── images
│   ├── train
│   ├── val
│   └── test
└── labels
    ├── train
    ├── val
    └── test
```

## 2. Train

The training command does not take any input parameters for now. To train a DAN model, you will therefore need to:

1. Update the parameters from those listed in the [dedicated page](../usage/train/parameters.md). You will always need to update at least these variables:

  - `dataset_name`, `dataset_level`, `dataset_variant` and `dataset_path`,
  - `model_params.transfer_learning.*[checkpoint_path]` to finetune an existing model,
  - `training_params.output_folder`.

2. Train a DAN model with the [train command](../usage/train/index.md).

## 3. Predict

Once the training is complete, you can apply a trained DAN model on an image.

To do this, you will need to:

1. Create a `parameters.yml` file using the parameters saved during training in the `params` file, located in `{training_params.output_folder}/results`. This file should have the following format:
```yml
version: 0.0.1
parameters:
  max_char_prediction: int
  encoder:
    dropout: float
  decoder:
    enc_dim: int
    l_max: int
    dec_pred_dropout: float
    attention_win: int
    vocab_size: int
    h_max: int
    w_max: int
    dec_num_layers: int
    dec_dim_feedforward: int
    dec_num_heads: int
    dec_att_dropout: float
    dec_res_dropout: float
  preprocessings:
    - type: str
      max_height: int
      max_width: int
      fixed_height: int
      fixed_width: int
```
2. Apply a trained DAN model on an image using the [predict command](../usage/predict.md).