@@ -4,13 +4,15 @@ There are a several steps to follow when training a DAN model.
## 1. Extract data
The data must be extracted and formatted for training. To extract the data, DAN uses an Arkindex export database in SQLite format. You will need to:
To extract the data, DAN uses an Arkindex export database in SQLite format. You will need to:
1. Structure the data into folders (`train` / `val` / `test`) in [Arkindex](https://demo.arkindex.org/).
1.[Export the project](https://doc.arkindex.org/howto/export/) in SQLite format.
1. Extract the data with the [extract command](../usage/datasets/extract.md).
At the end, you should have a tree structure like this:
This command will extract and format the images and labels needed to train DAN. It will also tokenize the training corpus at character, subword, and word levels, allowing you to combine DAN with an explicit statistical language model to improve performance.
At the end, you should get the following tree structure: