Unit tests for extraction command
We need a better test suite for extraction command. We need to create the SQLite database as a fixture during tests using arkindex-export
.
We need a clear schema of the data inside this DB.
We can start with something simple:
- root dataset folder with train/val/test subfolders
- 2 pages per split, with 5 lines on each pages
- 1 manual transcription and 1 transcription with a dedicated worker run source, on the pages and the lines
- 2 entities per transcription on pages and one/two of the lines on each page