Support multiple datasets from Arkindex as input
We should support parsing multiple datasets from arkindex, i.e. multiple --dataset
. All data will be aggregated into a single dataset.
This is some big work because of how the ArkindexExtractor
class is structured.
The idea is to still separate the images of each dataset in a dedicated folder. We don't want to specify dataset names so we'll stick with the ID of the dataset. Else, you will have the following structure.
images/
<dataset_id2>/
dataset_1_image_1.jpg # the image name doesn't change from current
dataset_1_image_2.jpg
...
<dataset_id2>/
dataset_2_image_1.jpg
dataset_2_image_2.jpg
...
We will have images/train/dataset_id/image_id.jpg
.
Edited by Yoann Schneider