Skip to content

Support multiple datasets from Arkindex as input

We should support parsing multiple datasets from arkindex, i.e. multiple --dataset. All data will be aggregated into a single dataset.

This is some big work because of how the ArkindexExtractor class is structured.

The idea is to still separate the images of each dataset in a dedicated folder. We don't want to specify dataset names so we'll stick with the ID of the dataset. Else, you will have the following structure.

images/
      <dataset_id2>/
        dataset_1_image_1.jpg # the image name doesn't change from current
        dataset_1_image_2.jpg
        ...
      <dataset_id2>/
        dataset_2_image_1.jpg
        dataset_2_image_2.jpg
        ...

We will have images/train/dataset_id/image_id.jpg.

Edited by Yoann Schneider