Update DatasetWorker argument and use ListProcessSets
The --dataset
argument will be removed in favor of --set
. This argument has a specific format <dataset_id>:<set_name>
.
The format should be checked during argument parsing.
Remove:
DatasetWorker.list_datasets
-
DatasetMixin.list_process_datasets
is renamed toDatasetMixin.list_process_sets
(called even inread-only
mode)
Create a new model in arkindex_worker.models
, arkindex_worker.models.Set
:
-
name: str
, name of the set, -
dataset
, dataset of the set. -
dataset_path
, property (port of Dataset.filepath)
In read-only mode, information about each set will be stored in an iterator.
Each value in self.args.set
is a string <dataset_id>:<set_name>
, and the result should be a arkindex_worker.models.Set
.
To minimize the API calls, we should call RetrieveDataset
using the provided ID, and store the results as a datasets: dict[str, Dataset]
instance attribute (mapping IDs). You will have to implement a proper generator to have that logic.
Edited by Yoann Schneider