Update DatasetWorker argument and use ListProcessSets
The --dataset argument will be removed in favor of --set. This argument has a specific format <dataset_id>:<set_name>.
The format should be checked during argument parsing.
Remove:
DatasetWorker.list_datasets-
DatasetMixin.list_process_datasetsis renamed toDatasetMixin.list_process_sets(called even inread-onlymode)
Create a new model in arkindex_worker.models, arkindex_worker.models.Set:
-
name: str, name of the set, -
dataset, dataset of the set. -
dataset_path, property (port of Dataset.filepath)
In read-only mode, information about each set will be stored in an iterator.
Each value in self.args.set is a string <dataset_id>:<set_name>, and the result should be a arkindex_worker.models.Set.
To minimize the API calls, we should call RetrieveDataset using the provided ID, and store the results as a datasets: dict[str, Dataset] instance attribute (mapping IDs). You will have to implement a proper generator to have that logic.
Edited by Yoann Schneider