Skip to content

Download the dataset's artifacts during processing

Depends #239 (closed) #238 (closed)

DatasetWorker will inherit the new TaskMixin.

When processing a dataset, we need to download its data. You need to implement a new method that does that:

  • call list_artifacts with dataset.task_id to list artifacts of the task that generated the dataset
  • find the artifacts with name <dataset.id>.zstd and download it using download_artifacts
  • Extract it in self.extra_dir = self.task_data_dir / "extra_files" (this is also used in BaseWorker.find_model_directory, we could store the path as attribute) This will be done only if the Worker is not a generator.