Download the dataset's artifacts during processing
Depends #239 (closed) #238 (closed)
DatasetWorker
will inherit the new TaskMixin
.
When processing a dataset, we need to download its data. You need to implement a new method that does that:
- call
list_artifacts
withdataset.task_id
to list artifacts of the task that generated the dataset - find the artifacts with name
<dataset.id>.zstd
and download it usingdownload_artifacts
- Extract it in
self.extra_dir = self.task_data_dir / "extra_files"
(this is also used inBaseWorker.find_model_directory
, we could store the path as attribute) This will be done only if the Worker is not agenerator
.