Skip to content

Model versions are not downloaded for training processes

Training processes require a model to train, but also accept an existing model version to fine-tune after #1457 (closed). The model version is set on the WorkerRun, but it is not automatically downloaded by Ponos agents using Task.extra_files, unlike Workers processes.

This is caused by the handling of the Training mode in Process.build_workflow, which duplicates the task creation instead of calling WorkerRun.build_task, which already handles model versions, GPU support, and other details.

Dataset processes, meant to replace Training processes in the future, are not affected because they handle their worker runs just like a Workers process. I am labeling as P4, since Training processes are meant to be replaced.