Restarting a single task in a process does not reset the assigned GPU
- Start a process with workers that require a GPU, enough that every GPU available on the instance will be used.
- Wait for a task to be assigned an agent and a GPU.
- Stop it, or wait for it to finish.
- Use Restart task as an instance admin to restart this one task, not retry the process.
- The task eternally stays in a pending state, even though a GPU is available to use it.
When restarting a single task, Task.agent
is reset, but not Task.gpu
. Since the task gets updated to pending
, it is seen as an active task, a task that is still using this GPU, even though it is not running at all. This GPU will never have any other task assigned to it until the task is stopped, the process is stopped, or there is a manual intervention by an admin.