Stale read when starting a training process
Sentry Issue: ARKINDEX-BACKEND-1RP
WorkerRun.DoesNotExist: WorkerRun matching query does not exist.
(12 additional frame(s) were not displayed)
...
File "arkindex/process/api.py", line 2182, in create
File "arkindex/process/serializers/training.py", line 224, in create
File "arkindex/process/models.py", line 750, in run
File "arkindex/process/models.py", line 596, in build_workflow
To start a training process, the CreateTrainingProcess endpoint creates a process, then a WorkerRun on it with the options set by the user, then immediately runs the process. The process retrieves the WorkerRun immediately in order to create a task from it, which can lead to a stale read.
Since stale reads only occur in production, we are seeing this on 1.5.1, where the code was still in Process.build_workflow. After !2122 (merged), it moved to ProcessBuilder.build_training, but can still have a stale read.