Stale read in the check_parents signal after creating a DataImport with a WorkerRun
Sentry Issue: ARKINDEX-BACKEND-19S
KeyError: 'dataimport'
File "django/db/models/fields/related_descriptors.py", line 187, in __get__
rel_obj = self.field.get_cached_value(instance)
File "django/db/models/fields/mixins.py", line 15, in get_cached_value
return instance._state.fields_cache[cache_name]
DataImport.DoesNotExist: DataImport matching query does not exist.
(21 additional frame(s) were not displayed)
...
File "django/dispatch/dispatcher.py", line 177, in <listcomp>
(receiver, receiver(signal=self, sender=sender, **named))
File "arkindex/dataimport/signals.py", line 29, in check_parents
File "django/db/models/fields/related_descriptors.py", line 205, in __get__
rel_obj = self.get_object(instance)
File "django/db/models/fields/related_descriptors.py", line 168, in get_object
return qs.get(self.field.get_reverse_related_filter(instance))
File "django/db/models/query.py", line 496, in get
raise self.model.DoesNotExist(
The CreateTrainingProcess
endpoint creates a DataImport, then immediately starts it, which creates a WorkerRun on it. The check_parents
signal runs on the new WorkerRun to check that we are not creating a WorkerRun with parents that don't exist, or that would lead to a cycle. It uses instance.dataimport
, which causes Django to try to retrieve the WorkerRun's DataImport. The DataImport is not yet available in the replica because it barely got created, so the query fails.
In this specific case, the entire signal could be completely skipped, because parents
is an empty array.
This never happened before simply because we never start a DataImport that uses WorkerRuns immediately: it always happens in two separate API calls, giving the replica enough time to catch up.