Skip to content

Stale read in the check_parents signal after creating a DataImport with a WorkerRun

Sentry Issue: ARKINDEX-BACKEND-19S

KeyError: 'dataimport'
  File "django/db/models/fields/related_descriptors.py", line 187, in __get__
    rel_obj = self.field.get_cached_value(instance)
  File "django/db/models/fields/mixins.py", line 15, in get_cached_value
    return instance._state.fields_cache[cache_name]

DataImport.DoesNotExist: DataImport matching query does not exist.
(21 additional frame(s) were not displayed)
...
  File "django/dispatch/dispatcher.py", line 177, in <listcomp>
    (receiver, receiver(signal=self, sender=sender, **named))
  File "arkindex/dataimport/signals.py", line 29, in check_parents
  File "django/db/models/fields/related_descriptors.py", line 205, in __get__
    rel_obj = self.get_object(instance)
  File "django/db/models/fields/related_descriptors.py", line 168, in get_object
    return qs.get(self.field.get_reverse_related_filter(instance))
  File "django/db/models/query.py", line 496, in get
    raise self.model.DoesNotExist(

The CreateTrainingProcess endpoint creates a DataImport, then immediately starts it, which creates a WorkerRun on it. The check_parents signal runs on the new WorkerRun to check that we are not creating a WorkerRun with parents that don't exist, or that would lead to a cycle. It uses instance.dataimport, which causes Django to try to retrieve the WorkerRun's DataImport. The DataImport is not yet available in the replica because it barely got created, so the query fails.

In this specific case, the entire signal could be completely skipped, because parents is an empty array.

This never happened before simply because we never start a DataImport that uses WorkerRuns immediately: it always happens in two separate API calls, giving the replica enough time to catch up.