Skip to content

Duplicate WorkerActivities are created when an element has multiple parent paths

Sentry Issue: ARKINDEX-BACKEND-1CS

CardinalityViolation: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

  File "django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)

ProgrammingError: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

(5 additional frame(s) were not displayed)
...
  File "django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)

This process in demo was looking for single_page elements on three folders. 24 of those single pages had two paths, one in one of the three folders, and one in this sample folder (example). This causes Process.list_elements() to duplicate the element, which is normal since deduplication on this query is both very costly and unnecessary (arkindex_tasks.init_elements deduplicates by itself before creating the elements.json file). WorkerActivities are however going to require a .distinct(), as the bulk insert otherwise creates duplicates.