Skip to content

Optimize AddSelection

AddSelection can easily be optimized to avoid timeouts when the user has already selected a large amount of elements.

The current endpoint manually filters elements to exclude those that are already selected. But in doing so, the SQL query uses an EXISTS clause, which results in one subquery being executed once for each newly selected element. Additionally, Django's implementation already checks for possible conflicts before inserting. The most optimized way would be to use Selection.objects.bulk_create(…, ignore_conflicts=True) directly, as Django is unable to detect the ability to use ignore_conflicts here; this would just let PostgreSQL deal with the conflicts in a single query, removing two expensive SQL queries. Additionally, we are doing a COUNT() before ending up with fetching all the element IDs in memory anyway; we could just fetch them and use len() to remove another query.

Sentry Issue: ARKINDEX-BACKEND-R6

KeyError: 140176347124752
  File "copy.py", line 264, in _keep_alive
    memo[id(memo)].append(x)

SystemExit: 1
(36 additional frame(s) were not displayed)
...
  File "rest_framework/serializers.py", line 349, in fields
    for key, value in self.get_fields().items():
  File "rest_framework/serializers.py", line 1019, in get_fields
    declared_fields = copy.deepcopy(self._declared_fields)
  File "copy.py", line 185, in deepcopy
    _keep_alive(x, memo) # Make sure x lives at least as long as d
  File "copy.py", line 264, in _keep_alive
    memo[id(memo)].append(x)
  File "gunicorn/workers/base.py", line 203, in handle_abort
    sys.exit(1)
Edited by Erwan Rouchet