Skip to content

Optimize element deletion using raw SQL

Erwan Rouchet requested to merge optimize-delete-element into master

Deleting a single element could call 26 SQL queries, including some SELECT that could fill up the RAM. This uses raw SQL to perform the deletion in 13 queries without loading anything in RAM, saving 50% of deletion time.

This overrides .delete on the model itself to avoid Django's own cascading (9 extra queries). This bypasses all signals if there are any, so the deletion signal has been replaced with an error to avoid future issues with queryset deletion. Element.objects.all().delete() would cause a SELECT, loading all elements in RAM, then 13 queries per element.

Corpus.delete, a method that used to avoid a ProtectedError when deleting a corpus without deleting its elements first, has been removed as it triggered the signal and is no longer used anywhere; it was only used for the delete button on the corpus admin, which has been removed.

Follow-ups:

  • Augment the raw SQL to use VALUES () (like for has_children) and handle a large amount of element IDs at once, then override .delete on element querysets to use that, allowing a new "delete this selection" option #518 (closed)
  • Rewrite CorpusConsumer to delete a whole corpus with similar queries #519 (closed)
  • Use assertExactQueries to assert harder #516 (closed)
Edited by Bastien Abadie

Merge request reports

Loading