Optimize element deletion using raw SQL
Deleting a single element could call 26 SQL queries, including some SELECT
that could fill up the RAM. This uses raw SQL to perform the deletion in 13 queries without loading anything in RAM, saving 50% of deletion time.
This overrides .delete
on the model itself to avoid Django's own cascading (9 extra queries). This bypasses all signals if there are any, so the deletion signal has been replaced with an error to avoid future issues with queryset deletion. Element.objects.all().delete()
would cause a SELECT
, loading all elements in RAM, then 13 queries per element.
Corpus.delete
, a method that used to avoid a ProtectedError
when deleting a corpus without deleting its elements first, has been removed as it triggered the signal and is no longer used anywhere; it was only used for the delete button on the corpus admin, which has been removed.
Follow-ups:
- Augment the raw SQL to use
VALUES ()
(like forhas_children
) and handle a large amount of element IDs at once, then override.delete
on element querysets to use that, allowing a new "delete this selection" option #518 (closed) - Rewrite
CorpusConsumer
to delete a whole corpus with similar queries #519 (closed) - Use
assertExactQueries
to assert harder #516 (closed)