Skip to content

Optimize corpus deletion

Erwan Rouchet requested to merge optimize-corpus-deletion into master

This completely removes the notion of batches from the corpus removal and uses a fixed 24 SQL queries to delete any corpus, without loading anything in RAM. It is possible to reach 21 queries, but keeping the three duplicates saves a few minutes when deleting large corpora. Closes #519 (closed).

If something changes in the models and the fixtures in manage.py build_fixtures or arkindex.documents.tests.tasks.test_corpus_delete.setUpTestData are not updated, this deletion can break and it will not be detected in unit tests.

Edited by Erwan Rouchet

Merge request reports

Loading