Skip to content

Assign corpus categories automatically

https://redmine.teklia.com/issues/11416

Requires #1943 (closed)

A new setting arkindex.project.settings.DEFAULT_CORPUS_CATEGORY stores the slug of the CorpusCategory that should be assigned by default.

This setting takes its value from a new default_corpus_category option of the YAML configuration, to be added in arkindex.project.config. This new option defaults to default.

A custom manager, arkindex.documents.managers.CorpusCategoryManager, should be assigned to CorpusCategory. It adds one extra cached_property named default, which should return the CorpusCategory whose slug is set to DEFAULT_CORPUS_CATEGORY. It should not handle any errors in case the category does not exist.

The Corpus.category foreign key is now non-nullable, and its default is set to a function that returns CorpusCategory.objects.default. You must use a function to avoid issues with Python module imports, and cannot use lambda: because Django needs a named and serializable function to make database migrations work.

A second, separate database migration will be needed to make this FK non-nullable. It is not possible to both update all corpora to set a default category, then update the table's structure again, due to PostgreSQL limitations; you would get OperationalError: cannot ALTER TABLE "documents_corpus" because it has pending trigger events.

A new arkindex.project.checks.default_corpus_category_check system check should attempt to call CorpusCategory.objects.default. If it fails with CorpusCategory.DoesNotExist, it must return a new arkindex.W016 warning. The warning message must explicitly state that creating a corpus will fail.