Add WorkerConfiguration fields required for validation
https://redmine.teklia.com/issues/11352
With the new configuration format, we want to have some stricter backend-side validation of the keys and values set in each WorkerConfiguration against the WorkerConfigurationFields. However, WorkerConfigurations are linked to Workers, and WorkerConfigurationFields are linked to WorkerVersions. This means there could be different sets of fields to validate against for the same configuration. Additionally, some field types like element_type will require a link to a Corpus to validate, since type slugs are not unique between corpora.
Because a WorkerConfiguration may be created on a process in a specific corpus and for a specific worker version, but later reused in another process with a different corpus and a different version, we need a way to keep track of which corpus and which worker version was involved in the validation of each WorkerConfiguration. Two new FKs should be added:
-
WorkerConfiguration.initial_worker_version: Nullable foreign key to aWorkerVersionthat belongs to theWorkerConfiguration.worker. -
WorkerConfiguration.initial_corpus: Nullable foreign key to aCorpus.
A check constraint should test for ~Q(initial_worker_version_id__isnull=False, initial_corpus_id__isnull=True). This means initial_corpus is required when initial_worker_version is set.
Both foreign keys should use on_delete=models.DO_NOTHING, because we want to handle the deletion manually to avoid Django's simple but inefficient cascade deletion. To handle the deletions manually:
- The
corpus_deleteRQ task must update bothinitial_worker_versionandinitial_corpustoNoneon all configurations that have this corpus set, in one query, before deleting the corpus; - The
arkindex cleanupcommand must do the same update before deleting archived workers, in one query.
Both of those fields should be available in the Django admin for worker configurations, but read-only. They should not be included in the list, only the details page, and should not cause any additional SQL queries. This may require customizing the admin's queries to add a .select_related().
Both fields should be available in the WorkerConfigurationListSerializer as initial_worker_version_id and initial_corpus_id, and should be made read-only in the WorkerConfigurationSerializer that inherits from it. This will expose the fields in:
-
ListWorkerConfigurations(response only) CreateWorkerConfiguration-
RetrieveWorkerConfiguration(response only) -
UpdateWorkerConfiguration(response only) -
PartialUpdateWorkerConfiguration(respone only)
The WorkerConfigurationListSerializer should validate both fields:
- The
initial_worker_versionmust be a WorkerVersion that belongs to the current worker. - Using an
initial_worker_versionfrom a worker that you do not have any access to should not reveal the existence of this worker version in any error. - A WorkerVersion that does not have
modern_configurationset cannot be used, since those cannot perform any validation. - A WorkerVersion that has no WorkerConfigurationFields cannot be used, since those are known to not be capable of having any configurations.
- The
initial_corpusis required if and only if aninitial_worker_versionis set. - Using an
initial_corpusthat you do not have any access to should not reveal the existence of the corpus in any error.
There should be unit tests for:
-
ListWorkerConfigurationswith a configuration with both fields set, to check that there are no extra queries; -
RetrieveWorkerConfigurationwith a configuration with both fields set, to check that there are no extra queries; - Attempting to update both fields with
UpdateWorkerConfiguration, which should be ignored because they are read-only; - Attempting to update both fields with
PartialUpdateWorkerConfiguration, which should be ignored because they are read-only; -
CreateWorkerConfigurationwith valid values for both fields; -
CreateWorkerConfigurationwith both fields explicitly set toNone; -
CreateWorkerConfigurationwithinitial_worker_versionset without ainitial_corpus, which should fail; -
CreateWorkerConfigurationwith a worker version from a worker that the user does not have access to, which should fail with a "does not exist" error; -
CreateWorkerConfigurationwith a worker version that does not exist, which should fail with the same error; -
CreateWorkerConfigurationwith a worker version withoutmodern_configurationset, which should fail; -
CreateWorkerConfigurationwith a worker version without any WorkerConfigurationFields set, which should fail; -
CreateWorkerConfigurationwith a corpus that the user does not have access to, which should fail with a "does not exist" error; -
CreateWorkerConfigurationwith a corpus that does not exist, which should fail with the same error.
The existing unit tests for corpus_delete and arkindex cleanup should be updated to include at least one WorkerConfiguration with both fields set, to verify that they both handle the new foreign keys properly.