Skip to content

Add WorkerConfiguration fields required for validation

https://redmine.teklia.com/issues/11352

With the new configuration format, we want to have some stricter backend-side validation of the keys and values set in each WorkerConfiguration against the WorkerConfigurationFields. However, WorkerConfigurations are linked to Workers, and WorkerConfigurationFields are linked to WorkerVersions. This means there could be different sets of fields to validate against for the same configuration. Additionally, some field types like element_type will require a link to a Corpus to validate, since type slugs are not unique between corpora.

Because a WorkerConfiguration may be created on a process in a specific corpus and for a specific worker version, but later reused in another process with a different corpus and a different version, we need a way to keep track of which corpus and which worker version was involved in the validation of each WorkerConfiguration. Two new FKs should be added:

  • WorkerConfiguration.initial_worker_version: Nullable foreign key to a WorkerVersion that belongs to the WorkerConfiguration.worker.
  • WorkerConfiguration.initial_corpus: Nullable foreign key to a Corpus.

A check constraint should test for ~Q(initial_worker_version_id__isnull=False, initial_corpus_id__isnull=True). This means initial_corpus is required when initial_worker_version is set.

Both foreign keys should use on_delete=models.DO_NOTHING, because we want to handle the deletion manually to avoid Django's simple but inefficient cascade deletion. To handle the deletions manually:

  • The corpus_delete RQ task must update both initial_worker_version and initial_corpus to None on all configurations that have this corpus set, in one query, before deleting the corpus;
  • The arkindex cleanup command must do the same update before deleting archived workers, in one query.

Both of those fields should be available in the Django admin for worker configurations, but read-only. They should not be included in the list, only the details page, and should not cause any additional SQL queries. This may require customizing the admin's queries to add a .select_related().

Both fields should be available in the WorkerConfigurationListSerializer as initial_worker_version_id and initial_corpus_id, and should be made read-only in the WorkerConfigurationSerializer that inherits from it. This will expose the fields in:

  • ListWorkerConfigurations (response only)
  • CreateWorkerConfiguration
  • RetrieveWorkerConfiguration (response only)
  • UpdateWorkerConfiguration (response only)
  • PartialUpdateWorkerConfiguration (respone only)

The WorkerConfigurationListSerializer should validate both fields:

  • The initial_worker_version must be a WorkerVersion that belongs to the current worker.
  • Using an initial_worker_version from a worker that you do not have any access to should not reveal the existence of this worker version in any error.
  • A WorkerVersion that does not have modern_configuration set cannot be used, since those cannot perform any validation.
  • A WorkerVersion that has no WorkerConfigurationFields cannot be used, since those are known to not be capable of having any configurations.
  • The initial_corpus is required if and only if an initial_worker_version is set.
  • Using an initial_corpus that you do not have any access to should not reveal the existence of the corpus in any error.

There should be unit tests for:

  • ListWorkerConfigurations with a configuration with both fields set, to check that there are no extra queries;
  • RetrieveWorkerConfiguration with a configuration with both fields set, to check that there are no extra queries;
  • Attempting to update both fields with UpdateWorkerConfiguration, which should be ignored because they are read-only;
  • Attempting to update both fields with PartialUpdateWorkerConfiguration, which should be ignored because they are read-only;
  • CreateWorkerConfiguration with valid values for both fields;
  • CreateWorkerConfiguration with both fields explicitly set to None;
  • CreateWorkerConfiguration with initial_worker_version set without a initial_corpus, which should fail;
  • CreateWorkerConfiguration with a worker version from a worker that the user does not have access to, which should fail with a "does not exist" error;
  • CreateWorkerConfiguration with a worker version that does not exist, which should fail with the same error;
  • CreateWorkerConfiguration with a worker version without modern_configuration set, which should fail;
  • CreateWorkerConfiguration with a worker version without any WorkerConfigurationFields set, which should fail;
  • CreateWorkerConfiguration with a corpus that the user does not have access to, which should fail with a "does not exist" error;
  • CreateWorkerConfiguration with a corpus that does not exist, which should fail with the same error.

The existing unit tests for corpus_delete and arkindex cleanup should be updated to include at least one WorkerConfiguration with both fields set, to verify that they both handle the new foreign keys properly.