Skip to content

Validate against fields in CreateWorkerConfiguration

https://redmine.teklia.com/issues/11352

Requires #1965

To validate WorkerConfigurations in a way that resembles the way we validate normal API payloads, we will construct a DRF serializer from WorkerConfigurationFields. This will allow us to reuse the existing DRF validation code and avoid reinventing the wheel a lot.

A new WorkerConfigurationField.as_serializer_field() method returns a DRF field that could be used to validate a value for this WorkerConfigurationField.

  • When called on a group field, which is only an abstract field that does not have a direct representation in a configuration, it must cause an error.

  • Otherwise, it should use an appropriate field class and set its parameters to match the WorkerConfigurationField's:

    • default is set exactly as our model's default, but only when it is not None, because DRF treats a missing default and a default of None differently.
    • required is set to our model's required.
    • allow_null should be the opposite of required, because null is always an acceptable value for an optional field and RetrieveWorkerRunConfiguration actually adds null on its own.
    • choices must be set on enum fields and no others.
    • read_only is the opposite of editable. This will cause any non-editable field to be ignored during validation. Skipping those fields entirely would prevent us from distinguishing non-editable fields from unexpected dict keys, for which we do want errors instead, so we still need those fields defined.

    For fields with many set to True, the parameters need to be set differently, by first creating a child field with the type of an invidiual field. This child field only has choices and no other parameter set, because the rest applies to the whole list and not to individual list items. A ListField(child=child, …) should then be returned, to which allow_null, default and required with the same rules as above.

The various RelatedField classes provided by DRF must not be used in this method. Those would cause one SQL query per field. We will instead validate related data in bulk separately.

For dict fields, a custom DictField subclass can be used to always validate each child as being a CharField, because we only allow dicts of strings. For float fields, a FloatField subclass can be used to define an additional validation step to ensure floating-point values are not infinity or NaN using math.isfinite.

A new WorkerVersion.get_configuration_serializer() method constructs a Serializer by calling WorkerConfigurationField.as_serializer_field for all of the fields for this WorkerVersion:

  • When modern_configuration is not set, an error occurs.
  • When there are zero WorkerConfigurationFields on this worker version, an empty serializer with no fields set should be returned.
  • Group fields should be ignored, since they are not in the actual configuration payload.
  • Fields with a parent should have the parent key prepended to their key, so a thing field within a group parent should be named group.field in the resulting serializer.

A new WorkerVersion.validate_configuration(configuration: dict, corpus: Corpus) -> None validates a WorkerConfiguration's configuration against a WorkerVersion's fields, and raises a ValidationError if anything goes wrong. It must validate using the serializer of get_configuration_serializer(), but it also needs to perform custom validation that the serializer cannot do on its own:

  • When there are any element_type fields, fetch all types on the corpus in one query, then verify that the specified type slugs exist for each field.
  • When there are any model fields, verify in one query that all the model IDs exist among the models readable to the user, and return errors for the specific models that don't exist.
  • When there are any secret fields, verify in one query that all the secret names exist among the secrets readable to the user, and return errors for the specific secrets that don't exist.
  • When there are any worker_version fields, verify in one query that all the worker version IDs exist among the workers executable by the user, and return errors for the specific worker versions that don't exist.
  • Check that the user did not define any non-existent configuration keys, those that are not in the serializer.fields dict (thus are not WorkerConfigurationFields).

All errors from both the serializer and the additional validation of this method should be combined before raising a ValidationError, so that a single explicit and detailed error report can be sent back to be shown in the frontend, or used for troubleshooting by advanced users in the API.

The WorkerConfigurationListSerializer.validate method should be updated to call validate_configuration when an initial_worker_version_id and initial_corpus_id are set.

Unit tests will need to cover many cases, but some can likely be regrouped in the same test to simplify by defining multiple fields at once. The cases should include:

  • Every field type;
  • The various options on fields that can change how the DRF fields are defined;
  • A non-modern version;
  • A version with zero fields;
  • A configuration without the field types that cause extra queries like element_type;
  • A configuration with multiple fields of the same type that would cause extra queries, to check that they only cause one extra query per type;
  • Non-existent keys.