Validate against fields in CreateWorkerConfiguration
https://redmine.teklia.com/issues/11352
Requires #1965
To validate WorkerConfigurations in a way that resembles the way we validate normal API payloads, we will construct a DRF serializer from WorkerConfigurationFields. This will allow us to reuse the existing DRF validation code and avoid reinventing the wheel a lot.
A new WorkerConfigurationField.as_serializer_field() method returns a DRF field that could be used to validate a value for this WorkerConfigurationField.
-
When called on a
groupfield, which is only an abstract field that does not have a direct representation in a configuration, it must cause an error. -
Otherwise, it should use an appropriate field class and set its parameters to match the WorkerConfigurationField's:
-
defaultis set exactly as our model'sdefault, but only when it is notNone, because DRF treats a missing default and a default ofNonedifferently. -
requiredis set to our model'srequired. -
allow_nullshould be the opposite ofrequired, becausenullis always an acceptable value for an optional field andRetrieveWorkerRunConfigurationactually addsnullon its own. -
choicesmust be set onenumfields and no others. -
read_onlyis the opposite ofeditable. This will cause any non-editable field to be ignored during validation. Skipping those fields entirely would prevent us from distinguishing non-editable fields from unexpected dict keys, for which we do want errors instead, so we still need those fields defined.
For fields with
manyset toTrue, the parameters need to be set differently, by first creating achildfield with the type of an invidiual field. This child field only haschoicesand no other parameter set, because the rest applies to the whole list and not to individual list items. AListField(child=child, …)should then be returned, to whichallow_null,defaultandrequiredwith the same rules as above. -
The various RelatedField classes provided by DRF must not be used in this method. Those would cause one SQL query per field. We will instead validate related data in bulk separately.
For dict fields, a custom DictField subclass can be used to always validate each child as being a CharField, because we only allow dicts of strings. For float fields, a FloatField subclass can be used to define an additional validation step to ensure floating-point values are not infinity or NaN using math.isfinite.
A new WorkerVersion.get_configuration_serializer() method constructs a Serializer by calling WorkerConfigurationField.as_serializer_field for all of the fields for this WorkerVersion:
- When
modern_configurationis not set, an error occurs. - When there are zero WorkerConfigurationFields on this worker version, an empty serializer with no fields set should be returned.
- Group fields should be ignored, since they are not in the actual configuration payload.
- Fields with a parent should have the parent key prepended to their key, so a
thingfield within agroupparent should be namedgroup.fieldin the resulting serializer.
A new WorkerVersion.validate_configuration(configuration: dict, corpus: Corpus) -> None validates a WorkerConfiguration's configuration against a WorkerVersion's fields, and raises a ValidationError if anything goes wrong. It must validate using the serializer of get_configuration_serializer(), but it also needs to perform custom validation that the serializer cannot do on its own:
- When there are any
element_typefields, fetch all types on the corpus in one query, then verify that the specified type slugs exist for each field. - When there are any
modelfields, verify in one query that all the model IDs exist among the models readable to the user, and return errors for the specific models that don't exist. - When there are any
secretfields, verify in one query that all the secret names exist among the secrets readable to the user, and return errors for the specific secrets that don't exist. - When there are any
worker_versionfields, verify in one query that all the worker version IDs exist among the workers executable by the user, and return errors for the specific worker versions that don't exist. - Check that the user did not define any non-existent configuration keys, those that are not in the
serializer.fieldsdict (thus are not WorkerConfigurationFields).
All errors from both the serializer and the additional validation of this method should be combined before raising a ValidationError, so that a single explicit and detailed error report can be sent back to be shown in the frontend, or used for troubleshooting by advanced users in the API.
The WorkerConfigurationListSerializer.validate method should be updated to call validate_configuration when an initial_worker_version_id and initial_corpus_id are set.
Unit tests will need to cover many cases, but some can likely be regrouped in the same test to simplify by defining multiple fields at once. The cases should include:
- Every field type;
- The various options on fields that can change how the DRF fields are defined;
- A non-modern version;
- A version with zero fields;
- A configuration without the field types that cause extra queries like
element_type; - A configuration with multiple fields of the same type that would cause extra queries, to check that they only cause one extra query per type;
- Non-existent keys.