Optional dataset element uniqueness
https://redmine.teklia.com/issues/6264
Requires #1712 (closed)
In some datasets, but not all of them, we need to ensure that an element is only present in a single set at a time. This is a first step towards preventing data leakage. Since this does not apply to every dataset, we cannot just use a unique constraint.
A new Dataset.unique_elements
boolean field should be added, defaulting to True
. It should be exposed in ListCorpusDatasets
and RetrieveDataset
, and editable in CreateDataset
, UpdateDataset
and PartialUpdateDataset
. This can be made visible in the Django admin, but it must not be editable there.
When this is enabled, CreateDatasetElement
should return an HTTP 400 if the element is already present in another set, mentioning the set's name.
When updating this field to True
using UpdateDataset
or PartialUpdateDataset
, an HTTP 400 error occurs if the dataset currently contains elements that are in multiple sets at once.
A data migration should ensure that unique_elements
is set to False
when there are elements in multiple sets at once in the existing datasets, and that every dataset that can be made unique should be made unique, since we assume that enforcing uniqueness is the preferred option for most datasets.