Skip to content

Share the same serializer on dataset APIs

To avoid issues like frontend#1323 (closed), related to datasets missing some attributes in some APIs, which require the frontend to detect that in every component and make many RetrieveDataset calls, we can simplify the dataset APIs so that their output always includes the same attributes.

Most of the attributes are already retrieved by Django when calling Dataset.objects.get(), apart from the creator.email which takes just one .select_related('creator') to get, so there are no performance concerns with returning all of the attributes all of the time.

  • The DatasetLightSerializer should no longer exist.
  • ListCorpusDatasets should return a list of DatasetSerializer.
  • CreateDataset should return a DatasetSerializer.
  • ListElementDatasets should return { dataset: DatasetSerializer, set: string }.
  • ListProcessDatasets, CreateProcessDataset, DestroyProcessDataset and ListDatasetElements are unchanged.

Only ListElementDatasets can be considered a breaking change, as all of the other changes only add attributes and do not remove or change any existing ones. This does not affect any worker, as Yoann says none are using ListElementDatasets. This will affect the frontend but that's kinda intended.

I might need to add some best practices regarding consistent API responses: we already have some for error responses or endpoint naming but nothing says "please reuse the same serializer everywhere".