New bulk endpoint to create transcription entities

We are publishing a lot more entities on transcriptions due to the amazing power of DAN (example on socface with hundreds of entities on a single transcriptions).

The current publication of these entities is slow, because it uses atomic endpoints to create each entity then each transcription entity (which is OK for smaller projects / pages).

We need a new bulk endpoint CreateTranscriptionEntities which would allow an API user to create a lot of entities on a single transcription.

The payload would be (all fields are required by default):

transcription_id: UUID of a transcription where all entities will be associated
worker_run_id: UUID of the worker run that publishes data ( no manual access allowed )
entities: list of entities and their positions:
- type_id: UUID of the entity type
- name: name of the entity
- offset: offset of the entity on the transcription
- length: length of the entity on the transcription
- confidence: confidence score of the transcription entity

Fields we do not support:

CreateEntity.validated: is always set to True, no need to use that here
CreateEntity.corpus: we have that information from the transcription's element
CreateEntity.metas: we do not need it for socface, nor probably won't use it a lot with entity types available

The endpoint workflow would be:

rights check (user has contributor access on the corpus) + worker run exists
check if any entities exist on that transcription for this worker run:
- raise a 400 if any entities already exist: we do not want to manage conflict here, just to be as fast as possible
open db transaction:
1. create all Entity instances through bulk create
2. create all TranscriptionEntity instances through bulk create
return the list of ID for created instances :

{
   "entities": [
     {
       "transcription_entity_id": "...",
       "entity_id": "..."
     }
   ]
}

Edited Feb 14, 2023 by Bastien Abadie