New bulk endpoint to create transcription entities
We are publishing a lot more entities on transcriptions due to the amazing power of DAN (example on socface with hundreds of entities on a single transcriptions).
The current publication of these entities is slow, because it uses atomic endpoints to create each entity then each transcription entity (which is OK for smaller projects / pages).
We need a new bulk endpoint CreateTranscriptionEntities which would allow an API user to create a lot of entities on a single transcription.
The payload would be (all fields are required by default):
-
transcription_id: UUID of a transcription where all entities will be associated -
worker_run_id: UUID of the worker run that publishes data ( no manual access allowed ) -
entities: list of entities and their positions:-
type_id: UUID of the entity type -
name: name of the entity -
offset: offset of the entity on the transcription -
length: length of the entity on the transcription -
confidence: confidence score of the transcription entity
-
Fields we do not support:
-
CreateEntity.validated: is always set to True, no need to use that here -
CreateEntity.corpus: we have that information from the transcription's element -
CreateEntity.metas: we do not need it for socface, nor probably won't use it a lot with entity types available
The endpoint workflow would be:
- rights check (user has contributor access on the corpus) + worker run exists
- check if any entities exist on that transcription for this worker run:
- raise a 400 if any entities already exist: we do not want to manage conflict here, just to be as fast as possible
- open db transaction:
- create all
Entityinstances through bulk create - create all
TranscriptionEntityinstances through bulk create
- create all
- return the list of ID for created instances :
{
"entities": [
{
"transcription_entity_id": "...",
"entity_id": "..."
}
]
}
Edited by Bastien Abadie