New bulk endpoint to create transcription entities
We are publishing a lot more entities on transcriptions due to the amazing power of DAN (example on socface with hundreds of entities on a single transcriptions).
The current publication of these entities is slow, because it uses atomic endpoints to create each entity then each transcription entity (which is OK for smaller projects / pages).
We need a new bulk endpoint CreateTranscriptionEntities
which would allow an API user to create a lot of entities on a single transcription.
The payload would be (all fields are required by default):
-
transcription_id
: UUID of a transcription where all entities will be associated -
worker_run_id
: UUID of the worker run that publishes data ( no manual access allowed ) -
entities
: list of entities and their positions:-
type_id
: UUID of the entity type -
name
: name of the entity -
offset
: offset of the entity on the transcription -
length
: length of the entity on the transcription -
confidence
: confidence score of the transcription entity
-
Fields we do not support:
-
CreateEntity.validated
: is always set to True, no need to use that here -
CreateEntity.corpus
: we have that information from the transcription's element -
CreateEntity.metas
: we do not need it for socface, nor probably won't use it a lot with entity types available
The endpoint workflow would be:
- rights check (user has contributor access on the corpus) + worker run exists
- check if any entities exist on that transcription for this worker run:
- raise a 400 if any entities already exist: we do not want to manage conflict here, just to be as fast as possible
- open db transaction:
- create all
Entity
instances through bulk create - create all
TranscriptionEntity
instances through bulk create
- create all
- return the list of ID for created instances :
{
"entities": [
{
"transcription_entity_id": "...",
"entity_id": "..."
}
]
}
Edited by Bastien Abadie