Faster ALTO publication
Refs https://redmine.teklia.com/issues/3512
This MR brings bulk endpoint publication to the ALTO files. It will fully replace the create_elements
method in the end.
There are a few constraints:
- bulk endpoints are only available with a
worker_run_id
, but not all users will have access to it- this means a slow mode must remain, publishing elements & transcription one by one
- a new CLI option
--worker-run-id
is added
- there is not bulk endpoints for publishing metadatas across a range of element ids (only multiple metadatas on one element)
- this means publishing alto IDs become optional, as it's rarely needed for prod corpus (but useful for debug ones)
- a new CLI option
--skip-metadatas
is added
-
CreateElements
only support elements linked to an image with a polygon, so we still need to publish one-by-one the structural elements
Remaining steps:
- use
create_elements_fast
instead ofcreate_elements
in ALTO publication - add both new CLI options to alto tool
- fix unit tests, do not introduce new ones (some may even disappear)
- remove
AltoElement.serialize
as it will be unused - document CLI options
Edited by Bastien Abadie