Use bulk CreateElements endpoint on PageXmlParser
We can change the implementation of PageXmlParser.save
method:
- build an element (as a dict) for all page text_region, and reference them with the XML region:
regions = [
(region, {"type": "paragraph", ...})
for region in self.pagexml_page.page.text_regions
]
}
- create all relevant elements through
CreateElements
and store their ID in relation to the region (so we can find later on the paragraph created per region) - list all paragraph transcriptions for ALL regions, and store them in a generic list (with the element id)
- iterate over all lines in a paragraph and build a list of elements to create through a single
CreateElements
call per paragraph - list all line transcriptions for ALL line, and store them in a generic list (with the element id)
- create all transcriptions at the end of exec (for the page), by calling
CreateTranscriptions
- remove the
save_element_transcriptions
andsave_element
methods