ALTO upload: Publish transcription on parents as well
Depends #148 (closed)
While we aggregate the text from the CONTENT
attributes to build transcription, we should use the text
attribute of a node's children when that doesn't work. In that case, confidence score of this aggregated transcription should be the micro-average of the children's (just like in #148 (closed)). We should resolve hyphenation when available, in SUBS_CONTENT
attributes. These should be resolved only when porting transcription to the parent.
# arkindex_cli.commands.upload.alto.parser.AltoElement.text
from operator import attrgetter
@property
def text(self):
"""
Easy access to the node's transcription
"""
if not len(self.strings):
return
content = " ".join(string.attrib["CONTENT"] for string in self.strings).strip()
if content:
return content
# Aggregate children `text` attribute
# TODO: skip children with no transcription
return "\n".join(map(attrgetter("text"), self.children))
Edited by Yoann Schneider