Skip to content

Process subelements + add 'page-type' and 'resize-image' arguments

Mélodie Boillet requested to merge process-subelements into main
  • Update pre-commit versions
  • Iterate over pages of type args.page_type
  • Remove offset when dealing with sub-elements
  • Rename generated files by page ids (instead of page names)
  • Download the image in a sub-resolution args.resize_image

Tested on this folder https://demo.arkindex.org/element/dfbf0e06-e42a-4d08-a4bf-190bc96f2fd3

Example for this image https://demo.arkindex.org/element/b6736d0f-8062-4298-bae0-286f6070c161 with this command

run-extraction --corpus "Nationaal Archief Nederland" \
    --classes text_line \
    --colors 255,0,0 \
    --parents-types folder \
    --parents-names Annotation_set \
    -s -i 768 \
    --page-type single_page \
    -r

2016ebdf-4dc7-4e21-b300-c75f166e2512 2016ebdf-4dc7-4e21-b300-c75f166e2512
e4e90766-bc9a-41fa-9390-8bdcbc46cb61 e4e90766-bc9a-41fa-9390-8bdcbc46cb61
Edited by Yoann Schneider

Merge request reports

Loading