METS import for logical structure
Refs https://redmine.teklia.com/issues/2909
We need a limited support for METS file in order to import Gallica structure for newspaper. We are only interested by the toc.xml
files, and their inner <structMap LABEL="Logical Structure" TYPE="logical">
description.
The cli must be extended with a new command arkindex upload mets <path/to/toc.xml> <corpus_id> --element-id=<ELEMENT_ID>
which will parse:
- all the
<file>
in any<fileSec>
, locate their inner<FLocat xlink:type="simple" LOCTYPE="URL">
and check that thehref
file exists - for each of these files, try to find a matching JSON file (same name than xml file, but with json) to get the summaries built from #99 (closed)
- when an XML or JSON file is missing, simply crash with explicit message
- parse all
<structMap TYPE="logical">
recursively to build a list of trees (hierarchies to build on Arkindex).- each
<div>
with atype
will generate a new arkindex element - only
<div>
with a<fptr>
child will map to an element previously created - if
LABEL
attribute is set, it must be used as element name
- each
- Check that every type used by the tree is available on the corpus
- build the hierarchy, browsing the hierarchy in a breadth-first fashion
- if
--element-id
was created, top elements (newspaper most likely) must be created under this parent. - only
CreateElement
(without image) andCreateElementParent
calls should be made
- if