METS Upload: Publish `DC` metadata on METS elements
Depends #150 (closed)
Some metadata are defined in dmdSec
nodes. The link to the element is stored in the DMDID
attribute.
Sample
<mets>
<dmdSec ID="DMD.1">
<mdWrap MDTYPE="DC" MIMETYPE="text/xml">
<xmlData>
<spar_dc:spar_dc>
<dc:title>1926-07-11 (Année 61, Numéro 22718)</dc:title>
<dc:description xsi:type="spar_dc:sequentialDesignation1">Année 61</dc:description>
<dc:description xsi:type="spar_dc:sequentialDesignation2">Numéro 22718</dc:description>
<dc:type>periodical</dc:type>
<dc:date>1926-07-11</dc:date>
</spar_dc:spar_dc>
</xmlData>
</mdWrap>
</dmdSec>
...
<structMap>
<div ID="DIV.13" TYPE="ISSUE" DMDID="DMD.1"></div>
</structMap>
</mets>
We will only support metadata where MDTYPE="DC"
on the mdWrap
node. All other types will be ignored.
We should store the correspondence between list of metadata and DMDID, at the beginning of the command, right after the XML is loaded, in RootMetsElement.__init__
. The metadata would be formatted nicely, for easier Arkindex publication later. These metadata are stored as children of the <spar_dc:spar_dc>
node (there should only be one under <xmlData>
.
Here is a sample to parse the metadata.
Metadata parsing
from lxml import etree as ET
def _build_meta(node: ET.Element) -> dict:
# Name without namespace
name = ET.QName(node.tag).localname
# If has `xsi:type` attribute, use it instead
if subname := node.attrib.get("xsi:type"):
name = subname
# Value
value = node.text
# MetaType
if value.startswith("https://"):
meta_type = "url"
elif re.match(r"^\d{4}(?:-\d{2})?(?:-\d{2})?$", value):
meta_type = "date"
else:
meta_type = "text"
return {
"type": meta_type,
"name": name.capitalize(),
"value": value,
}
Edited by Yoann Schneider