Support ceph ingestion
A new task is required to ingest files from Ceph buckets.
The task is named arkindex_tasks.import_ceph
and supports these CLI parameters:
-
--corpus
(required) to use an Arkindex corpus as destination -
--element
(optional) to use as a top folder for ingestion -
--bucket
(required) to use as data source on a ceph server -
--path
(optional) to use as a prefix to list all objects from -
--iiif-url-base
(optional), default tohttps://europe-gamma.iiif.teklia.com/iiif/2/
-
--type
(optional), default topage
4 environment variables will also be provided to give read access to the bucket:
INGEST_AWS_ENDPOINT
INGEST_AWS_ACCESS_KEY
INGEST_AWS_SECRET_KEY
INGEST_AWS_REGION
The workflow is:
- list all files in the bucket / path, using recursive option of minio client
- build a tree in-memory to represent the required elements for folders (like here)
- Use
CreateElement
to build the whole hierarchy, and keep in memory the Arkindex element ids linked to their paths on the bucket - Iterate on all files, and create their image on the Arkindex instance using
CreateIIIFURL
- the iiif url is
iiif_url_base / bucket_name / path / file
- the iiif url is
- For each image created, create an element under the right parent previously created