Skip to content

Use a worker for S3 imports

https://redmine.teklia.com/issues/6067

Just like with the init_elements task (#1717 (closed)), the arkindex_tasks.import_s3 task is being moved to a worker.

A new INGEST_DOCKER_IMAGE Django setting should be introduced, from docker.ingest_image in the YAML configuration. It defaults to registry.gitlab.teklia.com/arkindex/workers/import:latest.

A new WorkerVersion.objects.ingest_version cached property should be introduced, which works like init_elements_version, but with that new setting.

A new system check should call this cached property, so that it is actually cached, and should catch its errors to cause a new warning. Please update the system checks wiki page to document this new warning. If this warning appears, admins ought to expect HTTP 500 errors when trying to start an S3 import.

When starting or retrying an S3 import, the ProcessBuilder should now do the following:

  1. If a WorkerRun for the ingest worker version does not exist:
    1. Create a WorkerConfiguration on the worker of the ingest worker version, or use an existing one, which contains the following fields:
      • bucket: the name of the bucket as specified in the process;
      • bucket_prefix: the value of the settings.INGEST_PREFIX_BY_BUCKET_NAME setting, a boolean;
      • iiif_base_url: the URL of the ImageServer used for S3 ingest.
    2. Create a WorkerRun on the process, with the import worker version and that configuration.
  2. Start a new task from that WorkerRun, with the ARKINDEX_WORKER_RUN_ID and INGEST_S3_* environment variables to provide authentication credentials.