Support PDF in S3-compatible ingestion

Refs https://redmine.teklia.com/issues/2851

We need to support PDF during the ingestion from ceph/S3-compatible buckets on arkindex.

The files are currently listed, but only the first page is ingested (as Cantaloupe supports it), which lead to strange results for the end user.

We would have two implementation options:

through cantaloupe
through download & local parsing of PDF

Cantaloupe

The cantaloupe server supports PDFs, and mention page index to browse the pages in its source code.

I was not able to find any reference to such page index in the IIIF 3.0 spec, nor able to access the various page on a sample PDF.

There does not seem to have any information in the related info.json - tiles are not the pages we are looking for.

If you find how to specify the page index, it may be interesting, but the big downside is that we would not have the potential transcriptions (we could live without that if the implementation is quickly implemented...).

Download & parse

The most feature-complete solution is then to

download the file from the bucket
extract its images using poppler as we already do in tasks
upload each image onto the bucket
get the potential text transcriptions
create images on arkindex
create page elements on arkindex
create transcriptions & their elements on arkindex

We already have the PDF parsing/extraction, and the code for images + elements creation.

This breaks the existing workflow as we only rely on remote files + IIIF right now.

The ceph credentials would need to be read-write (RO only right now, but it's an infra detail).