Skip to content

Support archive files in file import

Support .tar.gz and .zip archives, using tarfile and zipfile. The archives should be extracted, then each of their files should be treated as if they were separate DataFiles and imported.

Bonus:

  • Add zstandard and support .tar.zst just so we can throw artifacts at the file import.
  • Add support for the MIME types specific to .tar.bz2 and .tar.xz, just because tarfile supports them natively.

If #158 (closed) is implemented, a ZIP archive should be treated as a Transkribus import if it contains a mets.xml file, and otherwise be extracted. Other file types should not get this special treatment.