Skip to content
Snippets Groups Projects

Merge the Transkribus and file imports

Merged Erwan Rouchet requested to merge import-files-transkribus into master
All threads resolved!

Closes #158 (closed)

  • I had to change the export archive detection logic to look for a mets.xml file anywhere in the archive, not necessarily at the root. A fresh export from a random Transkribus collection had one mets.xml file per Transkribus document.

  • While looking for the MIME type of a ZIP archive, I got some confusing statements about the possibility of another MIME type being used on Windows. I checked with a Windows 10 VM: Firefox, Chromium and Edge all upload their archives as application/x-zip-compressed on Windows, and as application/zip on Linux. I handled both MIME types.

  • It took me 5 hours to get my first successful import of ~90 pages. I noticed that extracting an image could take up to 30 minutes, even on master. Turns out the file-like object of ZipFile.open() was really not happy with Pillow's random accesses, so I extracted the file first before editing the image, making my test imports take 10 to 20 minutes instead of 5 hours.

  • My test imports were quickly filling up the disk, as #153 (closed) still occurs, causing each of my test imports to copy the export as an artifact.

Edited by Erwan Rouchet

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Bastien Abadie
  • ml bonhomme mentioned in merge request !362 (merged)

    mentioned in merge request !362 (merged)

  • Erwan Rouchet added 1 commit

    added 1 commit

    Compare with previous version

  • Bastien Abadie resolved all threads

    resolved all threads

  • Bastien Abadie resolved all threads

    resolved all threads

  • Erwan Rouchet added 1 commit

    added 1 commit

    Compare with previous version

  • merged

  • mentioned in issue #85 (closed)

  • Please register or sign in to reply
    Loading