Merge the Transkribus and file imports
Closes #158 (closed)
-
I had to change the export archive detection logic to look for a
mets.xml
file anywhere in the archive, not necessarily at the root. A fresh export from a random Transkribus collection had onemets.xml
file per Transkribus document. -
While looking for the MIME type of a ZIP archive, I got some confusing statements about the possibility of another MIME type being used on Windows. I checked with a Windows 10 VM: Firefox, Chromium and Edge all upload their archives as
application/x-zip-compressed
on Windows, and asapplication/zip
on Linux. I handled both MIME types. -
It took me 5 hours to get my first successful import of ~90 pages. I noticed that extracting an image could take up to 30 minutes, even on
master
. Turns out the file-like object ofZipFile.open()
was really not happy with Pillow's random accesses, so I extracted the file first before editing the image, making my test imports take 10 to 20 minutes instead of 5 hours. -
My test imports were quickly filling up the disk, as #153 (closed) still occurs, causing each of my test imports to copy the export as an artifact.
Merge request reports
Activity
changed milestone to %Arkindex 1.5.2
assigned to @erouchet
mentioned in merge request backend!2143 (merged)
added 2 commits
added 10 commits
- b9f48d97 - Remove unnecessary helpers
- d45c26c7 - Make the import 15 times faster
- 3941b38d - Support a parent element
- c74a8b44 - Fix unreachable exception handlers
- 39ffe401 - Properly build the elements.json file
- fd2fd9a7 - Update existing tests
- 9712b3d3 - Test the parent element corpus check
- 812bf52f - Test an import on a parent element
- e6cdc55c - Restore a test
- 88166730 - Add test for a ZIP file import
Toggle commit listrequested review from @babadie
added 18 commits
-
2d8865b9 - 1 commit from branch
master
- 2d8865b9...882d6d84 - 7 earlier commits
- 3594f3bf - Remove unnecessary helpers
- 25280a9f - Make the import 15 times faster
- 14a14a0e - Support a parent element
- 1c7e490d - Fix unreachable exception handlers
- b594d301 - Properly build the elements.json file
- 27299811 - Update existing tests
- c1f26848 - Test the parent element corpus check
- f371d801 - Test an import on a parent element
- 1539534c - Restore a test
- 1a76c31c - Add test for a ZIP file import
Toggle commit list-
2d8865b9 - 1 commit from branch
I assigned @mlbonhomme on #153 (closed)
- Resolved by Erwan Rouchet
- Resolved by Bastien Abadie
mentioned in merge request !362 (merged)
- Resolved by Bastien Abadie
I was able to start the import with this code, but no files was ever imported (all skipped):
- export.zip
- tasks log
mentioned in issue #85 (closed)