Page images can overwrite each other in the PDF export
- Create a corpus.
- Create a folder in that corpus.
- Add two pages with different images, but with the same name.
- Export that.
- Run the PDF export on that export.
- Get two duplicate images.
- Cry.
The resulting PDF for the folder will include twice the image of the second page. The first page was saved in the temporary directory as {name}.jpg
, but the second one has the same name so it overwrites it and only the image of the second page gets used.
You can also use that for evil purposes:
- Create a corpus.
- Create a folder in that corpus.
- Add a random page named
../../../../../../../../../../../../lol
- Export that.
- Run the PDF export on that export.
-
PermissionError: [Errno 13] Permission denied: '/tmp/tmpxet51my3/../../../../../../../../../../../../lol.jpg'
- Cry.
Note that since this is related to automatically generated filenames, this might also happen when generating the final PDFs (on folder names), or on ALTO exports if we implement #43.
Edited by Erwan Rouchet