Extract text from PDFs during import
This is easier than previously thought, thanks to pdfminer.six:
- load the pdf in the lib
- the library gives a nice tree of lines
- only use lines (rect & other features are too verbose)
- publish lines with content as transcriptions, using a bulk endpoint
Edited by Bastien Abadie