Format datasets
Is not an issue per se but a recommendation.
1 How about adding a script to format a dataset of images lines + text transcriptions to a formatted PyLaia dataset?
2 How about adding a script to format a ground truth in PAGE xml and/or ALTO xml comprising images + xml files, extract the lines, cut the lines from the image files and format a PyLaia dataset?
Personally I wrote some scripts that do that (raw lines + transcriptions to PyLaia, ALTO XML and PAGE XML to PyLaia)
Just a thought.
Edited by Teodor Bors