Skip to content

Format datasets

Is not an issue per se but a recommendation.

1 How about adding a script to format a dataset of images lines + text transcriptions to a formatted PyLaia dataset?

2 How about adding a script to format a ground truth in PAGE xml and/or ALTO xml comprising images + xml files, extract the lines, cut the lines from the image files and format a PyLaia dataset?

Personally I wrote some scripts that do that (raw lines + transcriptions to PyLaia, ALTO XML and PAGE XML to PyLaia)

Just a thought.

Edited by Teodor Bors