Skip to content

Format datasets and add datasets documentation for doc-ufcn

Is not an issue per se but a recommendation.

  1. How about adding a script to format a dataset of images + polygons to a formatted doc-ufcn dataset?

  2. How about adding a script to format a ground truth in PAGE xml and/or ALTO xml comprising images + xml files, extract the lines, cut the lines from the image files and format a doc-ufcn dataset?

  3. doc-ufcn dataset formatting documentation, unfortunatelly, is missing, especially nothing is clear about classes_colors of classes_names.

Personally I wrote some scripts that do that (raw images + polygons to doc-ufcn, ALTO XML and PAGE XML to doc-ufcn) even though I'm not sure if I did it right.

Just a thought.

Edited by Teodor Bors