Skip to content

Consider dropping or loosening the Transkribus polygon simplification

Back when we had a concept of Zone and a B-Tree index on polygons, all polygons were restricted to at most 164 points. Polygon simplification was implemented in PAGE XML imports to avoid exceeding this limit (#90 (closed), #105 (closed)), and is still currently there. We have since removed this limit and the backend can safely handle a few hundred points. In theory, we could send millions of points, but there is a limit where API requests will just time out because the polygon is too large to be sent.

We can consider either removing the simplification entirely, or at least allowing larger polygons to be generated, maybe with some arbitrary 500 points limit. This would allow importing more PAGE XML files without altering their contents, and without sometimes skipping regions that would be so complex that they cannot be simplified down enough.