Skip to content

There are polygons with less than 4 points

Sentry Issue: ARKINDEX-BACKEND-5J

ValueError: At least 4 distinct points are required.
(20 additional frame(s) were not displayed)
...
  File "django/db/models/query.py", line 71, in __iter__
    obj = model_cls.from_db(db, init_list, row[model_fields_start:model_fields_end])
  File "django/db/models/base.py", line 513, in from_db
    new = cls(*values)
  File "django/db/models/base.py", line 435, in __init__
    _setattr(self, field.attname, val)
  File "arkindex/project/fields.py", line 143, in __set__
  File "arkindex/project/gis.py", line 32, in ensure_linear_ring

ensure_linear_ring is also run on polygons coming from the database, causing a ValueError in preprod when retrieving two particular elements because some polygons in preprod have less than 4 distinct points.

To look for zones with less than 4 points:

from django.contrib.gis.db.models.functions import NumPoints
# Do not retrieve a zone's polygon here, because those would cause ValueError from ensure_linear_ring
zones = Zone.objects.annotate(n=NumPoints('polygon')).filter(n__lt=4).defer('polygon')

There are 1078 zones with invalid polygons in preprod and 1 in prod.

Two elements are affected in preprod (5cb84547-004f-4374-95cd-46728e8765fa and e5803e91-efa9-40ea-95bc-8ba778dbba36), both using the same zone on two corpora both named Transkribus collection n°44923. Both elements were created on September 8th, 20 days before PostGIS was merged in the backend. A third corpus exists with another Transkribus import that occured in November, so it does not have this issue. I suggest deleting those two duplicate corpora, then just removing the affected zones.

I will inspect the single affected zone in prod later as there is a release in progress and the queries can cause a high database read load.

It seems the existing database constraint on zones only ensures a maximum polygon size, not a minimum, but the backend blocks those invalid polygons.

Edited by Erwan Rouchet