There are polygons with less than 4 points
Sentry Issue: ARKINDEX-BACKEND-5J
ValueError: At least 4 distinct points are required.
(20 additional frame(s) were not displayed)
...
File "django/db/models/query.py", line 71, in __iter__
obj = model_cls.from_db(db, init_list, row[model_fields_start:model_fields_end])
File "django/db/models/base.py", line 513, in from_db
new = cls(*values)
File "django/db/models/base.py", line 435, in __init__
_setattr(self, field.attname, val)
File "arkindex/project/fields.py", line 143, in __set__
File "arkindex/project/gis.py", line 32, in ensure_linear_ring
ensure_linear_ring
is also run on polygons coming from the database, causing a ValueError in preprod when retrieving two particular elements because some polygons in preprod have less than 4 distinct points.
To look for zones with less than 4 points:
from django.contrib.gis.db.models.functions import NumPoints
# Do not retrieve a zone's polygon here, because those would cause ValueError from ensure_linear_ring
zones = Zone.objects.annotate(n=NumPoints('polygon')).filter(n__lt=4).defer('polygon')
There are 1078 zones with invalid polygons in preprod and 1 in prod.
Two elements are affected in preprod (5cb84547-004f-4374-95cd-46728e8765fa
and e5803e91-efa9-40ea-95bc-8ba778dbba36
), both using the same zone on two corpora both named Transkribus collection n°44923
. Both elements were created on September 8th, 20 days before PostGIS was merged in the backend. A third corpus exists with another Transkribus import that occured in November, so it does not have this issue. I suggest deleting those two duplicate corpora, then just removing the affected zones.
I will inspect the single affected zone in prod later as there is a release in progress and the queries can cause a high database read load.
It seems the existing database constraint on zones only ensures a maximum polygon size, not a minimum, but the backend blocks those invalid polygons.