Skip to content

Explicit error for null parents in ElementNeighbors

Erwan Rouchet requested to merge explicit-null-alert into master

Closes #301 (closed), resolves https://sentry.io/organizations/teklia/issues/1773113263/

The TypeError itself is because ElementHeader uses parent.id as a key to display parent elements for a path, but in this case, parent is null. The null values are sent by the ListElementNeighbors endpoint itself: https://arkindex.teklia.com/api/v1/elements/eb942042-e1db-4609-93cd-5219c061b7d8/neighbors/. Those null values indicate that some ElementPaths have UUIDs in their paths that do not belong to any existing element.

The following query returns every "ghost element ID":

SELECT DISTINCT id
FROM (SELECT unnest(path) AS id FROM documents_elementpath) AS p
LEFT JOIN documents_element e USING (id) WHERE e.id IS NULL;

In preprod, there are no ghost elements. In prod, there are four of them. 284 children elements are affected, on 284 paths. They are listed here along with the affected pages, sections and text segments, as those were the three element types their children have.

ID Corpus Pages Sections Text segments
98f680d7-11d8-4d9d-be14-e67956171333 HORAE | 790 0 169 0
005afb6c-7f4b-4080-85c3-9f5be2ea20c3 HORAE | 790 75 10 8
685ab9e8-2aa2-4826-aa33-d67ed2bf327a HORAE | Interannotator agreement 3 0 21 0
ff9d65c5-f2ca-406a-bccd-cd2dda018205 HORAE | Interannotator agreement 4 0 1 0

All of those children elements have other parents. All pages are direct children of those ghost elements and of a volume. All sections and text segments are direct children of pages that do not have the ghost elements as parents.

The exact cause of this issue is unknown and a few theories have been made:

  • Simultaneously running requests to add a parent and delete an element: this could cause some paths to be inserted with an element that gets deleted afterwards;
  • Running an API request to add or remove a parent, or to delete an element, when a deployment happens and causes the backend to be killed;
  • Cluster-related issues where some queries would use different servers and cause the path management algorithms to get wrong information.

Those theories are hard to test, reproduce, or fix, and occurrences of elements with invalid paths are rare. The last time this happened only was in preproduction five months ago: https://trello.com/c/ow1qiex3/

Getting those errors on the frontend is currently our only way to detect those ghost element IDs. This MR does nothing but add a more explicit error message and prevent the ElementHeader from displaying 'Loading…' forever. The offending element paths have been removed from the production database as they were useless.

Edited by Erwan Rouchet

Merge request reports

Loading