Ensure child path unicity in add_parent when there are multiple parents
Closes #778 (closed)
Let's say you want to perform this operation:
Before | After |
---|---|
|
|
The initial contents of ElementPath will be like so (the ordering column is ignored here as everything will be 0
):
Element | Path |
---|---|
Le Element | Parent 1 |
Le Element | Parent 2 |
Le Child | Parent 1, Le Element |
Le Child | Parent 2, Le Element |
Le Grandchild | Parent 1, Le Element, Le Child |
Le Grandchild | Parent 2, Le Element, Le Child |
After just adding the Parent 3
path to Le Element
, Element.add_parent
handles child elements like so:
- For each child element path
- If the path starts with Le Element, nuke it (it will be replaced by one with Le Element + the new parent)
- Else:
- Strip off the other parent (remove everything before Le Element in the path)
- For each of the new parents of the current element (Parent 1, 2, 3):
- Create a new path with the parents + Le Element + whatever was after Le Element in the original path.
In the above example, this means the grandchild's paths will be edited like so:
Path | Strip the other parent | Add our parents |
---|---|---|
Parent 1, Le Element, Le Child | Le Element, Le Child | Parent 1, Le Element, Le Child |
— | — | Parent 2, Le Element, Le Child |
— | — | Parent 3, Le Element, Le Child |
Parent 2, Le Element, Le Child | Le Element, Le Child | Parent 1, Le Element, Le Child |
— | — | Parent 2, Le Element, Le Child |
— | — | Parent 3, Le Element, Le Child |
And there we go, we just duplicated every path!
This issue does not occur when adding the first or the second parents, only when we are adding a third, fourth, … parent, to an element that has a child. This is a rather rare case, but it might be the reason why we had such strange structures in corpora that used datasets or where elements were regularly moved around. We also did not cover this case in unit tests, so I added a case that goes up to 50 parents; we already know going over 6 parents will cause issues in the frontend anyway (requests#1).
This merge request fixes the issue the hardcore way by using a set
, which really ensures the entire algorithm, no matter how flawed, cannot create duplicate paths. Maybe, one day, I will be granted the right to finally work on this issue properly and turn everything into SQL or PL/SQL…