Improve ListElementNeighbors performances
Closes #1011 (closed)
Uses the real path ordering value (named position
, the frontend displays position + 1
).
Merge request reports
Activity
assigned to @vrigal
added 3 commits
I compared performances between https://gitlab.com/teklia/arkindex/backend/-/merge_requests/2038/diffs?commit_id=33db76e71d3982c926bf8b397a514eaccf107cc9 and
master
, on the three examples from https://gitlab.com/teklia/arkindex/backend/-/issues/1011#note_1456641279 (debug toolbar info against the preprod dump hosted on xenarque) :1st elt with 5k + neighbors
branch SQL query ms CPU ms 33db7 722 383 (1382) master 6367 272 (6682) nth elt with 5k + neighbors
branch SQL query ms CPU ms 33db7 695 184 (1130) master 684 83 (999) elt with 11 paths
branch SQL query ms CPU ms 33db7 597 159 (1062) master 615 125 (980) Not much improvement though, except for the first case that no longer explodes in terms of performance.
- Using the real row position does not alter performances very much.
- Dropping the element table join allows to gain up to 10% of perfs on those cases (requires to separate the top level case).
Edited by Valentin Rigalmentioned in issue #1545 (closed)
- Automatically resolved by Valentin Rigal
- Automatically resolved by Valentin Rigal
- Automatically resolved by Valentin Rigal
DRF's HTML view tries to find a filter form, but we do not have any filter backends on this endpoint so it's useless. To do so, it calls
get_queryset
, which in this endpoint triggers SQL queries, which violates the best practices since it is known to cause problems. This causes all timings found by the Django debug toolbar to be duplicated, because suddenly there is twice as much work to do. I did some other timings withtime curl
and with the dev tools opened in the frontend, and consistently got 50% lower execution times.By dropping the kernel pagecache, which Postgres uses to keep indices, tables or query results in memory, some ListElementNeighbors queries could take up to 14.6 seconds. They took at most 15.6 seconds on
master
. It is likely that we were struggling to reproduce some of the performance issues because of this; once everything is cached, it all appears to be running fine. The cache does have a size limit, and will never get filled if the endpoint times out, so on very large corpora that are accessed less often, or after a database restart, we might still see some lag.I tested this problem in production by running the query against a corpus I often use for path performance tests, GoLForDEEPN | Full. This corpus has often caused us trouble. This element showed some similar execution times of 3 to 4 seconds without the cache, and 150ms with cache.
Removing the join on
documents_element
does not improve this issue, even though the query plans seem to show that this join is taking up most of the time. The window aggregation takes most of that time instead because it still computes all neighbors over all the elements with the same path before we can filter. I don't think this can be avoided without already knowing which elements are the neighbors of our element!So I guess a quick way to fix the neighbors issue if we find it again is to download more RAM, which should increase the amount of caching we can do. We could also work on frontend#1102 (closed) to make the performance issue less visible — you don't get to see the neighbors, but you at least get access to the
Actions
andDisplay
menus for example.
I might end up creating a follow-up issue to simplify the serializer; we don't care at all about compatibility, because only the frontend uses this endpoint, so we can always update the frontend. The current API actually makes it really hard to handle in the frontend, and we can easily simplify it:
-
The pagination is completely unnecessary, since
get_neighbors
will make an API request before DRF paginates, so we could remove it, which also means the frontend can now handle more than 6 parents at once (at the seventh parent, if you have a previous and a next element for each parent, then you get 21 elements which goes beyond the first page of results). -
We can rename
position
toordering
, since we want to return the raw ordering and not aROW_NUMBER()
. -
We can switch from building three different objects, which then have to be re-combined by the frontend to detect which elements are the previous and next ones, into a single object with a
previous
and anext
attribute, which also simplifies the backend implementation.
Edited by Erwan Rouchet-
mentioned in issue #1553 (closed)
added 9 commits
Toggle commit listenabled an automatic merge when the pipeline for 59193dc8 succeeds
changed milestone to %Arkindex 1.5.0
mentioned in merge request frontend!1527 (merged)