Skip to content
Snippets Groups Projects

Improve ListElementNeighbors performances

Merged Valentin Rigal requested to merge elt-neighbors-perfs into master
All threads resolved!

Closes #1011 (closed)

Uses the real path ordering value (named position, the frontend displays position + 1).

Edited by Valentin Rigal

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Erwan Rouchet
  • Erwan Rouchet
  • DRF's HTML view tries to find a filter form, but we do not have any filter backends on this endpoint so it's useless. To do so, it calls get_queryset, which in this endpoint triggers SQL queries, which violates the best practices since it is known to cause problems. This causes all timings found by the Django debug toolbar to be duplicated, because suddenly there is twice as much work to do. I did some other timings with time curl and with the dev tools opened in the frontend, and consistently got 50% lower execution times.

    By dropping the kernel pagecache, which Postgres uses to keep indices, tables or query results in memory, some ListElementNeighbors queries could take up to 14.6 seconds. They took at most 15.6 seconds on master. It is likely that we were struggling to reproduce some of the performance issues because of this; once everything is cached, it all appears to be running fine. The cache does have a size limit, and will never get filled if the endpoint times out, so on very large corpora that are accessed less often, or after a database restart, we might still see some lag.

    I tested this problem in production by running the query against a corpus I often use for path performance tests, GoLForDEEPN | Full. This corpus has often caused us trouble. This element showed some similar execution times of 3 to 4 seconds without the cache, and 150ms with cache.

    Removing the join on documents_element does not improve this issue, even though the query plans seem to show that this join is taking up most of the time. The window aggregation takes most of that time instead because it still computes all neighbors over all the elements with the same path before we can filter. I don't think this can be avoided without already knowing which elements are the neighbors of our element!

    So I guess a quick way to fix the neighbors issue if we find it again is to download more RAM, which should increase the amount of caching we can do. We could also work on frontend#1102 (closed) to make the performance issue less visible — you don't get to see the neighbors, but you at least get access to the Actions and Display menus for example.


    I might end up creating a follow-up issue to simplify the serializer; we don't care at all about compatibility, because only the frontend uses this endpoint, so we can always update the frontend. The current API actually makes it really hard to handle in the frontend, and we can easily simplify it:

    • The pagination is completely unnecessary, since get_neighbors will make an API request before DRF paginates, so we could remove it, which also means the frontend can now handle more than 6 parents at once (at the seventh parent, if you have a previous and a next element for each parent, then you get 21 elements which goes beyond the first page of results).

    • We can rename position to ordering, since we want to return the raw ordering and not a ROW_NUMBER().

    • We can switch from building three different objects, which then have to be re-combined by the frontend to detect which elements are the previous and next ones, into a single object with a previous and a next attribute, which also simplifies the backend implementation.

    Edited by Erwan Rouchet
  • Valentin Rigal resolved all threads

    resolved all threads

  • Valentin Rigal added 8 commits

    added 8 commits

    Compare with previous version

  • Valentin Rigal added 1 commit

    added 1 commit

    Compare with previous version

  • mentioned in issue #1553 (closed)

  • Erwan Rouchet added 9 commits

    added 9 commits

    Compare with previous version

  • Erwan Rouchet enabled an automatic merge when the pipeline for 59193dc8 succeeds

    enabled an automatic merge when the pipeline for 59193dc8 succeeds

  • merged

  • changed milestone to %Arkindex 1.5.0

  • Erwan Rouchet mentioned in merge request frontend!1527 (merged)

    mentioned in merge request frontend!1527 (merged)

  • Please register or sign in to reply
    Loading