Skip to content
Snippets Groups Projects

Delete training links upon element deletion

Merged Valentin Rigal requested to merge delete-training-folder into master

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • assigned to @vrigal

  • Valentin Rigal added 7 commits

    added 7 commits

    Compare with previous version

  • Valentin Rigal added 1 commit

    added 1 commit

    Compare with previous version

  • Author Maintainer

    Tested both with the .delete() manager and in a async task from the frontend.

  • Valentin Rigal marked this merge request as ready

    marked this merge request as ready

  • Valentin Rigal requested review from @erwanrouchet

    requested review from @erwanrouchet

  • changed milestone to %Arkindex 1.5.0

  • The WHERE … OR … can be a risk for performance because, unlike SQLite, PostgreSQL does not apply the OR-by-UNION technique which turns the WHERE clause into a single WHERE train_folder_id IN (SELECT <id> UNION SELECT element_id FROM …);, which can then be optimized down to a more typical INNER JOIN. Using OR with Postgres often makes it give up on all indices instead and just use sequential scans everywhere. This is listed among the database performance best practices.

    I did some EXPLAIN locally and in preprod to see if this is affected here:

    Update on process_process  (cost=23792.10..23930.55 rows=0 width=0)
      ->  Seq Scan on process_process  (cost=23792.10..23930.55 rows=2082 width=22)
            Filter: ((train_folder_id = '37d3c46c-8e72-4217-b709-8c5ed8c847df'::uuid) OR (hashed SubPlan 1))
            SubPlan 1
              ->  Bitmap Heap Scan on documents_elementpath  (cost=296.02..23776.61 rows=6197 width=16)
                    Recheck Cond: (path && '{37d3c46c-8e72-4217-b709-8c5ed8c847df}'::uuid[])
                    ->  Bitmap Index Scan on documents_e_path_15a4b8_gin  (cost=0.00..294.48 rows=6197 width=0)
                          Index Cond: (path && '{37d3c46c-8e72-4217-b709-8c5ed8c847df}'::uuid[])

    It does use the GIN index, pfew!

    With an explicit UNION, the cost is the same:

    arkindex_dev=# EXPLAIN UPDATE process_process
    SET train_folder_id = NULL
    WHERE train_folder_id IN (
            SELECT '37d3c46c-8e72-4217-b709-8c5ed8c847df'::uuid
        UNION
            SELECT element_id FROM documents_elementpath WHERE path && ARRAY['37d3c46c-8e72-4217-b709-8c5ed8c847df']::uuid[]
    );
                                                                QUERY PLAN                                                             
    -----------------------------------------------------------------------------------------------------------------------------------
     Update on process_process  (cost=24086.52..24215.08 rows=0 width=0)
       ->  Hash Join  (cost=24086.52..24215.08 rows=93 width=62)
             Hash Cond: (process_process.train_folder_id = "ANY_subquery".uuid)
             ->  Seq Scan on process_process  (cost=0.00..117.63 rows=4163 width=22)
             ->  Hash  (cost=24009.04..24009.04 rows=6198 width=56)
                   ->  Subquery Scan on "ANY_subquery"  (cost=23885.08..24009.04 rows=6198 width=56)
                         ->  HashAggregate  (cost=23885.08..23947.06 rows=6198 width=16)
                               Group Key: ('37d3c46c-8e72-4217-b709-8c5ed8c847df'::uuid)
                               ->  Append  (cost=0.00..23869.59 rows=6198 width=16)
                                     ->  Result  (cost=0.00..0.01 rows=1 width=16)
                                     ->  Bitmap Heap Scan on documents_elementpath  (cost=296.02..23776.61 rows=6197 width=16)
                                           Recheck Cond: (path && '{37d3c46c-8e72-4217-b709-8c5ed8c847df}'::uuid[])
                                           ->  Bitmap Index Scan on documents_e_path_15a4b8_gin  (cost=0.00..294.48 rows=6197 width=0)
                                                 Index Cond: (path && '{37d3c46c-8e72-4217-b709-8c5ed8c847df}'::uuid[])
    

    The cost is higher here because the query plan is more complicated than necessary, because we don't have a lot of processes, so the OR is fine here. The UNION may be necessary later if we end up choosing to preserve a lot of processes in Arkindex and not clean them up.

  • Will be merged once we actually enter this release…

  • Erwan Rouchet approved this merge request

    approved this merge request

  • Erwan Rouchet added 13 commits

    added 13 commits

    Compare with previous version

  • Erwan Rouchet enabled an automatic merge when the pipeline for de5f43a6 succeeds

    enabled an automatic merge when the pipeline for de5f43a6 succeeds

  • merged

Please register or sign in to reply
Loading