Delete training links upon element deletion
Closes #1488 (closed)
Merge request reports
Activity
assigned to @vrigal
added 7 commits
-
6723dd21...a6b43fdd - 4 commits from branch
master
- e93eb9db - Delete training links upon element deletion
- 407cdc5a - Update trash handler for element deletion
- 5a1efe69 - Update tests
Toggle commit list-
6723dd21...a6b43fdd - 4 commits from branch
changed milestone to %Arkindex 1.5.0
The
WHERE … OR …
can be a risk for performance because, unlike SQLite, PostgreSQL does not apply the OR-by-UNION technique which turns theWHERE
clause into a singleWHERE train_folder_id IN (SELECT <id> UNION SELECT element_id FROM …);
, which can then be optimized down to a more typicalINNER JOIN
. UsingOR
with Postgres often makes it give up on all indices instead and just use sequential scans everywhere. This is listed among the database performance best practices.I did some
EXPLAIN
locally and in preprod to see if this is affected here:Update on process_process (cost=23792.10..23930.55 rows=0 width=0) -> Seq Scan on process_process (cost=23792.10..23930.55 rows=2082 width=22) Filter: ((train_folder_id = '37d3c46c-8e72-4217-b709-8c5ed8c847df'::uuid) OR (hashed SubPlan 1)) SubPlan 1 -> Bitmap Heap Scan on documents_elementpath (cost=296.02..23776.61 rows=6197 width=16) Recheck Cond: (path && '{37d3c46c-8e72-4217-b709-8c5ed8c847df}'::uuid[]) -> Bitmap Index Scan on documents_e_path_15a4b8_gin (cost=0.00..294.48 rows=6197 width=0) Index Cond: (path && '{37d3c46c-8e72-4217-b709-8c5ed8c847df}'::uuid[])
It does use the GIN index, pfew!
With an explicit
UNION
, the cost is the same:arkindex_dev=# EXPLAIN UPDATE process_process SET train_folder_id = NULL WHERE train_folder_id IN ( SELECT '37d3c46c-8e72-4217-b709-8c5ed8c847df'::uuid UNION SELECT element_id FROM documents_elementpath WHERE path && ARRAY['37d3c46c-8e72-4217-b709-8c5ed8c847df']::uuid[] ); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Update on process_process (cost=24086.52..24215.08 rows=0 width=0) -> Hash Join (cost=24086.52..24215.08 rows=93 width=62) Hash Cond: (process_process.train_folder_id = "ANY_subquery".uuid) -> Seq Scan on process_process (cost=0.00..117.63 rows=4163 width=22) -> Hash (cost=24009.04..24009.04 rows=6198 width=56) -> Subquery Scan on "ANY_subquery" (cost=23885.08..24009.04 rows=6198 width=56) -> HashAggregate (cost=23885.08..23947.06 rows=6198 width=16) Group Key: ('37d3c46c-8e72-4217-b709-8c5ed8c847df'::uuid) -> Append (cost=0.00..23869.59 rows=6198 width=16) -> Result (cost=0.00..0.01 rows=1 width=16) -> Bitmap Heap Scan on documents_elementpath (cost=296.02..23776.61 rows=6197 width=16) Recheck Cond: (path && '{37d3c46c-8e72-4217-b709-8c5ed8c847df}'::uuid[]) -> Bitmap Index Scan on documents_e_path_15a4b8_gin (cost=0.00..294.48 rows=6197 width=0) Index Cond: (path && '{37d3c46c-8e72-4217-b709-8c5ed8c847df}'::uuid[])
The cost is higher here because the query plan is more complicated than necessary, because we don't have a lot of processes, so the
OR
is fine here. TheUNION
may be necessary later if we end up choosing to preserve a lot of processes in Arkindex and not clean them up.added 13 commits
-
abbccd94...3062f2fb - 10 commits from branch
master
- 73ebbdeb - Delete training links upon element deletion
- a385379a - Update trash handler for element deletion
- de5f43a6 - Update tests
Toggle commit list-
abbccd94...3062f2fb - 10 commits from branch
enabled an automatic merge when the pipeline for de5f43a6 succeeds