Optimize WorkerActivity deletion when deleting a corpus with no worker activities
@mlbonhomme reported that the Delete WorkerActivity
message was being shown for a very long time while deleting empty corpora before the deletion failed for an unrelated reason (#1476 (closed)). This deletion appears to be very slow in prod, preprod, demo, and locally, and how slow it is only depends on how large the instance is.
The query in question is this one, which uses an OR
between two joins. Using OR
is a common cause of large performance issues as suddenly the query planner loses a lot of the optimizations it could make, and in some cases even a UNION ALL
can be quicker.
I tried to do some EXPLAIN
with both filters at once, and with two separate queries deleting for each filter, on a random empty corpus, with zero worker activities:
Query | Estimated cost | Real execution time |
---|---|---|
Delete activities related to elements | 48413.39 | 68.313 ms |
Delete activities related to processes | 186241.20 | 1214.563 ms |
Delete both at once using OR
|
2833755.56 | 26541.183 ms |
Same thing, but on a corpus with 8.5 million worker activities (almost all the activities in my database):
Query | Estimated cost | Real execution time |
---|---|---|
Delete activities related to elements | 2120087.15 | 18882.327 ms |
Delete activities related to processes | 207938.36 | 156229.964 ms |
Delete both at once using OR
|
3079521.62 | 44057.445 ms |
Delete activities related to processes after having deleted activities related to elements | 207938.36 | 362.000 ms |
If we split the query into two queries, one through elements and one through processes, then either way, deleting a corpus without any activities will be faster. But if we delete a corpus with worker activities, we should go through elements first, as deleting on processes first is extremely slow and deleting on processes is fast (it's just an extra precaution, so the second query is always faster).
We should probably do some further testing using BEGIN; EXPLAIN ANALYZE DELETE …; ROLLBACK;
in other instances to see if this behavior is also observed on other databases.