Ignore restarted tasks when computing process states
Closes #1880 (closed)
The Process.state
implementation was fast, but updating the state filter on ListProcess
took a while, because that filter does not simply filter on the state. If it was just filtering on the state of any task in the last run, then if the last run contains a task in each state, then you'll find that process when filtering by any state. The state filter actually needs to both find processes that have tasks of a given state, and that also do not have any tasks of a state of a higher priority.
A proper way to do it would be to use two Exists()
calls, one to find processes where a task of the state exists, and the other to exclude processes where a task of higher priority states exists. This is however not possible because of the need to filter for the last run, a number that can only be computed through MAX(run)
, which cannot be used in a WHERE
clause. Implementing this would require using a subquery in a FROM
, which is not possible with the ORM. I also tried using a FilteredRelation
, which uses a custom ON
clause on a JOIN
and can sometimes replace a subquery, but it wouldn't work there either.
This instead relies on both Django understanding that a ~Q(tasks__…)
should be implemented as a NOT EXISTS
, which it does not do in all cases, and on PostgreSQL understanding that it can use the blessed Nested Loop Anti Join
which is the only way to execute a NOT EXISTS
while using indexes and without doing sequential scans on every table. This is similar to the troubles we had a long time ago where we just didn't get to implement an efficient best_classes=false
filter.
I tried to add some assertExactQueries
on ListProcess
to test those implicit behaviors, but because of the need for various prefetch_related
calls that retrieve a ton of data with UUIDs in an order that we cannot control, it would take way too long to actually get to write that test.