Skip to content

Allow to select direct children in processes

https://redmine.teklia.com/issues/10734

The inconsistencies between the element API filters and the process filters are causing issues with new users, as what they see before creating a process is not what the process selects. Implementing a full match between those APIs and ListProcessElements is not easy, as performance is crucial in this endpoint, and it works and is used differently. We can avoid the most common error by allowing child elements to be selected non-recursively, as running your process on every page in a folder is among the most common use cases.

Process.load_children should be redefined as an EnumField. The new ProcessChildrenOption enum includes:

  • none, load no children (default)
  • direct, only load direct children, non-recursively
  • all, load all children recursively

The migration should use a RunSQL with state_operations to have the AlterField use these queries:

  • ALTER TABLE process_process ALTER COLUMN load_children TYPE varchar(10) USING CASE WHEN load_children THEN 'all' ELSE 'none' END;
  • Reverse: `ALTER TABLE process_process ALTER COLUMN load_children TYPE boolean USING load_children <> 'none';

This will perform the data migration directly. load_children=False becomes none, and True becomes all.

Process.list_elements needs to be updated to support these new options:

  • On processes whose modes are not Workers or Export, return an empty queryset

  • If there is one element assigned as Process.element:

    • Build an initial Q filter that only looks for this element by ID
    • If load_children is not none:
      • Add a | Q(paths__path__overlap=[element_id]) to include children
      • If load_children is direct, also add paths__path__last=element_id to restrict to direct children
  • Try to list the id and corpus_id found in Process.elements.all(). If the list is not empty:

    • Fail if any corpus_id is not the corpus ID of the process
    • Build the initial Q filter on the list of id
    • If load_children is not none:
      • Add a | Q(paths__path__overlap=element_ids) to include children
      • If load_children is direct, also add paths__path__last__in=element_id to restrict to direct children
  • Otherwise the process is running on the whole corpus:

    • When load_children is none, return an empty queryset
    • With direct, return all elements in the corpus with paths__path=[]
    • With all, return all elements in the corpus
  • On all queries, the filters returned by _get_filters() should be applied, so that the name/type/class filters continue to work.

Performance tests are necessary because multiple options can have issues:

  • The ElementPath filters are using OR and might need to use a UNION instead, as we know this can be an issue
  • We already know that running a process from a large selection can have issues as we are loading every element in RAM, and this might not be avoidable
  • paths__path__last__in=[…] might not be using an index at all
  • paths__path__overlap alone without a last filter is sometimes slower because last can use a faster B-tree index