The best class filter and display options of element lists do not use indexes
It is becoming common knowledge that the best_class
filter and the with_best_classes
option on ListElements
, ListElementChildren
and ListElementParents
are painfully slow and can often timeout on large corpora, especially best_class=false
(elements without any best class). We even had to disable with_best_classes
by default (frontend#478 (closed)).
With a random corpus that has elements with and without best classes, I copy-pasted the generated SQL from ListElements
with ?best_class=True&with_best_classes=True
and had a look in the query plan. On the COUNT(*)
query, the filter slows the whole query down, and on the query that retrieves 20 elements the display option slows the query down. Here is the node where best classes are filtered:
Index Scan using documents_classification_element_id_58ae52e0 on documents_classification as documents_classification (cost=0.43..2.22 rows=1 width=16) (actual=0.003..0.003 rows=0 loops=105793)
Filter: (((state)::text = 'validated'::text) OR high_confidence)
Index Cond: (element_id = documents_element.id)
While an index scan is indeed used, the actual index condition is for elements, and the "Filter" is just applied in a loop, not from the index itself; no indexes are actually used. We can try creating a custom index on a classification's element_id
with state = 'validated' OR high_confidence
, though I do not know if this would help with best_class=False
, its opposite and currently the slowest option.