Skip to content

The best class filter and display options of element lists do not use indexes

It is becoming common knowledge that the best_class filter and the with_best_classes option on ListElements, ListElementChildren and ListElementParents are painfully slow and can often timeout on large corpora, especially best_class=false (elements without any best class). We even had to disable with_best_classes by default (frontend#478 (closed)).

With a random corpus that has elements with and without best classes, I copy-pasted the generated SQL from ListElements with ?best_class=True&with_best_classes=True and had a look in the query plan. On the COUNT(*) query, the filter slows the whole query down, and on the query that retrieves 20 elements the display option slows the query down. Here is the node where best classes are filtered:

Index Scan using documents_classification_element_id_58ae52e0 on documents_classification as documents_classification (cost=0.43..2.22 rows=1 width=16) (actual=0.003..0.003 rows=0 loops=105793)
Filter: (((state)::text = 'validated'::text) OR high_confidence)
Index Cond: (element_id = documents_element.id)

While an index scan is indeed used, the actual index condition is for elements, and the "Filter" is just applied in a loop, not from the index itself; no indexes are actually used. We can try creating a custom index on a classification's element_id with state = 'validated' OR high_confidence, though I do not know if this would help with best_class=False, its opposite and currently the slowest option.