quickwit-oss/tantivy

About 15% regression on string sorts

Open

#2776 opened on Dec 15, 2025

View on GitHub
 (12 comments) (0 reactions) (0 assignees)Rust (8,354 stars) (499 forks)batch import
good first issue

Description

Post https://github.com/quickwit-oss/tantivy/pull/2726, there is about a 15% regression on string sorts.

I have not had time to triage this yet, but I strongly suspect that it is due to impl SortKeyComputer for SortByString doing individual lookups to the column's dictionary per term: https://github.com/quickwit-oss/tantivy/blob/d0e16001357b0238645d2e09db59e913b09fee07/src/collector/sort_key/sort_by_string.rs#L62-L72

When ordering by strings, the resulting values will be sequential in the column's dictionary. Because the dictionary is compressed, each of these lookups will decompress a block of the term dictionary, and since the values are potentially contiguous in the dictionary, this can mean that we decompress the same block multiple times.


Previously on main, this used sorted_ords_to_term_cb to batch convert the TermOrdinals into terms. One way to get this performance back would be to change SegmentSortKeyComputer::convert_segment_sort_key into a batch method.

Contributor guide