Description
Post https://github.com/quickwit-oss/tantivy/pull/2726, there is about a 15% regression on string sorts.
I have not had time to triage this yet, but I strongly suspect that it is due to impl SortKeyComputer for SortByString doing individual lookups to the column's dictionary per term:
https://github.com/quickwit-oss/tantivy/blob/d0e16001357b0238645d2e09db59e913b09fee07/src/collector/sort_key/sort_by_string.rs#L62-L72
When ordering by strings, the resulting values will be sequential in the column's dictionary. Because the dictionary is compressed, each of these lookups will decompress a block of the term dictionary, and since the values are potentially contiguous in the dictionary, this can mean that we decompress the same block multiple times.
Previously on main, this used sorted_ords_to_term_cb to batch convert the TermOrdinals into terms. One way to get this performance back would be to change SegmentSortKeyComputer::convert_segment_sort_key into a batch method.