apache/lucene

Should we use the new `expectedVisitedNodes` estimator to speed up filtered vector search?

Open

#14,845 opened on 2025年6月25日

GitHub で見る
 (6 comments) (0 reactions) (0 assignees)Java (2,179 stars) (879 forks)batch import
good first issuetype:enhancement

説明

Description

@benwtrent 's PR at #14836 introduced an estimator for the number of visited nodes based on k and the number of vectors called expectedVisitedNodes(int k, int size).

Currently, filtered vector search works roughly this way:

  1. compute a per-leaf k value based on the global k value
  2. if (filterCost <= perLeafK) do an exact search
  3. else run an approximate search, stop after visiting filterCost+1 nodes
  4. if the approximate search stopped early (and is thus incomplete), run an exact search

Could/should we update step 2 above to do an exact search if (filterCost <= expectedVisitedNodes(perLeafK, size))? This should help save approximate searches that are almost certainly going to be stopped early and ignored anyway?

コントリビューターガイド