apache/lucene

Should we use the new `expectedVisitedNodes` estimator to speed up filtered vector search?

Open

#14.845 geöffnet am 25. Juni 2025

Auf GitHub ansehen
 (6 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Java (2.179 Stars) (879 Forks)batch import
good first issuetype:enhancement

Beschreibung

Description

@benwtrent 's PR at #14836 introduced an estimator for the number of visited nodes based on k and the number of vectors called expectedVisitedNodes(int k, int size).

Currently, filtered vector search works roughly this way:

  1. compute a per-leaf k value based on the global k value
  2. if (filterCost <= perLeafK) do an exact search
  3. else run an approximate search, stop after visiting filterCost+1 nodes
  4. if the approximate search stopped early (and is thus incomplete), run an exact search

Could/should we update step 2 above to do an exact search if (filterCost <= expectedVisitedNodes(perLeafK, size))? This should help save approximate searches that are almost certainly going to be stopped early and ignored anyway?

Contributor Guide