quickwit-oss/tantivy

Missing search results with in memory index

Open

#1,824 opened on Jan 23, 2023

View on GitHub
 (7 comments) (0 reactions) (0 assignees)Rust (8,354 stars) (499 forks)batch import
good first issuehelp wanted

Description

I've been working on a small command line app to merge bibtex files and have been using tantivy to index paper titles. I noticed some strange behavior where if I (from python extension):

  1. Create an in memory index
  2. Call writer.add_document in a for loop
  3. Call writer.commit() outside the for loop The search results I'd expect to be exact matches (duplicate detection) are not showing up. I reran the same code several times and every so often I do get the correct results, which suggested to me that on commit, some index entries were getting lost. If I have writer.commit inside the for loop, the issue goes away. Likewise, if I use a file-based index, the issue also goes away.

Reading the docs, it looks like even with an in memory index, calling writer.commit() after the for loop and not in it should work, is there something I'm missing or might this be a bug?

If it's not a known issue already and you cant reproduce it, once I finish with my cli I can share the relevant code.

Contributor guide

Missing search results with in memory index · quickwit-oss/tantivy#1824 | Good First Issue