milvus-io/milvus

[Enhancement]: Proposal: document common RAG failure modes on Milvus using the WFGY 16-problem map

Open

#47964 opened on Feb 28, 2026

View on GitHub
 (5 comments) (0 reactions) (1 assignee)Go (44,298 stars) (4,000 forks)batch import
good first issuekind/enhancementstale

Description

Is there an existing issue for this?

  • I have searched the existing issues

What would you like to be added?

Hi Milvus team,

Milvus is widely used as the vector database behind RAG pipelines. When those pipelines fail, teams often cannot tell whether the root cause is the vector DB layer, the embeddings, or the application logic.

I maintain WFGY RAG 16 Problem Map, an open-source diagnostic map for RAG and LLM applications.

Repo (MIT): https://github.com/onestardao/WFGY

ProblemMap reference: https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

Proposal: Add a short documentation guide such as:

Debugging RAG on Milvus with the WFGY 16-problem map

The guide would:

Map common Milvus-backed RAG failure symptoms to a small subset of the 16 failure categories (ingest gaps, indexing issues, filtering mismatch, retrieval drift, embedding mismatch, stale updates, etc).

Provide a practical checklist of what to inspect in Milvus (collection config, payload filters, indexing choices, update patterns, consistency, recall/latency tradeoffs).

Provide minimal remediation steps and verification tests so users can confirm the fix.

If this aligns with Milvus docs direction, I can prepare a first draft as a PR.

Why is this needed?

Why is this needed?

RAG failures are frequently misattributed to the LLM. In practice, many recurring production issues come from vector store and retrieval setup:

ingestion incomplete or partial updates

index fragmentation or recall regressions after re-index

payload filter mistakes that silently exclude relevant points

embedding / dimension mismatch or inconsistent normalization

update skew and stale results after incremental ingestion

retrieval drift over time even when the DB is healthy

A structured failure-mode checklist helps users debug faster, reduces repeated troubleshooting questions, and clarifies which issues are pipeline configuration vs. Milvus itself.

Anything else?

The WFGY 16-problem map has been referenced or integrated in several RAG-related projects and research contexts, including:

RAGFlow (RAG troubleshooting docs)

LlamaIndex (RAG diagnostics documentation)

ToolUniverse (Harvard MIMS Lab)

Rankify (University of Innsbruck)

Multimodal RAG Survey (QCRI LLM Lab)

curated lists such as Awesome LLM Apps and Awesome Data Science (academic)

Happy to tailor the Milvus guide to match your documentation style and keep it focused on Milvus-specific checks and reproducible verification steps.

Contributor guide