Case Study

Case Study: Debugging a RAG Assistant with a Vector DB and a k-NN LM Memory

You are the on-call ML engineer for an internal “Release Notes Q&A” assistant used by Sales Engineering. The assistant must answer questions about product behavior changes and must cite the exact release note paragraph(s) it used. The system is implemented as follows: (1) the user question is embedded and used to retrieve the top-k text chunks from a vector database of release notes; (2) those chunks are inserted into the prompt and the LLM generates an answer with citations (RAG); (3) in parallel, the LLM also uses a k-NN language modeling (k-NN LM) datastore built from last quarter’s support chat transcripts to improve next-token prediction during generation.

Incident: After yesterday’s release, users ask: “Does v4.2 still support OAuth1 for the legacy connector?” The correct answer in the new release notes is “No, OAuth1 was removed in v4.2; use OAuth2.” However, the assistant often answers “Yes, OAuth1 is supported,” and sometimes even cites a retrieved chunk that mentions “OAuth2 migration,” but the generated sentence still claims OAuth1 support. Logs show: the vector DB retrieval returns a chunk explicitly stating OAuth1 removal in position #2 of the retrieved list; the k-NN LM neighbors for several tokens around “OAuth1” come mostly from older transcripts where agents repeatedly said OAuth1 was supported.

As the incident owner, identify the most likely root cause of the wrong answer in terms of how text retrieval for RAG, grounding with external sources, the vector database, and k-NN LM interact during generation, and propose ONE concrete change (to retrieval, prompting/grounding, or k-NN LM integration) that would most directly prevent this specific failure mode while preserving the ability to cite sources.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.3 Prompting - Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Related