Essay

Post-Incident Analysis: Why a RAG Assistant Hallucinated Despite “Having the Docs”

You are the on-call ML engineer for an internal LLM assistant used by Sales Ops to answer questions about current pricing rules and contract clauses. The system is advertised as “RAG-powered” and uses a vector database of chunked policy documents (updated nightly). Last week, several answers were confidently wrong and could not be traced to any cited source. Logs show: (1) the retriever returned top-k snippets that were semantically similar but from an older policy version; (2) the prompt included the retrieved snippets, but the model still produced details not present in them; and (3) a teammate proposes replacing the whole approach with k-NN language modeling by storing past hidden states and next tokens from historical Q&A transcripts in a datastore.

Write an incident review that (a) diagnoses the most likely failure modes across text retrieval, vector database content/indexing, and grounding behavior in the generation step; (b) proposes concrete, testable changes to the RAG pipeline (retrieval strategy, datastore/versioning, and prompting/answer-format constraints) that would reduce ungrounded claims; and (c) evaluates whether k-NN LM would actually address the root causes or introduce new risks, given that it influences next-token prediction via nearest neighbors rather than explicitly supplying verifiable source text. Your answer should make clear tradeoffs and include at least two specific metrics or checks you would add to detect regressions (e.g., retrieval relevance, citation faithfulness, or version freshness).

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.3 Prompting - Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Related