Essay

Architecture Decision Memo: Unifying Vector-DB RAG and k-NN LM for a Global Policy Assistant

You are leading an architecture review for an internal “Policy & Procedures” assistant used by Legal, HR, and Finance. The assistant must (1) answer questions with citations to the exact policy passages used, (2) stay current as policies change weekly, and (3) support multi-turn chats where later turns depend on earlier answers. Your team proposes a single vector database to support both: (a) a RAG pipeline that retrieves the top-k relevant policy snippets to include in the prompt, and (b) a k-NN language modeling (k-NN LM) component that retrieves nearest-neighbor hidden-state entries to influence next-token prediction during generation.

Write a decision memo that evaluates whether using one shared vector database for both retrieval paths is a good idea. In your answer, you must:

  • Explain how text retrieval for RAG and k-NN LM differ in what they store as vectors, what the query vector represents, and what the retrieved items are used for.
  • Propose a concrete design (one shared store or two separate stores) and justify it using the grounding/citation requirement and the need for rapid updates.
  • Identify at least two failure modes that could occur if the retrieval design is wrong (e.g., correct-sounding but ungrounded answers, stale policy usage, irrelevant retrieval due to embedding mismatch), and describe how you would detect/mitigate them.

Assume you cannot fine-tune the base LLM and must rely on retrieval-time mechanisms.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.3 Prompting - Foundations of Large Language Models

Ch.5 Inference - Foundations of Large Language Models

Related