Case Study

Case Study: Preventing Hallucinated Compliance Claims in an API-Enabled LLM for Vendor Risk Reviews

You are designing an internal LLM assistant used by procurement and security teams to draft vendor risk review summaries. The assistant can call two internal APIs during inference: (1) get_vendor_certifications(vendor_id) which returns a list of current certifications with expiry dates, and (2) get_latest_security_incidents(vendor_id) which returns incident summaries from the last 12 months. A recent near-miss occurred: the assistant confidently wrote, “Vendor X is SOC 2 Type II certified through next year and has had no security incidents,” but the certification had expired two months earlier and there was a medium-severity incident last quarter. Leadership requires a redesign that (a) reduces the chance of false claims, (b) keeps median response time under 8 seconds, and (c) produces an auditable trail showing why the final statements were made.

Propose a single end-to-end inference workflow (not training) that integrates: deliberate-then-generate self-reflection, predict-then-verify with a verifier, and tool use with the APIs above. Your answer must specify (i) when and how many candidate drafts are generated, (ii) what the verifier checks (and whether it is outcome-based, process-based, or both), (iii) how API results are used to ground or block claims, and (iv) how the workflow meets the 8-second latency constraint while still improving reliability. Be concrete about the sequence of steps and the key tradeoffs you are making.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related