Case Study

Case Study: Shipping a Tool-Using LLM Assistant with Built-In Verification Under Latency Constraints

You are the product owner for an internal LLM assistant used by Customer Operations to answer: (1) “Where is order #12345 right now?” and (2) “Can I promise delivery by Friday?” The assistant can call two external APIs: get_tracking(order_id) (returns latest scan location + timestamp) and get_inventory(sku, warehouse) (returns available-to-promise quantity). A recent incident occurred: the assistant confidently promised Friday delivery based on a stale tracking scan and an incorrect assumption about inventory allocation rules. Leadership now requires: (a) fewer than 2 API calls per user request on average, (b) a measurable reduction in incorrect commitments, and (c) an auditable record of why the assistant made a commitment.

As the designer, propose a single end-to-end inference workflow that integrates: deliberate analysis before answering, self-reflection, predict-then-verify with a verifier, and tool use with the APIs above. Your answer must specify (i) when and why the model calls each API (or chooses not to), (ii) what the model generates as multiple “candidates” (what varies across candidates), (iii) what the verifier checks and what evidence it uses (including how it handles stale timestamps), and (iv) what the final user-facing response should contain to be auditable while minimizing overconfident promises.

Image 0

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related