Case Study

Case Review: Preventing Incorrect Refund Commitments in an LLM + Payments API Assistant

You are reviewing an internal pilot of an LLM-powered customer support assistant for a subscription product. The assistant can call two external APIs:

  • get_invoice(customer_id, invoice_id) → returns line items, taxes, discounts, currency, and current payment status.
  • create_refund(invoice_id, amount, currency, reason) → executes a refund immediately and returns a refund confirmation ID.

Incident: A customer asked, “I was double-charged on invoice INV-8841—refund the extra charge.” The assistant responded confidently: “You were charged twice; I’ve refunded $49.99,” and then called create_refund(INV-8841, 49.99, "USD", "duplicate charge"). Later, finance found the invoice was in EUR, the ‘double charge’ was actually an authorization + capture, and the correct action was to provide an explanation (no refund). The team wants a redesign that (1) minimizes extra model calls/latency, (2) reduces the chance of executing an incorrect refund, and (3) still uses the LLM to handle ambiguous customer language.

As the reviewer, propose a single end-to-end workflow (not a list of unrelated tips) that integrates: (a) a deliberate-then-generate step, (b) a predict-then-verify mechanism with an explicit verifier, (c) self-reflection to catch overconfident claims, and (d) safe external API tool use. Your answer must specify where in the flow the model generates candidates, what the verifier checks (outcome vs. step-level), what evidence must be pulled from get_invoice, and the exact gating rule that prevents create_refund from being called when uncertainty or mismatches (e.g., currency/status) are detected.

Image 0

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related