Essay

Design Review: Combining Tool Use, DTG, and Predict-then-Verify for a High-Stakes API Workflow

You are reviewing a proposed architecture for an internal LLM assistant used by Finance Operations to (1) draft a vendor-payment approval note and (2) optionally trigger an external API call create_payment(vendor_id, amount, invoice_id) that will schedule a real payment. The team has observed two failure modes: (a) the model sometimes hallucinates invoice details when the user’s message is incomplete, and (b) when the model does call the API, it occasionally chooses the wrong invoice_id among several similar open invoices.

Write a design critique and improvement plan that integrates: (i) deliberate-then-generate prompting (the model must first surface likely error types/uncertainties before drafting the approval note), (ii) a predict-then-verify strategy that generates multiple candidate action plans (including whether to call the API at all) and selects among them, (iii) an explicit verifier component (describe what it checks and whether it is outcome-based, process-based, or both), and (iv) safe tool-use with the external API (describe gating, required arguments, and what happens when required data is missing).

In your answer, explain the tradeoffs you are making (latency, cost, and risk), and give at least two concrete examples of verifier checks that would specifically reduce the two observed failure modes without relying on the model’s pre-trained knowledge alone.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models

Related