Essay

Debugging Decoding: Balancing Determinism, Diversity, and Length in a Regulated Product

You own the generation layer for an internal, regulated customer-support assistant. Two issues are reported after a model upgrade:

  1. For short answers (target: 1–2 sentences), the assistant often produces overly long, meandering responses.
  2. For longer answers (target: 6–10 sentences), the assistant is repetitive and sometimes “locks in” early to a suboptimal phrasing that later forces awkward continuations.

You are not allowed to change the model weights—only the decoding strategy and its parameters. Propose a single coherent decoding policy (you may use different settings by response-length tier, but keep the approach consistent) that addresses both issues. In your justification, explicitly explain how your choices combine: (a) a deterministic search method (greedy or beam search) versus a sampling method (top-k or top-p), (b) temperature scaling, and (c) a length penalty. Your answer must describe the tradeoffs you are making (e.g., predictability vs. diversity, local vs. global sequence quality, and how length controls interact with search/sampling) and why your policy would reduce both overlong short answers and repetitive long answers in production.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Data Science

Related