Essay

Selecting and Justifying a Decoding Policy for Two Production Use Cases

You are deploying the same LLM behind two internal products:

  1. A compliance assistant that drafts short, auditable policy answers where reproducibility is required (the same input should yield the same output), and answers must not become overly long.
  2. A brainstorming assistant for product managers where novelty and variety are valued, but outputs must remain coherent and not drift into low-probability “nonsense.”

Write a recommendation memo that proposes a decoding configuration for each product. Your memo must:

  • Choose between greedy decoding, beam search, top-k sampling, and top-p (nucleus) sampling for each product, and justify the choice in terms of determinism vs. diversity and how candidate-set pruning works.
  • Specify how you would use temperature scaling in the sampling-based configuration(s) (e.g., higher/lower temperature) and explain the expected effect on the renormalized token probabilities.
  • Explain whether and how you would apply a length penalty (or length normalization) in the deterministic configuration(s), including the failure mode it is intended to prevent.
  • Explicitly discuss at least one tradeoff you are accepting in each product (e.g., quality vs. diversity, compute vs. optimality, brevity vs. completeness) and why it is appropriate for that product’s constraints.

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Data Science

Related