1Cademy - Analyzing the Trade-off in Penalized Decoding

Learn Before

Decoding Objective with Penalty Term

Case Study

Analyzing the Trade-off in Penalized Decoding

A language model is configured to generate creative, one-sentence story prompts. To encourage novelty, it uses the decoding objective: argmax [Pr(y|x) - λ * Penalty(x, y)], where the penalty score is higher for more common or generic prompts. For a given input, the model considers two candidate outputs:

Output A (Common): "A young hero discovers they have magical powers."
- Probability Pr(y|x) = 0.8
- Penalty Score = 0.9
Output B (Novel): "A sentient teapot searches for its missing lid across a desert of sugar."
- Probability Pr(y|x) = 0.5
- Penalty Score = 0.1

Analyze how the model's final choice between Output A and Output B is influenced by the value of the hyperparameter λ. Specifically, explain which output is likely to be chosen when λ is very small (e.g., 0.01) versus when it is very large (e.g., 1.0), and justify your reasoning based on the components of the objective formula.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related