Learn Before
Analyzing the Trade-off in Penalized Decoding
A language model is configured to generate creative, one-sentence story prompts. To encourage novelty, it uses the decoding objective: argmax [Pr(y|x) - λ * Penalty(x, y)], where the penalty score is higher for more common or generic prompts. For a given input, the model considers two candidate outputs:
- Output A (Common): "A young hero discovers they have magical powers."
- Probability
Pr(y|x)= 0.8 - Penalty Score = 0.9
- Probability
- Output B (Novel): "A sentient teapot searches for its missing lid across a desert of sugar."
- Probability
Pr(y|x)= 0.5 - Penalty Score = 0.1
- Probability
Analyze how the model's final choice between Output A and Output B is influenced by the value of the hyperparameter λ. Specifically, explain which output is likely to be chosen when λ is very small (e.g., 0.01) versus when it is very large (e.g., 1.0), and justify your reasoning based on the components of the objective formula.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Penalty Function in Controllable Decoding
A developer is using a language model for text summarization. The model's outputs are generally fluent but suffer from excessive repetition of certain phrases. To address this, the developer employs a decoding objective that penalizes repetition, formulated as:
argmax [Pr(y|x) - λ * Penalty(x, y)], wherePenalty(x, y)increases with the amount of repetition in the outputy. How should the developer adjust the hyperparameterλto make the summaries less repetitive?Analyzing the Trade-off in Penalized Decoding
Consider the decoding objective for controllable text generation:
ŷ = argmax [Pr(y|x) - λ * Penalty(x, y)]. If the hyperparameterλis set to 0, the objective simplifies to finding the output with the highest conditional probability, effectively ignoring any penalty.Greedy Search with Penalty Objective
Sampling-based Search with Penalty Objective