Decoding Objective with Penalty Term
To achieve more controllable text generation, the standard decoding objective can be modified by introducing a penalty term. This extended objective aims to find the optimal output sequence by maximizing the conditional probability while simultaneously penalizing or rewarding specific behaviors based on a penalty function. The general form of this new objective is: Here, measures how much the sequence violates desired constraints given the input , and determines the strength of the penalty.

0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Decoding Objective with Penalty Term
A language model is being used to generate one-sentence summaries of news articles. The initial outputs are often too long and contain repetitive phrases (e.g., 'The study showed the research indicated that...'). To improve the quality of the summaries, a penalty term is added to the decoding process. Which of the following penalty strategies would be most effective at addressing both of the identified issues?
Evaluating a Penalty Term for Creative Writing
A language model is exhibiting several undesirable behaviors during text generation. Match each problem with the penalty term specifically designed to mitigate it.
Learn After
Penalty Function in Controllable Decoding
A developer is using a language model for text summarization. The model's outputs are generally fluent but suffer from excessive repetition of certain phrases. To address this, the developer employs a decoding objective that penalizes repetition, formulated as:
argmax [Pr(y|x) - λ * Penalty(x, y)], wherePenalty(x, y)increases with the amount of repetition in the outputy. How should the developer adjust the hyperparameterλto make the summaries less repetitive?Analyzing the Trade-off in Penalized Decoding
Consider the decoding objective for controllable text generation:
ŷ = argmax [Pr(y|x) - λ * Penalty(x, y)]. If the hyperparameterλis set to 0, the objective simplifies to finding the output with the highest conditional probability, effectively ignoring any penalty.Greedy Search with Penalty Objective
Sampling-based Search with Penalty Objective