Learn Before
Penalty Function in Controllable Decoding
The penalty function, denoted as , defines the cost or degree to which a generated output sequence exhibits undesirable behaviors or violates constraints given the input . Its flexible design allows it to be implemented in two general ways: assessing the final 'surface form' of the generated text, or evaluating the internal hidden states of the large language model during the generation process.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Penalty Function in Controllable Decoding
A developer is using a language model for text summarization. The model's outputs are generally fluent but suffer from excessive repetition of certain phrases. To address this, the developer employs a decoding objective that penalizes repetition, formulated as:
argmax [Pr(y|x) - 位 * Penalty(x, y)], wherePenalty(x, y)increases with the amount of repetition in the outputy. How should the developer adjust the hyperparameter位to make the summaries less repetitive?Analyzing the Trade-off in Penalized Decoding
Consider the decoding objective for controllable text generation:
欧 = argmax [Pr(y|x) - 位 * Penalty(x, y)]. If the hyperparameter位is set to 0, the objective simplifies to finding the output with the highest conditional probability, effectively ignoring any penalty.Greedy Search with Penalty Objective
Sampling-based Search with Penalty Objective
Learn After
Flexibility of the Penalty Function
Repetition Penalty
Length Penalty
Diversity Penalty
Constraint-based Penalty
Penalty Functions Based on Hidden States
A developer is building a system to generate empathetic and cautious responses for a customer service chatbot. To achieve this, they want to implement a penalty function that discourages the model from adopting an 'overly confident' or 'assertive' internal state during the text generation process, rather than simply penalizing specific words in the final output. Which of the following penalty function designs best aligns with this goal of operating on the model's internal representations?
Comparing Penalty Function Implementations
A team is developing a text generation model and is considering two different ways to penalize undesirable outputs. Match each proposed penalty mechanism with the implementation approach it represents.