Learn Before
Objective Function for Policy Optimization
An objective function, denoted as , can be formulated to guide the training of a sequence generation model, often in the context of policy optimization. It is calculated by summing the weighted log-probabilities of the policy over each step of the generated sequence. The formula is , where is the model's policy (a probability distribution over outputs) and is a function that assigns a weight or advantage to each step.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Objective Function for Policy Optimization
A language model, with parameters represented by θ, is translating the English sentence 'Hello, how are you?' into French. It has already generated the partial translation 'Bonjour, comment'. The model is now deciding the next word. What does the expression
π_θ('allez' | 'Hello, how are you?', 'Bonjour, comment')represent in this context?Match each component of the policy notation
π_θ(y_t | X, y_<t)to its correct description in the context of an autoregressive language model.Appropriateness of Autoregressive Notation
Learn After
A sequence generation model is being trained to maximize the objective function . The training goal is to specifically penalize the model for using repetitive phrasing. Which of the following strategies for designing the weighting function would best accomplish this?
Analysis of a Simplified Objective Function
Debugging Model Behavior via the Objective Function