Analysis of Policy Alignment with Preference Data
Based on the case study provided, analyze whether the current policy π_θ is well-aligned with the preference expressed in this data point. Justify your conclusion by explaining how the probabilities of the preferred and dispreferred responses have changed relative to the reference policy and how this impacts the training objective.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is being fine-tuned using a dataset of prompts (x), preferred responses (y_a), and dispreferred responses (y_b). The training objective is to minimize the following loss function:
In this framework, the probability that response y_a is preferred over y_b, denoted as , is computed directly from the likelihoods of each response under the current policy being trained and a fixed reference policy.
Based on this formulation, what is the most significant advantage of this training approach?
Consider a single data point
(x, y_a, y_b)from a preference dataset, wherey_ais the preferred response andy_bis the dispreferred response. In a training framework that directly optimizes a policyπ_θagainst a fixed reference policyπ_refby maximizing the log-probability of the preference data, if the policyπ_θcurrently assigns an equal likelihood to both responses (i.e.,π_θ(y_a|x) = π_θ(y_b|x)), the loss contribution from this data point will be zero.Analysis of Policy Alignment with Preference Data
Comparison of DPO and RLHF Loss Functions