Learn Before
Approximating Expected Loss with Empirical Loss
In practical applications, the theoretical reward model loss, which is defined as an expectation over the entire data distribution, is replaced with an empirical loss calculated as a summation over the collected dataset Dr. This substitution is valid under the assumption that the preference pairs (x, ya, yb) are sampled uniformly from the dataset, allowing for the direct computation of the loss from the available data.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Pair-wise Ranking Loss Formula for RLHF Reward Model
Empirical Reward Model Loss Formula using Bradley-Terry Model
A reward model is trained to learn human preferences by minimizing the following loss function, which is an expectation over a preference dataset :
In this dataset, represents a response preferred over response for a given input . What is the primary effect of successfully minimizing this loss function on the model's behavior?
Reward Model Training Diagnosis
Composition of Reward Model Parameters (ϕ)
Approximating Expected Loss with Empirical Loss
Empirical Reward Model Loss Formula
Impact of Prediction Confidence on Reward Model Loss
Learn After
A team is training a model to predict user preferences between two generated text responses. The training objective is to minimize the average loss calculated over a collected dataset of preferences. However, the data collection was flawed, resulting in a dataset that primarily contains preferences from a very specific, non-representative group of users. What is the most significant risk of using the average loss on this particular dataset as the primary metric for training the model?
From Theory to Practice: Expected vs. Empirical Loss
If a dataset used for training a preference model is extremely large, the average loss calculated over this dataset is guaranteed to be a highly accurate approximation of the theoretical loss over the entire data distribution, even if the data was collected from a narrow, specific user group.