Evaluating Reference Model Selection in Reward-Based Training
In a reward-based training process for a language model, a fixed 'reference model' is used to regularize the policy updates, preventing the main model from deviating too drastically from a known, stable distribution. Evaluate the trade-offs involved in choosing this reference model. Specifically, compare the potential outcomes of using the initial, pre-trained base model versus using a model that has already undergone some initial instruction-based fine-tuning as the reference.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
During the fine-tuning of a language model using a reward signal, a team observes that the model's outputs are becoming nonsensical, even though they receive high reward scores. The model is essentially 'gaming' the reward system. Which component in this training setup is specifically intended to mitigate this issue by penalizing the model for deviating too far from its initial, coherent language patterns?
Diagnosing Training Stagnation in a Reward-Based System
Evaluating Reference Model Selection in Reward-Based Training