Learn Before
A research team is training a model to score the quality of text responses. The training data consists of pairs of responses, where for each pair, one is labeled as 'better' than the other. The model's objective is to assign a higher score to the 'better' response in every pair. The team successfully trains two models, Model A and Model B. They discover that the internal parameters of Model A and Model B are significantly different. However, both models achieve 100% accuracy on the training data, correctly assigning a higher score to the 'better' response in every single pair. What fundamental principle of model training does this outcome best demonstrate?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Role of Regularization in Mitigating Reward Model Underdetermination
Reward Transformation Formula
A research team is training a model to score the quality of text responses. The training data consists of pairs of responses, where for each pair, one is labeled as 'better' than the other. The model's objective is to assign a higher score to the 'better' response in every pair. The team successfully trains two models, Model A and Model B. They discover that the internal parameters of Model A and Model B are significantly different. However, both models achieve 100% accuracy on the training data, correctly assigning a higher score to the 'better' response in every single pair. What fundamental principle of model training does this outcome best demonstrate?
Analyzing Reward Model Discrepancies
Explaining Score Discrepancies in Trained Models