Optimizing an AI Quality Scorer
Based on the scenario provided, describe the fundamental optimization goal for training the 'quality scoring' model. What kind of function is being optimized, and what does this optimization process aim to achieve with respect to the model's scores and the human preference data?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Reward Model Training via Ranking Loss Minimization
A team is training a neural network to evaluate the quality of different text outputs generated in response to a prompt. The training data consists of many examples, where each example includes a prompt, a pair of generated text outputs (Output A and Output B), and a label indicating which output was preferred by a human evaluator. The network's goal is to learn to assign a single numerical score to any given output. Which of the following best describes the fundamental objective that guides the adjustment of the network's parameters during this training process?
Optimizing an AI Quality Scorer
The Role of a Loss Function in Reward Model Training