Learn Before
Diagnosing Issues in a Chatbot Training Pipeline
Based on the training methodology and observed issues described in the case study, identify and explain the two primary limitations of using absolute, independent scores for feedback. Connect each limitation directly to one of the problems the team is facing.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing Issues in a Chatbot Training Pipeline
A team trains a reward model using a pointwise method where human annotators assign an absolute quality score from 1 to 10 to each generated text. The team finds that the final language model, trained using this reward model, performs poorly on prompts that differ even slightly from the training data. Which statement best analyzes the fundamental reason for this poor generalization?
A reward model is trained using a method where human annotators assign an absolute quality score to each response. The model's high sensitivity to disagreements among annotators is primarily a result of the regression algorithm's inherent difficulty in processing a wide numerical range of scores.