Diagnosing Flawed AI Training
A development team is refining a large language model using a two-stage training process. In the first stage, they train a 'scoring' model on a dataset of human-ranked conversations to predict which responses humans would prefer. In the second stage, they use this scoring model to provide feedback to the main language model, encouraging it to generate responses that receive high scores. After deployment, the team notices that the model often produces responses that are grammatically perfect and very polite, but are also extremely vague and non-committal, effectively avoiding answering the user's actual question. During internal testing, they confirm that the scoring model consistently gives these vague responses very high ratings. Based on this information, which of the two stages is the most likely root cause of the model's unhelpful behavior, and why?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Reward Model Learning in RLHF
A team is training a conversational agent to be more helpful. Their strategy involves having a human user interact with the agent. After each response from the agent, the human provides a numerical score indicating its quality. This score is immediately used as a signal to update the agent's internal strategy for generating the next response. This direct-feedback loop is repeated thousands of times. The team observes that this training process is prohibitively slow and costly. Based on the typical two-stage process for this kind of training, what is the most significant flaw in the team's approach?
A common method for aligning a language model with human preferences involves two major phases. Arrange the following descriptions of these phases in the correct chronological order.
A team is implementing a system to align a language model with human preferences. The process involves several distinct activities. Match each activity described below to the primary learning stage it belongs to.
Diagnosing Flawed AI Training