Learn Before
A team is developing a large language model aligned with human preferences using a reinforcement learning approach. Arrange the following key phases of their training pipeline into the correct chronological order.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Model Initialization Strategy in RLHF
A team is developing a large language model aligned with human preferences using a reinforcement learning approach. Arrange the following key phases of their training pipeline into the correct chronological order.
Diagnosing a Flawed Alignment Process
An AI development team is in the final stage of a three-part alignment process for their language model. They observe that the model's outputs are becoming increasingly nonsensical, even though the reward scores assigned during this final stage are consistently high. The team has already confirmed that the initial models were set up correctly and that the dataset of human preferences used in the second stage is high-quality. Based on this information, what is the most probable cause of the model's deteriorating performance?