Learn Before
An AI team is fine-tuning a language model to write compelling short stories. The model generates a story one token at a time. However, they find the model's outputs are becoming repetitive and nonsensical. Their current process involves having a reward model evaluate the entire 500-token story only after it is fully completed, providing a single quality score at the very end. Which of the following best explains why this training setup is failing?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Parameterization of the LLM Policy
A language model is being trained to generate helpful and harmless responses using feedback from a separate quality-assessment model. Arrange the following events into the correct chronological sequence for a single iterative step of this training loop.
An AI team is fine-tuning a language model to write compelling short stories. The model generates a story one token at a time. However, they find the model's outputs are becoming repetitive and nonsensical. Their current process involves having a reward model evaluate the entire 500-token story only after it is fully completed, providing a single quality score at the very end. Which of the following best explains why this training setup is failing?
In the iterative process of refining a language model using feedback, different components of the model's operation correspond to formal concepts from learning theory. Match each formal concept to its specific implementation in this language model training scenario.