A language model fine-tuned using feedback is in the middle of generating a response. For a single, specific token to be chosen and its quality assessed, several internal events must occur. Arrange the following events in the correct chronological order for one generation step.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model fine-tuned using feedback is in the middle of generating a response. For a single, specific token to be chosen and its quality assessed, several internal events must occur. Arrange the following events in the correct chronological order for one generation step.
An RLHF-tuned language model has generated the partial sentence: 'The best way to learn is by'. The model's policy is now considering 'doing' as the next token. Which statement best analyzes the interaction of the core components at this specific moment of generation?
Diagnosing Component Outputs in Text Generation