Learn Before
Evaluating a Sparse Reward Strategy
A team is training a language model to generate short stories. The training process provides a single quality score only after the entire story is written. For all intermediate steps (e.g., after each word is generated), the score is zero. Describe one potential advantage and one potential disadvantage of this training approach.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is training a text-generation model where a single quality score is assigned only after a complete multi-sentence response is generated. For all intermediate steps (i.e., before the final word), a default score of 0 is used. The team notices the model struggles to maintain a consistent narrative thread throughout its responses. Which statement best analyzes the relationship between this scoring method and the model's behavior?
A language model is being trained using a reward model that provides a single quality score for a complete, generated response. If the model generates the four-token sequence ['The', 'cat', 'sat', '.'], which of the following reward lists best represents the standard sparse reward assignment for this process, where
r_tis the reward at stept?Evaluating a Sparse Reward Strategy