Learn Before
AI Training Feedback Strategy
A team is training an AI to generate concise summaries of long research papers. Their current method involves human evaluators giving a score to each sentence the AI produces, one by one. Despite this sentence-by-sentence feedback, the final summaries often lack a logical flow and fail to capture the main argument of the original paper.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Effectiveness of Sparse but Informative Human Feedback in RLHF
A team is training a language model to write compelling short stories. They decide to provide a single quality score (reward) only after an entire story is completed, rather than scoring each sentence as it is generated. Which of the following statements best analyzes the primary justification for this training strategy?
AI Training Feedback Strategy
When training a language model to write a multi-paragraph summary of a document, providing a reward after each correctly structured sentence is generated is a more practical and effective approach than providing a single, holistic reward based on the quality of the final, complete summary.