Learn Before
Calculating a Missing Segment Score
A language model's output is divided into four segments. A reward model assigns scores to each segment. The scores for the first three segments are +1.2, -0.5, and +0.8. If the total reward for the entire output is calculated to be 2.0, what is the score for the fourth segment? Explain your calculation.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Application of Segment-Based Total Reward in Policy Training
A language model generates a three-segment response to a user's prompt. A separate reward model evaluates each segment, considering the full context of the prompt and the complete response, and assigns the following scores: Segment 1: 0.8, Segment 2: -0.3, Segment 3: 0.5. According to the principle of aggregating segment-based scores, what is the total reward for the entire generated response?
Analyzing Reward Model Behavior
Calculating a Missing Segment Score