Learn Before
Role of Context in Segment-Based Reward
In the context of calculating a reward score for a segment of a generated text, the formula is often expressed as Explain why the complete generated output, , is included as an input to the reward model when the goal is to score only a specific segment, .
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Total Reward as Sum of Segment-Based Scores
Examples of Constant Segment-Based Reward Functions
A team is developing a reward model to score segments of text generated by a language model. The standard approach calculates a segment's score using the initial prompt, the complete generated output, and the specific segment being evaluated. To improve efficiency, a developer suggests modifying the process to calculate the score using only the initial prompt and the specific segment, omitting the rest of the generated output. What is the most significant analytical flaw in this modified approach?
Inputs for Segment-Based Reward Calculation
Role of Context in Segment-Based Reward