Learn Before
Inputs for Segment-Based Reward Calculation
A language model is given a prompt and generates an output, which is then split into two segments for evaluation. A reward model needs to calculate the score for the second segment ('Segment 2'). Based on the information provided below, what are the three distinct pieces of information that must be fed into the reward model to compute this score?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Total Reward as Sum of Segment-Based Scores
Examples of Constant Segment-Based Reward Functions
A team is developing a reward model to score segments of text generated by a language model. The standard approach calculates a segment's score using the initial prompt, the complete generated output, and the specific segment being evaluated. To improve efficiency, a developer suggests modifying the process to calculate the score using only the initial prompt and the specific segment, omitting the rest of the generated output. What is the most significant analytical flaw in this modified approach?
Inputs for Segment-Based Reward Calculation
Role of Context in Segment-Based Reward