Segment-Based Reward Score Formula
The reward score for the -th segment of a generated output is calculated using a reward model that takes the full context into account. The evaluation of a particular segment involves the original input prompt, the entire generated output sequence, and the specific segment itself. The formula is expressed as: where denotes the reward for the -th segment, is the reward model function, represents the initial prompt, is the complete output sequence, and is the segment under evaluation.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Segment-Based Reward Score Formula
A team is developing a system to score individual sentences within a long, multi-paragraph response generated by a model. They observe that the system sometimes gives a high score to a sentence that, while well-written in isolation, directly contradicts information presented in a previous paragraph of the same response. Which of the following is the most likely reason for this evaluation error?
Designing a Context-Aware Reward Model
A reward model is designed to evaluate the quality of a specific sentence within a longer, AI-generated response. For the model to accurately score the sentence, it requires three distinct pieces of information as input. Match each required input component with its primary role in the evaluation process.
Learn After
Total Reward as Sum of Segment-Based Scores
Examples of Constant Segment-Based Reward Functions
A team is developing a reward model to score segments of text generated by a language model. The standard approach calculates a segment's score using the initial prompt, the complete generated output, and the specific segment being evaluated. To improve efficiency, a developer suggests modifying the process to calculate the score using only the initial prompt and the specific segment, omitting the rest of the generated output. What is the most significant analytical flaw in this modified approach?
Inputs for Segment-Based Reward Calculation
Role of Context in Segment-Based Reward