Concept

Input Formulation for Segment-Based Reward Computation

When evaluating the reward for a specific segment of an output, the reward model utilizes three key components as input: the original prompt x\mathbf{x}, the complete generated output sequence y\mathbf{y}, and the particular segment being evaluated yˉk\bar{\mathbf{y}}_k. This formulation ensures that the reward model has the comprehensive context necessary to accurately assess the quality of the individual segment.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences