Short Answer

Role of Context in Segment-Based Reward

In the context of calculating a reward score for a segment of a generated text, the formula is often expressed as rk=r(x,y,yˉk)r^k = r(\mathbf{x}, \mathbf{y}, \bar{\mathbf{y}}_k) Explain why the complete generated output, y\mathbf{y}, is included as an input to the reward model rr when the goal is to score only a specific segment, yˉk\bar{\mathbf{y}}_k.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science