1Cademy - Sequence Representation for Reward Calculation in RLHF

Learn Before

Input Formulation for the RLHF Reward Model

Concept

Sequence Representation for Reward Calculation in RLHF

To generate a single vector representing an entire prompt-response sequence in a reward model, the sequence is processed from left to right using forced decoding. Because language modeling restricts each position to only accessing its left context, the output from the top-most Transformer layer at the first position cannot encapsulate the full sequence. To resolve this, a special symbol, such as $\langle \backslash s \rangle$ , is appended to the end of the sequence. The corresponding output vector from the Transformer layer stack at this final position is then used as the comprehensive representation of the entire sequence.

0

1

Updated 2026-04-20

Contributors are:

Who are from:

References

Learn Before

Related

Learn After