1Cademy - Reward Model Implementation using a Pre-trained LLM

Learn Before

Architecture and Function of the RLHF Reward Model

Concept

Reward Model Implementation using a Pre-trained LLM

A common method for creating a reward model is to adapt a pre-trained Large Language Model (LLM). Given an input prompt $\mathbf{x}$ and a response $\mathbf{y}_k$ , they are concatenated to form a single sequence $\mathrm{seq}_k = [\mathbf{x}, \mathbf{y}_k]$ , which is processed from left to right using forced decoding. Because language models restrict each position to accessing only its left context, the representation at the first position cannot capture the full sequence. Instead, a special symbol (e.g., $\langle \backslash s \rangle$ ) is appended to the end of the sequence. The corresponding output from the top-most Transformer layer at this final position is selected as the comprehensive representation of the entire sequence.

0

1

Updated 2026-05-02

Contributors are:

Who are from:

References

Learn Before

Related

Learn After