Concept

Reward Model Implementation using a Pre-trained LLM

A common method for creating a reward model is to adapt a pre-trained Large Language Model (LLM). Given an input prompt x\mathbf{x} and a response yk\mathbf{y}_k, they are concatenated to form a single sequence seqk=[x,yk]\mathrm{seq}_k = [\mathbf{x}, \mathbf{y}_k], which is processed from left to right using forced decoding. Because language models restrict each position to accessing only its left context, the representation at the first position cannot capture the full sequence. Instead, a special symbol (e.g., \s\langle \backslash s \rangle) is appended to the end of the sequence. The corresponding output from the top-most Transformer layer at this final position is selected as the comprehensive representation of the entire sequence.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models