1Cademy - Input Formulation for the RLHF Reward Model

Learn Before

Reward Model Implementation using a Pre-trained LLM

Activity (Process)

Input Formulation for the RLHF Reward Model

To evaluate a response, the reward model processes a sequence created by concatenating the original input prompt $\mathbf{x}$ with the generated output $\mathbf{y}_k$ . This combined sequence is formally denoted as $\mathrm{seq}_k = [\mathbf{x},\mathbf{y}_k]$ , which is then fed into the model from left to right to derive its representation.