1Cademy - Reward Model Input Preparation

Learn Before

Input Formulation for the RLHF Reward Model

Short Answer

Reward Model Input Preparation

A data scientist is preparing a single data instance to be evaluated by a reward model. The instance consists of a prompt, x = 'Explain the water cycle in simple terms.', and a generated response, y = 'Water evaporates from oceans, forms clouds, and then falls back as rain.'. Describe the exact structure of the input sequence that should be fed into the reward model for this instance to ensure a meaningful evaluation.

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related