Short Answer

Reward Model Input Preparation

A data scientist is preparing a single data instance to be evaluated by a reward model. The instance consists of a prompt, x = 'Explain the water cycle in simple terms.', and a generated response, y = 'Water evaporates from oceans, forms clouds, and then falls back as rain.'. Describe the exact structure of the input sequence that should be fed into the reward model for this instance to ensure a meaningful evaluation.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science