1Cademy - Interpreting Reward Model Notation

Learn Before

Notation for the RLHF Reward Model

Short Answer

Interpreting Reward Model Notation

A language model's performance is evaluated using the function $r(\mathbf{x}, y)$ . In your own words, describe what this function calculates and what each of its components— $r$ , $\mathbf{x}$ , and $y$ —represents in this context.

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Comprehension in Revised Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A language model is given the input prompt, 'Write a short poem about a rainy day.' It generates the response, 'The sky weeps, and the world listens.' A separate evaluation model then assesses this response for the given prompt and assigns it a quality score of 9.2. If this evaluation process is represented by the function $r(\mathbf{x}, y)$ , which option correctly assigns the elements of this scenario to the function's variables?
In the context of evaluating a language model's output, a function is commonly expressed as $r(\mathbf{x}, y)$ . Match each component of this notation to its correct description.
Reward Function as a Linear Transformation of the Last Hidden State
Aggregated Reward as the Sum of Segment-Based Rewards
Interpreting Reward Model Notation

Learn Before

Related