1Cademy - Reward Model Behavior Analysis

Learn Before

Reward Function as a Linear Transformation of the Last Hidden State

Case Study

Reward Model Behavior Analysis

Based on the provided formula for the reward model, explain the most likely reason why the contradictory response described in the case study received a high reward.

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A reward model for a generative text model calculates a quality score for a given output using the formula $r = \mathbf{h}_{\text{last}} \mathbf{W}_r$ . In this formula, $\mathbf{h}_{\text{last}}$ is the vector representation of the final token in the generated text, and $\mathbf{W}_r$ is a learned weight matrix that transforms this vector into a scalar score, $r$ . What is a primary conceptual limitation of this specific reward calculation method, especially when evaluating lengthy and complex te
Reward Model Behavior Analysis
Evaluating a Reward Calculation Method

Learn Before

Related