Case Study

Reward Model Objective Calculation

You are training a reward model where the goal is to align the model's predicted scores with human-provided scores. The training process aims to maximize the objective function defined as Lpoint=[φ(x,y)r(x,y)]2\mathcal{L}_{\text{point}} = -[\varphi(\mathbf{x}, \mathbf{y}) - r(\mathbf{x}, \mathbf{y})]^2. Based on the case study below, calculate the initial objective value and analyze the effect of a model update.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science