Concept

General Loss Minimization Objective for Reward Model Training

In Reinforcement Learning from Human Feedback, the training of the reward model is framed as a loss minimization problem. The general objective is to minimize a loss function, denoted as L()L(\cdot), which is dependent on the input prompt (xx), a set of model-generated outputs (e.g., {y1, y2}), and the reward model (r()r(\cdot)) itself. By minimizing this loss function, the reward model learns to assign scores to outputs in a manner that reflects the collected human preference data.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related