Activity (Process)

Training a Reward Model with Preference Data

The process of training a reward model involves using the collected preference labels. For each label, the model is fed the original input prompt, the pair of generated outputs, and the corresponding preference data. This information is used to adjust the model's parameters so it can learn to predict which responses humans would prefer.

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.2 Generative Models - Foundations of Large Language Models

Ch.4 Alignment - Foundations of Large Language Models

Related