Sequence-Level Evaluation in Reward Models
When an RLHF reward model evaluates the relationship between an input prompt and a complete output sequence , it focuses on full semantic content rather than token-level accuracy. At each intermediate position in the output sequence, the model assigns a default value of (or another predetermined value). The actual scalar reward score is only generated at the final position (), reflecting the quality of the completed text.
0
1
Tags
Foundations of Large Language Models
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Reward Model Implementation using a Pre-trained LLM
Troubleshooting a Reward Model's Architecture
Both a standard generative language model and an RLHF reward model are often based on the same core architecture (e.g., a Transformer decoder). What is the key architectural modification that allows the reward model to produce a single scalar quality score for a given text, rather than generating a new sequence of text?
Adapting a Language Model for Reward Prediction
Function and Inputs of the RLHF Reward Model
Sequence-Level Evaluation in Reward Models