Troubleshooting a Reward Model's Architecture
Based on the typical architecture for a model designed to score text quality based on human preferences, identify the component that is most likely misconfigured or missing, and explain why this would cause the observed issue.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Reward Model Implementation using a Pre-trained LLM
Troubleshooting a Reward Model's Architecture
Both a standard generative language model and an RLHF reward model are often based on the same core architecture (e.g., a Transformer decoder). What is the key architectural modification that allows the reward model to produce a single scalar quality score for a given text, rather than generating a new sequence of text?
Adapting a Language Model for Reward Prediction
Function and Inputs of the RLHF Reward Model
Sequence-Level Evaluation in Reward Models