Learn Before
From Representation to Reward
In a reward model, a network first produces a high-dimensional vector that represents the combined meaning of a prompt and its response. This vector is then passed through a final output layer to produce the reward score. Analyze the fundamental difference between the information encoded in the high-dimensional vector and the information represented by the final single numerical score. What is the purpose of this transformation?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Reward Score Formula for LLM-based Reward Models
End-of-Sequence Reward Assignment in RLHF
In a system designed to evaluate the quality of generated text, a complex neural network first processes a prompt and its corresponding response, ultimately producing a high-dimensional vector that captures the nuanced meaning and relationship between them. What is the essential final step required to convert this complex vector into a practical, usable evaluation, and what is the nature of its output?
Troubleshooting a Reward Model's Output
From Representation to Reward