Learn Before
Reward Function as a Linear Transformation of the Last Hidden State
The formula defines a reward function where the reward for a given prompt and generated output is calculated as a linear function of the final hidden state, , of the language model that produced . Here, is the vector representation of the last token in the output sequence, and is a learned weight matrix or vector that transforms this hidden state into a scalar reward value.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A language model is given the input prompt, 'Write a short poem about a rainy day.' It generates the response, 'The sky weeps, and the world listens.' A separate evaluation model then assesses this response for the given prompt and assigns it a quality score of 9.2. If this evaluation process is represented by the function , which option correctly assigns the elements of this scenario to the function's variables?
In the context of evaluating a language model's output, a function is commonly expressed as . Match each component of this notation to its correct description.
Reward Function as a Linear Transformation of the Last Hidden State
Aggregated Reward as the Sum of Segment-Based Rewards
Interpreting Reward Model Notation
Learn After
A reward model for a generative text model calculates a quality score for a given output using the formula . In this formula, is the vector representation of the final token in the generated text, and is a learned weight matrix that transforms this vector into a scalar score, . What is a primary conceptual limitation of this specific reward calculation method, especially when evaluating lengthy and complex text?
Reward Model Behavior Analysis
Evaluating a Reward Calculation Method