Learn Before
Critique of the Last-Token Reward Calculation Method
A common method for calculating a reward score from a language model involves applying a linear transformation to the hidden state vector corresponding only to the final token of a given text sequence. Critically evaluate this approach. Discuss one significant advantage and one significant disadvantage of using only the final token's representation for this purpose.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team training a reward model observes a peculiar behavior: the model consistently assigns higher scores to generated text that ends with the phrase '...and that is the final answer.', even when the main body of the text is of poor quality. The reward score is calculated by applying a linear transformation to the hidden state vector corresponding to the final token of the input sequence. Which of the following provides the most direct explanation for this behavior?
Critique of the Last-Token Reward Calculation Method
An engineer is implementing a reward model where the final scalar score
ris computed from the last hidden state vectorh_lastusing the formular = h_last * W_r. If the hidden state vectorh_lasthas dimensions of[1 x 4096], what must be the dimensions of the weight matrixW_rfor the formula to produce a single scalar value?