1Cademy - Critique of the Last-Token Reward Calculation Method

Learn Before

Reward Score Formula for LLM-based Reward Models

Essay

Critique of the Last-Token Reward Calculation Method

A common method for calculating a reward score from a language model involves applying a linear transformation to the hidden state vector corresponding only to the final token of a given text sequence. Critically evaluate this approach. Discuss one significant advantage and one significant disadvantage of using only the final token's representation for this purpose.

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related