1Cademy - Evaluating a Reward Calculation Method

Learn Before

Reward Function as a Linear Transformation of the Last Hidden State

Short Answer

Evaluating a Reward Calculation Method

A language model's reward function is defined by the equation $r = \mathbf{h}_{\text{last}} \mathbf{W}_r$ , where $r$ is the scalar reward, $\mathbf{h}_{\text{last}}$ is the vector representation of the final token in the generated output, and $\mathbf{W}_r$ is a learned weight matrix. Based on this formula, explain one significant advantage and one significant disadvantage of this approach for evaluating the quality of a generated text.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related