Short Answer

Evaluating a Reward Calculation Method

A language model's reward function is defined by the equation r=hlastWrr = \mathbf{h}_{\text{last}} \mathbf{W}_r, where rr is the scalar reward, hlast\mathbf{h}_{\text{last}} is the vector representation of the final token in the generated output, and Wr\mathbf{W}_r is a learned weight matrix. Based on this formula, explain one significant advantage and one significant disadvantage of this approach for evaluating the quality of a generated text.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science