Short Answer

From Representation to Reward

In a reward model, a network first produces a high-dimensional vector that represents the combined meaning of a prompt and its response. This vector is then passed through a final output layer to produce the reward score. Analyze the fundamental difference between the information encoded in the high-dimensional vector and the information represented by the final single numerical score. What is the purpose of this transformation?

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science