1Cademy - An engineer is building a system to generate a single quality score for a models text response based on an initial prompt. Their proposed process is as follows: 1. Concatenate the prompt tokens and the response tokens into a single sequence. 2. Feed this combined sequence into a language model to get a final-layer hidden state vector for every token. 3. Average all of these hidden state vectors to create a single representative vector. 4. Pass this single vector through a linear layer to produ

Learn Before

Diagram of Reward Score Calculation using an LLM

Multiple Choice

An engineer is building a system to generate a single quality score for a model's text response based on an initial prompt. Their proposed process is as follows:

Concatenate the prompt tokens and the response tokens into a single sequence.
Feed this combined sequence into a language model to get a final-layer hidden state vector for every token.
Average all of these hidden state vectors to create a single representative vector.
Pass this single vector through a linear layer to produ

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related