Diagram of Reward Score Calculation using an LLM
The process of calculating a reward score using a Transformer-based LLM is illustrated by a data flow. First, input prompt tokens () are concatenated with response tokens (), followed by a special end-of-sequence token like ⟨EOS⟩. This combined sequence is fed into a Transformer Decoder (LLM), which outputs a hidden state representation for each token position (). The final hidden state, , corresponding to the ⟨EOS⟩ token, is selected to represent the entire sequence. This vector is then transformed by a linear mapping layer with weights to produce a single scalar value, which serves as the reward score.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Pair-wise Ranking Loss Formula for RLHF Reward Model
Input Formulation for the RLHF Reward Model
Diagram of Reward Score Calculation using an LLM
An engineer is implementing a reward model by adapting a pre-trained language model. After feeding a concatenated prompt and response sequence into the model, they have access to the final layer's hidden state vector for each token in the sequence. To derive a single scalar reward score from these vectors, which of the following procedures should they implement?
You are tasked with implementing a reward model to score a response generated for a given prompt. Arrange the following steps in the correct chronological order to transform the prompt-response pair into a final scalar reward score.
Reward Model Implementation Analysis
Learn After
An engineer is building a system to generate a single quality score for a model's text response based on an initial prompt. Their proposed process is as follows:
- Concatenate the prompt tokens and the response tokens into a single sequence.
- Feed this combined sequence into a language model to get a final-layer hidden state vector for every token.
- Average all of these hidden state vectors to create a single representative vector.
- Pass this single vector through a linear layer to produce the final scalar score.
Which statement best identifies a critical flaw in this proposed method for this specific task?
You are tasked with designing a system that uses a language model to generate a single numerical score representing the quality of a given text response to a prompt. Arrange the following steps into the correct logical sequence for this process.
Reward Model Implementation Debugging