Learn Before
Rating LLM Outputs for Reward Models
To train a reward model, one straightforward evaluation method is to ask annotators to assign a numerical rating score to each individual Large Language Model output. In this scenario, the learning problem for the reward model can be framed as a regression task.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
A development team has a pre-trained language model and wants to fine-tune it to produce responses that are more helpful and safe. Their strategy involves first creating a separate model whose sole job is to score how good a given response is, based on human preferences. Which of the following best describes the data and objective used to train this specific 'scoring' model?
You are tasked with aligning a large language model to better follow human preferences using a reward-based approach. Arrange the following high-level stages of the process into the correct chronological order.
Diagnosing Reward Model Failure
Rating LLM Outputs for Reward Models
Challenges of Rating LLM Outputs