Reward Model Objective Calculation
You are training a reward model where the goal is to align the model's predicted scores with human-provided scores. The training process aims to maximize the objective function defined as . Based on the case study below, calculate the initial objective value and analyze the effect of a model update.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is training a reward model where the goal is to align the model's predicted scores, , with human-provided scores, . The standard approach is to maximize the objective function . Suppose the engineer makes a mistake and instead configures the training process to maximize the standard mean squared error, effectively removing the negative sign from the objective: . What would be the most likely effect on the model's behavior during training?
Reward Model Objective Calculation
Pointwise Rating Loss (L_rating) Formula
In the context of training a model to predict scores for a given input-output pair, consider the following objective function: Match each component of the formula to its correct description.