Learn Before
Diagnosing a Flawed Training Objective
A developer is training a language model to generate helpful responses. They define a utility function, U, where higher values correspond to more helpful outputs. During training, they observe that their loss function, L(θ), is steadily decreasing, which typically indicates successful training. However, manual evaluation shows the model's responses are becoming progressively less helpful. Given that the training objective is to maximize the expected utility, what is the most likely error in the definition of L(θ) in relation to the expected utility, E[U], that would explain this outcome? Justify your answer.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Policy Gradient Utility for Sequence Generation
A research team is training a language model to generate helpful and harmless dialogue responses. They define a utility function for a given input
xand a generated responseyas:U(x, y) = (0.8 * Helpfulness_Score) - (0.2 * Harmfulness_Score). The team's objective is to find the model parameters,θ, that maximize the average utility across a large dataset of interactions. Which of the following loss functions,L(θ), should the team minimize to achieve this objective?A machine learning model is being trained with the objective of maximizing a specific utility function,
U(x, y; θ), which measures the quality of its outputs. The loss function used for training is defined asL(θ) = E[(x,y)~D][U(x, y; θ)]. True or False: Minimizing this loss functionL(θ)will successfully train the model to achieve its objective.Diagnosing a Flawed Training Objective