1Cademy - An engineer is training a language model where the training objective is to adjust the models parameters to maximize a utility score for its generated outputs. The loss function is defined as the negative of the expected utility score. During a training run, the engineer observes that the calculated loss value is consistently increasing over several iterations (e.g., moving from -15.0 to -12.5 to -10.0). What is the most direct interpretation of this observation?

Learn Before

Policy Learning Loss Function in RLHF

Multiple Choice

An engineer is training a language model where the training objective is to adjust the model's parameters to maximize a utility score for its generated outputs. The loss function is defined as the negative of the expected utility score. During a training run, the engineer observes that the calculated loss value is consistently increasing over several iterations (e.g., moving from -15.0 to -12.5 to -10.0). What is the most direct interpretation of this observation?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related