1Cademy - Debugging Model Behavior via the Objective Function

Learn Before

Objective Function for Policy Optimization

Case Study

Debugging Model Behavior via the Objective Function

A language model is being trained to generate factual summaries of news articles. The training process aims to maximize the objective function $U(\mathbf{x}, \mathbf{y}; \theta) = \sum_{t=1}^{T} A(\mathbf{x}, y_t, \mathbf{y}_{<t}) \log \pi_\theta(y_t|\mathbf{x}, \mathbf{y}_{<t})$ . The weighting function $A(\cdot)$ is designed to assign a large negative value for any generated statement that is factually incorrect relative to the source article, and a small, constant positive value for each correct statement. After training, the model consistently produces overly cautious and brief summaries, such as 'The article discusses a topic,' instead of detailed, informative ones. Analyze why the model might be exhibiting this behavior, specifically explaining how the design of the weighting function $A(\cdot)$ interacts with the overall objective function to produce this outcome.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Learn Before

Related