1Cademy - A sequence generation model is being trained to maximize the objective function $U(\mathbf{x}, \mathbf{y}; \theta) = \sum_{t=1}^{T} A(\mathbf{x}, y_t, \mathbf{y}_{<t}) \log \pi_\theta(y_t|\mathbf{x}, \mathbf{y}_{<t})$. The training goal is to specifically penalize the model for using repetitive phrasing. Which of the following strategies for designing the weighting function $A(\cdot)$ would best accomplish this?

Learn Before

Objective Function for Sequence Generation Policy Optimization

Multiple Choice

A sequence generation model is being trained to maximize the objective function $U(\mathbf{x}, \mathbf{y}; \theta) = \sum_{t=1}^{T} A(\mathbf{x}, y_t, \mathbf{y}_{<t}) \log \pi_\theta(y_t|\mathbf{x}, \mathbf{y}_{<t})$ . The training goal is to specifically penalize the model for using repetitive phrasing. Which of the following strategies for designing the weighting function $A(\cdot)$ would best accomplish this?

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related