1Cademy - Decoding Objective with Penalty Term

Learn Before

Incorporating Penalty Terms for Controllable Decoding

Formula

Decoding Objective with Penalty Term

To achieve more controllable text generation, the standard decoding objective can be modified by introducing a penalty term. This extended objective aims to find the optimal output sequence $\hat{\mathbf{y}}$ by maximizing the conditional probability while simultaneously penalizing or rewarding specific behaviors based on a penalty function. The general form of this new objective is: $\hat{\mathbf{y}} = \argmax_{\mathbf{y} \in \mathcal{Y}} \big[ \Pr(\mathbf{y}|\mathbf{x}) - \lambda \cdot \mathrm{Penalty}(\mathbf{x},\mathbf{y}) \big]$ Here, $\mathrm{Penalty}(\mathbf{x},\mathbf{y})$ measures how much the sequence $\mathbf{y}$ violates desired constraints given the input $\mathbf{x}$ , and $\lambda$ determines the strength of the penalty.

0

1

Updated 2026-05-05

Contributors are:

Who are from:

References

Learn Before

Related

Learn After