Formula

Decoding Objective with Penalty Term

To achieve more controllable text generation, the standard decoding objective can be modified by introducing a penalty term. This extended objective aims to find the optimal output sequence y^\hat{\mathbf{y}} by maximizing the conditional probability while simultaneously penalizing or rewarding specific behaviors based on a penalty function. The general form of this new objective is: y^=arg maxyY[Pr(yx)λPenalty(x,y)]\hat{\mathbf{y}} = \argmax_{\mathbf{y} \in \mathcal{Y}} \big[ \Pr(\mathbf{y}|\mathbf{x}) - \lambda \cdot \mathrm{Penalty}(\mathbf{x},\mathbf{y}) \big] Here, Penalty(x,y)\mathrm{Penalty}(\mathbf{x},\mathbf{y}) measures how much the sequence y\mathbf{y} violates desired constraints given the input x\mathbf{x}, and λ\lambda determines the strength of the penalty.

Image 0

0

1

Updated 2026-05-05

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences