1Cademy - Penalty Function in Controllable Decoding

Learn Before

Decoding Objective with Penalty Term

Definition

Penalty Function in Controllable Decoding

The penalty function, denoted as $\mathrm{Penalty}(\mathbf{x},\mathbf{y})$ , defines the cost or degree to which a generated output sequence $\mathbf{y}$ exhibits undesirable behaviors or violates constraints given the input $\mathbf{x}$ . Its flexible design allows it to be implemented in two general ways: assessing the final 'surface form' of the generated text, or evaluating the internal hidden states of the large language model during the generation process.