Learn Before
Formula

Objective Function for Context Compression into Soft Prompts

The problem of approximating a long context with a continuous representation can be formalized as an optimization task. Given a user input z\mathbf{z} and its full context c\mathbf{c}, the goal is to learn a compressed representation σ\sigma such that the model's prediction using σ\sigma closely matches the prediction using c\mathbf{c}. This objective is expressed as σ^=arg minσs(y^,y^σ)\hat{\sigma} = \argmin_{\sigma} s(\hat{\mathbf{y}},\hat{\mathbf{y}}_{\sigma}), where y^=arg maxyPr(yc,z)\hat{\mathbf{y}} = \argmax_{\mathbf{y}} \Pr(\mathbf{y}|\mathbf{c},\mathbf{z}) is the prediction with the full context, y^σ=arg maxyσPr(yσ,z)\hat{\mathbf{y}}_{\sigma} = \argmax_{\mathbf{y}_{\sigma}} \Pr(\mathbf{y}|\sigma,\mathbf{z}) is the prediction with the compressed context, and s(,)s(\cdot,\cdot) is a loss or similarity measure.

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related