Formula

KL Divergence Objective for Distilling Context into Soft Prompts

An alternative objective for distilling a full context into continuous soft prompt embeddings is to minimize the Kullback-Leibler (KL) divergence between the output distributions of the teacher and student models. This objective is given by σ^=arg minσ KL(Pr(c,z)  Pr(σ,z))\hat{\sigma} = \argmin_{\sigma}\ \mathrm{KL}(\Pr(\cdot|\mathbf{c},\mathbf{z})\ ||\ \Pr(\cdot|\sigma,\mathbf{z})), which directly aligns the student model's probability distribution given the compressed context σ\sigma and input z\mathbf{z} with the teacher model's distribution given the full context c\mathbf{c} and input z\mathbf{z}.

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences