1Cademy - KL Divergence Objective for Distilling Context into Soft Prompts

Learn Before

Context Distillation into Prompt Embeddings

Formula

KL Divergence Objective for Distilling Context into Soft Prompts

An alternative objective for distilling a full context into continuous soft prompt embeddings is to minimize the Kullback-Leibler (KL) divergence between the output distributions of the teacher and student models. This objective is given by $\hat{\sigma} = \argmin_{\sigma}\ \mathrm{KL}(\Pr(\cdot|\mathbf{c},\mathbf{z})\ ||\ \Pr(\cdot|\sigma,\mathbf{z}))$ , which directly aligns the student model's probability distribution given the compressed context $\sigma$ and input $\mathbf{z}$ with the teacher model's distribution given the full context $\mathbf{c}$ and input $\mathbf{z}$ .

0

1

Updated 2026-04-30

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related