Formula

Log-Likelihood Objective for Distilling Context into Soft Prompts

When applying knowledge distillation to compress context into soft prompts, a simple training objective seeks to maximize the log-likelihood of the teacher model's prediction given the compressed representation. This is formalized as σ^=arg maxσlogPr(y^σ,z)\hat{\sigma} = \argmax_{\sigma} \log \Pr(\hat{\mathbf{y}}|\sigma, \mathrm{z}), where y^\hat{\mathbf{y}} is the prediction from the teacher model using the full context, σ\sigma represents the continuous prompt embeddings, and z\mathrm{z} is the user input.

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences