Formula

Sequence-Level Loss in Context Distillation

A commonly used loss function for context distillation is the sequence-level loss, which calculates the error over an entire sequence. It takes the basic form:

Loss=yPrt(yc,z)logPrθs(yc,z)\mathrm{Loss} = \sum_{\mathbf{y}} \mathrm{Pr}^{t}(\mathbf{y}|\mathbf{c},\mathbf{z}) \log \mathrm{Pr}_{\theta}^{s}(\mathbf{y}|\mathbf{c}',\mathbf{z})

where c\mathbf{c} is the original instruction, c\mathbf{c}' is the simplified instruction, and z\mathbf{z} is the user input. However, this function is computationally infeasible in practice because it requires summing over an exponentially large number of possible outputs y\mathbf{y}.

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences