Learn Before
Formula
Sequence-Level Loss in Context Distillation
A commonly used loss function for context distillation is the sequence-level loss, which calculates the error over an entire sequence. It takes the basic form:
where is the original instruction, is the simplified instruction, and is the user input. However, this function is computationally infeasible in practice because it requires summing over an exponentially large number of possible outputs .
0
1
Updated 2026-04-30
Tags
Foundations of Large Language Models
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences