Context Distillation into Prompt Embeddings
Applying knowledge distillation to context compression involves treating the full-context prediction as the teacher model and the compressed-context prediction as the student model. Unlike standard context distillation where the compressed context uses discrete tokens, this method distills the context into real-valued vectors , which act as prompt embeddings. Furthermore, the teacher and student models are not required to share the same architecture; typically, a stronger model serves as the teacher, while a smaller, more efficient model acts as the student.
0
1
Tags
Foundations of Large Language Models
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences