Learn Before
Concept
Teacher Model Constraints in Soft Prompt Distillation
A major limitation of compressing full contexts into continuous representations is the necessity for a teacher model capable of processing the entire, long input sequence. If the context is excessively long, applying standard Large Language Models becomes computationally costly or infeasible. Consequently, this approach often relies on efficient long-context methods, such as utilizing a fixed-size Key-Value (KV) cache or efficient Transformer architectures, to make the teacher model's processing tractable.
0
1
Updated 2026-04-30
Tags
Foundations of Large Language Models
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences