Concept

Teacher Model Constraints in Soft Prompt Distillation

A major limitation of compressing full contexts into continuous representations is the necessity for a teacher model capable of processing the entire, long input sequence. If the context is excessively long, applying standard Large Language Models becomes computationally costly or infeasible. Consequently, this approach often relies on efficient long-context methods, such as utilizing a fixed-size Key-Value (KV) cache or efficient Transformer architectures, to make the teacher model's processing tractable.

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences