Activity (Process)

Sequential Context Compression with an RNN-like Mechanism

To compress a long context into a soft prompt, as proposed by Chevalier et al. (2023), the context is first divided into a series of segments. The method introduces a set of summary tokens, denoted as g1,...,gκ\langle \mathrm{g}_1 \rangle, ..., \langle \mathrm{g}_{\kappa} \rangle. A fine-tuned Transformer model operates in a Recurrent Neural Network (RNN) fashion to iteratively update a memory state. At each step, the model takes the current text segment, the previous memory state (σ<i\sigma^{<i}), and the summary tokens as input. The corresponding hidden representation sequence at the last Transformer layer for the summary tokens is extracted to form the newly updated memory state. The final memory state produced after processing the last segment serves as a complete, fixed-size representation of the entire long context.

Image 0

0

1

Updated 2026-04-30

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.3 Prompting - Foundations of Large Language Models