Learn Before
Sequential Context Compression with an RNN-like Mechanism
To compress a long context into a soft prompt, as proposed by Chevalier et al. (2023), the context is first divided into a series of segments. The method introduces a set of summary tokens, denoted as . A fine-tuned Transformer model operates in a Recurrent Neural Network (RNN) fashion to iteratively update a memory state. At each step, the model takes the current text segment, the previous memory state (), and the summary tokens as input. The corresponding hidden representation sequence at the last Transformer layer for the summary tokens is extracted to form the newly updated memory state. The final memory state produced after processing the last segment serves as a complete, fixed-size representation of the entire long context.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.3 Prompting - Foundations of Large Language Models
Related
Optimization Goal for Soft Prompt Learning via Context Compression
Challenge of Context Compression for Long Sequences
Prompt as a Form of Context
A research team is developing a system where a very long, detailed set of instructions is 'compressed' into a compact, learnable set of numerical values. This compact representation is then used to guide a language model in performing a specific task, aiming to replicate the performance that would be achieved if the model had processed the full set of instructions. What is the most significant practical challenge the team will face when implementing this 'compression' process?
Applying Context Compression for a Specialized Task
Sequential Context Compression with an RNN-like Mechanism
The Goal of Context Compression for Soft Prompts
Learn After
Evaluating a Proposed Modification to a Sequential Processing Model
A Transformer model is adapted to compress a long text by processing it sequentially in segments. Arrange the following steps to accurately describe how this model iteratively builds a complete representation of the text.
When a Transformer model is fine-tuned to compress a long context by sequentially processing text segments, it updates a memory state at each step. What is the most critical function of incorporating the memory state from the previous step when encoding the current text segment?