Length-Aware Context Selection and Compression for RAG
Length-aware context selection and compression is the line of retrieval-augmented generation (RAG) work that treats the token budget of the generator's input as a first-class constraint and selects, truncates, or compresses retrieved context to fit it. Two complementary traditions are typically cited together. Knapsack-style selection (e.g., Riedhammer et al., Interspeech 2008) formulates the choice of which retrieved units to include under a length cap as a 0/1 knapsack-packing problem, maximizing a utility (such as expected ROUGE or relevance) subject to a token-budget constraint. Prompt compression methods such as LLMLingua (Jiang et al., EMNLP 2023) and LongLLMLingua (Jiang et al., ACL 2024) instead compress the assembled prompt token-by-token under an explicit budget controller, with LongLLMLingua adding question-aware compression and dynamic per-document compression ratios for RAG. The common motivation is that retrieving a fixed number of passages does not imply a fixed number of tokens passed to the generator, so practical RAG deployments must apply a separate length-aware policy on top of top- retrieval. This is the body of work that papers cite when they argue that token-cap effects are a distinct axis from retrieval ranking and should be analyzed separately from ordering and serialization effects.
0
1
Tags
Science
Auditable Strict-Parity Evaluation of Prerequisite-Graph Retrieval for RAG under Leakage Controls