1Cademy - A large language model is tasked with processing a very long input document. To prepare for generating a response, it computes the Key-Value cache for the entire document in a single, large forward pass before any new tokens are produced. What is the most significant computational challenge or trade-off inherent to this all-at-once approach?

Learn Before

Prefilling in One Go (Standard Prefilling)

Multiple Choice

A large language model is tasked with processing a very long input document. To prepare for generating a response, it computes the Key-Value cache for the entire document in a single, large forward pass before any new tokens are produced. What is the most significant computational challenge or trade-off inherent to this 'all-at-once' approach?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related