1Cademy - A language model generates text token by token. At each step i, an attention operation computes an output using a query vector and a memory component. In a standard causal implementation, this memory component is defined as the complete set of key and value vectors from all previous steps (1 to i). Based on this definition, what is the direct relationship between the size of this memory component and the length of the generated sequence i?

Learn Before

General Form of Memory-Based Attention

Multiple Choice

A language model generates text token by token. At each step 'i', an attention operation computes an output using a query vector and a memory component. In a standard causal implementation, this memory component is defined as the complete set of key and value vectors from all previous steps (1 to i). Based on this definition, what is the direct relationship between the size of this memory component and the length of the generated sequence 'i'?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related