1Cademy - An engineer modifies a large language model by doubling the number of attention heads per layer while simultaneously halving the dimensionality of each heads key/value vectors. Assuming all other parameters (like the number of layers and sequence length) remain constant, how does this architectural change affect the multi-dimensional structure of the models key-value (KV) cache?

Learn Before

Multi-Dimensional Structure of the KV Cache

Multiple Choice

An engineer modifies a large language model by doubling the number of attention heads per layer while simultaneously halving the dimensionality of each head's key/value vectors. Assuming all other parameters (like the number of layers and sequence length) remain constant, how does this architectural change affect the multi-dimensional structure of the model's key-value (KV) cache?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related