Learn Before
An engineer is configuring an attention layer with 32 query heads. This layer uses a grouped-query approach where query heads are partitioned into groups, with each group sharing a single key and value projection. Match each configuration for the number of key-value groups to its resulting characteristic.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is designing a large language model and is deciding on the architecture for its attention layers. The model is configured to have 64 query heads. The engineer uses an attention variant where these query heads are partitioned into groups, and all heads within a group share the same key and value projections. If the engineer sets the number of key-value groups to 1, which statement best analyzes the resulting configuration?
Optimizing Attention Mechanisms for Different Applications
An engineer is configuring an attention layer with 32 query heads. This layer uses a grouped-query approach where query heads are partitioned into groups, with each group sharing a single key and value projection. Match each configuration for the number of key-value groups to its resulting characteristic.