Learn Before
In a multi-head attention mechanism where the overall model dimension is d_model and there are τ parallel attention heads (where τ > 1), the output vector of a single attention head has a dimension of d_model.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Multi-Head Attention Output Calculation
In a multi-head attention mechanism, the model's overall embedding dimension is 768. If this mechanism is configured with 12 separate, parallel attention heads, what is the dimension of the output vector produced by a single one of these heads?
Relationship Between Head and Model Dimensions
In a multi-head attention mechanism where the overall model dimension is
d_modeland there areτparallel attention heads (whereτ > 1), the output vector of a single attention head has a dimension ofd_model.