1Cademy - Relationship Between Head and Model Dimensions

Learn Before

Dimensionality of an Attention Head Output

Short Answer

Relationship Between Head and Model Dimensions

A transformer model has an overall embedding dimension, let's call it d_model. Inside this model, a multi-head attention layer is configured with a certain number of parallel attention heads, let's call this number τ. Each of these individual heads produces an output vector with its own dimension, d_h. Describe the mathematical relationship between d_model, τ, and d_h. Furthermore, explain why this specific relationship is crucial for integrating the multi-head attention layer's final output back into the model's subsequent layers.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Learn Before

Related