1Cademy - In a neural network component, an input representation of dimension 512 is processed by 8 parallel heads. For each head, a key vector is produced by multiplying the input representation by a specific weight matrix. The dimensions of the key vectors from all heads are concatenated, resulting in a final combined dimension of 512. What is the shape of the weight matrix used to produce the key vector for a single head?

Learn Before

Shape of Key Weight Sub-Matrix per Head

Multiple Choice

In a neural network component, an input representation of dimension 512 is processed by 8 parallel 'heads'. For each head, a 'key' vector is produced by multiplying the input representation by a specific weight matrix. The dimensions of the 'key' vectors from all heads are concatenated, resulting in a final combined dimension of 512. What is the shape of the weight matrix used to produce the 'key' vector for a single head?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related