Learn Before
In a neural network component, an input representation of dimension 512 is processed by 8 parallel 'heads'. For each head, a 'key' vector is produced by multiplying the input representation by a specific weight matrix. The dimensions of the 'key' vectors from all heads are concatenated, resulting in a final combined dimension of 512. What is the shape of the weight matrix used to produce the 'key' vector for a single head?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a neural network component, an input representation of dimension 512 is processed by 8 parallel 'heads'. For each head, a 'key' vector is produced by multiplying the input representation by a specific weight matrix. The dimensions of the 'key' vectors from all heads are concatenated, resulting in a final combined dimension of 512. What is the shape of the weight matrix used to produce the 'key' vector for a single head?
Determining the Number of Attention Heads
Debugging a Multi-Head Attention Layer