Multiple Choice

In a neural network component, an input representation of dimension 512 is processed by 8 parallel 'heads'. For each head, a 'key' vector is produced by multiplying the input representation by a specific weight matrix. The dimensions of the 'key' vectors from all heads are concatenated, resulting in a final combined dimension of 512. What is the shape of the weight matrix used to produce the 'key' vector for a single head?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science