Learn Before
Debugging a Dimensionality Mismatch
Based on the provided scenario, identify the correct shape for the single-stream 'key' weight matrix and explain the fundamental reason for the engineer's error.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a neural network component that uses parallel processing 'channels' to analyze input, an input representation with a dimension of 512 is transformed. This transformation is split across 8 parallel channels. For the 'key' transformation, the total dimension across all 8 channels is also 512. What is the shape of the learnable weight matrix used for the 'key' transformation within a single one of these channels?
Debugging a Dimensionality Mismatch
Calculating Weight Matrix Dimensions in a Multi-Head Attention Layer