Learn Before
Impact of Architectural Changes on a Value Weight Matrix
A machine learning engineer is designing a model where input vectors have a dimension of d=1024. The initial design uses 8 parallel processing streams (τ=8). The engineer considers increasing the number of streams to 16 (τ=16) while keeping the overall dimension d constant. Explain how this change affects the column dimension of the value weight matrix for each individual stream and describe the conceptual trade-off of this decision.
0
1
Tags
Data Science
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a neural network's attention mechanism, an input vector has a dimension of 512. This mechanism uses 8 parallel processing streams to handle different aspects of the input. A specific weight matrix is used to transform the input for each stream. What are the dimensions of this transformation matrix for a single stream?
Impact of Architectural Changes on a Value Weight Matrix
Evaluating Design Choices for a Value Weight Matrix