Learn Before
A language model is processing an input sentence that has been broken down into 5 distinct tokens. The input to the first processing layer is represented as a matrix containing 5 separate vectors, one for each token. Why is it fundamentally important for the model to maintain this structure—a sequence of individual vectors—as the input to each subsequent layer, rather than, for example, averaging or concatenating them into a single vector?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Transformer Layer Output Formula
General Formula for a Transformer Layer
Input Composition in a Prefix-Tuned Transformer Layer
A language model is processing an input sentence that has been broken down into 5 distinct tokens. The input to the first processing layer is represented as a matrix containing 5 separate vectors, one for each token. Why is it fundamentally important for the model to maintain this structure—a sequence of individual vectors—as the input to each subsequent layer, rather than, for example, averaging or concatenating them into a single vector?
Structure of a Transformer Layer's Input
When a Transformer model processes a sentence with 12 tokens, the input to the fifth layer is a single, high-dimensional vector that represents the aggregated meaning of the entire sentence as computed by the first four layers.