1Cademy - A language model is processing an input sentence that has been broken down into 5 distinct tokens. The input to the first processing layer is represented as a matrix containing 5 separate vectors, one for each token. Why is it fundamentally important for the model to maintain this structure—a sequence of individual vectors—as the input to each subsequent layer, rather than, for example, averaging or concatenating them into a single vector?

Learn Before

Input Representation in a Transformer Layer

Multiple Choice

A language model is processing an input sentence that has been broken down into 5 distinct tokens. The input to the first processing layer is represented as a matrix containing 5 separate vectors, one for each token. Why is it fundamentally important for the model to maintain this structure—a sequence of individual vectors—as the input to each subsequent layer, rather than, for example, averaging or concatenating them into a single vector?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related