1Cademy - When a Transformer model processes a sentence with 12 tokens, the input to the fifth layer is a single, high-dimensional vector that represents the aggregated meaning of the entire sentence as computed by the first four layers.

Learn Before

Input Representation in a Transformer Layer

True/False

When a Transformer model processes a sentence with 12 tokens, the input to the fifth layer is a single, high-dimensional vector that represents the aggregated meaning of the entire sentence as computed by the first four layers.

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Transformer Layer Output Formula
General Formula for a Transformer Layer
Input Composition in a Prefix-Tuned Transformer Layer
A language model is processing an input sentence that has been broken down into 5 distinct tokens. The input to the first processing layer is represented as a matrix containing 5 separate vectors, one for each token. Why is it fundamentally important for the model to maintain this structure—a sequence of individual vectors—as the input to each subsequent layer, rather than, for example, averaging or concatenating them into a single vector?
Structure of a Transformer Layer's Input
When a Transformer model processes a sentence with 12 tokens, the input to the fifth layer is a single, high-dimensional vector that represents the aggregated meaning of the entire sentence as computed by the first four layers.

Learn Before

Related