1Cademy - Layer-wise Processing in Transformer Inference

Learn Before

Core Components of a Transformer Decoding Network
Initial Input Representation for Transformer Layers

Activity (Process)

Layer-wise Processing in Transformer Inference

During the inference phase, each layer within a Transformer executes a two-step process. First, it applies a self-attention function (Attqkv) to the input, followed by a Feed-Forward Network (FFN). The outcome of this sequence is a d-dimensional vector that represents the current token while incorporating contextual information from all preceding tokens in the sequence (the "left context").

Updated 2025-10-10

Contributors are: