Activity (Process)

Layer-wise Processing in Transformer Inference

During the inference phase, each layer within a Transformer executes a two-step process. First, it applies a self-attention function (Attqkv) to the input, followed by a Feed-Forward Network (FFN). The outcome of this sequence is a d-dimensional vector that represents the current token while incorporating contextual information from all preceding tokens in the sequence (the "left context").

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related