1Cademy - A standard processing block in a Transformer model consists of two main sub-layers applied in sequence. The first sub-layers primary role is to relate different positions of the input sequence to compute a new representation for each position. The second sub-layer then applies an identical non-linear transformation to each positions representation independently. How does the core computational function, denoted as `F(·)`, implemented within each of these sub-layers, differ?

Learn Before

Core Function $F(\cdot)$ in Transformer Sub-layers

Multiple Choice

A standard processing block in a Transformer model consists of two main sub-layers applied in sequence. The first sub-layer's primary role is to relate different positions of the input sequence to compute a new representation for each position. The second sub-layer then applies an identical non-linear transformation to each position's representation independently. How does the core computational function, denoted as F(·), implemented within each of these sub-layers, differ?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related