1Cademy - Component Roles in a Transformer Block

Learn Before

Structure of a Transformer Block

Short Answer

Component Roles in a Transformer Block

Describe the distinct computational roles of the self-attention and the feed-forward network sub-layers within a single Transformer block. Explain why both are considered essential for the block's overall function of processing sequential data.

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

A standard Transformer block processes an input sequence through two main sub-layers using a post-normalization scheme. Arrange the following operations in the correct order from start to finish for a single block.
A language model built with Transformer blocks consistently produces grammatically correct sentences, but the sentences lack contextual coherence. For instance, given the input 'The scientist carefully placed the sample under the microscope to observe its...', the model generates '...color is a vibrant shade of the car.' Which sub-layer within the Transformer blocks is most likely failing to perform its primary function, leading to this specific type of error?
Component Roles in a Transformer Block
Transformer Block Inputs and Outputs Notation
Residual Connections and Layer Normalization in Transformers

Learn Before

Related