1Cademy - Overview of a Transformer

Learn Before

Transformer

Concept

Overview of a Transformer

Transformers are sequence-to-sequence models that consist of an encoder and a decoder, each of which is a stack of L identical blocks

Each encoder block is composed of a multi-head self-attention module and position wise feed-forward network (FFN).

Each decoder block is also composed of a multi-head self-attention module and FFN, but with the added components of having cross-attention from the encoder and masked self-attention

Updated 2025-09-24

Contributors are:

Who are from:

References

A Survey of Transformers (Lin et. al, 2021)

Tags

Data Science

Self-attention layers' first approach
Transformers in contextual generation and summarization
Huggingface Model Summary
A Survey of Transformers (Lin et. al, 2021)
Overview of a Transformer
Model Usage of Transformers
Attention in vanilla Transformers
Transformer Variants (X-formers)
The Pre-training and Fine-tuning Paradigm
Architectural Categories of Pre-trained Transformers
Computational Cost of Self-Attention in Transformers
Quadratic Complexity's Impact on Transformer Inference Speed
Pre-Norm Architecture in Transformers
Critique of the Transformer Architecture's Core Limitation
A research team is building a model to summarize extremely long scientific papers. They are comparing two distinct architectural approaches:
- Approach 1: Processes the input text sequentially, token by token, updating an internal state that is passed from one step to the next.
- Approach 2: Processes all input tokens simultaneously, using a mechanism that directly relates every token to every other token in the input to determine context.
Which of the following statements best analyzes the primary trade-off between these two approaches for this specific task?
Architectural Design Choice for Machine Translation
Enablers of Universal Language Capabilities
Model Depth in Transformers
Generalization of the Language Modeling Concept
Transformer Block Sub-Layers
Standard Optimization Objective for Transformer Language Models

Learn After

Encoder Structure of Transformer
Decoder Structure of Transformer
Purpose and Structure of the Feed-Forward Network (FFN) in Transformers
Self-Attention as a Source of Inference Difficulty in Transformers

Learn Before

Related

Learn After