Learn Before
Core Components of a Transformer Decoding Network
A Transformer decoding network, often denoted as Dec(·), is fundamentally constructed from an embedding network and a series of stacked layers. The embedding network handles the initial input processing, while the stacked layers, each comprising a self-attention module and a Feed-Forward Network (FFN), perform the main sequence processing.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Core Components of a Transformer Decoding Network
Masked Self-Attention in Transformer Decoders
A developer is building a model designed to generate text sequentially, where each new word is predicted based on the words that came before it. They consider modifying the model by removing the specific constraint that prevents a position in the sequence from attending to subsequent positions. What is the most likely consequence of this change on the model's training and generation capabilities?
A standard Transformer decoder block contains two distinct attention sub-layers. Which statement accurately differentiates the roles and data sources for these two sub-layers?
Within a single decoder block of a standard Transformer architecture, information is processed through three main computational sub-layers. Arrange these sub-layers in the correct operational sequence.
Learn After
Layer-wise Processing in Transformer Inference
Formula for KV Cache Prefilling
A researcher is building a sequence processing model and describes one of its core layers. The layer is designed to first apply a self-attention mechanism to its input sequence, and then, for each position in the sequence, it applies the same two-layer neural network independently. Based on this description, which statement accurately identifies a potential flaw or misunderstanding in the researcher's design compared to a standard Transformer decoding network layer?
A single token's data is being processed by a standard Transformer decoding network. Arrange the following operations in the correct sequence as the data flows through the network's core components, starting from the initial input.
Diagnosing a Faulty Decoding Network
Match each core component of a Transformer decoding network to its primary function within the network's architecture.
Next-Token Probability Calculation in a Transformer Decoder