Learn Before
Transformer Block Inputs and Outputs Notation
In a Transformer-decoder architecture, the inputs to a Transformer block are denoted by the sequence . After processing through the entire model, the outputs of the last Transformer block (the -th block) are denoted as .
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Post-Normalization in a Transformer Sub-layer
A standard Transformer block processes an input sequence through two main sub-layers using a post-normalization scheme. Arrange the following operations in the correct order from start to finish for a single block.
A language model built with Transformer blocks consistently produces grammatically correct sentences, but the sentences lack contextual coherence. For instance, given the input 'The scientist carefully placed the sample under the microscope to observe its...', the model generates '...color is a vibrant shade of the car.' Which sub-layer within the Transformer blocks is most likely failing to perform its primary function, leading to this specific type of error?
Component Roles in a Transformer Block
Transformer Block Inputs and Outputs Notation